Modify tibble in place

I'm looking for some efficiency advice. I am doing a simulation where I want to collect results in a tibble, tibbles being cool. The results of each simulation trial will go in one row of the tibble.

Everything I know how to do results in the entire tibble being rewritten each time I modify a row. That can burn a lot of computer time. I realize this is basically how R is designed, but is there any (relatively easy) way around this?

Also, does it save time if I do something like myTibble$myVariable[17] <- 3.14159 and so on for each column rather than replacing a row?

I wonder if you are looking for the new replace_values() function from dplyr 1.2.0!

library(tibble)
library(dplyr)

trial_data <- tibble(
  trial = c(1, 2, 3, 4, 5),
  results = c(4.2, 5.3, 8.1, 1.2, 2.2)
)

trial_data
#> # A tibble: 5 × 2
#>   trial results
#>   <dbl>   <dbl>
#> 1     1     4.2
#> 2     2     5.3
#> 3     3     8.1
#> 4     4     1.2
#> 5     5     2.2

trial_data |> 
  mutate(results = results |> 
           replace_values(
             5.3 ~ 1.0
           )
  )
#> # A tibble: 5 × 2
#>   trial results
#>   <dbl>   <dbl>
#> 1     1     4.2
#> 2     2     1  
#> 3     3     8.1
#> 4     4     1.2
#> 5     5     2.2

Created on 2026-03-18 with reprex v2.1.1

I didn't know about replace_values(), so that might be very helpful. Thanks! But let me ask two questions:
(1) Is looking up the matching value going to take enough time to offset efficiency gains?
(2) Is there a way to specify a row rather than look for a match?

My suggestion would be to collect the observations in a dataframe and convert it to a tibble after the simulation is done. Assuming you have a fixed sample size in mind for the simulation, you can define the initial dataframe to have one row per observation and NA values in the columns, then overwrite each row as a new observation comes in.

Thanks Paul. I don't think that works because data frames also get rewritten...usually. But investigating it led me to some evidence that one can use data.table to get around the problem.

If all your results are of the same type, I'd guess one of the most idiomatic approaches for in-place modification would be using a pre-allocated matrix and update its rows, columns or indices / ranges.

In-place modification, updating mat values does not trigger copy-on-write and its location in memory remains constant:

mat <- matrix(NA_real_, 5, 5)
for (m in 1:5){
  mat[m,] <- m * 1:5
  print (lobstr::obj_addr(mat))
}

#> [1] "0x158346caf00"
#> [1] "0x158346caf00"
#> [1] "0x158346caf00"
#> [1] "0x158346caf00"
#> [1] "0x158346caf00"
mat
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3    4    5
#> [2,]    2    4    6    8   10
#> [3,]    3    6    9   12   15
#> [4,]    4    8   12   16   20
#> [5,]    5   10   15   20   25

But maybe the bottleneck of your actual task is somewhere else and perhaps you'd benefit more from parallel processing, assuming the nature of that specific simulation allows it.

1 Like

Thank you @margusl. That makes sense to me.

In fact, I had Codex rewrite my code to run in parallel and got a big speedup. But it also made some other changes, so I think I will take this as a hint to run the profiler on my code and see if what the bottlenecks really are.