I have a list of tibbles (about 200 50 x 3 tibbles) that has huge memory size (over 2 GB).
I did some detective work, and I find that these tibbles have a lot of information in the na.action attribute. A vector of about 700,000 rows while the tibble itself is only 50 rows.
In other words, the data in the tibble is about 1 KB but the na.action attribute is about 45 MB.
Is there a way to clear this out? How did this happen?
This honestly seems like a problem. In this case, I have 200 50-row tibbles that are all filtered subsets of a master tibble with a million rows. I guess the na.action attribute of the 1 million-row tibble remains in all the 50-row tibbles?
Not at all an edge case. I wonder if this problem is an unrecognized gremlin in a lot of R code.
This might be a bug with na.omit() since the behavior is present with tibbles and dataframes alike, but if you use the tidyverse equivalent tidyr::drop_na() the problem is no longer present.
I think it would be worthy to formally report this issue. If the bug is with the stats::na.omit() function you would need to follow these instructions R: Bug Reporting in R.