I have about 4000 df for which I need to get the % of NA in a particular column for every data frame. I have found this solution here how to sum a specific column from multiple csv files but I do not know how to adapt it for dataframes and not csv files. I'd need the same kind of summary as shown in the link. That topic is now closed so I can't reply to it...
Thanks
Here's the code with my minor adjustments
library(tidyverse) #load SAT files
path_load = "~/DATA_Rfiles/DAT_SAT/test"
function to summarise file
sum_file = function(path = path_load){
dat = load(path_load)
tibble(file = path_load,
sum = sum(is.na()/nrow())
}
This will get you what you asked for. I would recommend you check out this article on nesting data as a thought starter. Getting comfortable with list-columns is really a game changer, and learning how to connect the {tidyr}, {dplyr} and {purrr} packages is a very useful skill to have.
One point of clarification, strictly speaking there is no such thing as an RStudio data frame. RStudio is simply the application you're using to interface with the R programming language.
Regarding which function to use to read in your data, I think you will want to use the load() function if you used save() to write the data out to your computer, please see ?load for more details.
In the future I might recommend you use write_rds() from the {readr} package as I believe that is the more common convention to save your data to disk.
I recommend you check out this book to learn a little more about some of the fundamentals of R programming since you said you're newer to R. This book will help with the majority of common questions, such as your questions of importing data and iterating over lists.