Multiple df using tidyverse

hawken1 · September 8, 2020, 4:32pm

I want to eventually combine multiple dataframes into one big dataframe, however, I want the identity of each dataframe to remain intact, and so I was thinking of using map_dfr to create a new column (with the header "Day_") for each file, and then somehow combining the files into one big dataframe, so I can later run the rest of the code. How might I go about doing this?

nirgrahamuk · September 8, 2020, 4:46pm

This can be done conveniently with dplyr.
I dont have your csv's so we'll go with iris dataset to generate example from.


library(tidyverse)
#example frames
df_1<- slice(iris,1)
df_2 <- slice(iris,2)
df_3 <- slice(iris,3)

(getnames <- ls(pattern="df_"))

as_a_list <- map(getnames,
                 ~get(.)) %>% 
  set_names(getnames)

#binding and recording source
(result_df<- bind_rows(as_a_list,
          .id="dfsource"))
#cleanup
rm(list=getnames)
rm(getnames)
rm(as_a_list)

martin.R · September 8, 2020, 8:46pm

A slightly more concise solution:

files <- list.files(pattern = "df_") # adapt as required
result_df <- purrr::map_dfr(files, read.csv, .id = "file")

hawken1 · September 8, 2020, 9:38pm

Hello, thanks for the response; when I go to do this, it gives me empty character values for "files" and also result_df is empty and says 0 observations of 0 variables

martin.R · September 8, 2020, 9:40pm

Sorry I was copying the pattern from the previous replier's example.

Try this:

files <- list.files(pattern = "Day") # adapt as required
result_df <- purrr::map_dfr(files, read.csv, .id = "file")

nirgrahamuk · September 9, 2020, 9:17am

I think map_dfr is similar to bind_rows in that the default is just to get a numeric index as the ID, which might be all that is needed in many cases.
Admittedly my example is a little overloaded / inelegant, but it is to accommodate for 'better naming' in the resulting dataset.
Im thinking that perhaps in your map_dfr approach something like

names(files) <- files

before the map_dfr step would cover that

martin.R · September 9, 2020, 10:09am

map_dfr() just combines map() and bind_rows() into one step.

system · September 30, 2020, 10:09am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.