Write multiple Dataframes to individual CSV (purr?)

I have many data frames that I would like to write out as csv. I have created a list to hold all of these data frames. But nothing is saved to my directory when I run the code and there is no error... I was following the example found here. Here is an example of what I am doing:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.3.3
#> Warning: package 'ggplot2' was built under R version 4.3.3
#> Warning: package 'tibble' was built under R version 4.3.3
#> Warning: package 'tidyr' was built under R version 4.3.3
#> Warning: package 'readr' was built under R version 4.3.3
#> Warning: package 'purrr' was built under R version 4.3.3
#> Warning: package 'dplyr' was built under R version 4.3.3
#> Warning: package 'stringr' was built under R version 4.3.3
#> Warning: package 'forcats' was built under R version 4.3.3
#> Warning: package 'lubridate' was built under R version 4.3.3

data1 <- data.frame(
  stringsAsFactors = FALSE,
                           ind = c("PctOverheadCover","PctBankOverheadCover",
                                   "VegComplexity","VegComplexityWoody",
                                   "VegComplexityUnderstoryGround","PctSedgeRushSpecies"),
          mean_val = c(16.2, 42.67, 1.015, 0.642, 0.64, 20),
                        sd_val = c(32.5904553171289,33.0573256295451,
                                   0.511886489153384,0.419014717323071,0.417745270602925,
                                   42.1637021355784),
                         nSize = c(5004.78205768309,742.783031768358,315.157023002379,
                                   527.378676333543,527.469202138303,
                                   5495.9981326272)
            )

data2 <- data.frame(
  stringsAsFactors = FALSE,
                           ind = c("PctOverheadCover","PctBankOverheadCover",
                                   "VegComplexity","VegComplexityWoody",
                                   "VegComplexityUnderstoryGround","PctSedgeRushSpecies"),
                      mean_val = c(39.0777202072539,73.860621761658,1.49690721649485,
                                   0.90840206185567,0.847680412371134,
                                   55.1744186046512),
                        sd_val = c(32.5924401859615,24.199397453478,0.405901697751818,
                                   0.360342415157486,0.343811734432535,
                                   45.9780489467759),
                         nSize = c(860.781149059716,133.408427528465,91.5972947463982,
                                   195.238406512757,204.080612429232,
                                   859.297538630485)
            )

dfs<- list(data1, data2)

dfs %>% 
  names(.) %>% 
  map(~ write_csv(dfs[[.]], paste0("C:/Users/", ., ".csv")))
#> list()

dfs %>% 
  names(.) %>%  
  walk(~ write_csv(dfs[[.]], paste0("C:/Users/", ., ".csv")))

I appreciate any feedback and help! I am trying to use a tidyverse approach here.

Notice the example you were following uses a named list. This code works to write the files to the working directory.

dfs<- list(data1 = data1, data2 = data2)

dfs %>% 
  names(.) %>% 
  map(~ write_csv(dfs[[.]], paste0( ., ".csv")))

Is there a more efficient way to do this? Having to write out a named list if you have 50+ dataframes that need saved out seems less than ideal. I am open to suggestions..

Thanks!

It is likely there is a more efficient way to get the names assigned to the files but the details depend on what you need the names to be and how those relate to the names of the data frames.
Can the names be gathered from the process of making the data frames?
Is there a pattern to the desired file names so they can be constructed with code?

Great questions!

The dataframes are created through a filter() of the data. In the filter I use grepl as some values are unique but they share an element and I want them combined into one data frame. For example I could have data on various animals and there is a column based on generic species names (Dog, Cat, Mouse, Bird) and some of the values in that column are Dogsmall, Doglarge and I would seek to filter using grepl("Dog", Species) to make sure I had all records for "dog". Below is an example of creating dataframes and naming them. Ideally I woud like to save out these data frames into csv's as stated in the original question.

# Cat dataframe
Catrange <- mast_dat |> 
  filter(grepl("Cat", Species) & PointSelectionType == "RandomGRTS") |> 
  select(all_of(metrics)) |> 
  pivot_longer(
    !c(EvaluationID,Date), 
    names_to = "ind", values_to = "value", values_transform = 
      list(value = as.double))|> 
  summarise(mean_val = mean(value, na.rm=T), sd_val = sd(value, na.rm=T), .by = ind)
Catrange[Catrange==0] <- NA #replace 0s with NA
Catrange_f <- na.omit(Catrange) # This is the final data frame 

#dog data frame
Dogrange <- mast_dat |> 
  filter(grepl("Dog", Species) & PointSelectionType == "RandomGRTS") |> 
  select(all_of(metrics)) |> 
  pivot_longer(
    !c(EvaluationID,Date), 
    names_to = "ind", values_to = "value", values_transform = 
      list(value = as.double))|> 
  summarise(mean_val = mean(value, na.rm=T), sd_val = sd(value, na.rm=T), .by = ind)
Dogrange[Dogrange==0] <- NA #replace 0s with NA
Dogrange_f <- na.omit(Dogrange) # This is the final data frame 

Hopefully this provides you with enough information!
Created on 2024-04-05 with reprex v2.1.0

Thanks for posting the code. It helps a lot.
Since your processing for Cat and Dog is identical except for the word used for filtering, I would write a function to do the data processing and use map() to iterate over the words. I think the following will work but I don't have data to test it with.

library(tidyverse)
DataProcess <- function(Txt, DF) {
  tmp <- DF |> 
    filter(grepl(Txt, Species) & PointSelectionType == "RandomGRTS") |> 
    select(all_of(metrics)) |> 
    pivot_longer(
      !c(EvaluationID,Date), 
      names_to = "ind", values_to = "value", values_transform = 
        list(value = as.double))|> 
    summarise(mean_val = mean(value, na.rm=T), sd_val = sd(value, na.rm=T), .by = ind)
  tmp[tmp==0] <- NA #replace 0s with NA
  tmp <- na.omit(tmp) # This is the final data frame 
  return(tmp)
}

Words <- c("Cat", "Dog")
OutList <- map(Words, \(Targ) DataProcess(Txt = Targ, DF = mast_dat))
names(OutList) <- Words

OutList %>% 
  names(.) %>% 
  map(~ write_csv(OutList[[.]], paste0( ., ".csv")))
1 Like

By the way, you could build the write.csv() into the function, but I kept them separate in case you need to do more with the data frames than is shown in your code.

Thanks! That worked great and as you alluded to I DO need to do a little more with the dataframes. I have very little experience building functions but I think I could copy the pervious answer you gave and create a function for the code below and then map the function to the outlist again? Or would I need to make another list (like outlist) that applies the new function to the "Words" ?

delta <- 0.10   # Detectable change (30% of the mean)
alpha <- 0.10   # Significance level
power <- 0.8    # Desired power

Cat_size <-
Cat |> 
  mutate(nSize = pwr.t.test(d = ((delta*mean_val) / sd_val), 
                            power = power, 
                            sig.level = alpha, 
                            type = "two.sample", 
                            alternative = "two.sided")$n, .by = ind
  )

Dog_size <-
  Dog |> 
  mutate(nSize = pwr.t.test(d = ((delta*mean_val) / sd_val), 
                            power = power, 
                            sig.level = alpha, 
                            type = "two.sample", 
                            alternative = "two.sided")$n, .by = ind
  )

## Here is my attempt at making the above into a function. I am not sure how to map the ##"Words" to the outlist so that each dataframe in the outlist is piped into the function

powertest <- function(delta, alpha, power, DF){
  tmp <- DF |> 
    mutate(nSize = pwr.t.test(d = ((delta*mean_val) / sd_val), 
                              power = power, 
                              sig.level = alpha, 
                              type = "two.sample", 
                              alternative = "two.sided")$n, .by = ind
    )
  return(tmp)
}
powerlist <- map(Words, \(Targ) powertest( delta = delta, alpha = alpha, power = power, DF = ))

Can you explain the relationship between the Txt input in the function, Words, and (targ)?

library(tidyverse)
DataProcess <- function(Txt, DF) {
  tmp <- DF |> 
    filter(grepl(Txt, Species) & PointSelectionType == "RandomGRTS") |> 
    select(all_of(metrics)) |> 
    pivot_longer(
      !c(EvaluationID,Date), 
      names_to = "ind", values_to = "value", values_transform = 
        list(value = as.double))|> 
    summarise(mean_val = mean(value, na.rm=T), sd_val = sd(value, na.rm=T), .by = ind)
  tmp[tmp==0] <- NA #replace 0s with NA
  tmp <- na.omit(tmp) # This is the final data frame 
  return(tmp)
}

Words <- c("Cat", "Dog")
OutList <- map(Words, \(Targ) DataProcess(Txt = Targ, DF = mast_dat))
names(OutList) <- Words

Created on 2024-04-05 with reprex v2.1.0

To understand the relationship between Words, Targ, and Txt. it is easiest to work from the inside out.

OutList <- map(Words, \(Targ) DataProcess(Txt = Targ, DF = mast_dat))

Txt is the argument of DataProcess() that receives the text that will be used to filter the data frame. You could run

DataProcess(Txt = "Cat", DF = mast_dat)

directly, after defining DataProcess().
The use of Targ might be obscured by the curious notation \(Targ). The \( )defines an anonymous function. That is simply a function that is not given a name. It does not need a name because it is only run inside of the map function.
Writing \(Targ) is the same as writing function(Targ) The code
\(Targ) DataProcess(Txt = Targ, DF = mast_dat)
accepts the argument Targ and passes that value to the Txt argument of DataProcess(). You might well ask "why do that instead of directly passing the value to DataProcess()". The only reason to do it is that DataProcess() accepts two arguments, Txt and DF. By nesting DataProcess() inside of the anonymous function, the value of DF is conveniently set. That is not the only way to set the value of DF, but I believe it is the preferred one.
Words holds the values that will be passed to Targ and Txt. The map() function iterates over Words, passing each value to the anonymous function.
You can build a similar arrangement for powertest() except that what changes is the value of DF. The OutList already provides the different data frames and the purpose of map() is to iterate over OutList and send each of its data frames to powertest(). A possible approach is

powerlist <- map(OutList, \(DatFrm) powertest( delta = delta, alpha = alpha, power = power, DF = DatFrm)

Each data frame in OutList will be passed to DatFrm and DatFrm is passed to the DF argument of powertest. I changed the argument name of the anonymous function to DatFrm to emphasize its purpose and distinguish it from the argument in the previous code. Using Targ as the name would have worked. As a matter of taste, I would make DF the first argument of powertest(), so it is easier to see where DatFrm gets passed.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.