Exporting ggplot to PDF from facet

Below is a subset of my dataframe (Infil_Data). What I am looking to do is save each scatter plot as a different page in a pdf. So for this case I would have a pdf called "new_file", the first page would contain scatter plot A1 and the second page would contain scatter plot A2. I have been reading through a book called, "R for Data Science", by Garrett Grolemund and Hadley Wickham and have come up with, but am not sure how to save individual plots:

ggplot(data = Infil_Data) + geom_point(mapping = aes(x = Sqrt_Time.x, 
    y = Calc_Vol_cm)) + facet_wrap(~Site.ID, nrow = 10)


tibble::tribble(
  ~Time.x, ~Site.ID, ~Sqrt_Time.x, ~Calc_Vol_cm,
       0L,     "A1",            0,           0L,
      30L,     "A1",  5.477225575,           1L,
      60L,     "A1",  7.745966692,           2L,
      90L,     "A1",  9.486832981,           3L,
     120L,     "A1",  10.95445115,           4L,
     150L,     "A1",  12.24744871,           5L,
     180L,     "A1",  13.41640786,           6L,
     210L,     "A1",  14.49137675,           7L,
     240L,     "A1",  15.49193338,           8L,
     270L,     "A1",  16.43167673,           9L,
     300L,     "A1",  17.32050808,          10L,
       0L,     "A2",            0,           0L,
      30L,     "A2",  5.477225575,           2L,
      60L,     "A2",  7.745966692,           4L,
      90L,     "A2",  9.486832981,           6L,
     120L,     "A2",  10.95445115,           8L,
     150L,     "A2",  12.24744871,          10L,
     180L,     "A2",  13.41640786,          12L,
     210L,     "A2",  14.49137675,          14L,
     240L,     "A2",  15.49193338,          16L,
     270L,     "A2",  16.43167673,          18L,
     300L,     "A2",  17.32050808,          20L
  )
#> # A tibble: 22 x 4
#>    Time.x Site.ID Sqrt_Time.x Calc_Vol_cm
#>     <int> <chr>         <dbl>       <int>
#>  1      0 A1             0              0
#>  2     30 A1             5.48           1
#>  3     60 A1             7.75           2
#>  4     90 A1             9.49           3
#>  5    120 A1            11.0            4
#>  6    150 A1            12.2            5
#>  7    180 A1            13.4            6
#>  8    210 A1            14.5            7
#>  9    240 A1            15.5            8
#> 10    270 A1            16.4            9
#> # ... with 12 more rows

I would suggest against use the facet_wrap() if you want each plot to be on a different page. We can utilize purrr::map() to make a list of plots and then do a for loop to print to a .pdf file.We could also do map() to do that, but it was the first thing that came to mind.

library(tidyverse)

Infil_Data <- tibble::tribble(
  ~Time.x, ~Site.ID, ~Sqrt_Time.x, ~Calc_Vol_cm,
  0L,     "A1",            0,           0L,
  30L,     "A1",  5.477225575,           1L,
  60L,     "A1",  7.745966692,           2L,
  90L,     "A1",  9.486832981,           3L,
  120L,     "A1",  10.95445115,           4L,
  150L,     "A1",  12.24744871,           5L,
  180L,     "A1",  13.41640786,           6L,
  210L,     "A1",  14.49137675,           7L,
  240L,     "A1",  15.49193338,           8L,
  270L,     "A1",  16.43167673,           9L,
  300L,     "A1",  17.32050808,          10L,
  0L,     "A2",            0,           0L,
  30L,     "A2",  5.477225575,           2L,
  60L,     "A2",  7.745966692,           4L,
  90L,     "A2",  9.486832981,           6L,
  120L,     "A2",  10.95445115,           8L,
  150L,     "A2",  12.24744871,          10L,
  180L,     "A2",  13.41640786,          12L,
  210L,     "A2",  14.49137675,          14L,
  240L,     "A2",  15.49193338,          16L,
  270L,     "A2",  16.43167673,          18L,
  300L,     "A2",  17.32050808,          20L
)

plot_lst <- Infil_Data %>% 
  split(.$Site.ID) %>% 
  map(~ggplot(data = .x) + 
        geom_point(mapping = aes(x = Sqrt_Time.x,  y = Calc_Vol_cm)))

pdf("allplots.pdf",onefile = TRUE)
walk(plot_lst, print)
dev.off()
#> quartz_off_screen 
#>                 2

Created on 2019-01-16 by the reprex package (v0.2.1)

So what we did was create a list of tibbles split by the Site.ID column that you were originally using for your facet_wrap() we then pass each tibble to a ggplot mapper used by the map which gives us a list of plots. We then print those plots individually to a .pdf.

3 Likes

I'm ashamed to say that I skipped right over your post and then got to exactly the same solution :sweat_smile: Out of interest, does it still work if the loop is replaced with a walk()?

Yes! That works, purrr can be such a lifesaver for problems like this!.

1 Like

I just want to make sure I understand some of the syntax you are using:

map(~ggplot(data = .x)
I read a little bit about "~", but I am not understanding it completely. Also the ".x"? Is this the same thing as entering the tibble/dataframe name (Infil_Data)?

split(.$Site.ID)
Does the "." also represent the tibble/dataframe (Infil_Data)?

1 Like

In the context of using map(), ~ is shorthand for function(x){ FUNCTION HERE } so it saves a few keystrokes when creating functions in map. For example these two statements are equivalent:

map(Infil_Dat, function(x){
  ggplot(data = x) + 
        geom_point(mapping = aes(x = Sqrt_Time.x,  y = Calc_Vol_cm))
})

vs.

map(Infil_Dat, ~ggplot(data = .x) + 
        geom_point(mapping = aes(x = Sqrt_Time.x,  y = Calc_Vol_cm)))

.x is specific to using map and the shorthand ~. So yes, it represents what you are passing to the function through map.

split(.$Site.ID)
Does the "." also represent the tibble/dataframe (Infil_Data)?

Yup, since split doesn't play super well with dplyr it requires to use a little different notation from functions like filter(), group_by(), summarise().

1 Like

Great. Thank you for the clarification. One last question, I want to add a title to each of the plots, corresponding to their Site.ID. How do I do that? I tried (below), but it did not work for me:

geom_point(mapping = aes(x = Sqrt_Time.x, y = Calc_Vol_cm, main = .$Site.ID))

I lied, I do have another question. I tried putting in (see below) and received an error (Can't convert a data.frame object to function). I then reentered what you had originally, from your first post and the error went away. So I am confused on how the two are equivalent.

map(Infil_Data, function(x){
ggplot(data = x) +
geom_point(mapping = aes(x = Sqrt_Time.x, y = Calc_Vol_cm))
})

You'd actually need to change the function a bit from map() to map2() so you can pass in the names of the dataframes to the ggplot (The column was kind of deleted from the dataset when we used split())

library(tidyverse)

Infil_Data <- tibble::tribble(
  ~Time.x, ~Site.ID, ~Sqrt_Time.x, ~Calc_Vol_cm,
  0L,     "A1",            0,           0L,
  30L,     "A1",  5.477225575,           1L,
  60L,     "A1",  7.745966692,           2L,
  90L,     "A1",  9.486832981,           3L,
  120L,     "A1",  10.95445115,           4L,
  150L,     "A1",  12.24744871,           5L,
  180L,     "A1",  13.41640786,           6L,
  210L,     "A1",  14.49137675,           7L,
  240L,     "A1",  15.49193338,           8L,
  270L,     "A1",  16.43167673,           9L,
  300L,     "A1",  17.32050808,          10L,
  0L,     "A2",            0,           0L,
  30L,     "A2",  5.477225575,           2L,
  60L,     "A2",  7.745966692,           4L,
  90L,     "A2",  9.486832981,           6L,
  120L,     "A2",  10.95445115,           8L,
  150L,     "A2",  12.24744871,          10L,
  180L,     "A2",  13.41640786,          12L,
  210L,     "A2",  14.49137675,          14L,
  240L,     "A2",  15.49193338,          16L,
  270L,     "A2",  16.43167673,          18L,
  300L,     "A2",  17.32050808,          20L
)

plot_lst <- Infil_Data %>% 
  split(.$Site.ID) %>% 
  map2(names(.), ~ggplot(data = .x) + 
        geom_point(mapping = aes(x = Sqrt_Time.x,  y = Calc_Vol_cm)) + 
        labs(title = paste(.y)))

pdf("allplots.pdf",onefile = TRUE)
walk(plot_lst, print)
dev.off()
#> quartz_off_screen 

Created on 2019-01-16 by the reprex package (v0.2.1)

1 Like

Also makes sure to mark the solution so other people can use this thread in the future and know what the answer was to the problem.

1 Like

I lied, I do have another question. I tried putting in (see below) and received an error (Can't convert a data.frame object to function). I then reentered what you had originally, from your first post and the error went away. So I am confused on how the two are equivalent.

Make sure you include the split(.$Site.ID)

1 Like

Is there any reason why the split splitted the Site.ID out of order? So for example if I had A1, A2, A3, A4, I would expect my pdf pages to be plot A1, plot A2, plot A3, and plot A4. I instead have (for example) plot A3, plot A2, plot A1, and plot A4.

Maybe if you arrange Site.ID before ploting (and yet another way to do it)

plot_lst <- Infil_Data %>%
    group_by(Site.ID) %>%
    arrange(Site.ID) %>% 
    do(plots = ggplot(., aes(x = Sqrt_Time.x,  y = Calc_Vol_cm)) + 
           geom_point() + 
           labs(title = unique(.$Site.ID))) %>% 
    .$plots

pdf("allplots.pdf",onefile = TRUE)
walk(plot_lst, print)
dev.off()

This works as well, but the Site.ID's are still out of order.

Sorry to came up for this again, I tried your solution for the same problem, it works well but in my case all plots are the same, looks like my code is only taking the first plot and repeating it for the other plots. Taking in account the example in this post, it is like all the plots are just for A1, ignoring A2, A3 and so on. Can you help me to get what is happening? Thanks in advance

Hard to help you without a reprex, please open a new topic providing a relevant REPRoducible EXample (reprex) illustrating your issue.