Using filenames to create a subset of data organized by months/years with PNG files

nburola · September 15, 2020, 6:38pm

Hey y'all, could use your feedback on this problem I got. I have 176 PNG maps in a folder called "barca_monthly" with file names named in the following format such as "Avg_monthly_2005_01_01" in a year_month_day style. The data runs from 2005-2019 with each 1st day of each month per year being taken into account. My objective was to create a function that could merge all of these files into a GIF and I found out how to do that. However, I am having trouble selectively choosing let us say for example all of the 2005 PNG files to create a GIF from those in the same folder. I circumvented this problem by creating sub-folders with data, however, it would be nice just to code this action.

One of my coworkers suggested using the file names of the PNG maps to create a subset based on the years/months that are wanted. Does anyone know how to do that? The filter function would not work here considering this is not a .csv file but rather 176 PNG files.

Appreciate the help!

AlexisW · September 15, 2020, 7:10pm

You can create a table. Here using tidyverse functions:

library(tidyverse)

DF <- tibble(filenames = list.files("my_dir") %>%
  mutate(fields = strsplit(filenames, "_"),
         date_fields = map(fields, ~ .x[(length(.x)-2):length(.x)]),   # keep only the last 3 fields (since the number of initial fields is variable)
         year = map_chr(date_fields, ~ .x[1]),                         # extract the date information
         month = map_chr(date_fields, ~ .x[2]),
         day = map_chr(date_fields, ~ .x[3]),
         year = as.integer(year),                                      # convert to integer, you could keep as character or convert to factor if more practical
         month = as.integer(month),
         day = as.integer(day)) %>%
  select(filenames, year:day)                                          # get rid of the intermediary columns

DF
# A tibble: 3 x 4
#  filenames               year month   day
#  <chr>                  <int> <int> <int>
#1 Avg_monthly_2005_01_01  2005     1     1
#2 Avg_monthly_2005_01_03  2005     1     3
#3 Avg_monthly_1985_01_01  1985     1     1

Then you can apply filter or any grouping operation you like on the date columns, and use the filenames column to call file.copy() or something else.

I'm not familiar with png creation, but you might have functions inside R to directly combine them based on filenames, that would avoid moving files around at all (perhaps with png::readPNG() and caTools::write.gif()).

nburola · September 15, 2020, 11:21pm

Hey, AlexisW! Thanks for the ingenious solution, really appreciate it! I keep on getting the following error message, "Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "character"" whenever I try and run the mutate line of the example that you wrote out. I searched up the class of the folder with class(barca_monthly) and it seems that my folder in question is that of a character class. I tried changing it but every time I change formats it results in NA_real, NA_integer, NA_complex, and other issues.

How would you change the class of the folder in question so that error message does not appear anymore?

In addition, for the input right after filenames, what exactly do I write in? Would I write the specific number of files from 1:20 as an example?

AlexisW · September 15, 2020, 11:35pm

What do you mean with class of the folder? You should just use list.files() to get the list of files, see ?list.files. So on Windows it may look like:

file_list <- list.files("C:\\Users\\my_username\\Documents\\dir_with_all_the_files\\subdir\\")
file_list
# [1] "Avg_monthly_2005_01_01" "Avg_monthly_2005_01_03" "Avg_monthly_1985_01_01"

Then you can create the tibble using this list as the content of a column:

DF <- tibble(my_filenames = file_list)
DF
# A tibble: 3 x 1
#   my_filenames             
#   <chr>                 
# 1 Avg_monthly_2005_01_01
# 2 Avg_monthly_2005_01_03
# 3 Avg_monthly_1985_01_01

If you still get the mutate error, try to see exactly which mutation is causing the problem (is it strplit? the 3 map_chr? or the conversions to integer?)

Oh and just in case that's the problem: mutate() takes as input a data.frame (or tibble), not directly a vector. So the file names have to be in a column of a tibble, and can't be passed directly with a pipe.

nburola · September 16, 2020, 12:22am

Ah, my apologies. I was unclear how to use the list.files command with relation to writing code. I have conquered that and managed to create the tibble using the list as the content of the column:

file_list <- list.files("C:/Users/name/Documents/barca_monthly")
file_list

dataframe <- tibble(my_filenames = file_list)
dataframe

When trying to execute the remainder of the code, I get this unique warning: Error: Problem with mutate() input date_fields. x only 0's may be mixed with negative subscripts i Input date_fields is map(fields, ~.x[(length(.x) - 2):length(.x)]).

I might be confusing the placement of the strsplit() function considering this is my first time using it. I think the error may have to do with the mutate() function directly with not writing the correct "_" command of your earlier post:
dataframe <- tibble(my_filenames = file_list) %>%
mutate(fields = strsplit(my_filenames, "barca_monthly"),
date_fields = map(fields, ~ .x[(length(.x)-2): length(.x)]),
year = map_chr(date_fields, ~ .x[1]),
month = map_chr(date_fields, ~ .x[2]),
day = map_chr(date_fields, ~ .x[3]),
year = as.integer(year),
month = as.integer(month),
day = as.integer(day)) %>%
select(filenames, year:day)

Thanks for the help thus far, appreciate it!

nirgrahamuk · September 16, 2020, 12:34am

it would be a mistake to use "barca_monthly". The field seperator is _ and the code provided will select the last 3 fields , so its a good approach for you to use also.

mutate(fields = strsplit(my_filenames, "_"),

nburola · September 16, 2020, 12:44am

It worked! I can't believe it, the code created a beautiful table with my_filenames, year, month, and day in a separate window!

I can use the filter() function to nab the year and months I want! Is there a way to directly combine them based on filenames to avoid moving files around at all in the same folder to make a GIF? An earlier samaritan mentioned using png::readPNG() or caTools::write.gif()).

system · October 7, 2020, 12:44am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.