Filename manipulation

SidV · May 26, 2022, 7:06am

Hello.
I am relatively new to R and struggling with some basics about filenames and data structuring.
I have a folder with the following type of file names (i have only put a few names here but i have ~300 files)

"125975_Face_1.fcsv"
"125975_Face_2.fcsv"
"126284_Face_1.fcsv"
"126292_Face_2.fcsv"
"126814_Face_2.fcsv"
"126878_Face_1.fcsv"
"126878_Face_2.fcsv"

I would like to create a new table (matrix)
The first column should have the FIRST NUMERIC component of the file name (e.g. 125975)
The second column should have the SECOND NUMERIC component of the file name (either 1 or 2)
How do I go about "extracting" that information from the file names?

Thank you for your help

Sid

pieterjanvc · May 26, 2022, 10:42am

Hi,

Welcome to the RStudio community!

Here is a way of doing this in base R

#Get the filenames
fileNames = list.files(path = ".", pattern = ".fcsv")

#Dummy filenames (replace by above)
fileNames = c("125975_Face_1.fcsv", "125975_Face_2.fcsv", "126284_Face_1.fcsv")

#Split the filenames by _ or .
fileNames = strsplit(fileNames, '_|\\.')
fileNames
#> [[1]]
#> [1] "125975" "Face"   "1"      "fcsv"  
#> 
#> [[2]]
#> [1] "125975" "Face"   "2"      "fcsv"  
#> 
#> [[3]]
#> [1] "126284" "Face"   "1"      "fcsv"

#Get the numeric values
fileNames = data.frame(
  num1 = as.integer(sapply(fileNames, "[[", 1)),
  num2 = as.integer(sapply(fileNames, "[[", 3))
)
fileNames
#>     num1 num2
#> 1 125975    1
#> 2 125975    2
#> 3 126284    1

^{Created on 2022-05-26 by the reprex package (v2.0.1)}

The sapply(fileNames, "[[", 1) function is a way of getting the n-th element (here 1st) out of a nested list.
I have also added as.integer() to convert the results to integers. If you like to keep it as character, remove that.

If you don't mind using the Tidyverse, they have some very neat functions to do this quicker using a little bit of RegEx

library(tidyverse)

fileNames = c("125975_Face_1.fcsv", "125975_Face_2.fcsv", "126284_Face_1.fcsv")

#Create a data frame with file names
fileNames = data.frame(file = fileNames)

#Use the extraxt function from the tidyr package to split into new columns
fileNames = fileNames %>% 
  extract(file, into = c("num1", "num2"), regex = "^(\\d+)_Face_(\\d+)", convert = T)
fileNames
#>     num1 num2
#> 1 125975    1
#> 2 125975    2
#> 3 126284    1

^{Created on 2022-05-26 by the reprex package (v2.0.1)}

Remove the convert = T option if you like to keep the values as characters instead of integers.

Hope this helps,
PJ

SidV · May 26, 2022, 6:32pm

Dear PJ.
This is fantastic (as are you for helping with your time so generously)
I am sure I will be encountering some issues soon and will reach out again, but I thank you sincerely for this. I would have just been stuck for hours/days trying to figure it out
Sid

system · June 2, 2022, 6:33pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.