Greetings. I am new to this community and have recently been introduced to "R" in a course called "Data Carpentry" at the University of Cambridge. I am now conducting gene expression analysis (RNAseq) and have about 100 fastq files in a folder. My question:
Is it possible, using R, to create a categorical vector containing the variables "file names" from a folder containing multiple files. In my case, more than 100. If this step is possible, can one use "mutate" to split each name so that for example, the first three characters from the file name are added to a second vector?
Thanks in advance for your help and please let me know if the above is unclear
list.files() does exactly what you want (to create your vector of files names). Note that you will get a character vector with the file names.
You can get information on this function with ?list.files.
The second step is also quite easy (with functional programming). There are various ways to achieve this, but I am not totally sure what you mean with your name splitting information, so it is hard to give you a code. But regular expression will allow you to select the first 3 characters of your names.
As prosoitos noted, list.files will solve your first problem - for the second one, could please provide a short reprex, so that we can see an actual example? See also here. Otherwise, we can't be sure what you would like to obtain, exactly.
Many thanks for the prompt reply. Tried list.files (path = "file path") but ended-up with a empty vector, character(0). Must be doing something wrong. Once I get this sorted will try to be more specific about the second part of splitting the variables in the vector.
Just to make sure, are you familiar with relative file paths? If not, this image from Automate the Boring Stuff with Python by Al Sweigart gives some good examples.
In R, you can find out what your current working directory is with the getwd() function. Also, make sure to use / (forward slash) instead of \ (backslash) for separating directories in a path (so, don't write them like in the image). Backslash has a special meaning in R strings, and / works just fine for Windows paths.
Many thanks for the feedback. I am overwhelmed with the response of the community. My issue with using list.files was the file paths. Even though I was using forward slashes "/" (I am on a Mac), I could only get the command to work when I set the working directory to the place where the files were. I will post details on the second part of my question next...
Have a great weekend all!
That's the default for the path argument of the function list.files(). But you should be able to get it to work with files anywhere if you feed that argument with a proper path. So, even though you got it to work by setting the working directory to your files location, it might be worth trying to get it to work in a more general case: first, it will allow you to understand how R uses paths, but it will also ensure that your script works without having to set the working directory to a place that might be awkward in your workflow.
If you post your code, maybe we will be able to see why it isn't working.
If you run getwd(), R will give you the path of your working directory. This will give you a template for a proper file path on your machine. It should be easy to then adapt this path to match your files location.
Thanks for the extra help @prosoitos. I am posting the code below. I am working on a Mac with several HDs attached.
My working RStudio working directory in on "Macintosh HD" whereas I have the FASTQ files on an internal drive called "MiguelDATA10TB". Therefore I issued the following command: file_names <- list.files("/Volumes/Miguel_DATA10TB/Work/AS_Sep2018/FASTQ_files")