Create vector from file names in folder?

Greetings. I am new to this community and have recently been introduced to "R" in a course called "Data Carpentry" at the University of Cambridge. I am now conducting gene expression analysis (RNAseq) and have about 100 fastq files in a folder. My question:

Is it possible, using R, to create a categorical vector containing the variables "file names" from a folder containing multiple files. In my case, more than 100. If this step is possible, can one use "mutate" to split each name so that for example, the first three characters from the file name are added to a second vector?

Thanks in advance for your help and please let me know if the above is unclear

list.files() does exactly what you want (to create your vector of files names). Note that you will get a character vector with the file names.

You can get information on this function with ?list.files.

The second step is also quite easy (with functional programming). There are various ways to achieve this, but I am not totally sure what you mean with your name splitting information, so it is hard to give you a code. But regular expression will allow you to select the first 3 characters of your names.

3 Likes

Hi, and welcome to the RStudio Community!

As prosoitos noted, list.files will solve your first problem - for the second one, could please provide a short reprex, so that we can see an actual example? See also here. Otherwise, we can't be sure what you would like to obtain, exactly.

1 Like

Many thanks for the prompt reply. Tried list.files (path = "file path") but ended-up with a empty vector, character(0). Must be doing something wrong. Once I get this sorted will try to be more specific about the second part of splitting the variables in the vector.

There are also several functions in the fs package that might help.

Just to make sure, are you familiar with relative file paths? If not, this image from Automate the Boring Stuff with Python by Al Sweigart gives some good examples.

In R, you can find out what your current working directory is with the getwd() function. Also, make sure to use / (forward slash) instead of \ (backslash) for separating directories in a path (so, don't write them like in the image). Backslash has a special meaning in R strings, and / works just fine for Windows paths.

2 Likes

Many thanks for the feedback. I am overwhelmed with the response of the community. My issue with using list.files was the file paths. Even though I was using forward slashes "/" (I am on a Mac), I could only get the command to work when I set the working directory to the place where the files were. I will post details on the second part of my question next...
Have a great weekend all!

4 Likes

That's the default for the path argument of the function list.files(). But you should be able to get it to work with files anywhere if you feed that argument with a proper path. So, even though you got it to work by setting the working directory to your files location, it might be worth trying to get it to work in a more general case: first, it will allow you to understand how R uses paths, but it will also ensure that your script works without having to set the working directory to a place that might be awkward in your workflow.

If you post your code, maybe we will be able to see why it isn't working.

By the way, did you put quotes around your path?

Something that might help you:

If you run getwd(), R will give you the path of your working directory. This will give you a template for a proper file path on your machine. It should be easy to then adapt this path to match your files location.

1 Like

Thanks for the extra help @prosoitos. I am posting the code below. I am working on a Mac with several HDs attached.

My working RStudio working directory in on "Macintosh HD" whereas I have the FASTQ files on an internal drive called "MiguelDATA10TB". Therefore I issued the following command:
file_names <- list.files("/Volumes/Miguel_DATA10TB/Work/AS_Sep2018/FASTQ_files")

Probably just a typo in your post, but in case:

Is your drive called "MiguelDATA10TB" (your text) or "Miguel_DATA10TB" (your code)?

If this is not a typo in the post but a typo in your code, that would be enough to explain why this is not working.

Many thanks again @prosoitos, it was indeed a typo. I now have the proper file path working in my code!

1 Like

Great!

If you need help for your second question, let us know (but we will need a little more information on what you want to do).

Thanks, I don't want to waste your time with more rookie questions. I will have a go myself. When I get stuck, I will come back for help :wink:

2 Likes