How to merge multiple csv files as a NEW COLUMNS with filenames as a column headers

Hi,
I have hundreds of instrument-generated csv files,
each file containing two columns of numbers (representing X and Y values of a spectrum) with no headers.

how to import the csv files into the dataframe so,
that for each imported file there are two NEW columns added with filenames in the column header.
(the number of columns is equal to the number of files*2 with the file names in the first row of the columns)

The final data frame / csv file should look like this:
05AC001.csv; 05AC001.csv; 05AC002.csv; 05AC002.csv
0.0001; 46823.112; 0.0001; 65944.23
0.0002; 45682.215; 0.0002; 65973.24
0.0003; 68231.212; 0.0003; 39751.11
; is used here as a column separator

I tried to google questions like: "how to append/merge/add/concatenate multiple csv files as a columns in R"
and found, that:

  1. almost all examples are appending the csv files to make one file with two long columns
    (It can be done with simple command even in the windows)
  2. stackoverflow.com is completely useless, people just reposting fragments of code that are not working at all or are missing something substantial for functioning.

I have some basic knowledge of R(ggplot2) and have all different packages/libraries installed (dplyr, tidyverse, ... etc) R 4.0.4 and the latest RStudio installed win10.

It should be elementary, and I have the feeling that I am not asking the correct keywords/phrases?

Jiri,

It is not that straightforward, and it is a little hard to help since you do not show any of the code you have written so far. Have you already written the part getting the file names? If not, that should be your first step. The code below will create a new data frame with all the names (instead of using here, you can add the folder path).

new_files <- tibble( # generate list of new files in directory
  file_name =
    list.files(
      here("raw_data"),
      pattern = "(?i)csv"
    )
)

Once you have that list, you can loop over the file names, read in the csv file, add the file name to the data, and then use bind_cols to get what you want. Note, you might have to create an empty data frame, so you have something to do the initial binding.

By the way, I assume you are not making the file names the variable names. That will not work since they are identical. A simple way around that is to append X and Y as appropriate.

Hi Jiri,

Indeed, you're not the first one with this kind of questions and I've found the following reply with an interesting reference. Maybe you find it also useful?

Edit: adding the filenames as column header is not really considered as best practice and this article also explains how the filenames of the imported csv files are added in a new column source.

HTH

Yes, that is better approach if each CSV is not too large. I used the above code because each file was massive and I wanted a way to continue reading in files and not having to do every single file again if something went wrong (reading and processing takes about an hour).

One thing, @Jiri would need map_dfc instead of map_dfr.

Please note your stated example cannot work as you cannot have 2 columns with the same name.
It would work if you'd have only 1 column to add, e.g. only the Y values to a fixed X.
Instead I would continue with the classical approach, add everything rowwise first into the long table style.
So you have 3 columns: imported file, X-value, Y-value.
Once everything is imported you can transform it into the desired table format, converting the file to file_X and file_Y.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.