read and merge csv file's

saso_008 · January 10, 2021, 11:59am

suppose we have a dataframe that have two cloumn's and any column have many value.

d1 <- data.frame(name = c("a", "b", "c", "d"),
                 code = c("x", "y", "w", "z"))

and we have four csv files (a.csv, b.csv ,c.csv ,d.csv) that we want to read the name's of column of d1 one by one and mutate in read csv file column's name and code by value's.
for example in a.csv file we want create two columns (name, code) bya and x value and then read b.csv create two columns (name, code) byb and y value and merged to previous , .... and the end return complete csv fie by for csv file's.

pieterjanvc · January 10, 2021, 1:03pm

Hi,

Could you provide a bit more details on an example It's a bit confusing what you want to do. So provide a sample of the inputs and the desired output data frames that will be saved as csv.

PJ

saso_008 · January 10, 2021, 1:33pm

for example suppose we have this csv file's:

a.cav
data.frame(age = c(10:14),
                 heaight = c(150,  198, 115, 124, 132))

b.cav
data.frame(age = c(20:26),
                 heaight = c(152, 154, 180, 165,190, 183, 125))

c.cav
data.frame(age = c(30:33),
                 heaight = c(150,  175, 145, 132))
d.cav
data.frame(age = c(40:42),
                 heaight = c(150,  198, 132))

and we have this data frame:

d1 <- data.frame(name = c("a", "b", "c", "d"),
                 code = c("x", "y", "w", "z"))

now we want to get names from d1 dataframe and read csv files (a, b,c,d) and add value a and x in new columnt to a and value b and y to b cav file and etc. and the end merge all them.
exit:

  age heaight  name code
1  10     150   a   x
2  11     198   a   x
3  12     115   a   x
4  13     124   a   x
5  14     132   a   x
7  20     152   b   y
8  21     154   b   y
9  22     180   b   y
10  23     165   b   y
11  24     190   b   y
12  25     183   b   y
13  26     125   b   y
14  30     150   c   w
15  31     175   c   w
16  32     145   c   w
17  33     132   c   w
18  40     150   d   z
19  41     198   d   z
20  42     132   d   z

pieterjanvc · January 10, 2021, 3:37pm

Hi,

Thanks, that's much clearer!
Here is a potential solution:

library(dplyr)
library(purrr)


#Create the dummy files
write.csv(data.frame(
  age = c(10:14),
  heaight = c(150,  198, 115, 124, 132)), "a.csv", 
  row.names = F)

write.csv(data.frame(
  age = c(20:26),
  heaight = c(152, 154, 180, 165,190, 183, 125)), "b.csv",
  row.names = F)


#THE CODE

#Get the new data
d1 = data.frame(name = c("a", "b"),
                 code = c("x", "y"))

#Read all files into one df
myData = map_df(d1$name, function(file){
  read.csv(paste0(file, ".csv")) %>% mutate(name = file)
})
myData
#>    age heaight name
#> 1   10     150    a
#> 2   11     198    a
#> 3   12     115    a
#> 4   13     124    a
#> 5   14     132    a
#> 6   20     152    b
#> 7   21     154    b
#> 8   22     180    b
#> 9   23     165    b
#> 10  24     190    b
#> 11  25     183    b
#> 12  26     125    b

#Join the data together
myData = myData %>% left_join(d1, by = "name")
myData
#>    age heaight name code
#> 1   10     150    a    x
#> 2   11     198    a    x
#> 3   12     115    a    x
#> 4   13     124    a    x
#> 5   14     132    a    x
#> 6   20     152    b    y
#> 7   21     154    b    y
#> 8   22     180    b    y
#> 9   23     165    b    y
#> 10  24     190    b    y
#> 11  25     183    b    y
#> 12  26     125    b    y

^{Created on 2021-01-10 by the reprex package (v0.3.0)}

EXPLANATION
The map_df function for the purrr package lets you apply a function over a list and then merge the results into one dataframe. I let it read all the filenames, then added the name of the file as an extra column.

In the second step, I used the left_join function from dplyr to join the data frame with new columns to the once we imported by joining on the "name" column.

Hope this helps,
PJ

saso_008 · January 10, 2021, 4:10pm

thanks @pieterjanvc i have a question. in this example for simple the example we use only four csv file's but if we had many(1000) csv file's how can merge them together?

pieterjanvc · January 10, 2021, 8:46pm

Hi,

That would not be too difficult. You just have to list the names of the files you're interested in in the function:
map_df(c("a", "b")... where you replace the vector with the vecor of file names. You can also use the nifty function list.files() to get a list of all files in a directory

PJ

saso_008 · January 10, 2021, 9:04pm

thanks
but when use list.file() i can't print name in the column name's and get me emety value.
now i want to use name column in d1 data frame for solve it.

d1 = data.frame(name = c("a", "b", "c", "d"),
                 code = c("x", "y", "w", "z"))

myData = map_df(d1$name, function(file){
  read.csv(paste0(file, ".csv")) %>% mutate(name = file)
})

error in file(file, "rt") : cannot open the connection

pieterjanvc · January 10, 2021, 10:05pm

Hi,

This error is because you have to make sure that the filenames exist in the correct folder. If you don't specify a folder, it will look in the current working directory. Also, if there is a file missing you will get this error.

#Check if files exist in parent directory
file.exists(paste0(d1$name, ".csv"))

#Check if files exist in other directory
file.exists(paste0("C:/Documents/Data/", d1$name, ".csv"))

You can use this to check which files are found and which not. Make sure that if the files are not in the parent directory, you need to specify the directory.

PJ

saso_008 · January 11, 2021, 9:48am

saso_008:

 age heaight  name code
1  10     150   a   x
2  11     198   a   x
3  12     115   a   x
4  13     124   a   x
5  14     132   a   x
7  20     152   b   y
8  21     154   b   y
9  22     180   b   y
10  23     165   b   y
11  24     190   b   y
12  25     183   b   y
13  26     125   b   y
14  30     150   c   w
15  31     175   c   w
16  32     145   c   w
17  33     132   c   w
18  40     150   d   z
19  41     198   d   z
20  42     132   d   z

yes its true but my file is exist to my directory.
this code run for you without error?

                 code = c("x", "y", "w", "z"))

myData = map_df(d1$name, function(file){
  read.csv(paste0(file, ".csv")) %>% mutate(name = file)
})

pieterjanvc · January 11, 2021, 12:27pm

Hi,

I updated my first post which now uses the d1$name. So run it, including the first step of file generation (beware it will overwrite a.csv and b.csv) and see if it works.

It should, an thus if that works, you should be able to do the same for any number of files providing they exist in the directory and have the same columns.

PJ

system · February 1, 2021, 12:27pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.