how can add csv name fie to cloumns dataframe

saso_008 · January 7, 2021, 7:57pm

I have many csv file's that any file specifice name (for example: one.csv , two.csv, ....) and any csv file have a data frame , now i want to add a coloumn to this dataframe by specifice name csv file and merge all this csv files. what's idea???

FJCC · January 7, 2021, 8:32pm

If the files are read into a list of data frames, it is easy to use the bind_rows() function from dplyr to combine them into one data frame and give that data frame a column that shows the name of the original file. In the following code, the AllDat data frame has a column named Origin that shows which file provided the data. The code assumes the csv files are in the working directory.

library(dplyr)
Files <- list.files(pattern = ".csv")
CSVlist <- lapply(Files, read.csv)
names(CSVlist) <- Files
AllDat <- bind_rows(CSVlist, .id = "Origin")

andresrcs · January 7, 2021, 11:57pm

You can also use purrr::map_dfr(), the code would be something like this

library(tidyverse)

merged_df <- list.files(path = "path/to/your/files",
                        pattern = "\\.csv$",
                        full.names = TRUE) %>% 
    set_names() %>% 
    map_dfr(read_csv, .id = "file_name")

saso_008 · January 8, 2021, 8:29am

its true but when merged csv files and add column, the file name column is complete path ("myPath/file/one_csv_name.csv"). and i want to get only csv name in this column(one_csv_name).

nirgrahamuk · January 8, 2021, 8:41am

Try the basename() function

saso_008 · January 8, 2021, 9:16am

how can use this function in the up code ?
tools::file_path_sans_ext(basename())

nirgrahamuk · January 8, 2021, 9:37am

library(tidyverse)

merged_df <- list.files(path = "path/to/files",
                        pattern = "\\.csv$",
                        full.names = TRUE) %>% 
  set_names() %>%  
  map_dfr(read_csv, .id = "file_name") %>% 
  mutate(file_name=basename(file_name))

saso_008 · January 8, 2021, 10:31am

thanks. i use separate( ) function for split filename, but the section my name return empty.

df <- data.frame(file_name = c("آبین.csv",
                               "ملت.csv", 
                               "شاروم.csv"))
df %>% separate(file_name, c("name", "file"))

  name file
1       csv
2       csv
3       csv

nirgrahamuk · January 8, 2021, 10:32am

This thread is probably useful
Separate letters into columns for Arabic character strings - tidyverse - RStudio Community

saso_008 · January 8, 2021, 10:58am

don't work and get empty column!!

nirgrahamuk · January 8, 2021, 11:00am

There are many ways that untold things can 'not work', if you wish specific technical help, please try to be as descriptive as possible with being explicit about what you tried, as well as how the problem presents.

For example, the thread I prompted you to read I believe offers two approaches to seperating on arabic character sets, one by use of a package and other via a base approach. did you try one, or both of these?

as an aside, when I run your code as provided, it does work 'in a sense' In that a df is made, although I see Ucodes for the letters rather than the direct interpretation. I can see that the arabic letters are preserved (i.e. that the U codes are correct, if I view()

df <- data.frame(file_name = c("آبین.csv",
                               "ملت.csv", 
                               "شاروم.csv"))

> df
                                     file_name
1         <U+0622><U+0628><U+06CC><U+0646>.csv
2                 <U+0645><U+0644><U+062A>.csv
3 <U+0634><U+0627><U+0631><U+0648><U+0645>.csv

df2<- df %>% separate(file_name, c("name", "file"))
view(df2)

saso_008 · January 8, 2021, 11:25am

it's true for you but give me empty name column when use view( ) function !!!

nirgrahamuk · January 8, 2021, 11:28am

Im using Rsudio 1.3.1056 and
R 4.0.2 with a UK locale.

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252

saso_008 · January 8, 2021, 11:36am

Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

nirgrahamuk · January 8, 2021, 11:52am

What R version are you using please ? perhaps its a 3.6 vs 4 issue

saso_008 · January 8, 2021, 11:55am

R version 4.0.3 (2020-10-10)
i think is problem from my linux

system · January 15, 2021, 11:55am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.