how can add csv name fie to cloumns dataframe

I have many csv file's that any file specifice name (for example: one.csv , two.csv, ....) and any csv file have a data frame , now i want to add a coloumn to this dataframe by specifice name csv file and merge all this csv files. what's idea???

If the files are read into a list of data frames, it is easy to use the bind_rows() function from dplyr to combine them into one data frame and give that data frame a column that shows the name of the original file. In the following code, the AllDat data frame has a column named Origin that shows which file provided the data. The code assumes the csv files are in the working directory.

library(dplyr)
Files <- list.files(pattern = ".csv")
CSVlist <- lapply(Files, read.csv)
names(CSVlist) <- Files
AllDat <- bind_rows(CSVlist, .id = "Origin")
1 Like

You can also use purrr::map_dfr(), the code would be something like this

library(tidyverse)

merged_df <- list.files(path = "path/to/your/files",
                        pattern = "\\.csv$",
                        full.names = TRUE) %>% 
    set_names() %>% 
    map_dfr(read_csv, .id = "file_name")
1 Like

its true but when merged csv files and add column, the file name column is complete path ("myPath/file/one_csv_name.csv"). and i want to get only csv name in this column(one_csv_name).

Screenshot_2021-01-08_12-00-43

Try the basename() function

how can use this function in the up code ?
tools::file_path_sans_ext(basename())

library(tidyverse)

merged_df <- list.files(path = "path/to/files",
                        pattern = "\\.csv$",
                        full.names = TRUE) %>% 
  set_names() %>%  
  map_dfr(read_csv, .id = "file_name") %>% 
  mutate(file_name=basename(file_name))
1 Like

thanks. i use separate( ) function for split filename, but the section my name return empty.

df <- data.frame(file_name = c("آبین.csv",
                               "ملت.csv", 
                               "شاروم.csv"))
df %>% separate(file_name, c("name", "file"))

  name file
1       csv
2       csv
3       csv

This thread is probably useful
Separate letters into columns for Arabic character strings - tidyverse - RStudio Community

don't work and get empty column!!

There are many ways that untold things can 'not work', if you wish specific technical help, please try to be as descriptive as possible with being explicit about what you tried, as well as how the problem presents.

For example, the thread I prompted you to read I believe offers two approaches to seperating on arabic character sets, one by use of a package and other via a base approach. did you try one, or both of these?

as an aside, when I run your code as provided, it does work 'in a sense' In that a df is made, although I see Ucodes for the letters rather than the direct interpretation. I can see that the arabic letters are preserved (i.e. that the U codes are correct, if I view()

df <- data.frame(file_name = c("آبین.csv",
                               "ملت.csv", 
                               "شاروم.csv"))
> df
                                     file_name
1         <U+0622><U+0628><U+06CC><U+0646>.csv
2                 <U+0645><U+0644><U+062A>.csv
3 <U+0634><U+0627><U+0631><U+0648><U+0645>.csv
df2<- df %>% separate(file_name, c("name", "file"))
view(df2)

image

it's true for you but give me empty name column when use view( ) function !!!

Im using Rsudio 1.3.1056 and
R 4.0.2 with a UK locale.

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

What R version are you using please ? perhaps its a 3.6 vs 4 issue

R version 4.0.3 (2020-10-10)
i think is problem from my linux

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.