Incoming/Amesh_CV_v1.docx Incoming/Amesh_CV_v2.docx Incoming/Amesh_CV_v3.docx Incoming/Amesh_CV_v4.docx Incoming/Amesh_CV_v6.docx Incoming/Amesh_Q_v1.docx Incoming/MIT/Akash_MIT_SoP_v1.docx Incoming/MIT/Akash_MIT_SoP_v2.docx
above data is in data.frame which is in the folder E drive inside "Amesh's" folder
so by giving data.frame
strsplit('Incoming/Amesh_CV_v1.docx', '') I want the output to be like Path and Version in separate column
It should based on the Version i.e (v1, _v1, V1 and so on)
I tried text processing using tidytext stringr regmatches with gregexpr string_extract using regex (regular expression) I am not able to get any output
Kindly let me know I am trying this from two days.
An imperfect, but somewhat workable solution…
I'm using the fill argument of separate() to get all of the files into the same column, despite the fact that the files are at different depths (you could probably extract file names using a regular expression, too, but I'm not that good at them).
fill
separate()
Then, for the versions, I'm using a regular expression with str_extract() saying "give me lower-case v followed by numbers" (see more on stringr and regular expressions here).
str_extract()
suppressPackageStartupMessages(library(tidyverse)) library(stringr) files <- c("Incoming/Amesh_CV_v1.docx", "Incoming/Amesh_CV_v2.docx", "Incoming/Amesh_CV_v3.docx", "Incoming/Amesh_CV_v4.docx", "Incoming/Amesh_CV_v6.docx", "Incoming/Amesh_Q_v1.docx", "Incoming/MIT/Akash_MIT_SoP_v1.docx", "Incoming/MIT/Akash_MIT_SoP_v2.docx") tibble(files) %>% separate(files, into = c("lv1", "lv2", "lv3"), sep = "/", fill = "left") %>% mutate("version" = str_extract(lv3, regex("v\\d+"))) #> # A tibble: 8 x 4 #> lv1 lv2 lv3 version #> <chr> <chr> <chr> <chr> #> 1 <NA> Incoming Amesh_CV_v1.docx v1 #> 2 <NA> Incoming Amesh_CV_v2.docx v2 #> 3 <NA> Incoming Amesh_CV_v3.docx v3 #> 4 <NA> Incoming Amesh_CV_v4.docx v4 #> 5 <NA> Incoming Amesh_CV_v6.docx v6 #> 6 <NA> Incoming Amesh_Q_v1.docx v1 #> 7 Incoming MIT Akash_MIT_SoP_v1.docx v1 #> 8 Incoming MIT Akash_MIT_SoP_v2.docx v2
Created on 2019-04-09 by the reprex package (v0.2.1)
Aside: If you do have access to the actual file paths, the fs package has a lot of nice helpers (e.g. is_dir(), is_file()) that could be handy.
is_dir()
is_file()
thank you ma'am but if i have to take this from a dataset or a data frame then i am getting a error as non character argument in r
what to do for this? Kindly reply.
there are 3000 file paths so i cant enter all into R right so how to put entire data in it.\
I can't tell without the exact error message and a sample of the data, but I'm guessing that you don't have stringsAsFactors=FALSE set, and, thus, are trying to do string operations on factors. You can convert this by changing the column in the data frame character with as.character().
stringsAsFactors=FALSE
as.character()
https://stat.ethz.ch/R-manual/R-devel/library/base/html/character.html
This is a separate question, and the answer depends on the file format. There's readLines() in base R, as well as read.csv(), etc. — both of these things can also be done with other packages, such as readr and data.table.
readLines()
read.csv()
I am getting a error: non character argument in r there are versions here right: #> # A tibble: 8 x 4 #> lv1 lv2 lv3 version #> #> 1 Incoming Amesh_CV_v1.docx v1 #> 2 Incoming Amesh_CV_v2.docx v2 #> 3 Incoming Amesh_CV_v3.docx v3 #> 4 Incoming Amesh_CV_v4.docx v4 #> 5 Incoming Amesh_CV_v6.docx v6 #> 6 Incoming Amesh_Q_v1.docx v1 #> 7 Incoming MIT Akash_MIT_SoP_v1.docx v1 #> 8 Incoming MIT Akash_MIT_SoP_v2.docx v2
i.e : v1, v2, v3, v4, v6 v5 is missing there so how to extract the missing version like: Version V5 is missing (i want it to be like this)
I will try what you said Thank you ma'am
files <- data.frame("E:/Review/Angshuman_Baruah", stringsAsFactors = FALSE) tibble(files) %>% separate(files, into = c("lv1", "lv2", "lv3"), sep = "/", fill = "left") %>% mutate("version" = str_extract(lv3, regex("v\d+")))
I am getting this
lv1 lv2 lv3 version 1 E: Review Angshuman_Baruah NA
if i am giving with the dir command then not working
files <- data.frame(dir("E:/Review/Angshuman_Baruah", stringsAsFactors = FALSE))
Error in dir("E:/Review/Angshuman_Baruah", stringsAsFactors = FALSE) : unused argument (stringsAsFactors = FALSE)
Kindly reply Ma'am. For the files which are there in the directory
files <- data.frame(dir("E:/Review/Angshuman_Baruah", stringsAsFactors = FALSE)) Error in dir("E:/Review/Angshuman_Baruah", stringsAsFactors = FALSE) : unused argument (stringsAsFactors = FALSE)
files <- data.frame("E:/Review/Angshuman_Baruah", stringsAsFactors = FALSE)
I want for this dir("E:/Review/Angshuman_Baruah", pattern=NULL, all.files=FALSE, full.names=FALSE, recursive = TRUE)
it has 49 entries how to give out like this for the above file Ma'am
lv1 lv2 lv3 version 1 NA Incoming Amesh_CV_v1.docx 1 2 NA Incoming Amesh_CV_v2.docx 2 3 NA Incoming Amesh_CV_v3.docx 3 4 NA Incoming Amesh_CV_v4.docx 4 5 NA Incoming Amesh_CV_v6.docx 6 6 NA Incoming Amesh_Q_v1.docx 1 7 Incoming MIT Akash_MIT_SoP_v1.docx 1 8 Incoming MIT Akash_MIT_SoP_v2.docx 2
Do you have a list of files as a dataset, or are you trying to get a list of the files that are in a directory?
If the latter, you can use fs::dir_ls() (after installing fs, of course).
fs::dir_ls()
If you're still having trouble, a self-contained reprex (short for reproducible example) will help us help you.
install.packages("reprex")
If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.
There's also a nice FAQ on how to do a minimal reprex for beginners, below:
If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.
reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")
For pointers specific to the community site, check out the reprex FAQ.
dir("E:/Review/Nidha Khan", pattern=NULL, all.files=FALSE, full.names=TRUE, recursive = TRUE)
details<-data.frame(dir("E:/Review/Nidha Khan", pattern=NULL, all.files=FALSE, full.names=FALSE, recursive = TRUE))
Directory
By using above code I got files which are there inside the folder by giving recursive = TRUE folder inside folder then files so if i run the above code (first code) then it display all the file if i run second code then it displays all the file in a data frame but when i am trying to use that data frame as stringasfactor doesnot work because of the dir command
i want the output like
From the folder inside folder - files i want to access all the file and display the versions for the filename i.e: v1 ,v2 ,v4 ,v5, v6............... if v 3 is missing i want output as version v3 missing
this is the goal
Ma'am could you please help.
the command which u sent earlier work for the input data, it doesnot work for the data.frame or files which are there inside the directory i.e: E:/Review/Nidha Khan inside this there are folders like "Nidha Khan/MSc" inside this there are docx, xlsx, png and some folders (file inside this)
trying to get list of file in the directory by using
dir("E:/Review/Nidha_Khan", pattern=NULL, all.files=FALSE, full.names=FALSE, recursive = TRUE)
details<-data.frame(dir("E:/Review/Nidha_Khan", pattern=NULL, all.files=FALSE, full.names=FALSE, recursive = TRUE))
It's not clear to me what you're working with yet. I can see what you're trying to do, but not what you're getting back.
For splitting a character variable into an unknown number of columns, see the approaches mentioned in the thread below:
No its just folder inside directories which i have to extract dir("E:/Review/Nidha_Khan", pattern=NULL, all.files=FALSE, full.names=TRUE, recursive = TRUE)
this is the command
above command in data.frame
details<-data.frame(dir("E:/Review/Angshuman_Baruah", pattern=NULL, all.files=FALSE, full.names=FALSE, recursive = TRUE))
files <- c("Incoming/Amesh_CV_v1.docx", "Incoming/Amesh_CV_v2.docx", "Incoming/Amesh_CV_v3.docx", "Incoming/Amesh_CV_v4.docx", "Incoming/Amesh_CV_v6.docx", "Incoming/Amesh_Q_v1.docx", "Incoming/MIT/Akash_MIT_SoP_v1.docx", "Incoming/MIT/Akash_MIT_SoP_v2.docx")
tibble(files) %>% separate(files, into = c("lv1", "lv2", "lv3"), sep = "/", fill = "left") %>% mutate("version" = str_extract(lv3, regex("v\d+"))) #> # A tibble: 8 x 4 #> lv1 lv2 lv3 version #> #> 1 Incoming Amesh_CV_v1.docx v1 #> 2 Incoming Amesh_CV_v2.docx v2 #> 3 Incoming Amesh_CV_v3.docx v3 #> 4 Incoming Amesh_CV_v4.docx v4 #> 5 Incoming Amesh_CV_v6.docx v6 #> 6 Incoming Amesh_Q_v1.docx v1 #> 7 Incoming MIT Akash_MIT_SoP_v1.docx v1 #> 8 Incoming MIT Akash_MIT_SoP_v2.docx v2
This above command how to use : details<-data.frame(dir("E:/Review/Angshuman_Baruah", pattern=NULL, all.files=FALSE, full.names=FALSE, recursive = TRUE)) for this data
details<-data.frame(dir("E:/Review/Nidha", pattern=NULL, all.files=FALSE, full.names=FALSE, recursive = TRUE))
files <- data.frame(details, stringsAsFactors = FALSE) View(files)
tibble(files) %>% separate(files, into = c("lv1", "lv2", "lv3"), sep = "/", fill = "left") %>% mutate("version" = str_extract(lv3, regex("v\d+")))
Output:::::
lv1 lv2 lv3 version 1 NA NA 1:49 NA 2 NA NA 1:49 NA 3 NA NA 1:49 NA 4 NA NA 1:49 NA 5 NA NA 1:49 NA 6 NA NA 1:49 NA 7 NA NA 1:49 NA 8 NA NA 1:49 NA 9 NA NA 1:49 NA 10 NA NA 1:49 NA
Can you please look at the reprex materials I linked to earlier. Though obviously I don't have your actual system drive, it's very hard to tell what's going on with unformatted code.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.