tjcnnl1
September 23, 2020, 7:37pm
1
I have a column in a data frame that contains something like student id, name, score, subject, quarter. These items are connected by hyphen “- “ for example
Student$info: 1-john-80-math-4q19, 2-linda-90-art-4q10….
I want to extract subject item, i.e. after 3rd hyphen like “math” , “art”… how to do it? Is there a substring function based on pattern's position in r?
jmcvw
September 23, 2020, 8:10pm
2
The following regex will work. The greedy nature of the first instance of .+
means it will replace everything up to the second last hyphen.
s <- c('1-john-80-math-4q19', '2-linda-90-art-4q10')
sub('.+-(.+)-.+', '\\1', s)
An alternative might be to use strsplit()
?
strsplit(s, '-') %>%
map(4) %>%
unlist()
FJCC
September 23, 2020, 8:20pm
3
Here are two ways to separate the info column into its components.
TEXT <- c("1-john-80-math-4q19", "2-linda-90-art-4q10")
DF <- data.frame(OtherCol = c(23, 45), info = TEXT)
DF
#> OtherCol info
#> 1 23 1-john-80-math-4q19
#> 2 45 2-linda-90-art-4q10
library(stringr)
TextMat <- str_split(DF$info,"-", simplify = TRUE)
DF$Subject <- TextMat[, 4]
DF
#> OtherCol info Subject
#> 1 23 1-john-80-math-4q19 math
#> 2 45 2-linda-90-art-4q10 art
#with tidyr
library(tidyr)
DF <- data.frame(OtherCol = c(23, 45), info = TEXT)
DF <- separate(DF, info, into = c("ID", "name", "score", "subject", "quarter"), sep = "-")
DF
#> OtherCol ID name score subject quarter
#> 1 23 1 john 80 math 4q19
#> 2 45 2 linda 90 art 4q10
Created on 2020-09-23 by the reprex package (v0.3.0)
tjcnnl1
September 23, 2020, 8:40pm
4
Both work fine. Thank you!
tjcnnl1
September 23, 2020, 8:49pm
5
Thank you very much. The first one doesn't work for me. The alternative doesn't work neighter if the column value doesn't have hyphen.
system
Closed
September 30, 2020, 8:49pm
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.