string function

I have a column in a data frame that contains something like student id, name, score, subject, quarter. These items are connected by hyphen “- “ for example

Student$info: 1-john-80-math-4q19, 2-linda-90-art-4q10….

I want to extract subject item, i.e. after 3rd hyphen like “math” , “art”… how to do it? Is there a substring function based on pattern's position in r?

The following regex will work. The greedy nature of the first instance of .+ means it will replace everything up to the second last hyphen.

s <- c('1-john-80-math-4q19', '2-linda-90-art-4q10')
sub('.+-(.+)-.+', '\\1', s)

An alternative might be to use strsplit()?

strsplit(s, '-') %>% 
  map(4) %>% 
  unlist()

Here are two ways to separate the info column into its components.

TEXT <- c("1-john-80-math-4q19", "2-linda-90-art-4q10")
DF <- data.frame(OtherCol = c(23, 45), info = TEXT)
DF
#>   OtherCol                info
#> 1       23 1-john-80-math-4q19
#> 2       45 2-linda-90-art-4q10
library(stringr)
TextMat <- str_split(DF$info,"-", simplify = TRUE)
DF$Subject <- TextMat[, 4]
DF
#>   OtherCol                info Subject
#> 1       23 1-john-80-math-4q19    math
#> 2       45 2-linda-90-art-4q10     art

#with tidyr
library(tidyr)
DF <- data.frame(OtherCol = c(23, 45), info = TEXT)
DF <- separate(DF, info, into = c("ID", "name", "score", "subject", "quarter"), sep = "-")
DF
#>   OtherCol ID  name score subject quarter
#> 1       23  1  john    80    math    4q19
#> 2       45  2 linda    90     art    4q10

Created on 2020-09-23 by the reprex package (v0.3.0)

Both work fine. Thank you!

Thank you very much. The first one doesn't work for me. The alternative doesn't work neighter if the column value doesn't have hyphen.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.