Extraction of some values from a series of strings

Hi everybody,
I need help to automate the extraction of some values from a series of strings.
the string has this pattern:
Message from authorname » Thu 6 May 2010, 21:21

I would like to create a table where for each column there is the authornames and the dates respectively. I am not interested in extracting the day of the week and the time.
suggestions? thanks in advance

You can use base R regex (you could also use stringr of course):

msg <-
  c(
    "Message from authorname1 » Thu 6 May 2010, 21:21",
    "Message from authorname2 » Thu 10 May 2010, 21:21"
  )

msg_rx <-
  regexec(
    "^Message from ([[:alnum:]]+) » [[:alpha:]]{3} ([[:digit:]]+ [[:alpha:]]+ [[:digit:]]+), ",
    msg
  )
msg_extract <- regmatches(msg, msg_rx)
name_date <- t(sapply(msg_extract, function(x) x[2:3]))
name_date
#          X1          X2
#1 authorname1  6 May 2010
#2 authorname2 10 May 2010
2 Likes

regex is powerful but I try to use it as little as possible

library(tidyverse)
library(lubridate)
df <- tibble(rawstring = "Message from authorname » Thu 6 May 2010, 21:21") %>%
  rowwise() %>%
  mutate(
    twohalves = str_split(rawstring, "»"),
    authorname = str_remove(
      string = head(twohalves, 1),
      pattern = "Message from "
    ),
    datetext = head(unlist(str_split(
      string = tail(twohalves, 1),
      pattern = ","
    )), 1),
    date = lubridate::dmy(datetext)
  )
1 Like

Why, performance concerns? You are using regex too, just through stringr.

Guys, thanks to your help, I've reached my purpose...

thank you very much, I've saved a lot of time!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

not performance, more, its a language I don't enjoy reading or reasoning about.
Stringr might be regexing under the hood, but when I'm splitting on explicit strings I find it more readible for myself to understand the mechanics of what im doing. Im sure if I studied and practiced regex more, then I might feel that regex is more legible etc, but I dont have motivation at the moment to dedicate any time to regex, complex string parsing doesn't come up much in my work.

Good to hear :slight_smile: Please mark the post that solved your questions