Not loosing a character in a Regex

cereghetti · June 24, 2020, 2:32am

Hi there!
I want to separate this sample:

library(dplyr)
library(tidyr)
example <- data.frame(data=c("Annie;7;4;1%;3%;Luciana;9;4%;2%;Lucas;1;2;3;7%;10%"))

And i need to separate it so every name has the proper information in the same column. Also, i do not know how long the "data" is, so i created a long vector with names

names<- paste0("name ",1:70,"")

And i separated the example like this:

example <- example %>% separate(data,into=c(names),sep="(%;[A-z])")

In this way i am able to separate all the cases, but i am missing the first and the last character (the only one that cares is the first one):

Annie;7;4;1%;3 | uciana;9;4%;2 | ucas;1;2;3;7%;10%

There is any way to split the information without loosing the first character? I am not good with regex.

Thank you!

EDIT:

Thanks to @andresrcs i was able to make it using regex (?<=%);(?=([a-zA-Z]))

andresrcs · June 24, 2020, 3:55am

How about this?

library(dplyr)
library(tidyr)

example <- data.frame(data=c("Annie;7;3%;Luciana;9;2%;Lucas;1;10%"))

example %>% 
    separate_rows(data, sep = "(?<=%);") %>% 
    separate(data, into = c("name", "value", "percentage"), sep = ";", convert = TRUE)
#> # A tibble: 3 x 3
#>   name    value percentage
#>   <chr>   <int> <chr>     
#> 1 Annie       7 3%        
#> 2 Luciana     9 2%        
#> 3 Lucas       1 10%

^{Created on 2020-06-24 by the reprex package (v0.3.0)}

cereghetti · June 24, 2020, 4:16am

Thank you, Andres. Your answer was really close to do it. It was my mistake: I am sorry, I am quite new with this and i had to edit the example that i used and now it is complete.
In this case, your solution is quite close to solve it, but you regex separates each sample in two ways.
With your help i have a hint about how to solve it: with a " Lookbehind"

Thank you!

system · July 1, 2020, 4:16am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.