dplyr::separate - I'm puzzled

Hi guys, while programming I just noticed this behavior of the separate function I cannot really explain.
I'm not a programmer, just a statistician using R, so I decided to report it here for understanding.

df <- data.frame(vector = paste0("what.in.tarnation", c(1:10)))
df %>% separate(vector, c("a", "b", "c"))

so far it is working good, buf if sor some reason I specify the sepator as the dot, well things get weird.

df <- data.frame(vector = paste0("what.in.tarnation", c(1:10)))
df %>% separate(vector, c("a", "b", "c"), sep =".")

I get the warning "Warning message: Expected 3 pieces. Additional pieces discarded in 10 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]." and clearly the output is wrong.

Strange enough, it is only the dot to give this problem. With a comma, everything is fine.

df <- data.frame(vector = paste0("what,in,tarnation", c(1:10)))
df %>% separate(vector, c("a", "b", "c"), sep = ",")

Any possible explanation?

Welcome to the community!

My knowledge regarding regular expression is bad, but based on what I know, you need to use \\. as the sep argument.

library(tidyr)

df <- data.frame(vector = paste0("what.in.tarnation", 1:10))

separate(data = df,
         col = vector,
         into = c("a", "b", "c"),
         sep = "\\.")
#>       a  b           c
#> 1  what in  tarnation1
#> 2  what in  tarnation2
#> 3  what in  tarnation3
#> 4  what in  tarnation4
#> 5  what in  tarnation5
#> 6  what in  tarnation6
#> 7  what in  tarnation7
#> 8  what in  tarnation8
#> 9  what in  tarnation9
#> 10 what in tarnation10

Created on 2019-06-13 by the reprex package (v0.3.0)

Here's the relevant explanation from R4DS:

But if β€œ . ” matches any character, how do you match the character β€œ . ”? You need to use an β€œescape” to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, \ , to escape special behaviour. So to match an . , you need the regexp \. . Unfortunately this creates a problem. We use strings to represent regular expressions, and \ is also used as an escape symbol in strings. So to create the regular expression \. we need the string "\\." .

Hope this helps.


dplyr doesn't have the function separate. It's in tidyr.

3 Likes

Thanks!
I'm really getting into manipulating strings for work reasons, so for sure that helped!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.