dplyr::separate - I'm puzzled

Hi guys, while programming I just noticed this behavior of the separate function I cannot really explain.
I'm not a programmer, just a statistician using R, so I decided to report it here for understanding.

df <- data.frame(vector = paste0("what.in.tarnation", c(1:10)))
df %>% separate(vector, c("a", "b", "c"))

so far it is working good, buf if sor some reason I specify the sepator as the dot, well things get weird.

df <- data.frame(vector = paste0("what.in.tarnation", c(1:10)))
df %>% separate(vector, c("a", "b", "c"), sep =".")

I get the warning "Warning message: Expected 3 pieces. Additional pieces discarded in 10 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]." and clearly the output is wrong.

Strange enough, it is only the dot to give this problem. With a comma, everything is fine.

df <- data.frame(vector = paste0("what,in,tarnation", c(1:10)))
df %>% separate(vector, c("a", "b", "c"), sep = ",")

Any possible explanation?

Welcome to the community!

My knowledge regarding regular expression is bad, but based on what I know, you need to use \\. as the sep argument.


df <- data.frame(vector = paste0("what.in.tarnation", 1:10))

separate(data = df,
         col = vector,
         into = c("a", "b", "c"),
         sep = "\\.")
#>       a  b           c
#> 1  what in  tarnation1
#> 2  what in  tarnation2
#> 3  what in  tarnation3
#> 4  what in  tarnation4
#> 5  what in  tarnation5
#> 6  what in  tarnation6
#> 7  what in  tarnation7
#> 8  what in  tarnation8
#> 9  what in  tarnation9
#> 10 what in tarnation10

Created on 2019-06-13 by the reprex package (v0.3.0)

Here's the relevant explanation from R4DS:

But if “ . ” matches any character, how do you match the character “ . ”? You need to use an “escape” to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, \ , to escape special behaviour. So to match an . , you need the regexp \. . Unfortunately this creates a problem. We use strings to represent regular expressions, and \ is also used as an escape symbol in strings. So to create the regular expression \. we need the string "\\." .

Hope this helps.

dplyr doesn't have the function separate. It's in tidyr.


I'm really getting into manipulating strings for work reasons, so for sure that helped!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.