I'm importing data which I need to separate into two columns. I'm having trouble to make the separator to match the first white space, because using \\s, eliminates the rest of the column. Normally I would use a split operator or remove the g flag on a regex, but here I don't know how to solve it.
So, as an example:
fruits <- data.frame(
col = c("apples and oranges and pears and bananas",
"pineapples and mangos and guavas")
)
separate(fruits, col, into = c("first", "rest"), sep = "\\s")
first rest
1 apples and
2 pineapples and
Warning message:
Expected 2 pieces. Additional pieces discarded in 2 rows [1, 2].
I would have expected:
first rest
1 apples and oranges and pears and bananas
2 pineapples and and mangos and guavas
I think you're just missing extra = "merge". It merges all "leftovers" into the last column you created with into.
separate(fruits, col, into = c("first", "rest"), sep = "\\s",
extra = "merge")
first rest
1 apples and oranges and pears and bananas
2 pineapples and mangos and guavas
(I apparently answered a very similar question on Stack Overflow in 2016. I don't remember this, so it's a good thing it came up when I searched! )
I think @aosmith solution is the winner for this case, but here is another solution using extract instead of separate, just for variety sake.
library(dplyr)
library(stringr)
fruits <- data.frame(
col = c("apples and oranges and pears and bananas",
"pineapples and mangos and guavas")
)
fruits %>%
mutate(first = str_extract(col, "^[^\\s]+"),
rest = str_extract(col, "\\s.+")) %>%
select(-col)
#> first rest
#> 1 apples and oranges and pears and bananas
#> 2 pineapples and mangos and guavas