Is there a 'tidy' approach to splitting data from text into columns, where each 'vector of text' does not contain the same number of elements?
I'm having trouble where stringr::str_view will recognize the string I want to split on, but I can't get tidyr::seperate, to separate the data properly.
I would assume as I want to split where three spaces occur, that the easiest way would be to simply specify the spaces in brackets, but I don't think tidyr likes that?
library(stringr)
library(tidyr)
data<-tibble::tribble(
~value,
"RATINGS: 4 MEAN: 3.83/5.0 WEIGHTED AVG: 3.39/5 IBU: 35 EST. CALORIES: 204 ABV: 6.8%",
"RATINGS: 89 WEIGHTED AVG: 3.64/5 EST. CALORIES: 188 ABV: 6.25%",
"RATINGS: 8 MEAN: 3.7/5.0 WEIGHTED AVG: 3.45/5 IBU: 85 EST. CALORIES: 213 ABV: 7.1%"
)
separate(data, value, into = c("Ratings","Weighted Avg","IBU","Est Calories","abv"),sep="[//s]",fill = "right")
#this works
str_view_all(string = "RATINGS: 4 MEAN: 3.83/5.0 WEIGHTED AVG: 3.39/5 IBU: 35 EST. CALORIES: 204 ABV: 6.8%",pattern = "[ ]+")
str_view_all(string = "RATINGS: 4 MEAN: 3.83/5.0 WEIGHTED AVG: 3.39/5 IBU: 35 EST. CALORIES: 204 ABV: 6.8%",pattern = " ")
#but I can't split on it with tidyr's seperate
separate(data, value, into = c("Ratings","Weighted Avg","IBU","est Calories","abv"),sep= " ",extra = "merge")
Warning message:
Too few values at 3 locations: 1, 2, 3