I got a document feature matrix from the quanteda package. The features are bigrams in this form word1_word2.
feature
1 good_morning
2 right_now
3 years_ago
4 last_night
5 r_u
6 ou_know
I would like to separate these bigrams into word1 in one column and word2 in another column, looking like this.
word1 word2
good morning
right now
years ago
last night
r u
ou know
The left of the underscore becomes word1, while the right of the underscore becomes word2. How do I do this in R?
I'm not familiar with the output of the quanteda package but if you can conver the output to a dataframe then you can use tidyr::separate()
library(tidyr)
df <- data.frame(stringsAsFactors = FALSE,
bigram = c("good_morning", "right_now", "years_ago",
"last_night", "r_u", "ou_know"))
separate(df, bigram, c("word1", "word2"), sep = "_")
#> word1 word2
#> 1 good morning
#> 2 right now
#> 3 years ago
#> 4 last night
#> 5 r u
#> 6 ou know
Created on 2019-04-15 by the reprex package (v0.2.1.9000)
Thank you very much! It worked! Another question--how do I separate trigrams? My data looks like this:
feature
1 enjoying_case_presentations
2 case_presentations_students
3 presentations_students_w
4 students_w_good
5 w_good_luck
6 good_luck_students
I would like it to look like this
words1:2 word3
1 enjoying_case presentations
2 case_presentations students
3 presentations_students w
4 students_w good
5 w_good luck
6 good_luck students