Greetings all,
I have an issue and was not sure where to address it.
I am doing some very simple text mining with 'tm'. I have produced word counts from two different corpuses. Now I would simply like to compare and contrast the groups of word sets. For example, what is A intersect B, A-B and B-A. To my knowledge 'tm' package does not do this. Am I incorrect?
Is there a better way/package to approach this? Is diff on the CLI all I really need?
I know that this is not strictly a RStudio problem. Where might you suggest I ask my general questions in the future?
library(dplyr)
# example word count tables
corps_1 <- tibble(
word = c('a', 'b', 'c'),
n = c(2,5,4)
)
corps_2 <- tibble(
word = c('b', 'c', 'd'),
n = c(7,8,9)
)
intersect(
corps_1 %>% select(word),
corps_2 %>% select(word)
)
#> # A tibble: 2 x 1
#> word
#> <chr>
#> 1 b
#> 2 c
union(
corps_1 %>% select(word),
corps_2 %>% select(word)
)
#> # A tibble: 4 x 1
#> word
#> <chr>
#> 1 b
#> 2 c
#> 3 a
#> 4 d
setdiff(
corps_1 %>% select(word),
corps_2 %>% select(word)
)
#> # A tibble: 1 x 1
#> word
#> <chr>
#> 1 a