Seeking Package Info/Advice


#1

Greetings all,
I have an issue and was not sure where to address it.

  1. I am doing some very simple text mining with 'tm'. I have produced word counts from two different corpuses. Now I would simply like to compare and contrast the groups of word sets. For example, what is A intersect B, A-B and B-A. To my knowledge 'tm' package does not do this. Am I incorrect?
    Is there a better way/package to approach this? Is diff on the CLI all I really need?

  2. I know that this is not strictly a RStudio problem. Where might you suggest I ask my general questions in the future?

Thanks


#2

Hi Matt, maybe the tidytext pakcage can help with this? https://www.tidytextmining.com/


#3

The #general category is a great place to ask general questions! :grin:


#4

Does the tm package allow you to make use of R's functions for set options?

Using dplyrs set options as an example of what I am suggesting:



library(dplyr)

# example word count tables
corps_1 <- tibble(
  word = c('a', 'b', 'c'),
  n = c(2,5,4)
)
corps_2 <- tibble(
  word = c('b', 'c', 'd'),
  n = c(7,8,9)
)

intersect(
  corps_1 %>% select(word),
  corps_2 %>% select(word)
)
#> # A tibble: 2 x 1
#>   word 
#>   <chr>
#> 1 b    
#> 2 c
union(
  corps_1 %>% select(word),
  corps_2 %>% select(word)
)
#> # A tibble: 4 x 1
#>   word 
#>   <chr>
#> 1 b    
#> 2 c    
#> 3 a    
#> 4 d
setdiff(
  corps_1 %>% select(word),
  corps_2 %>% select(word)
)
#> # A tibble: 1 x 1
#>   word 
#>   <chr>
#> 1 a

Created on 2018-08-08 by the reprex package (v0.2.0).

The tidytext package offers support for dealing with word frequencies with word count tables similar to the one above;