Displaying correlation between 2 categorical variables with many values

I have a player dataset with the following variables: { score, city, state, zip}.

Where:

  • score: numeric variable
  • age: numeric variable
  • city: factor variable with: 22750 levels.
  • state: factor variable with: 50 levels
  • zip: factor variable with: 26659 levels

As we can imagine, variables: city and zip should be highly correlated.

I tried:

plot_correlation(dataset %>% select(city, zip), maxcat = 30000)

But got:

Error in CJ(1:841500, 1:22582) : 
  Cross product of elements provided to CJ() would result in 19002753000 rows which exceeds .Machine$integer.max == 2147483647

Is there anyway I can display/plot somehow the correlation between these 2 variables?

Thanks!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.