Hi there,
I'm trying to figure out general approach to removing low cell counts, such as of less than 10, from output tables or graphs in R. The tables could be frequency tables of one variable, cross-tabulations or three-way tables. The graphs could be bar graphs, including clustered or stack bar graphs, or histograms. It's a common requirement in social research.
In Stata, I was able to do this using the following code to generate a counter variable with which to set a logical condition (if) for producing the table or graph. I then dropped the counter from the data set. For example:
# bysort variable1: gen count = _N
# tab variable1 variable2 if count >= 10
# drop count
What would be the equivalent procedure in R using the following example?
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(
zone_1 = rpois(10, 5),
zone_2 = rpois(10, 5),
zone_3 = rpois(10, 5)
)
df
z2z3 <- table(df$zone_2,df$zone_3) [what condition would I need to apply here to remove cell counts of say less than 2?]
z2z3
1 2 5 6 14
1 0 1 0 0 0
3 0 0 0 0 1
4 0 0 0 2 0
5 0 1 0 0 0
7 1 1 0 1 0
8 0 0 2 0 0
Is there a common approach with using packages such as 'sjPlot'?
Using the same dataset,
tab_xtab(df$zone_2,
df$zone_3,
var.labels = c("z2", "z3"),
statistics = "fisher",
show.row.prc = TRUE)
Where would I insert the condition or how would I transform the data prior to running 'tab_xtab'?
Thanks very much,
Steve