suppress low count values in tables

rob_r · May 15, 2019, 9:36pm

Hi R Studio Community,

With public health data it is common to suppress low value cell counts (say less than 5).

In the case where a table cell value less than 5 occurs, I wish to substitute/ suppress that value to state <5.

Any tips/ examples of this practice in R tables?

Thank you in advance for any direction.

technocrat · May 15, 2019, 10:28pm

With a data frame or tibble, you can easily apply boolean filters

countLT5 <- cell_count %>% filter(type2 > 5)

will create a new object with the counts less than 5 omitted. You can also overwrite

cell_count <- cell_count %>% filter(type2 > 5)

rob_r · May 15, 2019, 11:12pm

Thanks technocrat. I think my original post was vague.

By appending some code hopefully I explain myself better.

Starting with some counts:

df <- data.frame(
zone_1 = rpois(10, 5),
zone_2 = rpois(10, 5),
zone_3 = rpois(10, 5)
)

This provides a df such as:

zone_1 zone_2 zone_3
1 7 4 7
2 5 7 7
3 7 10 7
4 5 3 3
5 1 6 7
6 7 5 6
7 4 4 4
8 5 3 6
9 5 6 5
10 6 4 6

Now if I apply a filter such as -

df %>% filter_all(all_vars (.> 5))

It only returns the one row as all other rows have at least one value less than 5.

zone_1 zone_2 zone_3
1 7 10 7

Rather than omit the values, the aim is to retain the original table structure but substitute values 5 or less with "<5" where applicable.

technocrat · May 16, 2019, 2:49am

A reproducible example, called a reprex is always a good idea and I see my problem, which is that I assumed you were working with with a single column and filter takes the whole row.

So, mutate will do the job

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(
zone_1 = rpois(10, 5),
zone_2 = rpois(10, 5),
zone_3 = rpois(10, 5)
)
df
#>    zone_1 zone_2 zone_3
#> 1       8      5      7
#> 2       9      3      3
#> 3       7      6      8
#> 4       4      5      5
#> 5       4      5      7
#> 6       6      6      2
#> 7       7      8      1
#> 8       6      5      4
#> 9       8     11      6
#> 10      3      3      3
df %>% 
  mutate(zone_1 = ifelse(zone_1 < 5, NA, zone_1)) %>%
  mutate(zone_2 = ifelse(zone_2 < 5, NA, zone_2)) %>%
  mutate(zone_3 = ifelse(zone_3 < 5, NA, zone_2))
#>    zone_1 zone_2 zone_3
#> 1       8      5      5
#> 2       9     NA     NA
#> 3       7      6      6
#> 4      NA      5      5
#> 5      NA      5      5
#> 6       6      6     NA
#> 7       7      8     NA
#> 8       6      5     NA
#> 9       8     11     11
#> 10     NA     NA     NA

^{Created on 2019-05-15 by the reprex package (v0.2.1)}

I don't think you want to use "<5" as a value, because then your column is no longer a numeric vector.

andresrcs · May 16, 2019, 2:56am

I agree with technocrat, changing your data to character is not a good idea but if you just want to do this for printing or visualization, then this is a shorter way.

library(dplyr)
set.seed(1)
df <- data.frame(
    zone_1 = rpois(10, 5),
    zone_2 = rpois(10, 5),
    zone_3 = rpois(10, 5)
)

df %>%
    mutate_all(~ifelse(.<5, "<5", .))
#>    zone_1 zone_2 zone_3
#> 1      <5     <5      9
#> 2      <5     <5     <5
#> 3       5      6      6
#> 4       8     <5     <5
#> 5      <5      7     <5
#> 6       8      5     <5
#> 7       9      6     <5
#> 8       6     11     <5
#> 9       6     <5      8
#> 10     <5      7     <5

rob_r · May 16, 2019, 4:02am

Thank you both andresrcs and technocrat for considering my problem.

And yes I absolutely need to invest some time into reprex.

As the output is a final summary table, changing the data to character is a simple but effective solution.

system · May 23, 2019, 4:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.