summarise the count of frequency of values in dataframe

str_guru · October 23, 2020, 10:41am

I am trying to have a summarize of frequency of values 1 of data frame. but only count of values if column have 1

df = data.frame (A1=c(1,	0,	0,	0,	1,	0,	0,	0,	1,	0,	0),
                           A2 = c(1,	1,	0,	0,	0,	1,	0,	1,	1,	0,	0),
                           A3 =c(0,	0,	0,	0,	0,	0,	0,	0,	0,	0,	0),
                           A4 =c(1,	0,	0,	0,	0,	0,	0,	0,	1,	1,	1))

out may be like count of values if > 0

column count
A1 3
A2 5
A4 4

nirgrahamuk · October 23, 2020, 10:45am

summarise(DF,across(.fns = sum)) %>% 
  pivot_longer(cols=everything()) %>% 
  filter(value >0)

str_guru · October 23, 2020, 11:32am

also if there is any na values in any column....how to remove or filter that

nirgrahamuk · October 23, 2020, 11:35am

if you change your example by making the first value of A1 NA, and run the code, what happens ?

str_guru · October 23, 2020, 11:50am

some of the column also have NA in my original data

Error: Problem with summarise() input ..1.
x ‘sum’ not meaningful for factors
i Input ..1 is across(.fns = sum).

nirgrahamuk · October 23, 2020, 11:58am

It would be best if your example provided representative data.
You previously implied your data was simple integers, that could be summed.
Yet. if you have factors, do they logically map to summable concepts, or must the be excluded ?

to provide example of your data you would do something like

dput(head(mydata,n=10))

str_guru · October 23, 2020, 12:13pm

structure(list(check_16 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 2L), .Label = c("0", "1"), class = "factor"), check_7 = c(0, 
0, 0, 0, 1, 0, 0, 0, 0, 0), check_8 = c(1, 1, 0, 0, 0, 0, 0, 
0, 0, 0), check_9_1 = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0), check_9_2 = c(0, 
0, 0, 1, 0, 1, 1, 1, 1, 1), check_12 = c(0, 0, 1, 0, 0, 0, 0, 
0, 0, 0), check_10 = c(1, 0, 0, 0, 1, 1, 1, 1, 1, 0), check_11 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), check_6_OR_14 = c(1L, 0L, 1L, 1L, 
1L, NA, 1L, 1L, 1L, 1L), Check_1 = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L), Check_3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), Check_54 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    check_2 = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0), check_56 = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), check_51 = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")

nirgrahamuk · October 23, 2020, 12:17pm

On the assumption that the data is in principle integer and so can be summed, as sustained by your most recent example, the following adjustment to pre transform all variables to integer serves to plug the gap

mutate_all(example_df,as.integer) %>%
summarise(across(.fns = sum)) %>% 
  pivot_longer(cols=everything()) %>% 
  filter(value >0)

# A tibble: 8 x 2
  name      value
  <chr>     <int>
1 check_16     12
2 check_7       1
3 check_8       2
4 check_9_1     1
5 check_9_2     6
6 check_12      1
7 check_10      6
8 check_2       1

system · October 30, 2020, 12:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.