Nile
November 18, 2022, 6:34pm
1
Hi,
I have a dataset of over 100 columns with varying numbers of categories e.g. some are dichotomous, some have 4 and some have 7 categories. My objective is to identify columns with the same number of categories to process them further.
Here is a sample data:
df= data.frame(col1= sample(1:4, 10, replace = T), col2= sample(11:14, 10, replace = T), col3= sample(1:38, 10, replace = T), col4= sample(11:49, 10, replace = T), col5= sample(1:64, 10, replace = T) , col6= sample(11:75, 10, replace = T))
df
col1 col2 col3 col4 col5 col6
1 2 13 22 41 45 72
2 1 13 6 44 59 68
3 3 12 19 22 34 43
4 3 13 19 40 16 18
5 1 12 13 40 56 28
6 1 11 36 20 37 15
7 2 13 31 42 25 61
8 4 11 31 45 33 35
9 3 14 8 40 3 50
10 2 14 22 45 44 22
Please let me know if there is any way to achieve that.
Thank you.
Below is one way to identify the number of unique values/categories within each column of df .
library(tidyverse)
df= data.frame(col1= sample(1:4, 10, replace = T), col2= sample(11:14, 10, replace = T),
col3= sample(1:38, 10, replace = T), col4= sample(11:49, 10, replace = T),
col5= sample(1:64, 10, replace = T) , col6= sample(11:75, 10, replace = T)
)
categories = lapply(1:length(df),
function(i){
d = data.frame(nrow(unique(df[i])))
names(d) = names(df[i])
d
}
) %>%
bind_cols()
categories
#> col1 col2 col3 col4 col5 col6
#> 1 4 4 9 10 10 9
Created on 2022-11-18 with reprex v2.0.2.9000
2 Likes
There are usually many ways to do something in R. One alternative uses across()
library(tidyverse)
df= data.frame(col1= sample(1:4, 10, replace = T), col2= sample(11:14, 10, replace = T),
col3= sample(1:38, 10, replace = T), col4= sample(11:49, 10, replace = T),
col5= sample(1:64, 10, replace = T) , col6= sample(11:75, 10, replace = T)
)
df
#> col1 col2 col3 col4 col5 col6
#> 1 4 13 1 16 61 50
#> 2 1 13 18 24 8 55
#> 3 3 11 14 43 62 60
#> 4 4 13 18 20 5 20
#> 5 2 13 18 48 27 74
#> 6 3 13 25 15 13 20
#> 7 1 12 14 38 46 41
#> 8 1 14 25 14 20 29
#> 9 4 12 2 43 3 71
#> 10 1 12 8 38 7 23
categories <- df |> summarise(across(col1:col6, n_distinct))
categories
#> col1 col2 col3 col4 col5 col6
#> 1 4 4 6 8 10 9
Created on 2022-11-18 with reprex v2.0.2
3 Likes
system
Closed
November 25, 2022, 7:37pm
4
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.