Running multiple Chi Square Tests with Tidyverse and Extracting P-Values

Deal all,

I am new in this community (not new to R and R Studio), so hopefully I won't break any thread rules :slight_smile:

I've got a data frame with features and a categorical target. Prior to modelling, I wish to run a series of Chi Square tests for each categorical feature vs. the target and a series of some other test (Kruskal Wallis?) for the numerical features. To make it short, some sort of feature selection.

The desired output is two tables, one with test statistics and p values form Chi square tests and the other similar for the alternative test for numerical variables.

I can scramble it somehow with base R, but I want to do it nicely and easily with Tidyverse as I am getting more and more familiar with it and liking it more and more.

My attempt led me here:

cat.vars = myData |> map_lgl(is.factor)
Ctg_Data = myData[cat.vars]
Ctg_Data[,-20] |> map(function(x) chisq.test(Ctg_Data[,20],x))

Which has worked and gave me a long list of Chi square test outputs.

I am not sure how to dig out the test statistic and p-value out of this list without using for loops. I wish to do it with Tidyverse. There must be a way with Purrr....

Thank you in advance !

There are functions in the broom package that provide tibbles of results from various fit functions. For example, with glance() you can run

Ctg_Data[,-20] |> map(function(x) chisq.test(Ctg_Data[,20],x)) |> 
    map(function(RES) glance(RES)[1,c("statistic", "p.value")])
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.