Contingency table for "character" data


I would like to first, make a summary, contingency table for this data:

mydata <- structure(list(Sex = c("F", "M", "F", "F", "F", "F", "M", "M", 
"F", "F", "F", "M", "M", "F", "M", "F", "M", "M", "F", "M", "F", 
"F", "F", "F"), Q1 = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A"), Q2 = c("B", "A", "A", "C", "B", "B", "B", "C", 
"B", "C", "C", "C", "A", "B", "A", "B", "A", "A", "C", "C", "B", 
"C", "B", "B")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 

but I received an error:

invalid 'type' (character) of argument

and then I would like to do chisq.test() for this data but how to create a matrix from character data ? This is survey with questions and answers: A,B,C, etc. Sex variable consist of Female and Male values.
I have done it in SPSS but would like to do it in R as well and additionally I want to do "post hoc" comparisons between groups to get p-value. Any help will be greatly appreciated.
For "post hoc" tests I found this package:

As vignette says: "x is a matrix passed on to the chisq.test function".
So anyway I will have to create a matrix from this data.frame with character variables.

Hi again,

It is always better to split your questions out into separate sections. For your first question, you will have to either create a factor or in this case get the frequency of those observations (which is definitely what you'd rather want to do - there are a variety of ways to coerce it etc into the right shape).

The chi-square test I would do like so. As you can see I have given you enough clarity in the example of how you'd run it (from input data to output).

M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                    response = c("A","B", "C"))

#>       response
#> gender   A   B   C
#>      F 762 327 468
#>      M 484 239 477

Xsq <- chisq.test(M)  # Prints test summary

#>  Pearson's Chi-squared test
#> data:  M
#> X-squared = 30.07, df = 2, p-value = 2.954e-07

Xsq$observed   # observed counts (same as M)
#>       response
#> gender   A   B   C
#>      F 762 327 468
#>      M 484 239 477
Xsq$expected   # expected counts under the null
#>       response
#> gender        A        B        C
#>      F 703.6714 319.6453 533.6834
#>      M 542.3286 246.3547 411.3166
Xsq$residuals  # Pearson residuals
#>       response
#> gender          A          B          C
#>      F  2.1988558  0.4113702 -2.8432397
#>      M -2.5046695 -0.4685829  3.2386734
Xsq$stdres     # standardized residuals
#>       response
#> gender          A          B          C
#>      F  4.5020535  0.6994517 -5.3159455
#>      M -4.5020535 -0.6994517  5.3159455

Created on 2021-11-07 by the reprex package (v2.0.0)

Thank you for reply.
Where do those values come from:

c(762, 327, 468), c(484, 239, 477)

They are not from my data, mydata object ?

Could you please show me an example how to do it ?

I just used a dummy table of frequencies to illustrate how to run it.

What is P1 A and P2 A exactly representing? The contingency table wouldn't have total.

I updated a picture:
This is my dataframe:

It represent 3 variables, Sex(F and M) and Q1 and Q2 which are collections of replies (A,B and C) to a questionnaire.

I would like to count frequencies here if possible and then to perform chisq.test() based on this data.
So I suppose my contingency table would be looking like this:

Correct me if I am wrong, please.

Even if I recode all three variables to numeric and then Sex variable to factor with "1" and "2" levels it still errors to:

Error in sum(x) : invalid 'type' (character) of argument

Maybe someone could help how to prepare a matrix for a chisq.test(). Thank you.

mydata <- structure(list(Sex = c("F", "M", "F", "F", "F", "F", "M", "M", 
                                 "F", "F", "F", "M", "M", "F", "M", "F", "M", "M", "F", "M", "F", 
                                 "F", "F", "F"), Q1 = c("A", "A", "A", "A", "A", "A", "A", "A", 
                                                        "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
                                                        "A", "A", "A"), Q2 = c("B", "A", "A", "C", "B", "B", "B", "C", 
                                                                               "B", "C", "C", "C", "A", "B", "A", "B", "A", "A", "C", "C", "B", 
                                                                               "C", "B", "B")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 

(step1 <- group_by_all(mydata) %>% 
  count() %>% 
  ungroup () %>%
              values_from="n") )

(step2 <- select(step1, 
                 where(is.numeric)) %>% as.matrix())

row.names(step2) <- pull(step1,Sex)

Xsq <- chisq.test(step2)

Thank you very much indeed Nir,

Is this "step1" object pulling replies A, B and C from both Q1 and Q2 variable ?

Yes, it is

Thank you, I would like to perform a post-hoc tests because in mydata I have got three groups: Sex, Q1 and Q2.
When I do it, I receive such a results:


#Pearson's Chi-squared test

data:  step2
X-squared = 8.6044, df = 2, p-value = 0.01354


Dimension     Value               A_A       A_B A_C
1         F Residuals -2.67775472558081  2.351899   0
2         F  p values           0.0445*  0.112100   1
3         M Residuals  2.67775472558081 -2.351899   0
4         M  p values           0.0445*  0.112100   1

It is a bit unclear to me as I would like to have p-values for each pair comparisons, I mean: Sex x Q1 x Q2.
Am I missing something here ?

Here is a video on how to do it in SPSS and Excel unfortunately manually, but I hoped that it was possible in R.

I think this arises from how you provided your example. Q1 and Q2 each of which contain values like 'A',
therefore if you need to distinguish the A's from Q1 from the A`s from Q2 you should rename them, perhaps like so:

mydata %>% mutate(Q1=paste0("Q1_",Q1),

and then proceed to creating step1

This is very good idea, but would it be possible in creation of "step1" to make a different contingency table and not to pull from Q1 and Q2 alltogether but leaving Q1 and Q2 separate in resulting pivot_wider() dataframe ?

i dont understand your request. it might help me to have an example from you.
However, on reflection the Q1 and Q2 mutation that I suggested secondarily would seem to give the sort of resulting contingency table as per your second screenshot image.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.