Adding rows in a variable to equal the number of observations with another variable

Hi,

I am trying desperately to adjust my data to be able to execute a Kruskal-Wallis Test. Therefore, I need two variables that have the same length (number of observations) which is not the case. Is it somehow possible to equalise that? One variable has 60 the other only 40 observations but (female and male participants in a survey), to use the data I need one data frame with both variables in the same length...any suggestions?

Thanks in advance!

The groups don't need to be the same size.

You can put your data in a list and feed that to kruskal.test. For example:

# Fake data
set.seed(2)
x = list(male = rgamma(40, 2),
         female = rgamma(60, 1.5))

kruskal.test(x)

Or, you can provide kruskal.test a data vector and a group vector:

x.data = c(x$male, x$female)
x.group = as.factor(rep(c("male", "female"), sapply(x, length)))

kruskal.test(x.data, x.group)

If you're relatively new to R, the help can sometimes be cryptic, but it's a good idea to get in the habit of checking the help to see if there's a way to cover your use case. In this case, the Details section of the help (?kruskal.test) describes the above options.

Ok thanks! But what is the 2 or 1.5?

I used the rgamma function to generate some fake data for illustration. 2 and 1.5 are just the shape parameters I chose for generating the fake data. The fake data are random draws from a gamma distribution.

Ok, but R demands a shape, in case I use that for my data, can I just 1 to satisfy the R or what is the influence on the actual result? (Sorry might be a stupid question but I am not really good with statistics in general)

Your data is your data. You don't need to use rgamma to generate fake data. I just needed some fake data to illustrate how to use kruskal.test. If your case, just use your actual data.

1 Like

Ok! I think I understood it now, tried it without and worked! Thank you so much for your quick replay and help! Really appreciated :slight_smile:

Do you know the solution for an Unpaired Two-Samples Wilcoxon Test?

It's similar. The test is unpaired by default (see ?wilcox.test). You just need to provide the two data vectors. With the list of fake data I created above:

wilcox.test(x$male, x$female)

Thanks! I still don't get it done somehow...when I use the list function it errors that I need numeric vectors. I really don't understand the second solution you provided...I can't get it into R since it always gives me an error and it just does not work since again....not the same length...no matter what I try.

To be able to help you further, I think we need a reproducible example. That is, a data sample plus the code you've tried and error(s) you're getting. For more on providing a reproducible example, see here.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.