R-Beginner: How to seperate to groups

I'm a beginner at R and I have a very basic question.

How do I create two new groups in my data?
I calculated the mean-scores for two variables (continuos variables A and B) of the study-participants and now I want to devide these participants in two groups by using these two mean-scores.
The first group should be the participants that have a higher mean in the variable A than the variable B
The second group should consist of the participants with the higher mean in variable B than A.

I guess it should use the subset() function but I'm not sure in what way.

Can anybody please help a R-beginner?

As an R beginner you need to learn how to properly ask for help, this FAQ is going to help you with that.

Having that said, this is one example of how you could solve your problem applied to the iris dataset

library(dplyr)
iris %>% 
    group_by(Species) %>% 
    summarise(Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width)) %>% 
    mutate(group = ifelse(Sepal.Length > Sepal.Width, "A", "B"))
#> # A tibble: 3 x 4
#>   Species    Sepal.Length Sepal.Width group
#>   <fct>             <dbl>       <dbl> <chr>
#> 1 setosa             5.01        3.43 A    
#> 2 versicolor         5.94        2.77 A    
#> 3 virginica          6.59        2.97 A

Created on 2019-01-10 by the reprex package (v0.2.1)

3 Likes

It's a bit tricky to give you exact code without seeing your data or what you'd like to do with it, but I'll take a crack at it.

Assuming your data is currently in a data.frame 'df' and has at least the following three columns:

  • Participant ID ('PID' below)
  • Score A mean ('A.bar' below)
  • Score B mean ('B.bar' below)

I would start by creating a new column defining if the mean for score A is greater than the mean for score B using the following code:

df$A.higher <- df$A.bar > df$B.bar

Now you can use that column (containing logical data, e.g., TRUE/FALSE values) to define your two groups, and you can create new objects using the column.

With the subset function:

Greater_A <- subset(df, select = c('PID', 'A.bar'), subset = A.higher)
Greater_B <- subset(df, select = c('PID', 'B.bar'), subset = !A.higher)

(note that '!' is an inverter, e.g. !TRUE = FALSE)

The subset argument in the subset function (yeah, that can be confusing) will select the rows for which the expression is true.

With the [ ] selector:

Greater_A <- df[df$A.higher,]
Greater_B <- df[!df$A.higher,]

Using the [ ] example will grab all of the columns, and the subset function will only grab the columns named in the select argument.

Hope that helps, let me know if you have any questions.

1 Like

If you are a beginner, may I suggest the free online book R for Data Science?

You'll see that your question is addressed in Chapter 5, section 7, but there is a lot that comes before AND after that!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.