R-Beginner: How to seperate to groups

Lolo · January 10, 2019, 1:59pm

I'm a beginner at R and I have a very basic question.

How do I create two new groups in my data?
I calculated the mean-scores for two variables (continuos variables A and B) of the study-participants and now I want to devide these participants in two groups by using these two mean-scores.
The first group should be the participants that have a higher mean in the variable A than the variable B
The second group should consist of the participants with the higher mean in variable B than A.

I guess it should use the subset() function but I'm not sure in what way.

Can anybody please help a R-beginner?

andresrcs · January 10, 2019, 2:41pm

As an R beginner you need to learn how to properly ask for help, this FAQ is going to help you with that.

FAQ: What's a reproducible example (`reprex`) and how do I create one? meta

Why reprex? Getting unstuck is hard. Your first step here is usually to create a reprex, or reproducible example. The goal of a reprex is to package your code, and information about your problem so that others can run it and feel your pain. Then, hopefully, folks can more easily provide a solution. What's in a Reproducible Example? Parts of a reproducible example: background information - Describe what you are trying to do. What have you already done? complete set up - include any library() calls and data to reproduce your issue. data for a reprex: Here's a discussion on setting up data for a reprex make it run - include the minimal code required to reproduce your error on the data…

Having that said, this is one example of how you could solve your problem applied to the iris dataset

library(dplyr)
iris %>% 
    group_by(Species) %>% 
    summarise(Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width)) %>% 
    mutate(group = ifelse(Sepal.Length > Sepal.Width, "A", "B"))
#> # A tibble: 3 x 4
#>   Species    Sepal.Length Sepal.Width group
#>   <fct>             <dbl>       <dbl> <chr>
#> 1 setosa             5.01        3.43 A    
#> 2 versicolor         5.94        2.77 A    
#> 3 virginica          6.59        2.97 A

^{Created on 2019-01-10 by the reprex package (v0.2.1)}

Will · January 10, 2019, 2:53pm

It's a bit tricky to give you exact code without seeing your data or what you'd like to do with it, but I'll take a crack at it.

Assuming your data is currently in a data.frame 'df' and has at least the following three columns:

Participant ID ('PID' below)
Score A mean ('A.bar' below)
Score B mean ('B.bar' below)

I would start by creating a new column defining if the mean for score A is greater than the mean for score B using the following code:

df$A.higher <- df$A.bar > df$B.bar

Now you can use that column (containing logical data, e.g., TRUE/FALSE values) to define your two groups, and you can create new objects using the column.

With the subset function:

Greater_A <- subset(df, select = c('PID', 'A.bar'), subset = A.higher)
Greater_B <- subset(df, select = c('PID', 'B.bar'), subset = !A.higher)

(note that '!' is an inverter, e.g. !TRUE = FALSE)

The subset argument in the subset function (yeah, that can be confusing) will select the rows for which the expression is true.

With the [ ] selector:

Greater_A <- df[df$A.higher,]
Greater_B <- df[!df$A.higher,]

Using the [ ] example will grab all of the columns, and the subset function will only grab the columns named in the select argument.

Hope that helps, let me know if you have any questions.

apreshill · January 11, 2019, 8:50pm

If you are a beginner, may I suggest the free online book R for Data Science?

You'll see that your question is addressed in Chapter 5, section 7, but there is a lot that comes before AND after that!

system · January 18, 2019, 8:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.