Code to return new df without some rows if condition is true???

Hi, everyone! I'm trying to write some code that returns a new dataframe without certain rows if a condition is true. In the tibble below, I have the study name, the test used in the study, and code that corresponds to the type of test used in the study. To reduce dependencies in the dataset, I want to write some code that tells me whether or not, for each study in the data, the study has both a total test score ("2") and score for only a subsection ("1") of the total test ("reading", "writing", etc.). So, I want to write code that says, "For each study, if the study has both a 2 and a 1, return a new df without the 2's". Any help with this? Thanks!


df <- tibble(
  Study = c(rep("Todd (2002)", 5), rep("Liz (2004)", 5)),
  Test = c("TOEFL (total)", "TOEFL (reading)", "TOEFL (writing)", rep("cloze", 2),
           "IELTS (total)", "IELTS (listening)", "IELTS (speaking)", rep("c-test", 2)),
  Code = c(2, 1, 1, 0, 0, 2, 1, 1, 0, 0)
)

Not sure if I've understood perfectly, but does this help?

df %>%
 inner_join(df %>%
   filter(Code == 2) %>%
   distinct(Study)) %>%
 filter(Code == 1)

The result for your data frame should be

# A tibble: 4 x 3
  Study       Test               Code
  <chr>       <chr>             <dbl>
1 Todd (2002) TOEFL (reading)       1
2 Todd (2002) TOEFL (writing)       1
3 Liz (2004)  IELTS (listening)     1
4 Liz (2004)  IELTS (speaking)      1
1 Like

Hi, Kaushik.

Thanks for your reply! Once I figure this bit of code out, I can finally wrap up some edits to my dissertation. So, I appreciate your help!

Yes! This totally works! However, is there a way for me to filter rows from distinct studies if that study has both Code == 1 and Code == 2? I need the conditional execution because, with my larger dataset, sometimes a study ends up with just a 1 or just a 2 after running some other code. I just really want to avoid manually checking the final dataset!

Thanks a bunch for your time here.

Another option is to filter the data frame directly. For example:

df %>% 
  group_by(Study) %>% 
  filter(if(any(Code==2) & any(Code==1)) {Code != 2} else {TRUE})
  Study       Test               Code
1 Todd (2002) TOEFL (reading)       1
2 Todd (2002) TOEFL (writing)       1
3 Todd (2002) cloze                 0
4 Todd (2002) cloze                 0
5 Liz (2004)  IELTS (listening)     1
6 Liz (2004)  IELTS (speaking)      1
7 Liz (2004)  c-test                0
8 Liz (2004)  c-test                0

Your sample data frame only has studies that fulfill the condition, so the rows with Code==2 for those studies are deleted. For studies that don't have at least one row with Code==1 and at least one row with Code==2, all rows will be returned.

Yaaaaay! Thank you both so much! I'm really so thankful that folks like you exist and contribute to the RStudio Community! You've both made my life so much easier. This bit of code was all I needed to really finish wrapping up my dissertation. Thank you!!!!!!!

Joels,

Can you also explain this bit of code for me? I'm not sure how to translate this into beginning-coder dummyspeak. What does the {TRUE} bit do?

Thanks, again.

TRUE just means that if the if condition is not satisfied, then the result is always TRUE and therefore all rows get returned.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.