Havign issues with coding my R for filtering out data

I have data I have collected from my university dissertation experiment online and I have loaded the data into R. I am having trouble trying to filter out participants who have scored less than 80% in the first stage of the experiment and participants who have scored less than 100% in the second stage. I was wondering what coding to use to filter out this data.
Here is the coding I have used so far:

library(tidyverse)
dat <- read_csv("Learningtask.csv")
tidydat <- dat %>% select(participant, response_age_a,response_age_b, response_gender, stage, trial_type, cue, correct_response, response, correct, rt, block_s1, count_s1, count_s2)
dat %>% head ()
dat%>% glimpse()
dat_clean <-
  dat %>% select(participant, 
                 trial_type, 
                 stage,
                 cue, 
                 correct_response, 
                 response, 
                 correct, 
                 rt, 
                 block_s1, 
                 count_s1, 
                 count_s2)

Here I have filtered out the columns I do not need but unsure of what coding to put in the below sentence:

 exclude<- c()
 dat_clean <- dat_clean %>%filter(!subject %in% exclude))

If anyone could help with this it would be much-appreicated :).

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

It's hard to tell without a reprex, but one thing I see is that you probably want to use one of the "not in" variations (see SO thread linked below), as opposed to !subject if you're doing something with exclude (though that wouldn't be my recommended approach).

I'd filter on whatever variable is the score of first stage filter(foo >= 80), and do the same for whatever the second-stage variable is.

1 Like

'I'd filter on whatever variable is the score of first stage filter(foo >= 80), and do the same for whatever the second-stage variable is.'

So would I.

But I suspect that's the problem. He doesn't have a score column? Note the "response", "correct" and "correct_response" columns. So I think he has the data in long form

So needs to group_by(participant, stage) then summarise(score = 100*(n(correct[correct = T])/n(correct)) perhaps

Then he can choose what to do with that...?

Yeah, for sure! I was just answering about the filtering issue, and assumed the data was being summarized somewhere. You'd definitely want to get the participant scores at each stage, etc.

At this point we're basically imaginary coding without a reprex. :upside_down_face:

Thank you!! I will try this tomorrow and see if it works :grin::grin:

Hello everyone, I have managed to work out which participants to remove but when i type this code into R:

exclude<- c(22617, 22638, 22666, 22701, 22714, 22720,22790, 22802, 22806, 2371, 23180, 23273,23300, 23469)
datclean <- dat_clean %>%filter(!subj %in% exclude))

it comes up with this message:

Error: unexpected ')' in "datclean <- dat_clean %>%filter(!subj %in% exclude))"

I have followed my uni Rcode and it still comes up with this message;/.

Not sure if this code is of any help :(.

You have the error because there is an unmatched ) at the end of the line.
I think your code should look like this:

exclude<- c(22617, 22638, 22666, 22701, 22714, 22720,22790, 22802, 22806, 2371, 23180, 23273,23300, 23469)
datclean <- dat_clean %>%filter(!subj %in% exclude)

You do have an extraneous parenthesis in there. You could accomplish the same thing with a not-in operator, but that would be the same thing (ht @martin.R).
e.g.

`%nin%` <- Negate(`%in%`)
datclean <- dat_clean %>% filter(subj %nin% exclude)

edit: reflect the equivalence of !foo %in% bar and foo %nin% bar

How is that against dplyr syntax?
dat_clean %>% filter(!subj %in% exclude) is surely identical to dat_clean %>% filter(subj %nin% exclude), but avoids an unnecessary function. I would wager that the majority of people use the former version.

You're right. I guess I've just never done it that way…I thought there was something about operator precedence that made that funky. I've edited my earlier post to reflect that. Here's a reprex demonstrating your point (n.b. I've just addesd the as_tibble() at the end to shorten the printing):

library(tidyverse)
exclude <- c("Mazda RX4 Wag", "Duster 360", "Hornet Sportabout", "Cadillac Fleetwood")
mtcars %>%
  rownames_to_column() %>%
  filter(!rowname %in% exclude) %>%
  as_tibble()
#> # A tibble: 28 x 12
#>    rowname       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  3 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  5 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  6 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#>  7 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8 Merc 280C    17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9 Merc 450SE   16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10 Merc 450SL   17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # … with 18 more rows

`%nin%` <- Negate(`%in%`)

mtcars %>%
  rownames_to_column() %>%
  filter(rowname %nin% exclude) %>%
  as_tibble()
#> # A tibble: 28 x 12
#>    rowname       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  3 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  5 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  6 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#>  7 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8 Merc 280C    17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9 Merc 450SE   16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10 Merc 450SL   17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # … with 18 more rows

Created on 2021-04-07 by the reprex package (v1.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.