Filter if criteria matches an exact range of data

I've run across a situation where I need to filter my column data$SEASONS to include only the names in data$NAMES that satisfies the criteria of playing in EVERY season in the range: 2016, 2017, 2018, 2019, and 2020.

I've tried:
new_df <- data %>% filter(SEASON %in% c(2016:2020))

But this includes NAMES that didn't play in EVERY year of the range.

Thanks for the help.
Clair

Can you please post a sample of the contents of data?

I'm a newbie here.... when you ask for a sample how would I provide? Thanks for your patience.

Run dput(head(data, n = 20)) in RStudio and copy the output of that command here. That will generate the code required for us to reproduce the first 20 observations of your data set.

I created a sample dataframe and ran the code:

dput(head(sample_df, n=20))
structure(list(NAME = c("Jim", "Bob", "John", "Justin", "Jim",
"Bob", "John", "Justin", "Jim", "Bob", "John", "Justin", "Jim",
"Bob", "John", "Justin", "Jim", "Bob", "John", "Justin"), SEASON = c(2020,
2019, 2020, 2018, 2019, 2018, 2019, 2017, 2018, 2017, 2018, 2016,
2017, 2016, 2017, 2015, 2016, 2015, 2016, 2014), zscore = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))

That helps, thank you. Something like this would give you the desired result.

library(tidyverse)

data <- structure(list(NAME = c("Jim", "Bob", "John", "Justin", "Jim",
                                "Bob", "John", "Justin", "Jim", "Bob", "John", "Justin", "Jim",
                                "Bob", "John", "Justin", "Jim", "Bob", "John", "Justin"), 
                       SEASON = c(2020, 2019, 2020, 2018, 2019, 2018, 2019, 2017, 2018, 2017, 2018, 2016, 
                                  2017, 2016, 2017, 2015, 2016, 2015, 2016, 2014), 
                       zscore = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), 
                  row.names = c(NA, -20L), 
                  class = c("tbl_df", "tbl", "data.frame"))
data %>% 
  add_column(PLAYED = 1) %>% 
  arrange(SEASON) %>% 
  pivot_wider(id_cols = NAME, names_from = SEASON, values_from = PLAYED) %>% 
  filter_at(vars(`2016`:`2020`), all_vars(!is.na(.))) %>% 
  select(NAME)
#> # A tibble: 2 x 1
#>   NAME 
#>   <chr>
#> 1 Jim  
#> 2 John

Created on 2020-04-13 by the reprex package (v0.3.0)

This is a really nice solution. I was just wondering what is the purpose of adding the new column PLAYED? Could we get the same result by extracting values from zscore?

data %>% 
  arrange(SEASON) %>% 
  pivot_wider(id_cols = NAME, names_from = SEASON, values_from = zscore) %>%
  filter_at(vars(`2016`:`2020`), all_vars(!is.na(.))) %>%
  select(NAME)

#> # A tibble: 2 x 1
#>   NAME 
#>   <chr>
#> 1 Jim  
#> 2 John

and for OPs benefit (since he said he's a newbie), we can get our data back into the same format as the original dataframe by adapting @siddharthprabhu's solution and using pivot_longer():

data %>% 
  arrange(SEASON) %>% 
  pivot_wider(id_cols = NAME, names_from = SEASON, values_from = zscore) %>% 
  filter_at(vars(`2016`:`2020`), all_vars(!is.na(.))) %>%
  pivot_longer(-NAME, names_to = "year", values_to = "zscore")

#> # A tibble: 14 x 3
#>    NAME  year  zscore
#>    <chr> <chr>  <dbl>
#>  1 Jim   2014      NA
#>  2 Jim   2015      NA
#>  3 Jim   2016       1
#>  4 Jim   2017       1
#>  5 Jim   2018       1
#>  6 Jim   2019       1
#>  7 Jim   2020       1
#>  8 John  2014      NA
#>  9 John  2015      NA
#> 10 John  2016       1
#> 11 John  2017       1
#> 12 John  2018       1
#> 13 John  2019       1
#> 14 John  2020       1
1 Like

Thank you both!
Yes, I need the data back to long format and only the NAME, SEASON and zscore of those who played in 2016:2020

I've found my solution. Thank you!

r
#Solution
solution <- sample_df %>% 
  arrange(SEASON) %>% 
  pivot_wider(id_cols = NAME, names_from = SEASON, values_from = zscore) %>% 
  filter_at(vars(`2016`:`2020`), all_vars(!is.na(.))) %>%
  pivot_longer(-NAME, names_to = "SEASON", values_to = "zscore") %>%
  drop_na()

Be careful when calling drop_na() like that in the future because it will drop all rows with missing values in any column (https://tidyr.tidyverse.org/reference/drop_na.html):

df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
df
#> # A tibble: 3 x 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 a    
#> 2     2 <NA> 
#> 3    NA b

df %>% drop_na()
#> # A tibble: 1 x 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 a

df %>% drop_na(x)
#> # A tibble: 2 x 2
#>       x y    
#>   <dbl> <chr>
#> 1     1 a    
#> 2     2 <NA>

Not an issue in this instance, but something to be aware of.

thank you.... so much to learn!

True. But since the OP had only provided a small sample of the data set, I didn't know if the variable zscore contained the value 1 in every row. So I simply erred on the side of caution and created my own count variable. Good call on the reverse transformation though.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.