 # Using Filter function. Need to assign NA and keep length of dataset.

Hi All.
I'm still new to the group and R.
I had some really helpful feedback on my last query so hoping I can get
some more support with the following:

I am working on a horse racing database that at this stage has 4 variables:
race horse number, race id, distance of race and the rating (DaH) assigned for the horses
performance for the race.

The dataset:

``````horse_ratings <- tibble(
horse=c(1,1,1,2,2,2,3,3,3),
raceid=c(1,2,3,1,2,3,1,2,3),
Dist=c(9.47,9.47,10,10.1,10.2,9,11,9.47,10.5),
DaH=c(101,99,103,101,94,87,102,96,62)
)
``````

Giving:

``````> horse_ratings
# A tibble: 9 x 4
horse raceid  Dist   DaH
<dbl>  <dbl> <dbl> <dbl>
1     1      1  9.47   101
2     1      2  9.47    99
3     1      3 10      103
4     2      1 10.1    101
5     2      2 10.2     94
6     2      3  9       87
7     3      1 11      102
8     3      2  9.47    96
9     3      3 10.5     62

``````

I will perform a number of calculations on the dataset such as mean rating, max rating etc
which id like to result in a number of vectors of equal length.

I'm using the filter function to look at the performance ratings achieved for different
race distances (ie. Distance greater than 10 to begin). However, if one of the horses has not
run a race for that distance then i've noticed that the result does not include that
horse in the output. ie:

``````> horse_ratings %>%
+   group_by(horse) %>%
+   filter(Dist>10) %>%
+   summarise(mean_rating=mean(DaH))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
horse mean_rating
<dbl>       <dbl>
1     2        97.5
2     3        82
``````

So horse 1 has disappeared as it has not run a race of distance greater than 10.
I need to keep the output vector of length 3 ideally so I can put all the calculations
in to a dataframe of same length (for my final data output/print out).
I'm hoping there's a way of assigning an NA or similar to an output for horse 1
Giving:

``````# A tibble: 2 x 2
horse mean_rating
<dbl>       <dbl>
1     1        NA
2     2        97.5
3     3        82
``````

Or a similar solution.
Help would be much appreciated!!

Use `.preserve = TRUE`

``````library(dplyr)

horse_ratings <- tibble(
horse=c(1,1,1,2,2,2,3,3,3),
raceid=c(1,2,3,1,2,3,1,2,3),
Dist=c(9.47,9.47,10,10.1,10.2,9,11,9.47,10.5),
DaH=c(101,99,103,101,94,87,102,96,62)
)

horse_ratings %>%
group_by(horse) %>%
filter(Dist>10, .preserve = TRUE) %>%
summarise(mean_rating=mean(DaH))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 2
#>   horse mean_rating
#>   <dbl>       <dbl>
#> 1     1       NaN
#> 2     2        97.5
#> 3     3        82
``````

Created on 2020-06-18 by the reprex package (v0.3.0)

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.