Using pivotlonger on multiple variables of horse racing dataframe in R

Hi and Thanks in advance for any assistance the group can give.

I have a dataset which gives the performance ratings for 7 race horses
over their last 3 races. The performance ratings are DaH1, DaH2 and DaH3 where
DaH1 is the performance rating for the last race etc.
I also have data for race distances over which the races were ran, where the distances are
Dist1, Dist2 and Dist3 and they correspond to the performance ratings. ie. Horse 2 has a
performance rating of 124 for DaH1, with a race distance, Dist1, of 12.

The dataset is:

horse_data <- tibble(
  DaH1=c(0, 124, 121, 123, 0, NA, 110),
  DaH2=c(124, 117, 125, 120, 125, 0, NA),
  DaH3=c(121, 119, 123, 119, NA, 0, 123),
  Dist1 =c(10,12,10.3,11,11.5,14,10),
  Dist2 =c(10,10.1,12,8,9.5,10.25,8.75),
  Dist3 =c(11.5,12.5,9.8,10,10,15,10),
  horse =c(1,2,3,4,5,6,7),
)

I am trying to use pivot_longer to convert the data to a better dataset for performing
calculations depending upon race distances.
So far I used this code:

tidyData <- horse_data %>% 
  pivot_longer(
    values_to="Rating", 
    cols=c(DaH1, DaH2, DaH3), 
    names_prefix="DaH", 
    names_to="RaceIdx"
  )

To achieve:

> tidyData
# A tibble: 21 x 6
   Dist1 Dist2 Dist3 horse RaceIdx Rating
   <dbl> <dbl> <dbl> <dbl> <chr>    <dbl>
 1  10    10    11.5     1 1            0
 2  10    10    11.5     1 2          124
 3  10    10    11.5     1 3          121
 4  12    10.1  12.5     2 1          124
 5  12    10.1  12.5     2 2          117
 6  12    10.1  12.5     2 3          119
 7  10.3  12     9.8     3 1          121
 8  10.3  12     9.8     3 2          125
 9  10.3  12     9.8     3 3          123
10  11     8    10       4 1          123
# ... with 11 more rows

Where RaceIdx is the race number.
This has achieved the desired result for 'Rating' column but I need to be able to convert
Dist1, Dist2 and Dist3 in to a separate column 'Distance' that matches up each horses
corresponding DaH rating with Dist.
To illustrate, I am trying to end up with a dataset as follows:

      Distance horse RaceIdx Rating
      <dbl>  <dbl> <chr>    <dbl>
 1      10       1 1            0
 2      10       1 2          124
 3      11       1 3          121
 4      12       2 1          124
 5      10.1     2 2          117
 6      12.5     2 3          119
 7      10.3     3 1          121
 8      12       3 2          125
 9      9.8      3 3          123
10      11       4 1          123
# ... with 11 more rows

I need to filter the Ratings by Distance.
Then I hope to be able to produce average ratings for each horse ratings where the
race Distance is between 10 and 11.

Many Thanks in advance.

I am still learning all the options of pivot_longer so there may be a better method than this. It gets the data arrangement you want but not the column names.

library(tidyr)

horse_data <- tibble(
  DaH1=c(0, 124, 121, 123, 0, NA, 110),
  DaH2=c(124, 117, 125, 120, 125, 0, NA),
  DaH3=c(121, 119, 123, 119, NA, 0, 123),
  Dist1 =c(10,12,10.3,11,11.5,14,10),
  Dist2 =c(10,10.1,12,8,9.5,10.25,8.75),
  Dist3 =c(11.5,12.5,9.8,10,10,15,10),
  horse =c(1,2,3,4,5,6,7),
)
pivot_longer(horse_data, cols = -horse, names_to = c(".value", "RaceIdx"),
             names_pattern = "(DaH|Dist)(\\d)")
#> # A tibble: 21 x 4
#>    horse RaceIdx   DaH  Dist
#>    <dbl> <chr>   <dbl> <dbl>
#>  1     1 1           0  10  
#>  2     1 2         124  10  
#>  3     1 3         121  11.5
#>  4     2 1         124  12  
#>  5     2 2         117  10.1
#>  6     2 3         119  12.5
#>  7     3 1         121  10.3
#>  8     3 2         125  12  
#>  9     3 3         123   9.8
#> 10     4 1         123  11  
#> # ... with 11 more rows

Created on 2020-06-10 by the reprex package (v0.3.0)

That's Amazing. I was tearing out what's left of my hair. Many Thanks

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.