Hi All.
I am still new to the group and R.
I am working on a horse racing database.
The dataframe below displays the performance ratings achieved by seven different race
horse (rows 1 to 7). The performance ratings for each horse (column) are labelled DaH1 to DaH3,
where DaH1 is the most recent performance rating and DaH3 is the 3rd from recent performance
rating. The dataset also has some NA values. These are races that took place under different
conditions to todays race so are not considered as valid performances for consideration.
> racehorse_data
DaH1 DaH2 DaH3
1 0 124 121
2 124 117 119
3 121 125 123
4 123 120 119
5 0 125 NA
6 NA 0 0
7 110 NA 123
I have generated some basic stats on the dataset such as mean and max (see code below).
I would now like to generate some calculations on the ratings of each racehorse using custom functions.
I am reading about the apply functions and mutate but am struggling to create code that
will enable me to find what I describe as the average of the best two performances over the last
three races (for each horse, row)
So for example, for horse 3, the average of the best two performances is 124, using (123+125)/2.
Where there is only 2 values for consideration due to NA values I would just evaluate the ratings two
ratings that I have.
My idea was to sort or order the rating values in each row by highest first and then take the
average of the highest 2. I'm trying this with Apply and Mutate but not quite getting there.
The code is below. I'd be grateful for any help the group can prove.
Thanks
racehorse_data$Mean <- apply(racehorse_data,1,mean,na.rm=T)
racehorse_data$Max <- apply(racehorse_data,1,max,na.rm=T)
Gives:
DaH1 DaH2 DaH3 Mean Max
1 0 124 121 81.66667 124
2 124 117 119 120.00000 124
3 121 125 123 123.00000 125
4 123 120 119 120.66667 123
5 0 125 NA 62.50000 125
6 NA 0 0 0.00000 0
7 110 NA 123 116.50000 123
Thanks
Graham