Using Apply or/and Mutate in R to find the average of 2 values in a dataframe

Hi All.

I am still new to the group and R.
I am working on a horse racing database.
The dataframe below displays the performance ratings achieved by seven different race
horse (rows 1 to 7). The performance ratings for each horse (column) are labelled DaH1 to DaH3,
where DaH1 is the most recent performance rating and DaH3 is the 3rd from recent performance
rating. The dataset also has some NA values. These are races that took place under different
conditions to todays race so are not considered as valid performances for consideration.

> racehorse_data
  DaH1 DaH2 DaH3
1    0  124  121
2  124  117  119
3  121  125  123
4  123  120  119
5    0  125   NA
6   NA    0    0
7  110   NA  123

I have generated some basic stats on the dataset such as mean and max (see code below).
I would now like to generate some calculations on the ratings of each racehorse using custom functions.
I am reading about the apply functions and mutate but am struggling to create code that
will enable me to find what I describe as the average of the best two performances over the last
three races (for each horse, row)
So for example, for horse 3, the average of the best two performances is 124, using (123+125)/2.
Where there is only 2 values for consideration due to NA values I would just evaluate the ratings two
ratings that I have.
My idea was to sort or order the rating values in each row by highest first and then take the
average of the highest 2. I'm trying this with Apply and Mutate but not quite getting there.
The code is below. I'd be grateful for any help the group can prove.
Thanks

racehorse_data$Mean <- apply(racehorse_data,1,mean,na.rm=T)
racehorse_data$Max <- apply(racehorse_data,1,max,na.rm=T)

Gives:


  DaH1 DaH2 DaH3      Mean Max
1    0  124  121  81.66667 124
2  124  117  119 120.00000 124
3  121  125  123 123.00000 125
4  123  120  119 120.66667 123
5    0  125   NA  62.50000 125
6   NA    0    0   0.00000   0
7  110   NA  123 116.50000 123

Thanks
Graham

You had the correct idea. Instead of using built in mean and max, use your own function. For example:

mean_of_best_k_races <- function(performances, k)
{
    sorted_x <- sort(x = performances,
                     na.last = FALSE)
    mean(x = tail(x = sorted_x,
                  n = k),
         na.rm = TRUE)
}

Then call apply with this function, and pass k as 2.

Hope this helps.

Hi.
Thanks for your help.
I have tested the code.
I tried:

> mean_of_best_k_races(c(4,6,9),2)
[1] 7.5

Which works.
The problem i'm having is running the apply function over the dataframe.
ie. passing the rows through the mean_of_best_k_races function

When I try to pass the dataframe I get an error:

mean_of_best_k_races(racehorse_data,2)
Error in [.data.frame(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected

I'm new to R so i'm clearly not passing the dataframe through correctly.
Any ideas?

Thanks
Graham

Hi.
I've thought this through again and realised that I need to use:

apply(racehorse_data,1,mean_of_best_k_races,2)

It works!
Many Thanks
Graham

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.