How to write a function which finds mean of a column based on a specific row

Dino25 · June 8, 2023, 12:41pm

I have been trying to write a function that finds the mean & median of a single row only and a column.

I have a list of countries and I want to pass on the dataset of a specific country and this will in return give me the mean of that column

 country year     score         
 Algeria 1980     -1.1201501     
 Algeria 1981     -1.0526943    
 Algeria 1982     -1.0561565     
 Algeria 1983     -1.1274560     
 Algeria 1984     -1.1353926

I have tried the below:

output <- function(dataset) {
mean_country <- mean(dataset[country, score])
median_country <- median(dataset[country, score])
return(list(mean_country, median_country)
}

and was expecting to test the function with output(dataset[Algeria, score]) and that it would give me the correct result.

I am aware it can quickly be done using rowMeans or tidyverse but I need to write it as a function and the above doesn't work.

Also, any input regarding the function returning a dataframe instead of a list, would be great.

Thank you in advance.

technocrat · June 9, 2023, 3:53am

d <- data.frame(
  country = c(
    "Albania", "Albania", "Albania", "Albania", "Albania",
    "Algeria", "Algeria", "Algeria", "Algeria", "Algeria"
  ),
  year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1984),
  score = c(
    -1.1201501, -1.0526943, -1.0561565,
    -1.127456, -1.1353926, -1.1201501, -1.0526943, -1.0561565, -1.127456,
    -1.1353926
  )
)

output <- function(x) {
  d        = d[which(d["country"] == x),]
  mean_s   = mean(d$score, na.rm = TRUE)
  median_s = median(d$score, na.rm = TRUE)
  return(data.frame(
    country  = x,
    mean_s   = mean_s,
    median_s = median_s))
}

output("Algeria")
#>   country   mean_s median_s
#> 1 Algeria -1.09837 -1.12015

^{Created on 2023-06-08 with reprex v2.0.2}

Dino25 · June 9, 2023, 7:25am

Hi there, that’s amazing, thank you. Can the vector provided in function(x) be adjusted in such a way that it takes the dataset instead of the country itself? So the dataset of a particular country as an example. I would need to test the function with a couple of different datasets that have been filtered accordingly.

Thank you!

technocrat · June 9, 2023, 8:12am

Sure, this one is hardwired for a data.frame object with a name of d. If you had several data.frames, that could be changed

d <- data.frame(
  country = c(
    "Albania", "Albania", "Albania", "Albania", "Albania",
    "Algeria", "Algeria", "Algeria", "Algeria", "Algeria"
  ),
  year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1984),
  score = c(
    -1.1201501, -1.0526943, -1.0561565,
    -1.127456, -1.1353926, -1.1201501, -1.0526943, -1.0561565, -1.127456,
    -1.1353926
  )
)

e <- data.frame(
  country = c(
    "Estonia", "Estonia", "Estonia", "Estonia", "Estonia",
    "France", "France", "France", "France", "France"
  ),
  year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1984),
  score = c(
    -1.1201501, -1.0526943, -1.0561565,
    -1.127456, -1.1353926, -1.1201501, -1.0526943, -1.0561565, -1.127456,
    -1.1353926
  )
)


output <- function(x,y) {
  z        = x[which(x["country"] == y),]
  mean_s   = mean(z$score, na.rm = TRUE)
  median_s = median(z$score, na.rm = TRUE)
  return(data.frame(
    country  = y,
    mean_s   = mean_s,
    median_s = median_s))
}

output(d,"Algeria")
#>   country   mean_s median_s
#> 1 Algeria -1.09837 -1.12015
output(e,"Estonia")
#>   country   mean_s median_s
#> 1 Estonia -1.09837 -1.12015

Dino25 · June 9, 2023, 8:42am

Hi Richard, thank you for this but I’m looking for the function that takes the dataset in, and gives the output of a single country only. So for example output(dataset[country$Albania, ] would give me the outputs. Apologies if I was not clear enough. Is this something that could be done with the command distinct perhaps inside the function?

Thank you.

technocrat · June 9, 2023, 10:38am

does that, but doesn’t really much benefit from being inside a function.

d[which(d$country  == “Algeria”),]

Gets the data frame with only the given country. But it doesn’t go on to do the calculations.

Dino25 · June 10, 2023, 7:55pm

technocrat:

d <- data.frame(
  country = c(
    "Albania", "Albania", "Albania", "Albania", "Albania",
    "Algeria", "Algeria", "Algeria", "Algeria", "Algeria"
  ),
  year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1984),
  score = c(
    -1.1201501, -1.0526943, -1.0561565,
    -1.127456, -1.1353926, -1.1201501, -1.0526943, -1.0561565, -1.127456,
    -1.1353926
  )
)

output <- function(x) {
  d        = d[which(d["country"] == x),]
  mean_s   = mean(d$score, na.rm = TRUE)
  median_s = median(d$score, na.rm = TRUE)
  return(data.frame(
    country  = x,
    mean_s   = mean_s,
    median_s = median_s))
}

output("Algeria")

Hi Richard,

this code gives the same output if I run it for another country (ie. Albania) - is there a way to overcome this?

Thank you very much for your help with this.

technocrat · June 10, 2023, 8:27pm

The data in the example for Albania is a duplicate of Algeria except for country name. For real data it will be different.

Dino25 · June 10, 2023, 10:01pm

i've managed to get around this, all I had to do is convert my data table to a data frame.

thank you

Dino25 · June 14, 2023, 10:58am

Hi Richard, i’ve got one more question if that’s ok.

Could I possibly adjust the code to take the dataset as an argument and return the mean for each one the unique countries instead? So do something similar as the code you’ve provided me with but with the unique() command inside the function? I’ve tried a few different codes but it didn’t work for me.

Thank you.

technocrat · June 14, 2023, 11:01am

output <- function(x,y) {
  d        = x[which(x["country"] == y),]
  mean_s   = mean(d$score, na.rm = TRUE)
  median_s = median(d$score, na.rm = TRUE)
  return(data.frame(
    country  = y,
    mean_s   = mean_s,
    median_s = median_s))
}

nirgrahamuk · June 14, 2023, 2:08pm



output <- function(d,cntry){
  sub <- which(d[["country"]] %in% cntry)
list(
  country = cntry,
  mean_s = aggregate(score ~ country , data = d,subset = sub,FUN = mean)$score,
median_s = aggregate(score ~ country , data = d,subset = sub,FUN = median)$score)|> 
    as.data.frame()
}

output(d,"Algeria") 
output(d,"Albania")
output(d,c("Albania",
           "Algeria"))