How to write a function which finds mean of a column based on a specific row

I have been trying to write a function that finds the mean & median of a single row only and a column.

I have a list of countries and I want to pass on the dataset of a specific country and this will in return give me the mean of that column

`````` country year     score
Algeria 1980     -1.1201501
Algeria 1981     -1.0526943
Algeria 1982     -1.0561565
Algeria 1983     -1.1274560
Algeria 1984     -1.1353926
``````

I have tried the below:

``````output <- function(dataset) {
mean_country <- mean(dataset[country, score])
median_country <- median(dataset[country, score])
return(list(mean_country, median_country)
}
``````

and was expecting to test the function with output(dataset[Algeria, score]) and that it would give me the correct result.

I am aware it can quickly be done using rowMeans or tidyverse but I need to write it as a function and the above doesn't work.

Also, any input regarding the function returning a dataframe instead of a list, would be great.

``````d <- data.frame(
country = c(
"Albania", "Albania", "Albania", "Albania", "Albania",
"Algeria", "Algeria", "Algeria", "Algeria", "Algeria"
),
year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1984),
score = c(
-1.1201501, -1.0526943, -1.0561565,
-1.127456, -1.1353926, -1.1201501, -1.0526943, -1.0561565, -1.127456,
-1.1353926
)
)

output <- function(x) {
d        = d[which(d["country"] == x),]
mean_s   = mean(d\$score, na.rm = TRUE)
median_s = median(d\$score, na.rm = TRUE)
return(data.frame(
country  = x,
mean_s   = mean_s,
median_s = median_s))
}

output("Algeria")
#>   country   mean_s median_s
#> 1 Algeria -1.09837 -1.12015
``````

Created on 2023-06-08 with reprex v2.0.2

Hi there, thatâ€™s amazing, thank you. Can the vector provided in function(x) be adjusted in such a way that it takes the dataset instead of the country itself? So the dataset of a particular country as an example. I would need to test the function with a couple of different datasets that have been filtered accordingly.

Thank you!

Sure, this one is hardwired for a data.frame object with a name of `d`. If you had several data.frames, that could be changed

``````d <- data.frame(
country = c(
"Albania", "Albania", "Albania", "Albania", "Albania",
"Algeria", "Algeria", "Algeria", "Algeria", "Algeria"
),
year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1984),
score = c(
-1.1201501, -1.0526943, -1.0561565,
-1.127456, -1.1353926, -1.1201501, -1.0526943, -1.0561565, -1.127456,
-1.1353926
)
)

e <- data.frame(
country = c(
"Estonia", "Estonia", "Estonia", "Estonia", "Estonia",
"France", "France", "France", "France", "France"
),
year = c(1980, 1981, 1982, 1983, 1984, 1980, 1981, 1982, 1983, 1984),
score = c(
-1.1201501, -1.0526943, -1.0561565,
-1.127456, -1.1353926, -1.1201501, -1.0526943, -1.0561565, -1.127456,
-1.1353926
)
)

output <- function(x,y) {
z        = x[which(x["country"] == y),]
mean_s   = mean(z\$score, na.rm = TRUE)
median_s = median(z\$score, na.rm = TRUE)
return(data.frame(
country  = y,
mean_s   = mean_s,
median_s = median_s))
}

output(d,"Algeria")
#>   country   mean_s median_s
#> 1 Algeria -1.09837 -1.12015
output(e,"Estonia")
#>   country   mean_s median_s
#> 1 Estonia -1.09837 -1.12015
``````

Hi Richard, thank you for this but Iâ€™m looking for the function that takes the dataset in, and gives the output of a single country only. So for example output(dataset[country\$Albania, ] would give me the outputs. Apologies if I was not clear enough. Is this something that could be done with the command distinct perhaps inside the function?

Thank you.

does that, but doesnâ€™t really much benefit from being inside a function.

``````d[which(d\$country  == â€śAlgeriaâ€ť),]
``````

Gets the data frame with only the given country. But it doesnâ€™t go on to do the calculations.

Hi Richard,

this code gives the same output if I run it for another country (ie. Albania) - is there a way to overcome this?

Thank you very much for your help with this.

The data in the example for Albania is a duplicate of Algeria except for country name. For real data it will be different.

i've managed to get around this, all I had to do is convert my data table to a data frame.

thank you

Hi Richard, iâ€™ve got one more question if thatâ€™s ok.

Could I possibly adjust the code to take the dataset as an argument and return the mean for each one the unique countries instead? So do something similar as the code youâ€™ve provided me with but with the unique() command inside the function? Iâ€™ve tried a few different codes but it didnâ€™t work for me.

Thank you.

``````output <- function(x,y) {
d        = x[which(x["country"] == y),]
mean_s   = mean(d\$score, na.rm = TRUE)
median_s = median(d\$score, na.rm = TRUE)
return(data.frame(
country  = y,
mean_s   = mean_s,
median_s = median_s))
}
``````
``````

output <- function(d,cntry){
sub <- which(d[["country"]] %in% cntry)
list(
country = cntry,
mean_s = aggregate(score ~ country , data = d,subset = sub,FUN = mean)\$score,
median_s = aggregate(score ~ country , data = d,subset = sub,FUN = median)\$score)|>
as.data.frame()
}

output(d,"Algeria")
output(d,"Albania")
output(d,c("Albania",
"Algeria"))``````
1 Like