Hi,
The request from @Yarnabrina set me on a quest to create a more versatile system for summarizing curves and plotting them using ggplot ...
There's so much I could explain about it, but think the post would become too long lol. I am thinking of writing it up because I do think it can be handy for others to use, but for now I'll just give the code and summary.
SUMMARY CURVE FUNCTION
library("tibble")
library("dplyr")
summaryCurve = function(datasets, columnInfo, summaryFunction = mean,
interpolationMethod = "linear", onlyReturnSummary = T){
#Prepare datasets
#
nSets = length(datasets)
if(nSets == 1){#The user provided one data frame with one xcolumn and multiple ycolumns
if(!all(!is.na(combinedData[,2]))){
stop("The xcolumn cannot have missing values")
}
combinedData = cbind(data.frame(id = 1:nrow(datasets)),
data.frame(x = datasets[,columnInfo]),
datasets %>% select(columnInfo)
)
} else { #The user provided multiple data frames
#Get all possible xvalues
x = sapply(1:nSets, function(i){
datasets[[i]] %>% select(columnInfo[[i]][1])
}) %>% unlist %>% unique %>% sort
#Build data frame with column for x value
combinedData = tibble(id = 1:length(x), x = x)
# ... and one column for y for every set
for(i in 1:nSets){
xColName = columnInfo[[i]][1]
nYcols = length(columnInfo[[i]])  1
if(nYcols > 0){
combinedData = combinedData %>%
left_join(datasets[[i]] %>% select(columnInfo[[i]]),
by = c(x = xColName))
} else {
combinedData = combinedData %>%
left_join(datasets[[i]], by = c(x = xColName))
}
}
}
#Interpolate curves
#
#Apply an interpolation function to every ycolumn to fill in missing values
combinedData[,c(1,2)] = apply(combinedData[,c(1,2)], 2, function(y){
approx(combinedData$x, y, xout = combinedData$x, method = interpolationMethod)$y
})
#Now add the summaryCurve
summaryValues = apply(combinedData[,c(1,2)], 1, function(x){
summaryFunction(x[!is.na(x)])
})
if(onlyReturnSummary){
return(data.frame(x = combinedData[,2], summary = summaryValues))
} else {
return(cbind(combinedData, data.frame(summary = summaryValues)))
}
}
The summaryCurve function takes several arguments:

datasets: a list of data frames that hold the info for all curves
 Datasets must have at least one xcolumn, can have multiple y (multiple curves)
 Different datasets can be of different length (i.e. xvalues can have different ranges)

columnInfo: list of column name mappings of x and y values per dataset
 if only one name is provide, this is to be assumed the column names of the xvalues and all other columns are treated as y values (1 or more). NA is allowed in y values
 if multiple values are provided per dataset, the first refers to the xvalues, all other values to specific columns to be treated as yvalues (other will be ignored)

summaryFunction: the function to be applied to all curves. Default is 'mean' but can be anything like min, max, sum, ... even custom function, as long as it outputs one value for all yvalues of at a certain x.

interpolationMethod: defaults to 'linear', all curves are interpolated (but not extended) to provide the best summary between curves of different detail and filling in missing values. Other option is "constant" where points are carried forward instead.

onlyReturnSummary: defaults to TRUE in which case the x and yvalues of the summary curve are returned. If FALSE, one dataset with all interpolated curves plus summary function will be returned

longFormat: defaults to FALSE, if TRUE there is only one ycolumn and an extra column curve has a factor denoting the points belonging to different curves (can aid in plotting with ggplot)
EXAMPLE APPLYING THE FUNCTION AND PLOTTING (GGPLOT)
Let's start by creating 3 different curves
library("ggplot2")
dataset1 = data.frame(x = 50:6, y = runif(45))
dataset2 = data.frame(theX = seq(1, 55, 4), result1 = runif(14), result2 = LETTERS[1:14])
dataset3 = data.frame(x = c(0, 50), y = c(0,1))
ggplot() +
geom_point(data = dataset1, aes(x = x, y = y1), colour = "darkgreen") +
geom_line(data = dataset1, aes(x = x, y = y1), colour = "darkgreen") +
geom_point(data = dataset2, aes(x = theX, y = result1), colour = "red") +
geom_line(data = dataset2, aes(x = theX, y = result1), colour = "red") +
geom_point(data = dataset3, aes(x = xVal, y = yVal), colour = "blue") +
geom_line(data = dataset3, aes(x = xVal, y = yVal), colour = "blue") +
theme_minimal()
You can see that the curves have different starting and ending points and the xvalues do not overlap (some have many more points than others)
Now run the summaryCurve function with the appropriate arguments and plot the results using ggplot:
mySummarycurve = summaryCurve(datasets = list(dataset1, dataset2, dataset3),
columnInfo = list("x", c("theX", "result1"), "xVal"),
summaryFunction = sum, onlyReturnSummary = F,
longFormat = T)
ggplot(mySummarycurve %>% filter(curve != "summary"), aes(x = x, y = y, group = curve)) +
geom_point(aes(colour = curve)) +
geom_line(aes(colour = curve), linetype = 2) +
geom_line(data = mySummarycurve %>% filter(curve == "summary"), colour = "orange") +
geom_area(data = mySummarycurve %>% filter(curve == "summary"),
fill = "gray", alpha = 0.3) +
theme_minimal() + theme(legend.position = "none")
As you can see, the summary function interpolated all curves so they all have matching xvalues over which the summary function of choice (in this case sum) is applied. The resulting area is shaded, but that's just done because it was in the initial example in this post.
There you go! Hope you like it and find it useful. I think I might tinker with it bit more and maybe get it to GitHub or something.
Looking forward to your feedback
PJ