How can we plot the accumulated curve values and expand the plot

AbhishekHP · July 15, 2019, 2:41pm

Please find the reprex and screenshot highlighting the issue,
Am trying to accumulate the curve values but
am not sure if they are accumulating considering all the curves in the plot for every x axis.

Also, the plot has been cropped at origin and on top (highlighted in screenshot)
Tried bunch of things to expand the plot area but couldnt

library(tidyverse)
mean_sim = 2.7
N = 10

up_lmt <- (max(1:N) * 2 * mean_sim)

# Setting x axis
u <- seq(from = 0,
         to = up_lmt,
         length.out = 1e+3)

## Curve simulation
v <- sapply(X = 1:N,
            FUN = function(i) dnorm(x = u,
                                    mean = (i * mean_sim),
                                    sd = i))
matplot(x = u,
        y = v,
        type = "l",
        lty = 1,
        col = rainbow(n = 10),
        xlab = "x axis",
        ylab = "y axis",
        main = "Accumulating curves")
legend(x = "topright",
       title = "Curve number",
       legend = 1:N,
       inset = -.05,
       col = rainbow(n = 10),
       bty = "n",
       lty = 1)

# function to shade area below the accumulative curve
shd_ara <- function(start_pt, end_pt, colour){
  idx <- ((u >= start_pt) & (u <= end_pt))
  u_obj <- u[idx]

# Accumulating the curve (Does this sum up all the intersecting curves ?)
 v_obj <- purrr::map(.x = 1:sum(idx), .f = ~sum(as.data.frame(v[idx,])[.x,])) %>% unlist()
lines(x = u_obj,
          y = v_obj,
          type = "l",
          lty = 1,
          col = "black")
          polygon(x = c(start_pt, u_obj, end_pt),
                  y = c(0, v_obj, 0),
                  border = NA,
                  col = adjustcolor(col = "grey",
                                    alpha.f = 0.5))
  axis(side = 1, at = 1:54)
}
# Tried bunch of things to expand the plot area
# par(new=TRUE)
#dev.new(width=10, height=10)
#dev.size(10,10)
#par(mar=c(10,10,10,10))
shd_ara(start_pt = 0,
        end_pt = 50,
        colour = "grey")
abline(v = 1, col = "red")
abline(v = 2.7, col = "green")

pieterjanvc · July 16, 2019, 1:35am

Hi,

Let me explain the issues at each axis separately:

Y-AXIS
Your y-axis is cropped because the original plot size is defined in the function matplot. There, only the colored curves are drawn, and thus the plot scales correctly to them. You then plot the shaded curve with shd_ara on top of that, but here the values are higher (cumulative) and thus do not fit that plot anymore.

This is fixed by first plotting the shaded area changing the lines function to plot, and then plotting the matplot, where add = T

X-AXIS
The x-axis is not cropped, it is indeed displaying the correct values. This is because the first points of the 10 curves have a y-value that is NOT 0, (check the variable v) and thus the sum at origin is not 0.

Furthermore, your shaded curve function seems to be working correctly, and although it seems some of the shaded points do not match what you'd expect on the y-axis, they actually do as again your're summing all 10 curves.

library(tidyverse)
mean_sim = 2.7
N = 10

up_lmt <- (max(1:N) * 2 * mean_sim)

# Setting x axis
u <- seq(from = 0,
         to = up_lmt,
         length.out = 1e+3)

## Curve simulation
v <- sapply(X = 1:N,
            FUN = function(i) dnorm(x = u,
                                    mean = (i * mean_sim),
                                    sd = i))
# function to shade area below the accumulative curve
shd_ara <- function(start_pt, end_pt, colour){
  idx <- ((u >= start_pt) & (u <= end_pt))
  test1 <<- idx
  u_obj <- u[idx]
  
  # Accumulating the curve (Does this sum up all the intersecting curves ?)
  v_obj <- purrr::map(.x = 1:sum(idx), .f = ~sum(as.data.frame(v[idx,])[.x,])) %>% unlist()
  plot(x = u_obj,
        y = v_obj,
        type = "l",
        lty = 1,
        col = "black")
  polygon(x = c(start_pt, u_obj, end_pt),
          y = c(0, v_obj, 0),
          border = NA,
          col = adjustcolor(col = "grey",
                            alpha.f = 0.5))
  axis(side = 1, at = 1:54)
  
}

shd_ara(start_pt = 0,
        end_pt = 50,
        colour = "grey")

matplot(x = u,
        y = v,
        type = "l",
        lty = 1,
        col = rainbow(n = 10),
        xlab = "x axis",
        ylab = "y axis",
        main = "Accumulating curves", add = T)
legend(x = "topright",
       title = "Curve number",
       legend = 1:N,
       inset = -.05,
       col = rainbow(n = 10),
       bty = "n",
       lty = 1)


abline(v = 1, col = "red")
abline(v = 2.7, col = "green")

Hope this helps
PJ

PS: I think ggplot might be easier in the long run to plot more complex curves...

Yarnabrina · July 16, 2019, 1:51am

I agree with everything @pieterjanvc said, and want to add two notes:

Loading tidyverse at the beginning was unnecessary. This code doesn't require it. The only tidy function you're using is map, and you've called it as purrr::map. Though I must say that a much easier way to sum the curves would have been apply(v[idx,], 1, sum).
You can't change the limits of a existing base plot. Take a look here. The range of the accumulated curve exceeds the previous limit initially, and hence it'll be plotted outside. What you can do is plot the accumulated curve first, and then plot the original curves using matlines.

Hope this helps.

@pieterjanvc, if you don't mind, may I request you to please provide a ggplot way to plot this graph? @AbhishekHP asked this question in a previous thread, and I had to provide a base solution because I failed to do it in ggplot (I am learning it currently). This might be a little off topic here, so if you prefer, you can show me the ggplot solution in a DM. It'll be very helpful.

AbhishekHP · July 16, 2019, 7:26am

Thanks for suggestion.

by placing
add = T within matplot depending on whether we call matplot or shade_find_area or ploting curve first. below plots have been obtained but both are different.

Signal =4:10 have been removed when we plot shade and then plot curve

Please find screenshot

Additionally, both have different areas as shown below

Could you please help me understand why doesnt the shaded curve starts from 0,
Even though it is summation of 10 curves, but all are 0 @ u = 0,

pieterjanvc · July 16, 2019, 11:06am

Hi,

My first thought looking at the plots is that the difference in the two figures is an illusion because if you look carefully the second one has a y-axis where 0 is missing, probably clipping off a piece of the chart. I will try and look into the details later.

The reason your shaded curve does not start at 0 is because u = 0 is the x-axis, where the y-values of all those curves are stored in v. Take a look at the first row (those are all the first y-values), they are not 0 and thus add up to > 0

@Yarnabrina I'll see if I can find a way to do this in ggplot, but I'll need some time because I have a feeling this should already be implemented somewhere given the ggplot community is big

Grtz

pieterjanvc · July 20, 2019, 2:12am

Hi,

The request from @Yarnabrina set me on a quest to create a more versatile system for summarizing curves and plotting them using ggplot ...

There's so much I could explain about it, but think the post would become too long lol. I am thinking of writing it up because I do think it can be handy for others to use, but for now I'll just give the code and summary.

SUMMARY CURVE FUNCTION

library("tibble")
library("dplyr")

summaryCurve = function(datasets, columnInfo, summaryFunction = mean, 
                        interpolationMethod = "linear", onlyReturnSummary = T){
  
  #Prepare datasets
  #----------------
  nSets = length(datasets)
  
  if(nSets == 1){#The user provided one data frame with one x-column and multiple y-columns
    
    if(!all(!is.na(combinedData[,2]))){
      stop("The x-column cannot have missing values")
    }
    
    combinedData = cbind(data.frame(id = 1:nrow(datasets)),
                         data.frame(x = datasets[,columnInfo]),
                         datasets %>% select(-columnInfo)
                         )
    
  } else { #The user provided multiple data frames
    
    #Get all possible x-values
    x = sapply(1:nSets, function(i){
      datasets[[i]] %>% select(columnInfo[[i]][1])
    }) %>% unlist %>% unique %>% sort
    
    #Build data frame with column for x value 
    combinedData = tibble(id = 1:length(x), x = x)
    
    # ... and one column for y for every set
    for(i in 1:nSets){
      xColName = columnInfo[[i]][1]
      nYcols = length(columnInfo[[i]]) - 1
      if(nYcols > 0){
        combinedData = combinedData %>% 
          left_join(datasets[[i]] %>% select(columnInfo[[i]]), 
                      by = c(x = xColName))
      } else {
        combinedData = combinedData %>% 
          left_join(datasets[[i]], by = c(x = xColName))
      }
      
    }
  }
  
  #Interpolate curves
  #-------------------
  
  #Apply an interpolation function to every y-column to fill in missing values
  combinedData[,-c(1,2)] = apply(combinedData[,-c(1,2)], 2, function(y){
    approx(combinedData$x, y, xout = combinedData$x, method = interpolationMethod)$y
  })
  
  #Now add the summaryCurve
  summaryValues = apply(combinedData[,-c(1,2)], 1, function(x){
    summaryFunction(x[!is.na(x)])
  })
  
  if(onlyReturnSummary){
    return(data.frame(x = combinedData[,2], summary = summaryValues))
  } else {
    return(cbind(combinedData, data.frame(summary = summaryValues)))
  }
  
  
}

The summaryCurve function takes several arguments:

datasets: a list of data frames that hold the info for all curves
- Datasets must have at least one x-column, can have multiple y (multiple curves)
- Different datasets can be of different length (i.e. x-values can have different ranges)
columnInfo: list of column name mappings of x and y values per dataset
- if only one name is provide, this is to be assumed the column names of the x-values and all other columns are treated as y values (1 or more). NA is allowed in y values
- if multiple values are provided per dataset, the first refers to the x-values, all other values to specific columns to be treated as y-values (other will be ignored)
summaryFunction: the function to be applied to all curves. Default is 'mean' but can be anything like min, max, sum, ... even custom function, as long as it outputs one value for all y-values of at a certain x.
interpolationMethod: defaults to 'linear', all curves are interpolated (but not extended) to provide the best summary between curves of different detail and filling in missing values. Other option is "constant" where points are carried forward instead.
onlyReturnSummary: defaults to TRUE in which case the x and y-values of the summary curve are returned. If FALSE, one dataset with all interpolated curves plus summary function will be returned
longFormat: defaults to FALSE, if TRUE there is only one y-column and an extra column curve has a factor denoting the points belonging to different curves (can aid in plotting with ggplot)

EXAMPLE APPLYING THE FUNCTION AND PLOTTING (GGPLOT)

Let's start by creating 3 different curves

library("ggplot2")

dataset1 = data.frame(x = 50:6, y = runif(45))
dataset2 = data.frame(theX = seq(1, 55, 4), result1 = runif(14), result2 = LETTERS[1:14])
dataset3 = data.frame(x = c(0, 50), y = c(0,1))

ggplot() +
  geom_point(data = dataset1, aes(x = x, y = y1), colour = "darkgreen") +
  geom_line(data = dataset1, aes(x = x, y = y1), colour = "darkgreen") +
  geom_point(data = dataset2, aes(x = theX, y = result1), colour = "red") +
  geom_line(data = dataset2, aes(x = theX, y = result1), colour = "red") +
  geom_point(data = dataset3, aes(x = xVal, y = yVal), colour = "blue") +
  geom_line(data = dataset3, aes(x = xVal, y = yVal), colour = "blue") + 
  theme_minimal()

You can see that the curves have different starting and ending points and the x-values do not overlap (some have many more points than others)

Now run the summaryCurve function with the appropriate arguments and plot the results using ggplot:

mySummarycurve = summaryCurve(datasets = list(dataset1, dataset2, dataset3), 
                              columnInfo = list("x", c("theX", "result1"), "xVal"),
                              summaryFunction = sum, onlyReturnSummary = F, 
                              longFormat = T)

ggplot(mySummarycurve %>% filter(curve != "summary"), aes(x = x, y = y, group = curve)) +
  geom_point(aes(colour = curve)) +
  geom_line(aes(colour = curve), linetype = 2) +
  geom_line(data = mySummarycurve %>% filter(curve == "summary"), colour = "orange") +
  geom_area(data = mySummarycurve %>% filter(curve == "summary"), 
            fill = "gray", alpha = 0.3) +
  theme_minimal() + theme(legend.position = "none")

As you can see, the summary function interpolated all curves so they all have matching x-values over which the summary function of choice (in this case sum) is applied. The resulting area is shaded, but that's just done because it was in the initial example in this post.

There you go! Hope you like it and find it useful. I think I might tinker with it bit more and maybe get it to GitHub or something.

Looking forward to your feedback
PJ

system · July 27, 2019, 2:12am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.