Calculate height (y value) for elements of a vector based on the heights of the previous several elements.

I've created the chart below, (only a small portion is pictured), but whereas here I have manually entered a value for the height of each data point, I want to be able to do it using a function. The y-value isn't meaningful, it is only for visual purposes to make sure the data points don't overlap on the chart. There are 3 x-values for each data point, being the start, end, and midpoint of the line representing the data point. These are the vectors x1, x2, and x3, respectively.
The y-values are integers between 1 and 5 inclusive, to stack the data points as shown. The code defining these vectors is:

x1 <- c(Elution_times_csv$Min elution time)
x2 <- c(Elution_times_csv$Max elution time)
x3 <- c((x1+x2)/2)

The data points are drawn on the chart using this code:
segments(x1,y,x2,y,col=colours,lwd=10)

How can I make the vector, y, of the same length as the other three vectors, which examines the x2 value of the preceding data points, compares it to the x1 value of the element being defined, and generates a value of y such that the resulting data point doesn't overlap with any preceding data point on the chart?

I can probably work out the logic to calculate the value of y that I want, I just need to know the syntax of how to iteratively get y(n) to evaluate y(n-1) and x2(n-1) and x2(n-2) and so on.

your chances of getting help with a solution will be increased immeasurably if you provide a reprex
FAQ: How to do a minimal reproducible example ( reprex ) for beginners

Here is my attempt at making a reprex. The empty calculate.height function is what I hope to use to replace the line that gets the y values from the csv file.

structure(list(Compound = c("tricyclene", "allyl sulfide", "a-thujene", 
"iso-butyl iso butyrate", "a-pinene"), `Min elution time` = c(7.303, 
7.54, 7.948, 8.182, 8.219), `Max elution time` = c(7.303, 7.54, 
8.015, 8.182, 8.255)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))
#>                 Compound Min elution time Max elution time
#> 1             tricyclene            7.303            7.303
#> 2          allyl sulfide            7.540            7.540
#> 3              a-thujene            7.948            8.015
#> 4 iso-butyl iso butyrate            8.182            8.182
#> 5               a-pinene            8.219            8.255

fig.myplot <- function() {
  x1 <- c(Elution_times_csv```
Min elution time`)
  x2 <- c(Elution_times_csv```
Max elution time`)
  x3 <- c((x1+x2)/2)
  
  calculate.height <- function() {
    
  }
  
  y <- c(Elution_times_csv$Height)
  labs <- c(Elution_times_csv$Compound)
  colours = c(Elution_times_csv$Colour)
  par(bg="azure")
  
  plot.new()
  plot.window(c(7,29.5),c(0,6))
  axis(1, at=seq(7,29.5,0.1), cex.axis=0.5)
  abline(v=seq(7,29.5,0.05),col="gray95",lty="dotted")
  abline(v=seq(7,29.5,0.1),col="lightgrey",lty="dotted")
  abline(v=seq(7,29.5,0.5),col="lightgrey",lty="dashed")
  abline(v=seq(7,29.5,1),col="darkgrey")
  segments(x1,y,x2,y,col=colours,lwd=10)
  text(x3,y,labs,cex=0.5,srt=-45)
  title(main="Elution Times of Compounds on EO Basemethod 2020.M")
  title(xlab="Elution Time in Minutes")
  abline(v=7)
  abline(v=29.5)
  segments(7,6.25,29.5,6.25)
  }

fig.myplot()

Thank you for your response, nirgrahamuk. However, your code doesn't work for me. I get an error message after the loop:
"Error in if (df$clash[i]) { : missing value where TRUE/FALSE needed"

The dataframe created looks like this:

df
x1 x2 x3 lag_x1 lag_x2 clash height
1 1.00 2.0 1.500 NA NA NA 1
2 1.50 4.0 2.750 1.00 2.0 NA 1
3 2.70 3.0 2.850 1.50 4.0 NA 1
4 3.00 7.0 5.000 2.70 3.0 NA 1
5 4.75 6.0 5.375 3.00 7.0 NA 1
6 9.00 9.5 9.250 4.75 6.0 NA 1
7 11.00 17.0 14.000 9.00 9.5 NA 1
8 15.00 15.6 15.300 11.00 17.0 NA 1
9 15.10 24.0 19.550 15.00 15.6 NA 1
10 18.00 19.0 18.500 15.10 24.0 NA 1

indeed there was an issue with the code i provided, but the issue is higher up in the code. the between function has two definitions , one in dplyr, and the one that I intended from data.table. so changing to

df$clash <-data.table::between(df$x1,df$lag_x1,df$lag_x2) | data.table::between(df$x2,df$lag_x1,df$lag_x2)

Ok, so it now works in the sense that it doesn't produce any error messages, however when I apply this approach to my original dataset, it doesn't do what I need it to because I have over 200 x1 and x2 values spread over the range of 7 to 30 on the x axis, I don't really understand what the lag function is doing in your solution, but it clearly isn't providing enough gap to separate all the values.
Compare this output to my original image above:

presumably they arent exactly overlapping, but are side by side with each other ?
canI suggest that you provide example data which is relevant to the problematic region?

for your information the lag function lets one compare the current x coordinates with the lagged(i.e. previous coordinates) which is a way of determining if they overlap.

You are correct that they don't overlap, but because the x values are recorded to 3 decimal places, they appear to overlap when plotted on the chart, especially since the line thickness is set to 10 for visual purposes.

df[24:26,]
x1 x2 x3 lag_x1 lag_x2 clash height
24 10.140 10.140 10.140 10.017 10.017 FALSE 1
25 10.191 10.205 10.198 10.140 10.140 FALSE 1
26 10.208 10.208 10.208 10.191 10.205 FALSE 1

I attempted a different approach, however there is something wrong with the calculate.height function, because it returns values of y that are all NA, except for the last one, which is 1.
The idea here was for each entry to evaluate all previous entries and determine if their x2 coordinate would overlap with the x1 of the current entry, and if so to rule out the height of that entry as a valid height for the current entry. Preference is given to lower heights over higher heights. It doesn't work though.

calculate.height <- function(i) {
  if (i==1){height <- 1}
  else {
    bad.height.list <- c()
    for (j in i-1){
        if (x2[j] > x1[i] - 0.01) {append(bad.height.list,y[j])}
    }
    if (1 %in% bad.height.list == FALSE) {height <- 1}
    else if(2 %in% bad.height.list == FALSE) {height <- 2}
    else if(3 %in% bad.height.list == FALSE) {height <- 3}
    else if(4 %in% bad.height.list == FALSE) {height <- 4}
    else if(5 %in% bad.height.list == FALSE) {height <- 5}
    else {height <- 1}
  }
  return(height)
}

x1 <- c(Elution_times_csv$`Min elution time`)
#> Error in eval(expr, envir, enclos): object 'Elution_times_csv' not found
x2 <- c(Elution_times_csv$`Max elution time`)
#> Error in eval(expr, envir, enclos): object 'Elution_times_csv' not found
x3 <- c((x1+x2)/2)
#> Error in eval(expr, envir, enclos): object 'x1' not found
y <- c()

for (i in length(x1))  {
  y[i] <- calculate.height(i)
}
#> Error in eval(expr, envir, enclos): object 'x1' not found

Created on 2020-06-05 by the reprex package (v0.3.0)

sticking with my approach.I would just apply a margin

margin <- 0.5
df$clash <-data.table::between(df$x1,df$lag_x1-margin,df$lag_x2+margin) | data.table::between(df$x2,df$lag_x1-margin,df$lag_x2+margin)

The margin does help, once I played with it a bit to get the number right, but the lag function seems only to evaluate the previous entry, not earlier ones. So in the event that entry a occupies a wide range of x, entry b will be placed at a greater height but then entry c will be placed on top of entry a.

image

I attempted to address this problem by allocating all the entries with wide span to height = 6, which kind of works, except if there are clashes between these wider entries, which there are.

A better solution would be for each entry to evaluate all possible clashes at each height level (1 through 6) and select the lowest non-clashing height. But I don't know how to go about implementing that.

Another (failed) attempt:

calculate.height <- function(i) {
  if (i==1){height <- 1}
  else {
    bad.height.list <- c()
    for (j in i-1){
      if (x2[j] > x1[i] - 0.01) {append(bad.height.list,y[j])}
    }
    
    bad.levels <- c(1,2,3,4,5,6) %in% bad.height.list
    height <- min(which(bad.levels==FALSE))
    
  }
  return(height)
}

x1 <- c(1,1.5,2.7,3,4.75,9,11,15,15.1,18)
x2 <- c(2,4,3,7,6,9.5,17,15.6,21,19)

y <- c()

for (i in length(x1))  {
  y[i] <- calculate.height(i)
}

Created on 2020-06-06 by the reprex package (v0.3.0)

I finally found a solution that works perfectly. Here it is in total.

library(data.table)
library(readr)
library(RColorBrewer)

Elution_times_csv <- read_csv("~/ARL/EO basemethod elution times/elution times/Elution times 200603 csv.csv")
#> Parsed with column specification:
#> cols(
#>   Compound = col_character(),
#>   Count = col_double(),
#>   `Min elution time` = col_double(),
#>   `Max elution time` = col_double(),
#>   `Elution span` = col_double(),
#>   Notes = col_character()
#> )
#View(Elution_times_csv)


x1 <- c(Elution_times_csv$`Min elution time`)
x2 <- c(Elution_times_csv$`Max elution time`)

df<- data.frame(labs=Elution_times_csv$Compound,x1=x1,x2=x2) #%>% arrange(x1,x2)
df$x3 <- (x1+x2)/2
df$width <- (x2-x1)
df$height <- 1
df$bad.height.one <- FALSE
df$bad.height.two <- FALSE
df$bad.height.three <- FALSE
df$bad.height.four <- FALSE
df$bad.height.five <- FALSE
df$bad.height.six <- FALSE
margin <- 0.05

for (i in seq_along(df$x2))  {
  if (i>1){
  for (j in seq_along(df$height[1:i-1])){
    if (df$x2[j] > df$x1[i] - margin) {
                               if (df$height[j]==1){df$bad.height.one[i]<-TRUE}
                               else if (df$height[j]==2){df$bad.height.two[i]<-TRUE}
                               else if (df$height[j]==3){df$bad.height.three[i]<-TRUE}
                               else if (df$height[j]==4){df$bad.height.four[i]<-TRUE}
                               else if (df$height[j]==5){df$bad.height.five[i]<-TRUE}
                               else if (df$height[j]==6){df$bad.height.six[i]<-TRUE}
  }
}
  if (df$bad.height.one[i] == FALSE) {df$height[i]<-1}
    else if (df$bad.height.two[i] == FALSE) {df$height[i]<-2}
    else if (df$bad.height.three[i] == FALSE) {df$height[i]<-3}
    else if (df$bad.height.four[i] == FALSE) {df$height[i]<-4}
    else if (df$bad.height.five[i] == FALSE) {df$height[i]<-5}
    else if (df$bad.height.six[i] == FALSE) {df$height[i]<-6}
    else {df$height[i]<-1.5}
  }
}

df$height <- df$height - 0.5


fig.myplot <- function() {
  par(bg="azure")
  plot.new()
  plot.window(c(7,29.5),c(0,6))
  axis(1, at=seq(7,29.5,0.1), cex.axis=0.5)
  abline(v=seq(7,29.5,0.05),col="gray95",lty="dotted")
  abline(v=seq(7,29.5,0.1),col="lightgrey",lty="dotted")
  abline(v=seq(7,29.5,0.5),col="lightgrey",lty="dashed")
  abline(v=seq(7,29.5,1),col="darkgrey")
  segments(df$x1,df$height,df$x2,df$height,col=brewer.pal(n=11,name="Set3"),lwd=10)
  text(df$x3,df$height,df$labs,cex=0.5,srt=-30)
  title(main="Elution Times of Compounds on EO Basemethod 2020.M")
  title(xlab="Elution Time in Minutes")
  abline(v=7)
  abline(v=29.5)
  segments(7,6.25,29.5,6.25)
}

fig.myplot()


pdf("elution_times.pdf",width=60, height=6)
fig.myplot()
dev.off()

well done phenomniverse

Thanks for your help, much appreciated.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Here is a simplified version where I've tried to use conditional logic to find the lowest value of y such that it doesn't allow the x values to overlap. It doesn't work though.

x1 <- c(1,1.5,2.7,3,4.75,9,11,15,15.1,18)
x2 <- c(2,4,3,7,6,9.5,17,15.6,24,19)

find.height <- function(i){
  if (i==1) {height <- (1)}
  
  else{
  max.height.one <- max(x2[y==1])
  max.height.two <- max(x2[y==2])
  max.height.three <- max(x2[y==3])
  max.height.four <- max(x2[y==4])
  max.height.five <- max(x2[y==5])
 
  if (max.height.one < x1[i]-0.01) {height <- (1)}
  else if (max.height.two < x1[i]-0.01) {height <- (2)}
  else if (max.height.three < x1[i]-0.01) {height <- (3)}
  else if (max.height.four < x1[1]-0.01) {height <- (4)}
  else if (max.height.five < x1[1]-0.01) {height <- (5)}
  else {height <- (6)}}
  
  return(height)
}

for (i in length(x1)) {
  y[i] <- find.height(i)