Define y-range (horizontal asymptote) in log fitting linear models

lohrmanf · January 8, 2021, 5:36pm

Hello community,

I am new to R, so please forgive me if the answer to my question is obvious or if my code is rubbish.
I have data that represents time (on x) and a percentage (on y). "y" can thus not be higher than 100. I would like to calculate a log trendline/fitting linear model. I used the following code:

x <- c(3, 5, 7, 14, 21, 28, 56, 100, 200)
y <- c(20, 30, 50, 70, 89, 95, 99.9, 99.99, 99.999)
fitlog <-lm(y~log(x))
xx <- seq(1,200, length=200)
plot(x,y,pch=19,ylim=c(0,200))
lines(xx, predict(fitlog, data.frame(x=xx)), col="green")

#this way the curve goes higher than 100 (apart from also not fitting very well). I have three questions related to this project:

How can I define the "y" range to be between 0-100, so that I still get the best linear modelling within this boundary?
How can I make R display the function behind the trendline?
How can I calculate the asymptote (in a similar case, where it might not be 100) with R?
Thank you very much in advance!

(I am using RStudio Version 1.2.5033 on a Windows 10 Pro 64-Bit.)

Best regards

FL

gueyenono · January 8, 2021, 8:27pm

Hello @lohrmanf,

You need to change ylim = c(0, 200) to ylim = c(0, 100).
You can use the text() function
Could you provide more details about what you are looking for?

Here is my code:

x <- c(3, 5, 7, 14, 21, 28, 56, 100, 200)
y <- c(20, 30, 50, 70, 89, 95, 99.9, 99.99, 99.999)

fitlog <-lm(y ~ log(x))

xx <- seq(1, 200, length = 200)
predicted <- predict(fitlog, data.frame(x = xx))

plot(x, y, pch = 19, ylim = c(0, 100))
lines(xx, predicted, col = "green")
text(x = 50, y = 80, labels = "y = f(x)")

lohrmanf · January 9, 2021, 9:46am

Hello @gueyenono,

thank you very much for your help! However, I have not explained my problem well enough:

your modification limits the range of the graph to 100. But I was looking for a command to limit the maximum y value within the log fitting model to 100, so that any x I feed into the function yields a y that is never above 100 (as y represents a percentage in my data). I therefore chose to display a higher range in the plot (up to 200) to be able to see whether my line actually remains below 100.
I need this in order to have a resonable predictions on y values that are not actually covered by the representative values I have measured.
Here I was looking for the actual function of the log fitting model, so that I can use it to feed a specific x into it and get the corresponding y out. It does not necessarily have to be shown within the graph.
We keep this problem for later

Best regards

FL

nirgrahamuk · January 9, 2021, 5:07pm

If you insist on the 'purity' of the log fit model, and need it to not return values above 100, you are basically going to need something thats indistinguishable from a straight line I think... Otherwise, you could just apply a maximum rule over the top like so.

x <- c(3, 5, 7, 14, 21, 28, 56, 100, 200)
y <- c(20, 30, 50, 70, 89, 95, 99.9, 99.99, 99.999)
basefit <-lm(y~log(x))

# a wrapper to clamp down 'too high' values
mypred <- function(model_obj,
                   df,max){
  temp <- predict(model_obj, df) 
  pmin(temp,max)
}

xx <- seq(1,200, length=200)
plot(x,y,pch=19,ylim=c(0,200))
lines(xx, mypred(basefit, data.frame(x=xx),max=100), col="green",lwd=8)

coefficients(basefit)
# (Intercept)      log(x) 
# 9.06515          20.73667 
myscorefunc <- function(x){
  pmin(9.065150 + 20.73667 * log(x),100)
}

lines(xx, myscorefunc(xx), col="red",lty="dotted",lwd=4)

lohrmanf · January 10, 2021, 9:30am

Hello @nirgrahamuk,
thank you very much for your help! If I understand it correctly, you found a nice way to limit the y values to 100. (And the coefficient command shows me the actual function, thanks a lot). However, it seems to me that the function is now only meaningful up to a certain point (where y "naturally" reaches 100), after which the y values are artificially kept at 100, is that correct?

Maybe I can describe what I am looking for by giving an example:
The curve I am looking for is similar to a concentration increase ("invasion") with a specific saturation level. In pharmakokinetics this would look like this (though my application is not for pharmakokinetics):
("Konzentration" =concentration, "Zeit"=time)

The corresponding formula would be:

dc(t)/dt = k(c*-c(t))
c=c*(1-e^-kt)

#with "-kt" being the exponent

(source: Sernetz et al.: https://www.researchgate.net/publication/265605774_Pharmakokinetik_und_Wachstumskinetik)

Now my question is: Can I perform a trend fitting in R, where I define how the formula has to look (e.g. as in the figure above) and R finds the missing konstants?

Thank you very much!

Best regards

nirgrahamuk · January 10, 2021, 10:42am

I came up with an optimise process to try to find the best fit given the additional constraints, empirically. if we could fit on a value of x at a lower number than 3 it would be more protected from violation <0 for the lowest x values

x <- c(3, 5, 7, 14, 21, 28, 56, 100, 200)
y <- c(20, 30, 50, 70, 89, 95, 99.9, 99.99, 99.999)

plot(x,y,pch=19,ylim=c(0,200))
over_100_penalty = -1
under0_penalty = -1
lmfit <- function(k){
  model <- lm(y~exp(-k*x))
  smry <- summary(model)
  base_score <- smry$adj.r.squared
  rawpred <- predict(model,newdata=data.frame(x=x))
  # print(rawpred)
  over_penalty_score <- sum(ifelse(rawpred>100,over_100_penalty,0))
  under_penalty_score <- sum(ifelse(rawpred<0,under0_penalty,0))
  total_score <-  over_penalty_score+under_penalty_score+base_score
  cat("tested ",k, "total",total_score, " over penalty score ",over_penalty_score, " under penalty score ", under_penalty_score," fit score (adj r square) ",base_score,"\n")
  total_score
}
optimise(lmfit,c(0,10),maximum = TRUE)
# $maximum
# [1] 0.1046782
# 
# $objective
# [1] 0.992785
basefit <- lm(y~exp(-0.1046782*x))



xx <- seq(1,200, length=200)

lines(xx, predict(basefit, data.frame(x=xx),max=100), col="green",lwd=1)

coefficients(basefit)
# Coefficients:
# (Intercept) exp(-0.1046782 * x) 
# 99.99861          -111.77309 
myscorefunc <- function(x){
  99.99861          -111.77309     * exp(-0.1046782*x)
}

lines(xx, myscorefunc(xx), col="red",lty="dotted",lwd=4)

myscorefunc(xx)

lohrmanf · January 10, 2021, 11:36am

Thank you @nirgrahamuk! I will gladly work with this.

Best regards
FL

system · January 17, 2021, 11:36am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.