# `ifelse()` in a Linear Model

I still don't entirely know my way around the modelling in R, so I might be approaching this in completely the wrong way. I'm trying to fit a model to some data that should initially increase linearly, then at some value, `a`, it should flatten out. This switch should be quite sharp.

Is there a way to include an `ifelse()` or similar statement in a formula? In my testing below I have been able to make it kind of work, but not to be able to find the point at which the value will flatten.

Should I be approaching this a completely different way? Is this a silly idea? Any help would be appreciated!

``````library(ggplot2)

dat <- data.frame(
x = c(1:10),
y = c(1:4, rep(4, 6) + rnorm(6, sd = 0.1))
)

cutoff <- function(x, a = 4) {
ifelse(x > a, a, x)
}

plt <- ggplot(dat, aes(x, y)) +
geom_point()

plt +
geom_smooth(
method = 'lm',
formula = y ~ cutoff(x),
se = FALSE
)
`````` ``````
plt +
geom_smooth(
method = 'lm',
formula = y ~ cutoff(x, a),
se = FALSE
)
#> Warning: Computation failed in `stat_smooth()`:
`````` Created on 2022-09-05 with reprex v2.0.2

1 Like

What you are doing is called 'piecewise linear regression' there is a package on CRAN that has functions to support working in that area segmented

``````library(tidyverse)
library(segmented)

dat <- data.frame(
x = c(1:10),
y = c(1:4, rep(4, 6) + rnorm(6, sd = 0.1))
)

(lm_1 <- lm(y~x,data=dat))

(lm_2 <- segmented::segmented(lm_1))

dat\$pred_y <- predict(lm_2,newdata=dat)

(plt <- ggplot(dat, aes(x, y)) +
geom_point()  +
geom_line( aes(y=pred_y),color="blue"))

``````

1 Like

actually, your approach seems ok to me.
If I am sure about my data will have the behavior " initially increase linearly, then at some value, `a` , it should flatten out", then we find the local maxima, then split data into two subsets, and fit two different models to values before local max, and after local max.
to find the point at which the value will flatten (local maxima), one may use the slope of the line.
here is my code to find local maxima and fit lines 2 subsets.

``````library(tidyverse)

y <- c(1:4, rep(4, 6) + rnorm(6, sd = 0.1))

x <- 1:10

slope <- diff(y)/diff(x)

local_maxima <- max(which(slope == max(slope))) + 1

local_maxima
#> local_maxima
# 4

lm_model1 <-
tibble(x,y) %>%
filter(x <= local_maxima) %>%
lm(formula =  y~x)

lm_model2 <-
tibble(x,y) %>%
filter(x > local_maxima) %>%
lm(formula =  y~x)

final_results <-
tibble(x,y) %>%
mutate(pred_line1 = predict(lm_model1, newdata = . )) %>%
mutate(pred_line2 = predict(lm_model2, newdata = . ))  %>%
mutate(pred_line_final = if_else(x > local_maxima, pred_line2, pred_line1  )) %>%
select(x, y, pred_line_final)

final_results <-
ggplot()+
geom_line(aes(x,y))+
geom_line(aes(x,pred_line_final), color = 'blue')
``````

We also have the brokenstick R package that can accommodate longitudinal data, and in rms you have the lsp with known knots or rcs to estimate the knots location:

``````library(rms)
#>
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#>
#>     format.pval, units
#>
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#>
#>     backsolve
set.seed(134564)
dat <- data.frame(
x1 = c(1:10),
y1 = c(1:4, rep(4, 6) + rnorm(6, sd = 0.1))
)

m1 <- ols(y1~x1, data=dat)
m2 <- ols(y1~rcs(x1,3,label="rcs"), data=dat)
m3 <- ols(y1~lsp(x1,4,label="lsp"), data=dat)
ggplot(Predict(m3))
`````` Created on 2022-09-06 with reprex v2.0.2

Thanks! This is exactly what I am after. Now I know the name of it as well I can read more about it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.