How can I represent y~f(x)+F1+F2, with nonlinear f and categorical F1 and F2?

Hi everyone,

I want to draw raw points and fitted lines (corresponding to a broken-line-plateau model which is an nls-object), both the points and the lines colored depending on a factor (group) values and splitted up in facets depending on another factor (sex) values.

The data and the code I've tested using ggplot() are provided below. The plot is just it should be, but showing the adjusted plateau model too.

I'm asking the comunity to help me, maybe using alternative code.

Thanks in advance
Mercè

# The data:
 x   <- c(rep(c(4.27,3.86,3.46,3.05,2.64),each=12)) 

 sex <- c("b","g","g","b","g","b","g","b","b","g","g","b","g","b","g","b","b","g","g","g","g","b","b","b","b","g","g","b","b","b","g","g","b","g","b",   "g","g","b","b","g","g","b","g","g","b","b","g","b","b","b","g","g","b","g","g","b","b","b","g","g")
  group<-c("G","P","G","P","G","G","M","P","M","M","P","M","P","G","M","P","G","G","G","P","M","M","P","M","M","G","M","G","P","M","P","M","G","P","P","G","M","G","P","P","M","M","G","P","G","M","G","P","G","P","P","P","M","G","M","G","P","M","M","G")
  
  y<-c(1063.2867,800.0000,829.0210,916.7832,958.3916,1016.6667,863.6364,902.4476,927.9720,910.8392,948.6014,982.5175,917.4242,980.4196,892.3077,897.9021,973.7762,861.1888,946.8531,708.0000,963.6364,855.2448,942.6573,953.4965,947.2028,936.3636,807.3427,972.7273,906.9930,917.8322,848.2517,824.8252, 1048.9510,774.1259,808.7121,951.7483,924.2424, 1020.2797,932.8671,834.9650,872.7273,905.9441,905.9441,879.0000,863.9860,945.1049,874.0000,837.4126,805.9441,825.8741,764.6853,684.9650,763.9860,859.4406,794.4056,892.6573,806.4394,883.2168,857.6923,974.8252)
  
dat<-data.frame(x,sex,group,y)
dat$x<-as.character(dat$x)
dat$sex<-as.factor(sex)
dat$group<-as.factor(dat$group)

# The model:
## broken line plateau for y with respect to x, with two block factors (sex and group) 
 
model.nls_plat<-nls(y ~ ifelse(as.numeric(x) < 3.18, a + b *(3.18 - as.numeric(x)), a) +  b1*(group=="M") + b2*(group=="G") + b3*(sex=="g"), data=dat, start = c(a=870, b=-150,b1=-47,b2=-90,b3=-51), trace = TRUE, control = list(maxiter=50))

summary(model.nls_plat)
  
pred.nls<-predict(model.nls_plat)
dat$pred.nls<-pred.nls  
    
 # the tested plot: 

  formula <- y ~ ifelse(as.numeric(x) < 3.18, a + b *(3.18 - as.numeric(x)), a) + b1*(group=="M") + b2*(group=="G") + b3*(sex=="g")

  argum<- list(start=c(a=870, b=-150,b1=-47,b2=-90,b3=-51))
  
  ggplot(dat, aes(x, y, fill = group)) +
    geom_point(shape = 21, size = 3) +
    xlab("x")  +
    ylab("y") +
    geom_smooth(method = "nls", formula = formula, method.args = argum) +  
    facet_wrap(~sex, scales = "free_y")
 
  # plateau lines are not visible !!

You need to set group explicitly:

ggplot(dat, aes(x, y, fill = group, group = group)) +
    geom_point(shape = 21, size = 3) +
    xlab("x")  +
    ylab("y") +
    geom_smooth(method = "nls", formula = formula, method.args = argum) +  
    facet_wrap(~sex, scales = "free_y")

When group is not set directly it is deduced from the interaction of all discrete aesthetics. As x is discrete in your case it will influence the grouping and result in groups that doesn't span multiple x values, thus giving data that can't be fitted.

You still get a warning, but that seems to be specific to your setup, so I'll let you tackle that :slight_smile:

3 Likes

Thanks for your quick reply @thomasp85, but i get the same figure as before. The geom_smooth() lines do not appear in the plot.

Mercè

Yeah, but it throws a warning about your formula now. I don’t know anything about that type of fit, so I can’t help you with that

Hi, and welcome!

It's kind of hard to smooth a non-existent line! There's no

geom_line(aes(...))

(I hate it when this happens to me!)

I didn't want to guess what the aesthetic to go with pred.nls was.

I have a feeling that you won't be able to fit your model in ggplot2 even after making some changes to the plotting code to get things nominally working. The formula should only contain x and y, which refer to those aesthetics you defined in aes().

The variable you use as a grouping variable will allow a separate smooth per group. However, this is fit where the relationship is estimated separately for each group. In your fitted model you only allow the intercepts to change, not both intercepts and "slopes". The same is true for the facet groups.

The first thing I did to get the smooth line plotting was to make the x variable numeric in aes() rather than the formula. It's hard to plot continuous variables from geom_smooth() on top of a discrete axis. In addition, I changed the formula to only refer to x and y and set se = FALSE (since nls models don't return standard errors on predictions).

That all looks like:

ggplot(dat, aes(as.numeric(x), y, fill = group, color = group)) +
     geom_point(shape = 21, size = 3) +
     xlab("x")  +
     ylab("y") +
     geom_smooth(method = "nls",
                 formula = y ~ ifelse((x) < 3.18, a + b *(3.18 - (x)), a),
                 se = FALSE,
                 method.args = list(start=c(a=870, b=-150) ) ) +
     facet_wrap(~sex, scales = "free_y")

But I don't think that's the model you want, since the slopes vary among groups and sexes.

I would tackle this problem by making the predictions from the fitted model (as you did) and plotting those with geom_line(). However, I couldn't tell if these were correct because they don't seem to be plateaus. Maybe you need a new dataset with more x values in it for predicting? What does look correct is that the lines are parallel; the only difference are the intercepts.

ggplot(dat, aes(as.numeric(x), y, fill = group, color = group)) +
     geom_point(shape = 21, size = 3) +
     xlab("x")  +
     ylab("y") +
     geom_line(aes(y = pred.nls) ) +
     facet_wrap(~sex, scales = "free_y")

3 Likes

Thanks a lot @aosmith. Both solutions help me a lot.

As you pointed out, the first model is giving the two parameters (intercept "a" and slope "b") depending on the groupsex level. As it can be expected, this model seems to fit better the data compared to the fixed slope model which I was trying to represent. The obvious handicap is that the groupsex level sample size to estimate "b" dramatically small, so the standard errors are huge.

In the second figure, I've realized that the plateaus can be plotted only if you give a sampled "x" value as the changing slope point. For instance, using 3.07 instead of 3.18, as you can see:

model.nls_plat<-nls(y ~ ifelse(as.numeric(x) <= 3.05, a + b *(3.05 - as.numeric(x)), a) +  b1*(group=="M") + b2*(group=="G") + b3*(sex=="g"), data=dat, start = c(a=870, b=-150,b1=-47,b2=-90,b3=-51), trace = TRUE, control = list(maxiter=50))

summary(model.nls_plat)
   
pred.nls<-predict(model.nls_plat)
dat$pred.nls<-pred.nls
   
ggplot(dat, aes(as.numeric(x), y, fill = group, color = group)) +
     geom_point(shape = 21, size = 3) +
     xlab("x")  +
     ylab("y") +
     geom_line(aes(y = pred.nls) ) +
     facet_wrap(~sex, scales = "free_y")

By now, this solution is satisfactory enough for me.

Thanks to everyone again,
Mercè

Glad it worked!

I was suspecting that some of the problem with plotting the predictions had to do with the limited range of x. Often in this case, making predictions for a longer sequence could help. This is actually what geom_smooth() does under the hood when it makes predicted lines for you.

The expand.grid() is useful for making a prediction dataset with a new x as well as the character variables. You can see I make a longer sequence of numeric x and repeat it for every group and every sex. I had to be careful to first convert x to numeric for making the sequence, which I made from the minimum to the maximum value with a total of 50 values, and then making it a character.

When making predictions with a new dataset, all variables in the right-hand side of the model must be in the dataset.

The plot is done using geom_line() with data = newdat. The result looks better to me than the one made using the original dataset for predictions.

model.nls_plat = nls(y ~ ifelse(as.numeric(x) < 3.18, 
                                a + b *(3.18 - as.numeric(x)), a) +  
                          b1*(group=="M") + b2*(group=="G") + b3*(sex=="g"), 
                     data=dat, 
                     start = c(a=870, b=-150,b1=-47,b2=-90,b3=-51), 
                     control = list(maxiter=50) )

newdat = expand.grid(x = seq(min( as.numeric(dat$x) ), 
                             max( as.numeric(dat$x) ), length.out = 50),
                     group = unique(dat$group),
                     sex = unique(dat$sex) )
newdat$x = as.character(newdat$x)

# Make predictions using the longer dataset
newdat$pred.nls = predict(model.nls_plat, newdata = newdat)

ggplot(dat, aes(as.numeric(x), y, fill = group, color = group)) +
     geom_point(shape = 21, size = 3) +
     xlab("x")  +
     ylab("y") +
     geom_line(data = newdat, aes(y = pred.nls) ) +  
     facet_wrap(~sex, scales = "free_y")

Created on 2020-01-10 by the reprex package (v0.2.0).

1 Like

Great !

I get it, for continuous plotting you need an appropriate network, thin enough. You have helped me a lot with ggplot (). I'm too used to basic R :slight_smile:

Mercè

1 Like

To be pedantic, I think it fits separate models per group. This matters when se = TRUE.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.