Calculating Regression with By-function

Hello everyone!

I am currently working with the built-in data set "iris". I have calculated separate regressions for the three types of plant species like this and it worked:

reg1= with(iris[iris$Species=="setosa",], lm(Sepal.Width~Sepal.Length))
reg2= with(iris[iris$Species=="versicolor",], lm(Sepal.Width~Sepal.Length))
reg3= with(iris[iris$Species=="virginica",], lm(Sepal.Width~Sepal.Length))

Now, I am wondering if it is also possible to calculate three sub-regressions using the by-function in R.

I have tried it like this but I only receive error messages:
b<- by(iris, iris$Species, function(x){
regby<- lm(Sepal.Width~Sepal.Length)}

Does anyone have any tips for me.

Thank you. Your answers are very much appreciated.

Welcome to the community!

Are you looking for something like this?

with(data = iris,
     expr = {
       by(data = data.frame(Sepal.Width,
                            Sepal.Length),
          INDICES = Species,
          FUN = lm)
     })
#> Species: setosa
#> 
#> Call:
#> FUN(formula = data[x, , drop = FALSE])
#> 
#> Coefficients:
#>  (Intercept)  Sepal.Length  
#>      -0.5694        0.7985  
#> 
#> -------------------------------------------------------- 
#> Species: versicolor
#> 
#> Call:
#> FUN(formula = data[x, , drop = FALSE])
#> 
#> Coefficients:
#>  (Intercept)  Sepal.Length  
#>       0.8721        0.3197  
#> 
#> -------------------------------------------------------- 
#> Species: virginica
#> 
#> Call:
#> FUN(formula = data[x, , drop = FALSE])
#> 
#> Coefficients:
#>  (Intercept)  Sepal.Length  
#>       1.4463        0.2319

Created on 2019-03-31 by the reprex package (v0.2.1)

1 Like

If you want to give the tidyverse a try, here is another solution

library(dplyr)
library(tidyr)
library(purrr)
library(broom)

iris %>% 
    group_nest(Species) %>% 
    mutate(model = map(data, function(df) lm(Sepal.Width~Sepal.Length, data = df)),
           tidied = map(model, tidy)) %>% 
    unnest(tidied)
#> # A tibble: 6 x 6
#>   Species    term         estimate std.error statistic  p.value
#>   <fct>      <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 setosa     (Intercept)    -0.569    0.522      -1.09 2.81e- 1
#> 2 setosa     Sepal.Length    0.799    0.104       7.68 6.71e-10
#> 3 versicolor (Intercept)     0.872    0.445       1.96 5.56e- 2
#> 4 versicolor Sepal.Length    0.320    0.0746      4.28 8.77e- 5
#> 5 virginica  (Intercept)     1.45     0.431       3.36 1.55e- 3
#> 6 virginica  Sepal.Length    0.232    0.0651      3.56 8.43e- 4

Created on 2019-03-31 by the reprex package (v0.2.1.9000)

Oh WOW! Great, thanks a lot. Your help is very much appreciated.

I have some follow-up questions so that I can fully understand the code.

What exactly does the expr function do and why do you need to write data.frame(Sepal.Width, Sepal.Length)?

Thanks a lot.

Sure. Understanding is more important than implementing, in my opinion.

It's not a function. You've used expr yourself.

Here, lm(Sepal.Width~Sepal.Length) is the expr part. For details, check the documentation of with.

If you have a data frame df of two columns x and y, lm(df) performs linear regression of x on y. In iris, Sepal.Length appears first, and you wanted the opposite regression, and hence the explicit definition.

Thank you so much. It has become so much clearer to me now.

Just one last question (I promise :slight_smile: ). It does not seem to be necessary to assign expr. It seems to work without expr just as fine (at least the output is the same).

So, could I also omit "expr" ?

expr it's a parameter name, if you pass parameters in the predifined order to a function then you don't necessarily have to name them, but if you want to alter the order then you need to explicitly name them, see this example.

# Order of parameters
with(data, expr, ...)
# Passing unnamed parameter in order
with(iris, { by(data.frame(Sepal.Width, Sepal.Length), Species, lm) })
# Passing named parameters in disorder
with(expr =  { by(data = data.frame(Sepal.Width, Sepal.Length), INDICES = Species, FUN = lm) }, 
     data = iris)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.