Linear regression by rows


#1

hi,
I have several items for which i have a quantity and the end of each month:

|           |JAN  |FEB   |MAR    |APR   |
|item A     |   10|    20|     15|    17|
|item B     |   30|    35|     37|    38|

for each item i would like to have the coef of the linear regression in purpose to see if the trend is positive or negative.
In some cases, i do not have the value for each month
does someone can help?


#2

You would need:

  1. tidyr package to reshape your data
  2. lm() function in base r
  3. purrr package for applying each lm() to each item.

If you are working with time series, forecast and fable package can help you.


#3

broom :package: can also be useful in addition

here are some examples


#4

You can use this example as starting point

library(dplyr)
library(tidyr)
library(purrr)
library(broom)

data <- data.frame(stringsAsFactors=FALSE,
                   ITEM = c("item A", "item B"),
                   JAN = c(10, 30),
                   FEB = c(20, 35),
                   MAR = c(15, 37),
                   APR = c(17, 38)
                   )

fit_model <- function(df) lm(value ~ n, data = df)
get_slope <- function(mod) tidy(mod)$estimate[2]

data %>% 
    gather('month', 'value', -ITEM) %>% 
    group_by(ITEM) %>% 
    mutate(n = row_number()) %>% 
    arrange(ITEM, n) %>% 
    nest() %>% 
    mutate(model = map(data, fit_model)) %>% 
    mutate(slope = map_dbl(model, get_slope))
#> # A tibble: 2 x 4
#>   ITEM   data             model    slope
#>   <chr>  <list>           <list>   <dbl>
#> 1 item A <tibble [4 x 3]> <S3: lm>  1.6 
#> 2 item B <tibble [4 x 3]> <S3: lm>  2.60

Created on 2019-01-12 by the reprex package (v0.2.1)


#5

@andresrcs, you are a master !!!! Thank you so much.
Can't tell you how much grateful I am


#6

Since you are looking for trends maybe you also want to take a look at p-values of slopes for checking statistical significance.

fit_model <- function(df) lm(value ~ n, data = df)
get_slope <- function(mod) tidy(mod)$estimate[2]
get_p_value <- function(mod) tidy(mod)$p.value[2]

data %>% 
    gather('month', 'value', -ITEM) %>% 
    group_by(ITEM) %>% 
    mutate(n = row_number()) %>% 
    arrange(ITEM, n) %>% 
    nest() %>% 
    mutate(model = map(data, fit_model)) %>% 
    mutate(slope = map_dbl(model, get_slope)) %>% 
    mutate(p_value = map_dbl(model, get_p_value))
#> # A tibble: 2 x 5
#>   ITEM   data             model    slope p_value
#>   <chr>  <list>           <list>   <dbl>   <dbl>
#> 1 item A <tibble [4 × 3]> <S3: lm>  1.6   0.509 
#> 2 item B <tibble [4 × 3]> <S3: lm>  2.60  0.0569

#7

@andresrcs, at this point of time no, but many thanks.
For now, despite I was able to use the example with no effort, I'm trying to learn and understand well your solution, step by step and document my code to keep the knowledge.
This is more an effort for me :slight_smile: