 # Create a column to calculate the slope of other columns

I'm trying to look though lots of data and see what categories have had the most drastic change over time. I was thinking that calculating the slope of the normalized values per row would give me that information, but I seem to be lost on how to get that.

``````category <- c("red","blue","black","orange")
yr2016 <- c(1,20,4,8)
yr2017 <- c(3,24,5,1)
yr2018 <- c(5,28,9,4)
df <- data.frame(
category=category,
yr2016=yr2016,
yr2017=yr2017,
yr2018=yr2018
)
df
``````

What I want to achieve is first normalize the values so it would look like this.

``````category <- c("red","blue","black","orange")
yr2016 <- c(-1.00,	-1.00,	-0.76,	1.04)
yr2017 <- c(0.00,	0.00,	-0.38,	-0.95)
yr2018 <- c(1.00,	1.00,	1.13,	-0.09)
df <- data.frame(
category=category,
yr2016=yr2016,
yr2017=yr2017,
yr2018=yr2018
)
df
``````

and then make it to this where I have the slope for each category

``````category <- c("red","blue","black","orange")
slope <- c(1.00,	1.00,	0.94,	-0.57)
df <- data.frame(
category=category,
slope = slope
)
df
``````

For the life of me, I'm drawing a blank on how to do his. I've done it in Excel, but I'm figuring out how to transition my method.

I'm open to other suggestions for figuring this out if slope isn't the best way.

I can't tell if using the slope is a good approach, but this would be a tidy way to do get it

``````library(tidyverse)
library(broom)
df <- data.frame(stringsAsFactors = FALSE,
category = c("red","blue","black","orange"),
yr2016 = c(1,20,4,8),
yr2017 = c(3,24,5,1),
yr2018 = c(5,28,9,4)
)

fit_model <- function(df) lm(Value ~ Year, data = df)
get_slope <- function(mod) tidy(mod)\$estimate

df %>%
mutate_if(is.numeric, scale) %>% # Normalize numeric variables
gather(Year, Value, -category) %>% # Reshape to long format
mutate(Year = parse_number(Year)) %>% # Get numeric value for the year
group_nest(category) %>% # nest by category
mutate(model = map(data, fit_model)) %>% # Fit a linear model for each category
mutate(slope = map_dbl(model, get_slope)) # Get the slope for each model
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#> # A tibble: 4 x 4
#>   category data             model      slope
#>   <chr>    <list>           <list>     <dbl>
#> 1 black    <tibble [3 × 2]> <S3: lm>  0.143
#> 2 blue     <tibble [3 × 2]> <S3: lm>  0.0316
#> 3 orange   <tibble [3 × 2]> <S3: lm> -0.320
#> 4 red      <tibble [3 × 2]> <S3: lm>  0.145
``````

Created on 2019-04-16 by the reprex package (v0.2.1.9000)

Thanks, I got this to work and learned something new. I'm open to suggestions for something other than slope. I didn't pick it because I thought it was best, more like it was what I could think of. If you have another idea, I'd love to hear it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.