 # Help with Standardization on multiple variables for time series data

Hi,

I am working on predicting sales using some multiple independent variables and time series forecasting. I am still learning time series and you might see more questions related to this from me.

My data includes independent variables that come in different ranges. Because of this, when I try to plot all variables to see their trend, some of them have almost zero values in line plot. So, I am thinking if we can standardize all these variables except sales and month as I want to predict sales based on those independent variables. However, I am not sure how to proceed with that. In my example below, trend on variable A is clearly seen in the line plot, but C and D are at the very bottom and B is almost lost. I would like to see the trend for all these variables. Perhaps, all these 4 variables can have values in the same range. But not sure how to achieve that.

``````# Example Data
df <- data.frame(
stringsAsFactors = FALSE,
month = c("2020 Jan","2020 Feb",
"2020 Mar","2020 Apr","2020 May"),
sales = c(2061292, 2087140, 2136628, 449335, 1105069),
A = c(5067331.423,4856897.658,
4175123.217,3494987.878,3768201.526),
B = c(153, 146, 115, 108, 133),
C = c(58.247345, 50.548263, 30.994029, 20.521175, 28.040035),
D = c(609026, 595426.8, 598968.2, 544902.2, 556805.2)
)

# Creating tsibble
df <- df%>%
select(everything(), -sales)%>%
gather(key = "factors", value = "value", -month)%>%
mutate(month = yearmonth(month))%>%
as_tsibble(key = `factors`, index = `month`)

# Plotting Variables
df%>%
autoplot(value)
``````

Hopefully, we can see trends for all independent variables once standardization takes place or please recommend some other efficient method. Once we do that, I hope we don't need to do anything with sales as that is the target variable and I don't want to make any changes to it.

Hi @ksingh19,

how about plotting relative values where the first one is 1 or (100%)?

``````df %>%
# parse the 'month' and convert it to a date being the first of the month
mutate(DATE = readr::parse_date(month, format = "%Y %b")) %>%
select(-month) %>%
# make a long table to be able to work in groups
gather(key = "key", value = "value", -DATE) %>%
group_by(key) %>%
arrange(DATE) %>%
# the normalised value starts from 1
mutate(NORMALISED_VALUE = value / first(value)) %>%
ungroup() %>%
ggplot(aes(x = DATE, y = NORMALISED_VALUE, colour = key)) + geom_line()
`````` Thanks @smichal!

It does help! But because I am working on time series, I don't think relative values help that much. But this is definitely a good alternative, if we can't figure out the right presentation.

An alternative could be a facet plot with a free y axis. This is a way to preserve the original numbers.

``````df %>%
# parse the 'month' and convert it to a date being the first of the month
mutate(DATE = readr::parse_date(month, format = "%Y %b")) %>%
select(-month) %>%
# make a long table to be able to work in groups
gather(key = "key", value = "value", -DATE) %>%
group_by(key) %>%
arrange(DATE) %>%
# the normalised value starts from 1
mutate(NORMALISED_VALUE = value / first(value)) %>%
ungroup() %>%
ggplot(aes(x = DATE, y = value, colour = key)) +
geom_line() +
facet_grid(rows = vars(key), scales = "free_y")
`````` Thank you so much! This is great! We didn't need to standardize 1 Like

Is there a way to remove the y-axis ticks in this plot. I was trying to add the following at the last line in your code above, but nothing changes:

``````theme(axis.text.y = element_blank())
``````

Have your tried `scale_y_continuous(labels = NULL)`?

``````library(tidyverse)
ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point() + scale_y_continuous(labels = NULL)
`````` Created on 2020-09-21 by the reprex package (v0.3.0)

Thank you so much! It works great!

1 Like