Help with Standardization on multiple variables for time series data

Hi,

I am working on predicting sales using some multiple independent variables and time series forecasting. I am still learning time series and you might see more questions related to this from me.

My data includes independent variables that come in different ranges. Because of this, when I try to plot all variables to see their trend, some of them have almost zero values in line plot. So, I am thinking if we can standardize all these variables except sales and month as I want to predict sales based on those independent variables. However, I am not sure how to proceed with that. In my example below, trend on variable A is clearly seen in the line plot, but C and D are at the very bottom and B is almost lost. I would like to see the trend for all these variables. Perhaps, all these 4 variables can have values in the same range. But not sure how to achieve that.

# Example Data
df <- data.frame(
  stringsAsFactors = FALSE,
                month = c("2020 Jan","2020 Feb",
                          "2020 Mar","2020 Apr","2020 May"),
             sales = c(2061292, 2087140, 2136628, 449335, 1105069),
                    A = c(5067331.423,4856897.658,
                          4175123.217,3494987.878,3768201.526),
                 B = c(153, 146, 115, 108, 133),
                 C = c(58.247345, 50.548263, 30.994029, 20.521175, 28.040035),
                 D = c(609026, 595426.8, 598968.2, 544902.2, 556805.2)
   )

# Creating tsibble
df <- df%>%
  select(everything(), -sales)%>%
  gather(key = "factors", value = "value", -month)%>%
  mutate(month = yearmonth(month))%>%
  as_tsibble(key = `factors`, index = `month`)

# Plotting Variables
df%>%
  autoplot(value)

Hopefully, we can see trends for all independent variables once standardization takes place or please recommend some other efficient method. Once we do that, I hope we don't need to do anything with sales as that is the target variable and I don't want to make any changes to it.

Thanks for your help!

Hi @ksingh19,

how about plotting relative values where the first one is 1 or (100%)?

df %>% 
    # parse the 'month' and convert it to a date being the first of the month
    mutate(DATE = readr::parse_date(month, format = "%Y %b")) %>% 
    select(-month) %>% 
    # make a long table to be able to work in groups
    gather(key = "key", value = "value", -DATE) %>% 
    group_by(key) %>% 
    arrange(DATE) %>% 
    # the normalised value starts from 1
    mutate(NORMALISED_VALUE = value / first(value)) %>% 
    ungroup() %>% 
    ggplot(aes(x = DATE, y = NORMALISED_VALUE, colour = key)) + geom_line()

image

Thanks @smichal!

It does help! But because I am working on time series, I don't think relative values help that much. But this is definitely a good alternative, if we can't figure out the right presentation.

Thanks for your help!

An alternative could be a facet plot with a free y axis. This is a way to preserve the original numbers.

df %>% 
    # parse the 'month' and convert it to a date being the first of the month
    mutate(DATE = readr::parse_date(month, format = "%Y %b")) %>% 
    select(-month) %>% 
    # make a long table to be able to work in groups
    gather(key = "key", value = "value", -DATE) %>% 
    group_by(key) %>% 
    arrange(DATE) %>% 
    # the normalised value starts from 1
    mutate(NORMALISED_VALUE = value / first(value)) %>% 
    ungroup() %>% 
    ggplot(aes(x = DATE, y = value, colour = key)) + 
    geom_line() +
    facet_grid(rows = vars(key), scales = "free_y")

image

Thank you so much! This is great! We didn't need to standardize :slight_smile:

1 Like

Is there a way to remove the y-axis ticks in this plot. I was trying to add the following at the last line in your code above, but nothing changes:

theme(axis.text.y = element_blank())

Have your tried scale_y_continuous(labels = NULL)?

library(tidyverse)
ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point() + scale_y_continuous(labels = NULL)

Created on 2020-09-21 by the reprex package (v0.3.0)

Thank you so much! It works great!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.