Calculate Alpha number for every 2nd overlap

I have the following code for the Alpha calculation:

df <- tibble::tribble(
~seq, ~date, ~sales,
1, "3/01/2017", 40,
2, "4/01/2017", 2,
3, "5/01/2017", 2,
4, "6/01/2017", 2,
5, "7/01/2017", 30,
6, "8/01/2017", 2,
7, "1/02/2017", 9,
8, "2/02/2017", 5,
9, "3/02/2017", 65,
10, "4/02/2017", 3,
11, "5/02/2017", 65
)
library(tidyverse)
library(magrittr)
library(psy)

df %<>% mutate(lagsales = lag(sales))
df2 <- rowwise(df) %>% mutate(z = cronbach(cbind(sales,lagsales))$alpha) %>% ungroup
df2

It creates the following output:

df2
seq date sales lagsales z
1 1 3/01/2017 40 NA -1.232498
2 2 4/01/2017 2 40 -1.232498
3 3 5/01/2017 2 2 -1.232498
4 4 6/01/2017 2 2 -1.232498
5 5 7/01/2017 30 2 -1.232498
6 6 8/01/2017 2 30 -1.232498
7 7 1/02/2017 9 2 -1.232498
8 8 2/02/2017 5 9 -1.232498
9 9 3/02/2017 65 5 -1.232498
10 10 4/02/2017 3 65 -1.232498
11 11 5/02/2017 65 3 -1.232498

Question:

I am looking to get the Alpha for overlap for every 2nd line
Can I use this formula to calc Chronbachs Alpha for each item?;

Reliability = N / ( N - 1)x(Total Variance - Sum of Variance for Each number)/Total Variance

So I would not have – 1.232498 for every item.Only probably for item 1and 2 together.but 2 and 3 would have another alpha and 3 and 4 together would also have another alpha .etc.et. it has to look at row level and taking 2 rows rolling into consideration.

What would the R code look like?

Hi,

I don't know anything about Cronbach (or if what you're trying to do makes sense). I also don't understand the lag idea, but I came up with an implementation of what you asked for. So beware this is just the implementation, I don't know if the values make any sense :slight_smile:

library(tidyverse)
library(psy)

#Number of rows before to take into account
rolling = 2

df <- tibble::tribble(
  ~seq, ~date, ~sales,
  1, "3/01/2017", 40,
  2, "4/01/2017", 2,
  3, "5/01/2017", 2,
  4, "6/01/2017", 2,
  5, "7/01/2017", 30,
  6, "8/01/2017", 2,
  7, "1/02/2017", 9,
  8, "2/02/2017", 5,
  9, "3/02/2017", 65,
  10, "4/02/2017", 3,
  11, "5/02/2017", 65
)

#Lag
df = df %>%  mutate(lagsales = lag(sales))

#Get the rolling alpha
df$alpha = c(
  rep(NA, rolling), 
  map_dbl((rolling + 1):nrow(df), function(x){
    cronbach(df %>% select(sales, lagsales) %>% slice((x-rolling):x))$alpha
  })
)
df
# A tibble: 11 x 5
     seq date      sales lagsales      alpha
   <dbl> <chr>     <dbl>    <dbl>      <dbl>
 1     1 3/01/2017    40       NA    NA     
 2     2 4/01/2017     2       40    NA     
 3     3 5/01/2017     2        2     0     
 4     4 6/01/2017     2        2     0     
 5     5 7/01/2017    30        2     0     
 6     6 8/01/2017     2       30    -2     
 7     7 1/02/2017     9        2    -4.44  
 8     8 2/02/2017     5        9    -1.48  
 9     9 3/02/2017    65        5    -0.0604
10    10 4/02/2017     3       65    -2.69  
11    11 5/02/2017    65        3 -3782 

You can adjust the rolling parameter to take a set number of previous rows into account, but know that that means that the first lag rows will have NA for alpha. I set the value to 2, because 1 as in your example generates a lot of NaN and Inf

Hope this helps,
PJ

Hi Pieter, can you help me confirm that I'm not crazy.
I believe that pasting the reprex provided by ron4, doesn't produce ron4's claimed output in a fresh session.
i.e. the z values are not all -1.232498, but rather NA (and for reasons I originally discussed with ron4 when I provided the code he has shared as his reprex... link
Or am I missing something?

I think ron4 must have applied the cronbach() function to the whole table:

df %>% 
  mutate(lagsales = lag(sales)) %>% 
  mutate(z = cronbach(cbind(sales, lagsales))$alpha)
1 Like

hi, looks great. i will test these values.
i will try to add Bet..dist and gamma.dist values per line for the same sample .

you guys are very knowledgeable :slight_smile:

Hi Nir,
Indeed ,z values are not all -1.232498. i will test the values with the master file against this sample and report back to the two of you
great support.:slight_smile:

Hi,

Indeed that value is if you use the whole table. Again, I just implemented the code, I have no idea how this statistic is used and if the rolling idea makes sense :slight_smile:

PJ

Hi PieterJan
(sounds Dutch...ben je Nederlands?)

I have amended the code to relate to a file. (sales1.xls) The sample code was perfect.
So now I have refer it to an excel file.

code as below:
read_file(sales1)
library(tidyverse)
library(psy)

#Number of rows before to take into account
rolling = 2

sales1 <- sales::sales( ~date, ~sales,)
#Lag
sales1 = sales1 %>% mutate(lagsales = lag(sales))

#Get the rolling alpha
sales1$alpha = c( rep(NA, rolling),
map_dbl((rolling + 1):nrow(sales1), function(x){cronbach(sales1 %>% select(sales, lagsales) %>% slice((x-rolling):x))$alpha
}))
sales1

it shows the following output:

sales1

A tibble: 341 x 4

Date sales lagsales alpha

1 03-01-2017 40 NA NA
2 04-01-2017 2 40 NA
3 05-01-2017 2 2 0
4 06-01-2017 2 2 0
5 07-01-2017 30 2 0
6 08-01-2017 2 30 -2
7 01-02-2017 9 2 -4.44
8 02-02-2017 5 9 -1.48
9 03-02-2017 65 5 -0.0604
10 04-02-2017 3 65 -2.69

... with 331 more rows

Question: how do i get to see the other 331 rows?

cheers

Hi,

If you just run the variable sales, it will be printed in the console, and there the results are trimmed to prevent too much space being taken up.

To see more, either click the variable in the environment pane (with default settings top-right window in RStudio) or save it as a csv file

write.csv(sales1, "sales1.csv", row.names = F)

Or as Excel

library(xlsx)

write.xlsx(sales1, "sales1.xlsx", row.names = F)

You can now open the file in Excel or any other program to explore the results (Excel can also read csv files, you can set them to be opened automatically with Excel in Windows).

Hope this helps,
PJ

PS: Ja ik ben Belg :slight_smile:

Hi,
so the code would be ?

read_file(sales1)
library(tidyverse)
library(psy)

#Number of rows before to take into account
rolling = 2

sales1 <- sales::sales(
~date, ~sales,)
#Lag
sales1 = sales1 %>% mutate(lagsales = lag(sales))

#Get the rolling alpha
sales1$alpha = c(
rep(NA, rolling),
map_dbl((rolling + 1):nrow(sales1), function(x){
cronbach(sales1 %>% select(sales, lagsales) %>% slice((x-rolling):x))$alpha
})
)
sales1
write.csv(sales1, "sales1.csv", row.names = F)

is this correct?

ps. Mijn grootouders komen uit Zaventem.

Hi,

I don't know the read_file command, but if the input file is also an Excel file I'd do this

library(tidyverse)
library(psy)
library(xlsx)

#Number of rows before to take into account
rolling = 2

#Load the data
sales1 = read.xlsx("sales1.xls", 1)

#Lag
sales1 = sales1 %>% mutate(lagsales = lag(sales))

#Get the rolling alpha
sales1$alpha = c(
rep(NA, rolling),
map_dbl((rolling + 1):nrow(sales1), function(x){
cronbach(sales1 %>% select(sales, lagsales) %>% slice((x-rolling):x))$alpha
})
)

write.csv(sales1, "sales1.csv", row.names = F)
#OR
write.xlsx(sales1, "sales1.xlsx", row.names = F)

PJ

Geweldig.
Het is gelukt.
Het script laat alle Alpha's zien. En n/a waar geen baseline is.
Geweldig!

Great ,it worked, Success. The script shows all Alpha's and N/a as where there is no baseline.
Great work.

My next step is now to add a column with the Beta.dist

cheers.

Hallo PieterJan
Ik probeer de beta en gamma te berekenen.
Ik heb de volgende packages
geinstalleerd
read_file(sales1)
library(tidyverse)
library(psy)
library(beta)
library(LearnBayes)
library(lm.beta)

Is de code voor de alpha hetzelfde voor de beta en gamma berekening ?
Kan ik het woord alpha vervangen voor beta en consequent ook voor de gamma factor?

groet.

Hi @ron4, your note wasn't sent to @pieterjanvc, but I can try answer some of your questions: The cronbach function takes a table as input and gives a number as output, but lm.beta() gives a model as output, and rinvgamma() which you asked about here, gives random values from a ditribution. So to asnwer you question, I would say that, no, you can't change 'beta' or 'gamma' for 'alpha' and have the calculations work out -- that's why I had asked what your motivation was for using the beta- and gamma-related commands. What were you hoping to use them for?

Hi David.
Thank you for clarification. I would need to see the the trend line and seasonal pattern per two numbers .Goal is to both Beta and gamma compare with the trend of the Alpha.
I work in the food industry where we have irregular demand of some produced recipes within '2 day cycles ' and influenced by our production plan.
Hope the above info helps :slight_smile:

Hi @ron4, I guess what I was trying to ask was why you think using lm.beta() would be helpful: How would it help you understand trends? Is there something you've read where it was used? Maybe there are others who may understand, but without more information, I'm not sure I'll be able to be of much help.

Hi David
No worries.Maybe I can help you understanding the Alpha,Beta and Gamma and their relation towards each other.
Maybe the following internet site will provide you some back ground info.if you need some more info , i will be happy to explain.I am a six sigma black belt and very familiar with ABG,

Here's the internet link
https://docs.rapidminer.com/9.3/studio/operators/modeling/time_series/forecasting/holt-winters_trainer.html

cheers

Btw,maybe i try to use the incorrect code for the beta and alpha?
Is there another Beta or and gamma code is should use?

cheers

Thanks for the reference, @ron4 : It looks like you don't want Cronbach's alpha, or lm.beta(), or anything related to the gamma distribution as far as I can tell, since alpha, beta, and gamma seems to have a completely meanings from those in your context. Instead, it looks like you may want to use the HoltWinters() function, which you can find out more about by simply running ?HoltWinters since it turns out be a base R function. I hope that helps.

Hi David, Thank you for your help.I kinda had a bit of a feeling i was having the wrong Alpha as it usually provides different output.

I will work with the Holtwinters .

if using the following code, would that work?

HoltWinters(x, alpha = NULL, beta = NULL, gamma = NULL,
seasonal = c("additive", "multiplicative"),
start.periods = 2, l.start = NULL, b.start = NULL,
s.start = NULL,
optim.start = c(alpha = 0.3, beta = 0.1, gamma = 0.1),
optim.control = list()

I would need to use the packageHoltwinters() i guess?
How do i get the ABG shown in my table per separate line?

current reprex:

read_file(sales1)
library(tidyverse)
library(psy)

#Number of rows before to take into account
rolling = 2

sales1 <- sales::sales(
~date, ~sales,)
#Lag
sales1 = sales1 %>% mutate(lagsales = lag(sales))

#Get the rolling alpha
sales1$alpha = c(
rep(NA, rolling),
map_dbl((rolling + 1):nrow(sales1), function(x){
cronbach(sales1 %>% select(sales, lagsales) %>% slice((x-rolling):x))$alpha
})
)
sales1

cheers