There's a package, rdd, which I haven't used, that implements regression discontinuity resign. I'd recommend, before plunging in, however, some warm up exercises.
When you have quantitative variables, even with categorical variables thrown in, it often pays to start out with ordinary least squares regression.
First set up your data structure. These are the columns in a data frame or tibble, called df
sales_id <chr>
prodA_4Qv <dbl>
prodB_4Qv <dbl>
bonus <dbL>
prodA_1Qv <dbl>
prodB_1Qv <dbl>
Populate df
with your data and do a little data exploratory analysis
- What are the 4Q ratio and 1Q product ratios for each sales_id?
- What is the difference between the ratios?
- If there's no difference, are you still interested in the effect of the bonus?
- If there is a difference, what test statistic and p-value is appropriate to use to test whether the difference is due to chance (more formally, the null hypothesis is that the ratios are not statistically different). Depending on the number of observations, you might use
Student's t
, for example, if you only have 20 or so sales_id records.
OK, assuming there is a statistically significant difference, let's develop two models.
The first model is trivial f(x) = y, where x is the volume of product A and y is the bonus, which illustrates that we don't really need to model at all for the first half of the data.
The second model has a psr
product sales ratio as the response variable, bp the prior bonus, and bc
the current bonus.
fit <- lm(psr ~ bp + bc, data = df)
summary(fit)
plot(fit)
You might also try adding an interaction term
fit2 <- lm(psr ~ bp + bc + bp*bc, data = df)
summary(fit)
plot(fit)
or just the interaction
fit3 <- lm(psr ~ bp*bc, data = df)
summary(fit)
plot(fit)
Armed with these results, I think you'll have a better idea of how to construct an rdd
model.