Modeling price elasticity of demand for product categories

Hello, I want to calculate/estimate the price elasticity of demand for different products or product categories. It would be great to be guided towards literature or better yet tutorials or packages.
Let's say I've got a dataframe like the following:


df <- tibble(cafe_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 
                         2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3), 
             timestamp = structure(
               c(1596283511, 1596283511, 1596287730, 1596287730, 1596287730, 
                 1596370505, 1596378076, 1596386672, 1596386672, 1596283999, 
                 1596283999, 1596283999, 1596291727, 1596368783, 1596368783, 
                 1596369685, 1596285196, 1596285327, 1596285327, 1596286869, 
                 1596291836, 1596293731, 1596293731, 1596370648, 1596370648), 
               class = c("POSIXct", "POSIXt"), tzone = "UTC"),
             invoice_id = c(1, 1, 2, 2, 2, 3, 4, 5, 5, 1, 1, 1, 2,
                            3, 3, 4, 1, 2, 2, 3, 4, 5, 5, 6, 6), 
             item = c("coffee", "latte macchiato", "cake", "cappuchino", 
                      "tea", "coffee", "cake", "espresso", "orange juice", 
                      "tea", "bagel", "espresso", "coffee", "ice cream", 
                      "espresso", "cake", "latte macchiato", "cake", 
                      "cappuchino", "bagel", "espresso", "tea", "cake",
                      "coffee", "bagel"), 
             quantity = c(1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 3, 1, 1, 
                          2, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2), 
             item_price = c(2, 3, 3, 2.5, 2.5, 2, 3, 1.5, 2.6, 
                            2.5, 3, 1.5, 2, 2.3, 1.5, 3, 3, 3,
                            2.5, 3, 1.5, 2.5, 3, 2, 3), 
             total = c(2, 3, 6, 5, 2.5, 4, 3, 3, 2.6, 5, 9, 1.5, 2, 
                       4.6, 4.5, 3, 3, 3, 2.5, 3, 3, 2.5, 3, 4, 6))

#> # A tibble: 6 x 7
#>   cafe_id timestamp           invoice_id item          quantity item_price total
#>     <dbl> <dttm>                   <dbl> <chr>            <dbl>      <dbl> <dbl>
#> 1       1 2020-08-01 12:05:11          1 coffee               1        2     2  
#> 2       1 2020-08-01 12:05:11          1 latte macchi~        1        3     3  
#> 3       1 2020-08-01 13:15:30          2 cake                 2        3     6  
#> 4       1 2020-08-01 13:15:30          2 cappuchino           2        2.5   5  
#> 5       1 2020-08-01 13:15:30          2 tea                  1        2.5   2.5
#> 6       1 2020-08-02 12:15:05          3 coffee               2        2     4

This is the scenario:
I've got several cafés of the same brand (e.g. Starbucks) which offer more or less the same products with the same price. I want to calculate the own and cross price elasticity for hot drinks (coffee, tea, espresso, etc.) as the demand of e.g. normal coffee is probably influenced by the prices of cappuchino, espresso etc.

As I have several cafés offering the same products I want to capture the hierarchical nature of the data with a mixed effects model. I have time series data for around 2 years of operations during which the prices of the products have changed. Now I'm wondering how I would implement such a model (maybe even in the tidymodels framework?). Any help is much appreciated. Thanks a lot.

Run your code and add these actions to your business:

df <- df %>%
mutate(offer = case_when(
item_price == 1 ~ "Offer Mints",
item_price >= 2 & total <= 3 ~ "Chocolate",
total >=1 ~ "Offer a hug",
total >6 ~ "Gift a Mints"))

df <- df %>%
mutate(as_the_demand = case_when(
item == "espresso" ~ "ask why preffer espreso",
item == "tea" ~ "invite answer the form"))


I hope this helps!

Now I have time for a linear regression for your business, I hope this helps!

cor(df$total, df$item_price, method = c("pearson")) # I got 0.2886805 no so far at least!

linear model of total according to bevarage.

li <- lm(total ~ item_price, data=df)

data frame with the variable you want to plot against item_price
predic <- data.frame(yourprediction = predict(li, df), itemprice=df$item_price)

The predicted line of linear regression
ggplot(data = df, aes(x = total, y = item_price)) +
geom_point(position = "jitter", aes(color = item))+
geom_line(color='red',data = predic, aes(x=yourprediction, y=itemprice))+
ylab("This is your item price")+
xlab("This is how your Total increments")+
ggtitle("Red Regression Line", subtitle = "item_price is red line, that increments")+

In conclusion, the total will increment, if you increment the price, but your total don´t increment too much as x_axis shows

Thanks for your response. I am interested in the estimates for own and cross price elasticity. I think your post does not really answer this question. The dataframe is also just a small example of how the data looks like.

I'm not an expert in mixed/hierarchical models, but it seems like a good idea to accomplish the clustering of different items.

I would start with something like this.

Keep in mind though that elasticity is not the same as slope as far as I understand it.

lmer(quantity ~ (item_price | item), data = df) %>% coef()

Others have made some helpful suggestions, but let me add two points:

(1) As you probably know, estimation of elasticities is typically done with a log-log specification.
(2) Estimation of demand elasticities is the classical example of where a least squares regression doesn't work. The issue is that the prices are likely correlated with the error term.


Thanks! With regard to (2) could you direct me to some literature dealing with this problem and what might be alternative regression methods?

Here is one link, Pretty much any econometrics text will discuss this.

The solution is to use instrumental variables--which is not always easy.

1 Like

Maybe this can help as an inspiration Food for Regression: Using Sales Data to Identify Price Elasticity - STATWORX
In your model try to track the market effect, e.g. at a given point in time what were average prices of competitors in the same category? Try to include lagged effects, e.g. past week sales may influence current sales (by the way, you may want to build a separate model for such kind of effects.) As you see your task is strongly connected to time, so pay attention in proper implementation of time effects in your model.
So called brand effect or halo effect should also be included in your model - how well a particular brand is recognized by customers etc., how sales of cappuchino affects sales of macchiato etc. You may want to have it as a result, but carefull examination of the data may give you hints what should be the shape of your eventual model.
And before you start modeling, take a close look at basic statistics of your data and look at it from different angles - plot sales curves (quantities and prices) against time axis. At this stage you may notice some crucial relationships.
And a general remark - these types of problems are not easy to solve and sometimes you are able to build a good quality model explaining the past, but with weak power to predict the future. Your data is also not very reach, only few predictors. Try some feature engineering to extend this set.
And as one of collegues said, you should work with log-log models.



Interesting. Maybe this is an easier approach (correcting for autocorrelation).

Unfortunately this does nothing about the problem of simultaneous equations bias, which is almost always an issue when estimating a demand elasticity.

1 Like

I appreciate your insight on this. I have an engineering background but am sometimes asked to advice problems related to pricing and production.

Is there a typical selection for exogenous variables in application to pricing? Perhaps a variable to account for inflation and another for market size?

How much of a concern is weak instruments? Wonder if the bias introduced by weak instruments might be more than the bias introduced by confounding of demand and price.

Great questions @arthur.t , although the answers aren't always simple.

As you suggest, you would want to account for inflation and market size. While there is no single way to do this, sometimes scaling all the variables makes sense. One usually wants to deflate all dollar figures. So instead of dollars use dollars divided by a price index such as the consumer price index.

Market size is harder. Sometimes one divides quantities by market size (perhaps population, perhaps something else). This makes sense for modeling individual behavior, but might miss out on what happens when the whole market grows. So one might scale by market size and then include a separate market size variable.

Weak instruments aren't likely to be worse than least squares, but if they are really weak they won't be much better either. (I can say this because together with a colleague, Charles Nelson, I'm the person who discovered the problem of weak instruments.)

1 Like