# Having trouble finding was test to run for significance

I'm having trouble finding which kind of test to do that can show the significance of price on consumption in two different countries. I am comparing the price of oil and consumption from Saudi Arabia and Japan and trying to prove that a price increase holds a bigger significance in the consumption for Saudi Arabia's than it does in a non resource dependent county like Japan.

Sounds to me like you could use a regression model for consumption ~ price * country to see whether there is any effect modification (i.e. whether the effect of price on consumption varies by country).

Something like:

lm(consumption ~ price * country, data = data)


Hi, and welcome!

A couple of things preliminarily. Most of what we do here is to help solve usage errors illustrated in an reproducible example, called a reprex. Also, there's a homework policy, if that's applicable.

@mattwrketin sketched an approach. I'd like to add some foundational comments on two important parts of your question that most beginners in classical statistics struggle with: "proof" and "significance."

Classical statistics doesn't prove hypotheses, it either rejects or fails to reject them. In your example, it may be possible to say that a difference between exists and the difference isn't, to a greater of lesser degree of confidence, likely to be due simply to chance.

In formal terms, we say that the null hypothesis, conventionally designated H_0 is that some statistical measure, or parameter like a mean, is observed simply by chance. The alternative hypothesis, H_1, is that the test statistic is not simply random. When we reject H_0, we accept H_1; when we fail to reject H_0, we reject H_1.

The distinction takes on a greater urgency when you consider confirmation bias, the human tendency to look for reasons to prove what we want to believe.

The second foundational point is the importance of understanding the word significance in statistics.

You keep using that word. I do not think it means what you think it means. Iñigo Montoya

Here are a few things it doesn't mean:

• meaningful
• important
• proven

Statistical significance is a measure of the likelihood that the result of a test statistic is due to random variation.

You often hear of some result being described as having a 95% probability or confidence. What this comes down to is that there is one chance in 20 that the result is just random noise. Technically, the measure is called a p-value based on a confidence level called \alpha, and for the 95% example \alpha = 0.05, so 1 - \alpha = 0.95.

I call this value of \alpha passing the laugh test. Think of four five-shot revolvers lying on the table with a single bullet in one of them. Hold it to your head and pull the trigger for \$1 million? Not a great bet.

On to the choice of a statistical design for your problem. Let's focus on one country at a time, say Japan.

We have two variables, the price of oil and consumption.

Price of oil is usually measured by the "barrel" of 42 US gallons, or just short of 159 liters. But, there are different types of crude oil and therefore different benchmarks. You will want to use one that is appropriate to Japan, considering the markets in which it buys crude.

Crude oil is not directly useful for much beyond refining into petrochemical products, so you need to consider what measure of consumption is relevant. As the industrial input? Or as an ultimate consumer product, such as gasoline or heating oil?

Once you've picked the units of measurement for prices, it's time to assign them roles as dependent and independent variables. Conventionally, Y is used for the dependent, and X for the independent variable.

So, how are Y and X associated?

A very useful rule of thumb is to start off with the assumption that the association is linear, because so many things are. To restrict @mattwrketin's model to just Japan

fit <- lm(Y ~ X, data = prices)


Why just Japan? Because if there's no association between price and consumption in Japan, there's no reason to look into differences with Saudi Arabia.

Linear regression is easy to do, but can be more difficult to interpret results. See my post for an orientation.

1 Like

Just to add a bit here in case you get to that. If you run some sort of model akin to what @mattwarkentin suggests, you would still need to test a linear inequality constraint on the coefficients (in this case a trivial one - if one of them associated with Saudi Arabia is larger than the one associated with Japan). An interesting package I just came across that can help you with this is restriktor (yes with a "k"). But you need to be careful interpreting the coefficients of a linear regression with an interaction term - there should be plenty of resources to dig into that. I guess that the post @technocrat mentions might be useful to that end

1 Like

I didn't go as far into lm as interactions, plot interpretation or validation of the required assumptions, so thanks for pointing those out.