Avoiding false discovery for a P-Value type test


I am currently reading the book Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques.

In one of the use cases of the book they recommend using Benford's Law to see anomalous values in a distribution that might suggest people were making up the numbers. This could be applied to things like tax returns or "lists of socioeconomic data submitted in support of public planning decision"

There seems to be a package in R that allows you to run a hypothesis test to compare two distributions to determine if the distribution you are looking at is the same as the Benford's distribution.

If I take an example of tax returns per company. Lets say for arguments sake, I have 1000 companies and 1000 tax returns which contain lots of income on items sold for each of these 1000 companies. Lets say i have lots of other categorical and numerical data associated with these companies as well. I want to predict fraudulent returns using a model. I am interested in using the Benford's test to highlight anomalous distributions based on each companies returns. If i get a significant p-value, I simply add it as a binary column (1/0) in my training set as anomalous_distribution.

My question is around the multiple comparisons. I am basically making 1000 of these tests which at a 0.05 p-value cutoff tells me, I'm going to have many false positives. Is there a way i can implement these tests to avoid this as a problem or at least mitigate it somewhat?

I think this can probably be applied to other domains apart from fraud, namely hypothesis tests with lots of groups but i just need a pointer in the right direction

Thanks very much for your time and i hope you have a lovely weekend


What you want is called the Bonferroni correction. It was created exactly for this problem to make your p value smaller in the face of multi testing.

In short, you divide your p value by the number of tests and that becomes your new cut-off on which to reject your null hypothesis. As you will see, it is a very conservative test and it can increase your chances of committing a Type II error (i.e failing to reject H0 where you should have rejected it). If you look at the literature you will find other corrections that are less conservative and potentially more used for problems like this. Bonferroni is one of the most well known ones.

You can read about it in more detail here:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.