# How to calculate McLoone Index?

The McLoone Index divides the summation of all observations below the median, by the median multiplied by the number of observations below the median.

Example:

number of employees --- salary

2 --- 1000
4 --- 200
6 --- 100
6 --- 60
8 --- 45
12 --- 24

In this example, the summation of observations below the median = 603, and the median = 45 Thus, the McLoone Index = 603/(45(19)) = .7053

Hi, and welcome!

The key to getting more and better answer lies in the FAQ: What's a reproducible example (`reprex`) and how do I do one?

For your problem, at a minimum it should look like

``````dat <- c(1000,1000,200,200,200,200,100,100,100,100,100,100,60,60,60,60,60,60,45,45,45,45,45,45,45,45,12,24,12,24,12,24,12,24,12,24,12,24,12,24,12,24,12,24,12,24,12,24,12,24)
``````

Created on 2020-02-13 by the reprex package (v0.3.0)

and, ideally, what you have tried.

Also, if applicable, see FAQ: Homework Policy

So, to start you need the median.

``````med <- median(dat)
``````

Next, you need to subset for all observations below the median (assuming that you don't be at or below the median).

``````lower <- dat[dat < med]
``````

The sum is easy enough

``````tot <- sum(lower)
``````

as is the number of observations

``````obs <- length * tot
``````

And with that, the `+` operator will give you a final result.

1 Like

@technocrat Thank you for your help, but actually I have a big database about educational information and my difficulty it's about how I can calculate this index separated by state. I was using the function aggregate to calculate other indexes separated by states, but I couldn't do it with this index. I think that the example below illustrates better my problem. The database is called of fundeb and the variables are StateCode and RevenuePerStudent.

StateCode / RevenuePerStudent

1 --- 100
1 --- 100
1 --- 110
1 --- 130
1 --- 150
2 --- 200
2 --- 200
2 --- 205
2 --- 210
2 --- 230
3 --- 250
3 --- 250
3 --- 260
3 --- 280
3 --- 290
4 --- 307
4 --- 324
4 --- 320
4 --- 334
4 --- 350

You should consider wrapping technocrats solution into a function that can receive a df and calculate the statistic on it.
Then you could cut up your master df into a list of DFS each in a 'by group' and then use purrr packages map function to call your statistic calculation on each group df passed in.

Well that's the high-level tactic of one possible approach.

:Edited my purrr typo, thank technocrat:

1 Like

`purrr` (I hate it when my fingers do that!)

Can we get a `reprex` for that?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.