Categorical and Influential Points

I am trying to eliminate the categorical outliers and influential data points in the faraway infmort dataset. Every time I try to use the cooks distance to get rid of them, I wipe out all of my data. The outliers are Saudi Arabia, Afghanistan, Nigeria. Any assistance in this matter is greatly appreciated.

library(faraway)
data(infmort)
IMR<-infmort

IMR <- na.omit(IMR)
summary(IMR)

lm.1=lm(mortality~., IMR)
summary(lm.1)
plot(lm.1)

influential <- as.numeric(names(cooks.distance>1) )
IMR <- IMR[-influential,] <- this where the data disappears

This should work:

# don't run this bit of your code
# IMR <- IMR[-influential,] 

library(tidyverse)

# get cooks distance values
cook_outlier <- cooks.distance(lm.1) 

# put in dataframe
cooks <- enframe(cook_outlier) %>% 
  mutate(name = str_trim(name)) # had to clean up name column

# data frame with cooks distance values
joined <- IMR %>%
  rownames_to_column(var="name") %>% 
  mutate(name = str_trim(name)) %>% # had to clean up name column
  left_join(cooks, by = "name")

# outliers removed
joined %>% 
  filter(value < 1)

It produces a tibble with the cooks distance value:

> head(joined)
       name   region income mortality            oil        value
1 Australia     Asia   3426      26.7 no oil exports 8.322499e-03
2   Austria   Europe   3350      23.7 no oil exports 6.532994e-05
3   Belgium   Europe   3346      17.0 no oil exports 7.182426e-07
4    Canada Americas   4751      16.8 no oil exports 9.367508e-04
5   Denmark   Europe   5029      13.5 no oil exports 7.037544e-05
6   Finland   Europe   3312      10.1 no oil exports 1.047359e-04

Thank you so much! You're a rock star!

1 Like

@williaml does it as I would; to understand where your method failed look at the data this way

fit = lm(mpg ~., mtcars)
summary(fit)
#> 
#> Call:
#> lm(formula = mpg ~ ., data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.4506 -1.6044 -0.1196  1.2193  4.6271 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) 12.30337   18.71788   0.657   0.5181  
#> cyl         -0.11144    1.04502  -0.107   0.9161  
#> disp         0.01334    0.01786   0.747   0.4635  
#> hp          -0.02148    0.02177  -0.987   0.3350  
#> drat         0.78711    1.63537   0.481   0.6353  
#> wt          -3.71530    1.89441  -1.961   0.0633 .
#> qsec         0.82104    0.73084   1.123   0.2739  
#> vs           0.31776    2.10451   0.151   0.8814  
#> am           2.52023    2.05665   1.225   0.2340  
#> gear         0.65541    1.49326   0.439   0.6652  
#> carb        -0.19942    0.82875  -0.241   0.8122  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.65 on 21 degrees of freedom
#> Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
#> F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07
par(mfrow = c(2,2))
plot(fit)

cooks.distance(fit) > 1
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>               FALSE               FALSE               FALSE               FALSE 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>               FALSE               FALSE               FALSE               FALSE 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>               FALSE               FALSE               FALSE               FALSE 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>               FALSE               FALSE               FALSE               FALSE 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>               FALSE               FALSE               FALSE               FALSE 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>               FALSE               FALSE               FALSE               FALSE 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>               FALSE               FALSE               FALSE               FALSE 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>               FALSE               FALSE               FALSE               FALSE
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.