I am trying to eliminate the categorical outliers and influential data points in the faraway infmort dataset. Every time I try to use the cooks distance to get rid of them, I wipe out all of my data. The outliers are Saudi Arabia, Afghanistan, Nigeria. Any assistance in this matter is greatly appreciated.
# don't run this bit of your code
# IMR <- IMR[-influential,]
library(tidyverse)
# get cooks distance values
cook_outlier <- cooks.distance(lm.1)
# put in dataframe
cooks <- enframe(cook_outlier) %>%
mutate(name = str_trim(name)) # had to clean up name column
# data frame with cooks distance values
joined <- IMR %>%
rownames_to_column(var="name") %>%
mutate(name = str_trim(name)) %>% # had to clean up name column
left_join(cooks, by = "name")
# outliers removed
joined %>%
filter(value < 1)
It produces a tibble with the cooks distance value:
> head(joined)
name region income mortality oil value
1 Australia Asia 3426 26.7 no oil exports 8.322499e-03
2 Austria Europe 3350 23.7 no oil exports 6.532994e-05
3 Belgium Europe 3346 17.0 no oil exports 7.182426e-07
4 Canada Americas 4751 16.8 no oil exports 9.367508e-04
5 Denmark Europe 5029 13.5 no oil exports 7.037544e-05
6 Finland Europe 3312 10.1 no oil exports 1.047359e-04