How to remove spatial Outliers and Outliers from Boxplot (that are non spatial)

Greeting to all.

I have the following questions:

As you can see, in my data, there are some outliers , seen and recognised from the BoxPlot diagram.

How can i detect and remove them?

Also, how can i detect and remove AND the spatial outliers, in order to execute spatial interpolation techniques? (e.g. kriging)

Hello,

A boxplot is built like this


That means that everything below the min (< Q1 - 1.5*IQR) or above the max (> Q3 + 1.5*IQR) is considered an outlier. With this in mind we can then create a simple function to return for a vector of values whether they are an outlier or not:

set.seed(4) #For reproducibility

#Generate random data with outliers
x = runif(250)
x[sample(1:250, 10)] = c(runif(5, -1.5, -0.5), runif(5, 1, 2))

#Boxplot
boxplot(x)


#Outlier function
outlier = function(x){
  
  IQR = quantile(x, prob = c(0.25, 0.75))
  IQR[3] = IQR[2] - IQR[1]
  
  return(ifelse(x < IQR[1] - 1.5*IQR[3] | x > IQR[2] + 1.5*IQR[3], T, F))
}

#list outliers
outlier(x)
#>   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
#>  [13] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [61] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
#>  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [109] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [121] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE
#> [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [217]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

#Total outliers (verify on plot)
sum(outlier(x))
#> [1] 8

Created on 2023-01-10 with reprex v2.0.2

You can see that the functions detects 8 outliers, which is corresponding with what you can see on the plot.

NOTE: Removing the outliers and plotting again will change the boxplot shape and can create new outliers as the data has changed.

Hope this helps,
PJ

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.