Labelling Outliers with rowname boxplot

I want to put a label on my outliers in a box plot. I use factoextra.

I tried the solution "To label the outliers with rownamesrow names" (based on JasonAizkalns answer)" from this post Labeling Outliers of Boxplots in Rpost.

Here is the reprex of my problem :

library(factoextra)
#> Le chargement a nécessité le package : ggplot2
#> Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at 
library(tibble)
library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(reprex)
#> Warning: le package 'reprex' a été compilé avec la version R 3.5.3

data("ToothGrowth")
ToothGrowth$dose<-as.factor(ToothGrowth$dose)

is_outlier<-function(x){
  return(x<quantile(x,0.25)-1.5*IQR(x)|x>quantile(x,0.75)+1.5*IQR(x))
}

dat<-ToothGrowth%>%tibble::rownames_to_column(var="outlier")%>%group_by(dose)%>%mutate(is_outlier=ifelse(is_outlier(len),len,as.numeric(NA)))
dat$outlier[which(is.na(dat$is_outlier))]<-as.numeric(NA)

ggplot(ToothGrowth,aes(x=dose,y=len,fill=supp))+geom_boxplot()+
  theme_classic()+
  stat_summary(fun.y=mean, geom="point", shape=10, size=4,position = position_dodge(0.75))+
  geom_text(aes(label=outlier),na.rm=TRUE,nudge_y = 0.05)  
#> Error in FUN(X[[i]], ...): objet 'outlier' introuvable

Can you help me please?

Thanks in advance, Bérangère

If you change the data argument in ggplot() from ToothGrowth to dat, R will look for outlier in the right environment. Based on the output, you might want to change group_by(dose) to group_by(dose, supp) as well. Cheers!

library(tidyverse)
library(rlang)
library(factoextra)

is_outlier <- function(x) {
  x < quantile(x, 0.25) - (1.5 * IQR(x)) | x > quantile(x, 0.75) + (1.5 * IQR(x))
}

data("ToothGrowth")
tooth_growth <- ToothGrowth %>% 
  tibble::rownames_to_column(var = "outlier") %>% 
  mutate(dose = factor(dose))

dat <- tooth_growth %>% 
  group_by(dose) %>% 
  mutate(outlier1 = if_else(is_outlier(len), len, rlang::na_dbl)) %>% 
  group_by(dose, supp) %>% 
  mutate(outlier2 = if_else(is_outlier(len), len, rlang::na_dbl))

pos <- position_dodge(0.75)
ggplot(dat, aes(x = dose, y = len, fill = supp)) + 
  geom_boxplot() +
  stat_summary(fun.y = mean, geom = "point", shape = 10, size = 4, position = pos) +
  geom_text(aes(label = outlier1), color = 'red', 
            na.rm = TRUE, 
            nudge_y = 0.05) +
  geom_text(aes(label = outlier2), 
            na.rm = TRUE, 
            position = pos, 
            vjust = 0) +
  theme_classic()

Created on 2019-03-27 by the reprex package (v0.2.1)

2 Likes

Thanks a lot ! It works also with my data !

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.