How to remove outlier from a scatter plot

Good Afternoon dear all,
Please considering this data, I will like to know the best way to check for outliers in a scatter plot without using the ggplotly function from plotly. Thanks

Load Required Packages

library(tidyverse)
library(ggpmisc)
library(plotly)
library(ggrepel)

Load the Data

LeafArea <- tibble::tribble(
  ~Planting,  ~Variety, ~Inoculation,      ~Fertilizer, ~Plant, ~leaf, ~mid.lobe.length, ~mid.lobe.width,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     1L,    1L,             15.4,             4.6,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     1L,    2L,             12.3,             3.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     2L,    1L,             15.6,             4.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     2L,    2L,               18,             6.5,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     3L,    3L,               44,             6.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     3L,    4L,             17.1,             6.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     4L,    3L,             14.9,             6.9,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     4L,    4L,             21.5,             7.2,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     5L,    5L,             20.2,             7.1,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     5L,    6L,             21.5,             6.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     6L,    5L,             22.5,             7.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     6L,    6L,             18.7,             6.5,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     7L,    7L,             17.2,             6.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     7L,    8L,             20.8,             6.4,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     8L,    7L,               60,              18,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     8L,    8L,               18,             7.1,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     9L,    9L,               16,             6.2,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",     9L,   10L,             15.5,             4.7,
      "May", "TME 419",         "No", "0 kg P2O5 ha-1",    10L,    9L,             11.8,             3.4
  )

Plot

p1 <- ggplot(data = LeafArea, aes(x = mid.lobe.length,y = mid.lobe.width)) +
  stat_poly_line(fullrange = T) +
  stat_poly_eq(use_label(c("eq","R2","P"))) +
  #stat_poly_eq(label.y = 0.8)+
  geom_point()+
  labs(y='Actual Mid-lobe Width (cm)',x='Mid Lobe Length (cm)')+
  #coord_cartesian(xlim = c(0,1),)+
  theme_test()

p1

Checking for Outliers

ggplotly(p1)

I think the basic procedure is to eyeball the graph and then manually delete the outliers in your tibble or data.frame.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.