Fill the outliers with NAs.

My code is following, however, ti does no change in my dataset:

# create a vector of outliers for the numeric factor
outliers <- boxplot(df$SupDem, plot = FALSE)$out

df_a[df_a$SupDem %in% outliers, "SupDem"] = NA

Also, I am not sure for what "SupDem" as a part of df_a[df_a$SupDem %in% outliers, "SupDem"] = NA stands.

Hi,

The first line of code is finding the set of outliers (as described by a boxplot) in the "SupDem" column of your dataframe.

The second line will change all these outliers into NA, so effectively remove them.

If there are no outliers in the plot, nothing will change (you can check that by setting plot = TRUE in the boxplot).

Hope this helps,
PJ

Many thanks for your helpful comment. It has been very useful. This outlier analysis is based on one variable but what if I am curious about the outliers where I consider dependent and independent variables together? For example, I have a model SupDem is my dependent variable and distance is my independent variable. When I plot them with the following code:

ggplot(df_a, aes(x=distance,y=SupDem)) +geom_point(shape=1, color="orange",
                                                            stroke=0.70, size=1) +
  geom_smooth(method=lm, se = FALSE, size=0.70) 

There can be outlier. How could I find those outliers and fill with NAs?

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.