Fill the outliers with NAs.

korkut_keles · January 27, 2023, 12:35pm

My code is following, however, ti does no change in my dataset:

# create a vector of outliers for the numeric factor
outliers <- boxplot(df$SupDem, plot = FALSE)$out

df_a[df_a$SupDem %in% outliers, "SupDem"] = NA

Also, I am not sure for what "SupDem" as a part of df_a[df_a$SupDem %in% outliers, "SupDem"] = NA stands.

pieterjanvc · January 27, 2023, 1:07pm

Hi,

The first line of code is finding the set of outliers (as described by a boxplot) in the "SupDem" column of your dataframe.

The second line will change all these outliers into NA, so effectively remove them.

If there are no outliers in the plot, nothing will change (you can check that by setting plot = TRUE in the boxplot).

Hope this helps,
PJ

korkut_keles · January 27, 2023, 1:58pm

Many thanks for your helpful comment. It has been very useful. This outlier analysis is based on one variable but what if I am curious about the outliers where I consider dependent and independent variables together? For example, I have a model SupDem is my dependent variable and distance is my independent variable. When I plot them with the following code:

ggplot(df_a, aes(x=distance,y=SupDem)) +geom_point(shape=1, color="orange",
                                                            stroke=0.70, size=1) +
  geom_smooth(method=lm, se = FALSE, size=0.70)

There can be outlier. How could I find those outliers and fill with NAs?

system · March 10, 2023, 1:59pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.