Problem removing the outlier from ggplot

Hi all,

I'm running the code below:

snp_rs9303277<-as.factor(age16_RV_SNP_Rawdata$rs9303277_C)
levels(snp_rs9303277)<-c("TT","CT","CC")
p1<-ggplot(aes(x=as.factor(rs9303277_C),y=IFN_beta_RV1B),data = age16_RV_SNP_Rawdata)+theme(axis.text.y=element_text(size = 12),axis.text.x = element_text(size = 12),axis.title = element_text(size = 16))+geom_boxplot(outlier.size = -1)+xlab("rs9303277_C")+ylab("pg/ml")+ ggtitle("RV1B IFN-beta age16")+
  scale_x_discrete(labels=c("0" = "TT", "1" = "CT", "2" = "CC"))
A1<-p1+stat_compare_means(label.y = 30)+stat_compare_means(comparisons = my_comperisons)+
  geom_jitter(position = position_jitter(0.15),aes(color=snp_rs9303277))
A1

My plot looks like this:

My data is from 0 to around 20 units, but I have one outlier with 38 units.
Can you please let me know to remove the outlier?

Thank you,
Eteri

Hi @ebakhsol. geom_jitter have no outlier argument. If you really want to remove data point, filter the data by filter(age16_RV_SNP_Rawdata, IFN_beta_RV1B < 20) before plotting.

2 Likes

This is a good solution for this specific simple case but in general you may want to identify the outliers using a known method, you could define your own outlier function and filter the data with something like this.

library(dplyr)

is_outlier <- function(x) {
    return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

age16_RV_SNP_Rawdata %>% 
    group_by(snp_rs9303277) %>% 
    mutate(outlier = is_outlier(IFN_beta)) %>% 
    filter(outlier == FALSE)

3 Likes

I get this error message when I applied the code you suggested:

Error in mutate_impl(.data, dots) :
Evaluation error: missing values and NaN's not allowed if 'na.rm' is FALSE.

My IFN_beta_RV1B has some NA.
thanks.

Hi @ebakhsol!

This (and any other) community is not intended for doing your work for you. We can guide you to the solution, but you'll have to put some efforts. After all, the problem is actually yours.

This particular message is very much self-descriptive. You already know that there are some missing values, and it specifically says that you're having problems because na.rm is set as FALSE. Have you tried changing it to TRUE?

Both the functions (quantile and IQR) that Andres used have an argument na.rm, which is set as FALSE by default. Try to change it and see what happens. In case you're unfamiliar to these functions, read the documentation.

If that does not solve your problem, please provide a REPRoducible EXample of your problem.

1 Like

Dear Yarnabrina,

Yes, I tried to change it to TRUE:
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x), na.rm=TRUE)}

age16_RV_SNP_Rawdata %>%
group_by(rs9303277_C) %>%
mutate(outlier = is_outlier(IFN_beta_RV1B)) %>%
filter(outlier == FALSE)

but I got error message:
Error in mutate_impl(.data, dots) :
Evaluation error: multi-argument returns are not permitted.

thanks

Once again, the error message is very helpful.

So, R says you that you can't give more than one argument to return, and it says so because you supplied na.rm=TRUE to the return, instead of the desired functions quantile and IQR.

Please go through the documentation of these functions. You're supposed to call like quantile(x = variable_of_interest, probs = probabilities_of_interest, na.rm = TRUE), and IQR(x = variable_of_interest, na.rm = TRUE) for each call.

An alternative will be to remove all the missing values a priori to avoid several na.rm's.

1 Like

The outlier function could also be defined this way

is_outlier <- function(x, ...) {
    return(x < quantile(x, 0.25, ...) - 1.5 * IQR(x, ...) | x > quantile(x, 0.75, ...) + 1.5 * IQR(x, ...))
}

age16_RV_SNP_Rawdata %>%
    group_by(rs9303277_C) %>%
    mutate(outlier = is_outlier(IFN_beta_RV1B, na.rm = TRUE)) %>%
    filter(outlier == FALSE)
2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.