How to Ignore one of the factors in a Column


#1

Hello all, my group and I trying to analyze our data frame in which most data types are factors. On one of our columns there are 6 types of factors (1,2,3,4, 9999 and blank(" ") we want to plot a bar graph with the following code, however when we do it, the bar graph gives us a bar for 1,2,3,4 9999 and also for blank , we want to ignore 9999 and the blank as factors, what code can we use? thanks in advance.

plot(tab_catalog)
barplot(tab_catalog)
barplot(prop.table(tab_catalog))
barplot(prop.table(tab_catalog),col=c("red", "orange"), 
        main="Distribution of Customers by Catalog Years", 
        ylim=c(0,1))
box(lwd=2)

#2

I think you can replace "" and 9999 by NA values before plotting. There is several ways to do that depending on the tools you want to use.

naniar :package: has really good tools for dealing with missing values. Among theme, you'll find naniar::replace_with_na that will do the job.

You can also recode your column (with dplyr's recode, if_else or case_when) to replace the values.

As you deal with factor, you may need more factors oriented tools. There are dplyr::recode_factor or even forcats::fct_recode where you can remove levels easily. forcats is the toolbox for factors in the tidyverse.

Indeed, you can also do it with base R too. At the end, it is as you like. :wink:

If you manage to provide a reprex, we can be sure it is the issue and we can work on an example for you.

rem: I also edited your post for formatting purposes


#3

Thank you so much for your help !


#4

If your question's been answered, would you mind choosing a solution? (see FAQ below for how) It makes it a bit easier to visually navigate the site and see which questions still need help.

Thanks


#5

Base R has the droplevels function:

x <- factor(c(1:4, 9999, " "))
x
# [1] 1    2    3    4    9999     
# Levels:   1 2 3 4 9999
droplevels(x, c("9999", " "))
# [1] 1    2    3    4    <NA> <NA>
# Levels: 1 2 3 4