factors and filter using value

Hello,
I'll be direct. I know, only a bit, how to convert a integer vector to a factor.
However, I don't know how to use the original values when I perform a filter using dplyr.
I mean, suppose this you have some data that contains the variable "region":

data$region=as.factor(data$region)
data$region=factor(data$region, levels=c(1,3,2 ), labels=c("a","Z+","koko")

Obviously if I try this It fails:

data %>% filter( region==2)

It only works using the factor labels

data %>% filter( region=="Z+")

So, I tried this.

data %>% filter( as.integer(region)==2)

But the code above filters data by the value 3 (or label "Z+"), and not the original value. It filters by the order in which I declared the factors.
So, how can I access to the original integer values in order to filter by "2" ?
It looks silly my question, but the factors are usefull when I export data to excel, so I need them.
Thanks for your replies and interest.
Thanks a lot.

By setting the levels argument to c(1,3,2) you are assigning a value of 2 to the elements of data$region that contain 3. The easy fix is to change the ordering of that argument, as shown below.

library(dplyr, warn.conflicts = FALSE)
#Current version of factro()
data <- data.frame(region = c(1,2,3,2,3,1))
data$region=factor(data$region, levels=c(1,3,2 ), labels=c("a","Z+","koko"))

filter(data, as.numeric(region) == 2)
#>   region
#> 1     Z+
#> 2     Z+

levels(data$region)
#> [1] "a"    "Z+"   "koko"
as.numeric(data$region)
#> [1] 1 3 2 3 2 1

#Change the levels argument
data <- data.frame(region = c(1,2,3,2,3,1))
data$region=factor(data$region, levels=c(1,2,3), labels=c("a", "koko","Z+"))

filter(data, as.numeric(region) == 2)
#>   region
#> 1   koko
#> 2   koko

levels(data$region)
#> [1] "a"    "koko" "Z+"
as.numeric(data$region)
#> [1] 1 2 3 2 3 1

Created on 2020-09-10 by the reprex package (v0.3.0)

2 Likes

So, what you specified is I can't perform what I need without altering the factor option.I thought there were some kind of trick in order to write It directly that step when I tried to filter.
If It is so, can you confirm that?
Is great to know that info.
Thanks again for your help.

There may be a way to do what you want just using the filter step, but I do not know what it is. I certainly do not claim to know everything about the tidyverse. Maybe someone else will have a suggestion.

I'm afraid, as soon as you run this line:

data$region=factor(data$region, levels=c(1,3,2 ), labels=c("a","Z+","koko")

the original values of region are lost forever, and the content of the levels= argument is forgotten.

But why don't you keep the original values in the data? You can always remove them just before saving:

data <- data.frame(region = c(1,2,3,2,3,1))

data %>%
  mutate(region_fct = factor(data$region, levels=c(1,3,2 ), labels=c("a","Z+","koko"))) %>%
  filter(region == 2) %>%
  select(-region) %>%
  rename(region = region_fct) %>%
  write_csv("my_file.csv")

Another possibility: how do you chose the order in levels=c(1,3,2)? I imagine you're not making up this order. You could probably use the data that gave you this order as a look-up table.

Can you justify why the params you pass to factor are "set in stone" and must be worked around...?
following FJCC example, its possible to simplify and omit the levels argument all together and get the intended behaviour, because FJCC is explicitly stating the default behaviour for the levels param, so its

I think it might just be a case of you trying to do 'to much' when with less code you can achieve your effect. 
you know the label you want for each level, therefore when you inform the labels to use, state them in the order that relates to the region level values...
```r
library(dplyr, warn.conflicts = FALSE)
#Current version of factro()
data <- data.frame(region = c(1,2,3,2,3,1))
# passing the labels using an ordering that corresponds to the intended integer values seen in the data originally
data$region=factor(data$region, labels=c("a","koko","Z+")) 

data %>% filter( as.integer(region)==2)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.