dplyr::filter function turns no result

sirius1170 · November 26, 2019, 8:27pm

Hi everybody,

I would like to ask for help with the function filter.
I have data like that

data1<-data.frame(
  closed_price = c(49900L, 46600L, 46900L, 45200L, 45100L, 45600L,
                   46500L, 46100L, 46100L, 45800L),
  opened_price = c(51000L, 49500L, 49500L, 48000L, 45200L, 45100L,
                   47500L, 46100L, 46500L, 46900L),
  adjust_closed_price = c(12951L, 12095L, 12173L, 11731L, 11705L, 11835L,
                          12069L, 11965L, 11965L, 11887L),
  stock = as.factor(c("AAA", "AAA", "AAA", "AAA", "AAA", "AAA",
                      "AAC", "AAC", "AAC", "AAC")),
  date3 = as.factor(c("2010-07-15", "2010-07-16", "2011-07-19",
                      "2011-07-20", "2012-07-21", "2012-07-22",
                      "2011-07-23", "2011-07-26", "2012-07-27",
                      "2012-07-28")),
  code = as.factor(c("AAA2010", "AAA2010", "AAA2011",
                     "AAA2011", "AAA2012", "AAA2012",
                     "AAC2011", "AAC2011", "AAC2012", "AAC2012"))
)
data2<-data.frame(
  code = as.factor(c("AAA2010", "AAC2011", "ABT2006", "ACC2009",
                     "ACL2007", "ACM2015", "ADC2010", "ADS2016",
                     "AGF2002", "AGM2012")),
  ticker = as.factor(c("AAA", "AAM", "ABT", "ACC", "ACL", "ACM", "ADC",
                       "ADS", "AGF", "AGM"))
)

I would like to filter from data1 to data2 by the same variable "code".
I used this code but it turns no results.

price_code<-data1%>% 
  filter(code %in% data2)

I appreciate any help.
Thank you in advance.

cderv · November 26, 2019, 9:12pm

I think you could be interested in semi_join from dplyr

https://dplyr.tidyverse.org/reference/join.html

semi_join()
return all rows from x where there are matching values in y , keeping just columns from x . A semi join differs from an inner join because an inner join will return one row of x for each matching row of y , where a semi join will never duplicate rows of x .

data1<-data.frame(
  closed_price = c(49900L, 46600L, 46900L, 45200L, 45100L, 45600L,
                   46500L, 46100L, 46100L, 45800L),
  opened_price = c(51000L, 49500L, 49500L, 48000L, 45200L, 45100L,
                   47500L, 46100L, 46500L, 46900L),
  adjust_closed_price = c(12951L, 12095L, 12173L, 11731L, 11705L, 11835L,
                          12069L, 11965L, 11965L, 11887L),
  stock = as.factor(c("AAA", "AAA", "AAA", "AAA", "AAA", "AAA",
                      "AAC", "AAC", "AAC", "AAC")),
  date3 = as.factor(c("2010-07-15", "2010-07-16", "2011-07-19",
                      "2011-07-20", "2012-07-21", "2012-07-22",
                      "2011-07-23", "2011-07-26", "2012-07-27",
                      "2012-07-28")),
  code = as.factor(c("AAA2010", "AAA2010", "AAA2011",
                     "AAA2011", "AAA2012", "AAA2012",
                     "AAC2011", "AAC2011", "AAC2012", "AAC2012"))
)
data2<-data.frame(
  code = as.factor(c("AAA2010", "AAC2011", "ABT2006", "ACC2009",
                     "ACL2007", "ACM2015", "ADC2010", "ADS2016",
                     "AGF2002", "AGM2012")),
  ticker = as.factor(c("AAA", "AAM", "ABT", "ACC", "ACL", "ACM", "ADC",
                       "ADS", "AGF", "AGM"))
)

library(dplyr, warn.conflicts = FALSE)

data1 %>%
  semi_join(data2, by = "code")
#> Warning: Column `code` joining factors with different levels, coercing to
#> character vector
#>   closed_price opened_price adjust_closed_price stock      date3    code
#> 1        49900        51000               12951   AAA 2010-07-15 AAA2010
#> 2        46600        49500               12095   AAA 2010-07-16 AAA2010
#> 3        46500        47500               12069   AAC 2011-07-23 AAC2011
#> 4        46100        46100               11965   AAC 2011-07-26 AAC2011

^{Created on 2019-11-26 by the reprex package (v0.3.0)}

Let's note the warning: You should use characters instead of factors or use forcats to get some identical levels on both factors before joining

See also

sirius1170 · November 27, 2019, 10:52am

Thank you so much. I used your code and it works for me. But I still don't understand why it doesn't work with the filter function?

StatSteph · November 27, 2019, 1:50pm

You can get filter to work if you do the following:

data1<-data.frame(
  closed_price = c(49900L, 46600L, 46900L, 45200L, 45100L, 45600L,
                   46500L, 46100L, 46100L, 45800L),
  opened_price = c(51000L, 49500L, 49500L, 48000L, 45200L, 45100L,
                   47500L, 46100L, 46500L, 46900L),
  adjust_closed_price = c(12951L, 12095L, 12173L, 11731L, 11705L, 11835L,
                          12069L, 11965L, 11965L, 11887L),
  stock = as.factor(c("AAA", "AAA", "AAA", "AAA", "AAA", "AAA",
                      "AAC", "AAC", "AAC", "AAC")),
  date3 = as.factor(c("2010-07-15", "2010-07-16", "2011-07-19",
                      "2011-07-20", "2012-07-21", "2012-07-22",
                      "2011-07-23", "2011-07-26", "2012-07-27",
                      "2012-07-28")),
  code = as.factor(c("AAA2010", "AAA2010", "AAA2011",
                     "AAA2011", "AAA2012", "AAA2012",
                     "AAC2011", "AAC2011", "AAC2012", "AAC2012"))
)
data2<-data.frame(
  code = as.factor(c("AAA2010", "AAC2011", "ABT2006", "ACC2009",
                     "ACL2007", "ACM2015", "ADC2010", "ADS2016",
                     "AGF2002", "AGM2012")),
  ticker = as.factor(c("AAA", "AAM", "ABT", "ACC", "ACL", "ACM", "ADC",
                       "ADS", "AGF", "AGM"))
)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

price_code <- data1 %>%
  filter(code %in% data2$code)

price_code
#>   closed_price opened_price adjust_closed_price stock      date3    code
#> 1        49900        51000               12951   AAA 2010-07-15 AAA2010
#> 2        46600        49500               12095   AAA 2010-07-16 AAA2010
#> 3        46500        47500               12069   AAC 2011-07-23 AAC2011
#> 4        46100        46100               11965   AAC 2011-07-26 AAC2011

^{Created on 2019-11-27 by the reprex package (v0.3.0)}

filter is looking for a logical expression. The %in% function works on vectors only but you were applying it to a data.frame

system · December 4, 2019, 1:57pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.