Help with filtering with specific condition

Hello!

Trying to find Product Ids that have different Product Types in the following sample data. In this case, looking for solution with all rows where Product ID is M12

df <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
              Date = c("2022-01-01", "2022-01-01", "2022-01-01"),
           Country = c("Canada", "Canada", "USA"),
    `Product Type` = c("AFC", "FBF", "CBF"),
      `Product ID` = c("M10", "M12", "M12")
)%>%
  mutate(Date = ymd(Date))

Thanks for the help!

This is conveniently done with the filter() function from the dplyr package.

library(dplyr)

library(lubridate)
df <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Date = c("2022-01-01", "2022-01-01", "2022-01-01"),
  Country = c("Canada", "Canada", "USA"),
  `Product Type` = c("AFC", "FBF", "CBF"),
  `Product ID` = c("M10", "M12", "M12")
)%>%
  mutate(Date = ymd(Date))
dfFiltered <- df |> filter(`Product ID` == "M12")
dfFiltered
#>         Date Country Product Type Product ID
#> 1 2022-01-01  Canada          FBF        M12
#> 2 2022-01-01     USA          CBF        M12

Created on 2022-08-25 by the reprex package (v2.0.1)

Well! I am looking for solution without knowing Product ID to look at al those Ids that have different Types. I am trying to check those and find data discrepancy.

Thanks!

You could something like this:

library(tidyverse)

df |>
  group_by(`Product ID`) |>
  summarise(
    different_prodTypes = n_distinct(`Product Type`)
  ) |>
  ungroup() |>
  filter(different_prodTypes > 1)
#> # A tibble: 1 × 2
#>   `Product ID` different_prodTypes
#>   <chr>                      <int>
#> 1 M12                            2

Created on 2022-08-25 by the reprex package (v2.0.1)

This will show you all Product IDs with more than one Product Type.

Kind regards

Thanks @FactOREO!
Is there a way to see Product Types along with it. It will make it easier for us to clear discrepancies in data.

You could do something like this:

library(tidyverse)

df |>
  group_by(`Product ID`) |>
  summarise(
    different_prodTypes = n_distinct(`Product Type`),
    the_prodTypes = list(unique(`Product Type`))
  ) |>
  ungroup() |>
  filter(different_prodTypes > 1) |>
  unnest(the_prodTypes)
#> # A tibble: 2 × 3
#>   `Product ID` different_prodTypes the_prodTypes
#>   <chr>                      <int> <chr>        
#> 1 M12                            2 FBF          
#> 2 M12                            2 CBF

Created on 2022-08-25 by the reprex package (v2.0.1)

It's in long format, for wide format use a counter along the different Product Types and reshape it with e.g. pivot_wider or another function that does this and you like to use.

Kind regards

Thank you @FactOREO!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.