Create dataset based on observations

Hello, Can someone please help me know hoe i can make a new dataset, based on conditions.
I want new dataset based on product - Russian banana and Vermillion, but don't know how to adjust one more condition here:

df <- read.csv("F:/SASUniversityEdition/myfolders/jar.csv", stringsAsFactors = F)
colnames(df)[1] = "assessor"
#subset of 2 samples
newdata <- df[ which(df$product=='Russian banana' ), ]

Created on 2020-09-25 by the reprex package (v0.3.0)
Here is dataset:

squads <-tibble::tribble(
           ~assessor,         ~product, ~category,
                  1L, "Russian banana",        2L,
                  1L,     "Vermillion",        1L,
                  1L,       "Atlantic",        1L,
                  1L,    "POR12PG28-3",        2L,
                  1L,         "Valery",        1L,
                  1L,   "Rio colorado",        1L,
                  1L,     "CO99076-6R",        2L,
                  1L, "Purple majesty",        2L,
                  1L,   "AC99330-1P/Y",        2L,
                  1L,    "CO05068-1RU",        1L,
                  1L,     "Masquerade",        2L,
                  1L,   "Canela ruset",        2L,
                  2L, "Russian banana",        3L,
                  2L,     "Vermillion",        3L,
                  2L,       "Atlantic",        2L
           )
head(squads)
#> # A tibble: 6 x 3
#>   assessor product        category
#>      <int> <chr>             <int>
#> 1        1 Russian banana        2
#> 2        1 Vermillion            1
#> 3        1 Atlantic              1
#> 4        1 POR12PG28-3           2
#> 5        1 Valery                1
#> 6        1 Rio colorado          1

Created on 2020-09-25 by the reprex package (v0.3.0)

Hello @sharmachetan,

The easiest way to accomplish it woudl be as follow. As you can see I am just taking your dataframe which I have called and then I "pipe" it into the filter command where it then compares product for either that condition or the other. Other logical expressions work too. Let me know if this solves your problem?

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <-tibble::tribble(
  ~assessor,         ~product, ~category,
  1L, "Russian banana",        2L,
  1L,     "Vermillion",        1L,
  1L,       "Atlantic",        1L,
  1L,    "POR12PG28-3",        2L,
  1L,         "Valery",        1L,
  1L,   "Rio colorado",        1L,
  1L,     "CO99076-6R",        2L,
  1L, "Purple majesty",        2L,
  1L,   "AC99330-1P/Y",        2L,
  1L,    "CO05068-1RU",        1L,
  1L,     "Masquerade",        2L,
  1L,   "Canela ruset",        2L,
  2L, "Russian banana",        3L,
  2L,     "Vermillion",        3L,
  2L,       "Atlantic",        2L
)

df_filter <- df %>% filter(product == "Russian banana" | product == "Vermillion")

df_filter
#> # A tibble: 4 x 3
#>   assessor product        category
#>      <int> <chr>             <int>
#> 1        1 Russian banana        2
#> 2        1 Vermillion            1
#> 3        2 Russian banana        3
#> 4        2 Vermillion            3

Created on 2020-09-25 by the reprex package (v0.3.0)

1 Like

@GreyMerchant Yes, it served the purpose. Thanks!
I am trying to learn R, so can you please still tell me if how i can add one more category into the original code.

So you can do something like a triple like the below:

df_filter <- df %>% filter(product == "Russian banana" | product == "Vermillion" | product == "Canela ruset")

You can also do something like this as to say "not equal" to Russian banana

df_filter <- df %>% filter(product != "Russian banana")

You can also do others such as "and" (specified with &) and then also my favourite %in% which means any which is within your set.

Have a look here: https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/filter

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.