Create subset of a dataset

Hello, we need to create a subset of a dataset on R by choosing only two labels from a column of 9 labels. In our new data frame the 9 labels exist, even though 7 have no rows, we would like them not to exist. Does anyone know how to do this?
Thanks for your help :slight_smile:

Hello,

Welcome to RStudio Community.

It would be helpful if you were to provide a "reproducible example", detailed here:

FAQ: What's a reproducible example (reprex) and how do I create one?

I believe you are talking about filtering. Here is an example using dplyr:

library(tidyverse)

diamonds
#> # A tibble: 53,940 x 10
#>    carat cut       color clarity depth table price     x     y     z
#>    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#>  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
#>  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
#>  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
#>  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
#>  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
#>  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
#>  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
#>  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
#>  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
#> 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
#> # ... with 53,930 more rows

diamonds$cut %>% unique()
#> [1] Ideal     Premium   Good      Very Good Fair     
#> Levels: Fair < Good < Very Good < Premium < Ideal

best_diamonds = diamonds %>%
  filter(cut %in% c("Ideal", "Premium"))

best_diamonds
#> # A tibble: 35,342 x 10
#>    carat cut     color clarity depth table price     x     y     z
#>    <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#>  1  0.23 Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
#>  2  0.21 Premium E     SI1      59.8    61   326  3.89  3.84  2.31
#>  3  0.29 Premium I     VS2      62.4    58   334  4.2   4.23  2.63
#>  4  0.23 Ideal   J     VS1      62.8    56   340  3.93  3.9   2.46
#>  5  0.22 Premium F     SI1      60.4    61   342  3.88  3.84  2.33
#>  6  0.31 Ideal   J     SI2      62.2    54   344  4.35  4.37  2.71
#>  7  0.2  Premium E     SI2      60.2    62   345  3.79  3.75  2.27
#>  8  0.32 Premium E     I1       60.9    58   345  4.38  4.42  2.68
#>  9  0.3  Ideal   I     SI2      62      54   348  4.31  4.34  2.68
#> 10  0.24 Premium I     VS1      62.5    57   355  3.97  3.94  2.47
#> # ... with 35,332 more rows

Created on 2021-12-16 by the reprex package (v2.0.1)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.