trouble excluding rows using crunch package

Hi All,
I'm trying to use the crunch package to exclude certain rows from my dataset. I found a simple code online and tried to make it work with my data but I'm getting an error message. The original online solution was:

dim(ds)
exclusion(ds) <-ds$perc_skipped > 15
exclusion(ds)
dim(ds)

I tried:

library(crunch)
dim(all_data_merged)
exclusion(all_data_merged) <- all_data_merged$completeness > 1
exclusion(all_data_merged)
dim(all_data_merged)

I get the error message: Error in exclusion(all_data_merged) : is.dataset(x) is not TRUE

I don't know what this means, it recognizes my dataset and executes the dim(all_data_merged) command with no trouble.

My goal is to be able to run summary statistics on my data without the rows that have the value of 1 in the completeness column. I don't want to delete those rows because I will need to run some statistics on them separately.

Would it be simpler to use the dplyr(filter) function?

Any help would be greatly appreciated!

Yes, definitely.

If you still need help with this, please provide a minimal REPRoducible EXample (reprex) ilustrating your issue. A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

Thanks Andre,
I still need some help. I tried the following code to exclude rows with a value of 1 in the completeness column but I'm not sure it worked. It stated that 3041 rows were omitted but I have 3091 rows with a value of 1 in the completeness column. I'm also unsure of how to manipulate the filtered data after I get it sorted. Thanks for your patience, I'm just getting started with R.

library(dplyr)
filter(all_data_merged,!(completeness >1))

site Weight Completeness termination
4 7.76 2 2
4 8.44 2 3
4 7.67 1 0
4 7.51 2 1
4 10.19 4 5
4 8.47 1 0
4 4.81 1 0
4 4.88 2 5
4 5.46 4 5

To get you going with both reprex and dplyr::filter, here is a small example of how it works (there are more examples in documentation):

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

mtcars <- tibble::as_tibble(mtcars)
mtcars
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

mtcars %>%
  dplyr::filter(cyl > 4)
#> # A tibble: 21 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  5  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  6  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  7  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # … with 11 more rows

res <- mtcars %>%
  dplyr::filter(cyl > 4)

Created on 2019-08-12 by the reprex package (v0.3.0)

As you can see, after I've used dplyr::filter, only rows with cyl > 4 are left. However, after applying the transformation, you still need to save it to an object to continue working with it.

Let us know if that helps.

This is what I understand from what you are saying, you want to "exclude rows with a value of 1 in the completeness column" and "run summary statistics on my data without the rows that have the value of 1 in the completeness column".

Then this would do the job

# Details about the used libraries
library(dplyr)

# Sample data on a copy/paste friendly format
all_data_merged <- data.frame(
           site = c(4, 4, 4, 4, 4, 4, 4, 4, 4),
         Weight = c(7.76, 8.44, 7.67, 7.51, 10.19, 8.47, 4.81, 4.88, 5.46),
   Completeness = c(2, 2, 1, 2, 4, 1, 1, 2, 4),
    termination = c(2, 3, 0, 1, 5, 0, 0, 5, 5)
)

# Relevant code
all_data_merged %>% 
    filter(Completeness != 1) %>% 
    summary()
#>       site       Weight        Completeness    termination  
#>  Min.   :4   Min.   : 4.880   Min.   :2.000   Min.   :1.00  
#>  1st Qu.:4   1st Qu.: 5.973   1st Qu.:2.000   1st Qu.:2.25  
#>  Median :4   Median : 7.635   Median :2.000   Median :4.00  
#>  Mean   :4   Mean   : 7.373   Mean   :2.667   Mean   :3.50  
#>  3rd Qu.:4   3rd Qu.: 8.270   3rd Qu.:3.500   3rd Qu.:5.00  
#>  Max.   :4   Max.   :10.190   Max.   :4.000   Max.   :5.00

Note: Please notice the way I'm sharing the sample data and code, that would be the proper way of posting a reproducible example.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.