trouble excluding rows using crunch package

hoode · August 11, 2019, 11:15pm

Hi All,
I'm trying to use the crunch package to exclude certain rows from my dataset. I found a simple code online and tried to make it work with my data but I'm getting an error message. The original online solution was:

dim(ds)
exclusion(ds) <-ds$perc_skipped > 15
exclusion(ds)
dim(ds)

I tried:

library(crunch)
dim(all_data_merged)
exclusion(all_data_merged) <- all_data_merged$completeness > 1
exclusion(all_data_merged)
dim(all_data_merged)

I get the error message: Error in exclusion(all_data_merged) : is.dataset(x) is not TRUE

I don't know what this means, it recognizes my dataset and executes the dim(all_data_merged) command with no trouble.

My goal is to be able to run summary statistics on my data without the rows that have the value of 1 in the completeness column. I don't want to delete those rows because I will need to run some statistics on them separately.

Would it be simpler to use the dplyr(filter) function?

Any help would be greatly appreciated!

andresrcs · August 11, 2019, 11:34pm

Yes, definitely.

If you still need help with this, please provide a minimal REPRoducible EXample (reprex) ilustrating your issue. A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

hoode · August 12, 2019, 5:02am

Thanks Andre,
I still need some help. I tried the following code to exclude rows with a value of 1 in the completeness column but I'm not sure it worked. It stated that 3041 rows were omitted but I have 3091 rows with a value of 1 in the completeness column. I'm also unsure of how to manipulate the filtered data after I get it sorted. Thanks for your patience, I'm just getting started with R.

library(dplyr)
filter(all_data_merged,!(completeness >1))

site	Weight	Completeness	termination
4	7.76	2	2
4	8.44	2	3
4	7.67	1	0
4	7.51	2	1
4	10.19	4	5
4	8.47	1	0
4	4.81	1	0
4	4.88	2	5
4	5.46	4	5

mishabalyasin · August 12, 2019, 8:51am

To get you going with both reprex and dplyr::filter, here is a small example of how it works (there are more examples in documentation):

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

mtcars <- tibble::as_tibble(mtcars)
mtcars
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

mtcars %>%
  dplyr::filter(cyl > 4)
#> # A tibble: 21 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  5  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  6  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  7  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # … with 11 more rows

res <- mtcars %>%
  dplyr::filter(cyl > 4)

^{Created on 2019-08-12 by the reprex package (v0.3.0)}

As you can see, after I've used dplyr::filter, only rows with cyl > 4 are left. However, after applying the transformation, you still need to save it to an object to continue working with it.

Let us know if that helps.

andresrcs · August 12, 2019, 3:57pm

This is what I understand from what you are saying, you want to "exclude rows with a value of 1 in the completeness column" and "run summary statistics on my data without the rows that have the value of 1 in the completeness column".

Then this would do the job

# Details about the used libraries
library(dplyr)

# Sample data on a copy/paste friendly format
all_data_merged <- data.frame(
           site = c(4, 4, 4, 4, 4, 4, 4, 4, 4),
         Weight = c(7.76, 8.44, 7.67, 7.51, 10.19, 8.47, 4.81, 4.88, 5.46),
   Completeness = c(2, 2, 1, 2, 4, 1, 1, 2, 4),
    termination = c(2, 3, 0, 1, 5, 0, 0, 5, 5)
)

# Relevant code
all_data_merged %>% 
    filter(Completeness != 1) %>% 
    summary()
#>       site       Weight        Completeness    termination  
#>  Min.   :4   Min.   : 4.880   Min.   :2.000   Min.   :1.00  
#>  1st Qu.:4   1st Qu.: 5.973   1st Qu.:2.000   1st Qu.:2.25  
#>  Median :4   Median : 7.635   Median :2.000   Median :4.00  
#>  Mean   :4   Mean   : 7.373   Mean   :2.667   Mean   :3.50  
#>  3rd Qu.:4   3rd Qu.: 8.270   3rd Qu.:3.500   3rd Qu.:5.00  
#>  Max.   :4   Max.   :10.190   Max.   :4.000   Max.   :5.00

Note: Please notice the way I'm sharing the sample data and code, that would be the proper way of posting a reproducible example.

system · August 30, 2019, 10:37pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.