Calculate Percent of Total for Values greater than a certain value

Hi,

I have what seems like a simple question.

I am trying to calculate the relative percent of values greater than a certain value within a column of numbers.

My code is below. I first was able to get the correct number of samples >0.04 using tally. And I can tally for total number of samples >0. In this case, since I only have four groups, I can use a calculator and figure out the %. But, I am sure there must be a more efficient method.
Thank you.

exceed<- moor4 %>%
  filter(code == "tp") %>%
  group_by(station) %>%
  tally(adj2>0.04)
#> Error in moor4 %>% filter(code == "tp") %>% group_by(station) %>% tally(adj2 > : could not find function "%>%"
exceed
#> Error in eval(expr, envir, enclos): object 'exceed' not found

exceed2<- moor4 %>%
  filter(code == "tp") %>%
  group_by(station) %>%
  tally(adj2>0)/tally(adj2>0.04)
#> Error in moor4 %>% filter(code == "tp") %>% group_by(station) %>% tally(adj2 > : could not find function "%>%"
exceed2 
#> Error in eval(expr, envir, enclos): object 'exceed2' not found

Created on 2021-08-25 by the reprex package (v2.0.1)

What does moor4 look like?

Hi @Craigdux,
Those errors are occurring because you haven't loaded the {tidyverse} packages - your code can't find the %>% pipe function.

@DavoWW I have been having issues with tidyverse.

But, I loaded tidyverse library, and used dplyr:: in front of the pipes. Appears to work, but my math is off?

library(tidyverse)
exceed<- moor4 %>%
  dplyr::filter(code == "tp") %>%
  dplyr::group_by(station) %>%
  tally(adj2>0.04)
#> Error in dplyr::filter(., code == "tp"): object 'moor4' not found
exceed
#> Error in eval(expr, envir, enclos): object 'exceed' not found

exceed2<- moor4 %>%
  dplyr::filter(code == "tp") %>%
  dplyr::group_by(station) %>%
  tally(adj2>0)/tally(adj2>0.04)
#> Error in dplyr::filter(., code == "tp"): object 'moor4' not found
exceed2 
#> Error in eval(expr, envir, enclos): object 'exceed2' not found

Created on 2021-08-26 by the reprex package (v2.0.1)

@williaml
moor4 has several columns including:
Factor: station (4 of these)
Factor: code (16 of these)
Date: date
Numeric: adj2

A subset of the data is shown below:

sum(moor4)
#> Error in eval(expr, envir, enclos): object 'moor4' not found
tibble::tribble(
  ~station,       ~date,  ~code, ~adj2,
     "MB1", "10/7/2008", "chla",   6.9,
     "MB3", "10/7/2008", "chla",   3.7,
     "MB4", "10/7/2008", "chla",   7.5,
     "MB3", "10/7/2008", "chla",   5.9,
     "MB2", "10/7/2008", "chla",   5.9,
     "MB1", "11/6/2008", "chla",   1.5,
     "MB3", "11/6/2008", "chla",   1.5,
     "MB4", "11/6/2008", "chla",   1.5,
     "MB2", "11/6/2008", "chla",   1.5,
     "MB3", "11/6/2008", "chla",   1.5,
     "MB2", "12/4/2008", "chla",   1.5,
     "MB3", "12/4/2008", "chla",   1.5,
     "MB2", "12/4/2008", "chla",   1.5,
     "MB4", "12/4/2008", "chla",   3.2,
     "MB1", "12/4/2008", "chla",   5.9,
     "MB3", "1/13/2009", "chla",   1.5,
     "MB4", "1/13/2009", "chla",   1.5,
     "MB1", "1/13/2009", "chla",  10.1,
     "MB3", "1/13/2009", "chla",   1.5,
     "MB2", "1/13/2009", "chla",   3.7,
     "MB2", "2/12/2009", "chla",   1.5,
     "MB2", "2/12/2009", "chla",   1.5,
     "MB3", "2/12/2009", "chla",   1.5,
     "MB4", "2/12/2009", "chla",   1.5,
     "MB1", "2/12/2009", "chla",   1.5,
     "MB2", "3/12/2009", "chla",   1.5,
     "MB4", "3/12/2009", "chla",   1.5,
     "MB4", "3/12/2009", "chla",   1.5,
     "MB4", "3/12/2009", "chla",   1.5,
     "MB1", "3/12/2009", "chla",   1.5,
     "MB2", "3/12/2009", "chla",   1.5,
     "MB3", "3/12/2009", "chla",   1.5
  )
#> # A tibble: 32 x 4
#>    station date      code   adj2
#>    <chr>   <chr>     <chr> <dbl>
#>  1 MB1     10/7/2008 chla    6.9
#>  2 MB3     10/7/2008 chla    3.7
#>  3 MB4     10/7/2008 chla    7.5
#>  4 MB3     10/7/2008 chla    5.9
#>  5 MB2     10/7/2008 chla    5.9
#>  6 MB1     11/6/2008 chla    1.5
#>  7 MB3     11/6/2008 chla    1.5
#>  8 MB4     11/6/2008 chla    1.5
#>  9 MB2     11/6/2008 chla    1.5
#> 10 MB3     11/6/2008 chla    1.5
#> # ... with 22 more rows

Created on 2021-08-26 by the reprex package (v2.0.1)

1 Like

This isn't particularly clean, and your example data doesn't really show the the values in a good way, but you could join. I am sure there is a better way of doing it.

left_join(moor4 %>%
            # filter(code == "tp") %>% # nothing to filter by
            group_by(station) %>%
            tally(adj2 > 2),
          moor4 %>%
            # filter(code == "tp") %>% 
            group_by(station) %>%
            tally(adj2 > 7) %>% 
            rename(n2 = n)
          ) %>% 
  mutate(prop = n/n2)

to: @williaml this worked great! I had to flip the n2 and n around, and I was able to multiply by 100 to get the results into a percentage format. (I also found I have something funky going on with my data!).

Thanks again. Here is my modification:

left_join(moor4 %>%
             filter(code == "tp") %>% # 
            group_by(station) %>%
            tally(adj2 > 0),
          moor4 %>%
            filter(code == "tp") %>% 
            group_by(station) %>%
            tally(adj2 > 0.04) %>% 
            rename(n2 = n)
) %>% 
  mutate(prop = (n2/n)*100)
#> Error in left_join(moor4 %>% filter(code == "tp") %>% group_by(station) %>% : could not find function "%>%"

Created on 2021-08-26 by the reprex package (v2.0.1)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.