I am trying to subtract values of different observations of a variable (table), based on their rank variable (rank), which reflects their position within groups of a third variable (color). What I need to do is subtract the values of the first and second ranked table values in each color group. The code I tried deleted all observations from the df. I'm not really sure where to go next, if anyone can help I'd be really grateful.
(d <- head(ggplot2::diamonds))
# A tibble: 6 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
##I then create a rank variable to hold the rank position of table in each color:
d <- d %>%
arrange(color, table) %>%
group_by(color) %>%
mutate(rank = rank(order(table, decreasing = TRUE)))
##The code that deletes all observations in the df:
d <- d %>% filter(rank == 1 & rank == 2) %>%
mutate(diff = table - lag(table, order_by = rank))
If you want your filtering to work you need to give it a valid condition. See below:
d <- d %>% filter(rank == 1 | rank == 2) %>%
mutate(diff = table - lag(table, order_by = rank))
No value will have 1 and 2 as rank at the same time so it will filter out everything.
Then in terms of your other code you will need to make a condition that color is only valid when it at least has 2 values associated with that color. Otherwise you will have a problem with some who only have single. You will also have unfortunate overlap so you should always calculate when it should occur.
A bit of code that might help you as well is something like this. You can essentially create a mutate with conditionals. If rank 1 or 2 then it will perform depth - lag(depth) for diff otherwise it will just have 0. You can adjust this accordingly to work for your problem (more than 2 levels etc).
Thank you for your input! The valid condition is something that slipped by, good that you noticed right away. Though I still had a problem when running the code you provided. I got this error message back:
rlang::last_trace()
<error/dplyr:::mutate_error>
Problem with `mutate()` input `diff`.
x comparação (1) é possível apenas para tipos lista ou atômicos
i Input `diff` is `case_when(...)`.
Backtrace:
x
1. +-`%>%`(...)
2. +-dplyr::mutate(...)
3. +-dplyr:::mutate.data.frame(...)
4. | \-dplyr:::mutate_cols(.data, ...)
5. | +-base::withCallingHandlers(...)
6. | \-mask$eval_all_mutate(dots[[i]])
7. +-dplyr::case_when(...)
8. | \-rlang::eval_tidy(pair$lhs, env = default_env)
9. \-base::.handleSimpleError(...)
10. \-dplyr:::h(simpleError(msg, call))
Sorry for the Portuguese in there, but it translates to something like 'comparison (1) is only possible for lists or atomic vectors"