library(dplyr)
set.seed(123)
# Toy data
weight_actual_a = rnorm(50, mean = 10, sd = 4)
weight_actual_b = rnorm(50, mean = 4, sd = 2)
weight_expected_a = rnorm(50, mean = 10, sd = 1)
weight_expected_b = rnorm(50, mean = 4, sd = .2)
df <- tibble(
product = rep(c("A", "B"), each = 50),
weight_actual = c(weight_actual_a, weight_actual_b),
weight_expected = c(weight_expected_a, weight_expected_b)
) %>%
# difference between actual and expected weights
mutate(difference = weight_expected - weight_actual)
head(df)
#> # A tibble: 6 x 4
#> product weight_actual weight_expected difference
#> <chr> <dbl> <dbl> <dbl>
#> 1 A 7.76 9.29 1.53
#> 2 A 9.08 10.3 1.18
#> 3 A 16.2 9.75 -6.48
#> 4 A 10.3 9.65 -0.630
#> 5 A 10.5 9.05 -1.47
#> 6 A 16.9 9.95 -6.91
tail(df)
#> # A tibble: 6 x 4
#> product weight_actual weight_expected difference
#> <chr> <dbl> <dbl> <dbl>
#> 1 B 6.72 3.74 -2.98
#> 2 B 2.80 4.40 1.60
#> 3 B 8.37 4.12 -4.25
#> 4 B 7.07 3.75 -3.32
#> 5 B 3.53 3.88 0.349
#> 6 B 1.95 3.76 1.82
Based on the variable difference
and by taking the product
group into account, how do I mutate a variable, say, outlier
, which will take the value "yes" if the corresponding value is an outlier otherwise "no"?