Hey,
I have a data set with amounts of yield, 6 different crops, 12 different irrigation scenarios and other variables.
I would like to remove the outliers in the amount of yield for each crop in each irrigation scenario.
My idea was to list grouped over irrigation, loop through the list and combine the resulting df afterwards.
I have tried various ways and none of them work.
The code below works for each irrigation scenario individually but I don't know how to make a loop over all 12 of them.
Thank you for your help!
library(tidyverse)
#make example data
a1 <- purrr::map_dfr(1:100,~expand.grid(
crops=letters[1:6],
irig =LETTERS[1:12]
))
set.seed(42)
a1$yield <- runif(nrow(a1),0,1000)
(start_df <- tibble(a1) %>% mutate(row_id = row_number()))
#pick values to force to make extreme
(force_make_outlier <- sort(sample(seq_len(nrow(a1)),size=50,replace=FALSE)))
start_df$yield[force_make_outlier] <- (100+start_df$yield[force_make_outlier])^3
# having made example data, here is a solution, note the use of row_id which we made in the data prep
# use boxplot to detect and then eliminate outliers within each crops, irig combination groups
(b2 <- start_df %>% group_by(crops,irig) %>% summarise(outlier_row_ids = list(row_id[which(yield %in% boxplot(yield,plot=FALSE)$out)])))
# peek at where there is an outlier in a group
(c2 <- filter(rowwise(b2),length(outlier_row_ids)>0))
# attach the outlier values to the full set to make it easy to filter
(d2 <- left_join(start_df,b2))
(end_df <- filter(rowwise(d2),
! row_id %in% outlier_row_ids) %>% select(-outlier_row_ids))
#for checking
(removed_rows <- setdiff(start_df$row_id,end_df$row_id))
#check
setdiff(removed_rows,force_make_outlier)
setdiff(force_make_outlier,removed_rows)
Thank you so much!
The code works perfectly when I run it with the example data,
but when I try it on my big dataset it says:
Fehler: Speicher erschöpft (Limit erreicht?)
Fehler während wrapup: Speicher erschöpft (Limit erreicht?)
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
It tells me my storage is full?
Do you have any idea where that could come from?