Greetings RStudio Community:
I have a data frame of x and y coordinates representing baseball pitch locations (
df_2). I also have a reference data frame containing a
region label as well as the corresponding
ymax region parameters (
df_1). I'm trying to apply the value of
df_2$y are between
I can get the code to run using a nasty series of nested ifelse statements, but ideally the solution would be much faster and more elegant. I’ve tried using purrr and a for loop to no avail.
# Objective: # Match x and y in df_2 with corresponding region number in df_1 library(tidyverse) # df_1: region labels and coordinates load(url("http://aaronbaggett.com/data/df_1.Rda")) # df_2: x and y coordinates load(url("http://aaronbaggett.com/data/df_2.Rda")) # Attempt 1: Using purrr df_2 %>% mutate(region = map2_dbl(x, y, ~df_1$region[.x >= df_1$xmin & .x <= df_1$xmax & .y >= df_1$ymin & .y <= df_1$ymax])) #> Error in mutate_impl(.data, dots): Evaluation error: Result 52 is not a length 1 atomic vector. df_2[52, ] #> # A tibble: 1 x 3 #> region x y #> <dbl> <dbl> <dbl> #> 1 0 -0.0200 1.83
One potential problem with the
df_1 region parameters is that when a pitch is directly over one of the borders (see blue lines in the figure below), the function isn't sure to which
region those pitch coordinates should be assigned. For example,
df_2[52, ] could be in either region 27 or 21. The output snippet below is what
df_2 should look like after the iteration.
df_2 #> # A tibble: 100 x 3 #> region x y #> <dbl> <dbl> <dbl> #> 1 25 -1.37 1.42 #> 2 28 0.405 1.21 #> 3 31 -1.37 0.682 #> 4 36 1.58 0.912 #> 5 10 0.304 3.50 #> 6 14 -0.906 3.03 #> 7 23 0.620 2.41 #> 8 9 -0.202 3.38 #> 9 14 -0.987 2.93 #> 10 8 -1.02 3.77 #> # ... with 90 more rows
Any help is appreciated.