Filter Data by Seasonal Ranges Over Several Years Based on Month and Day

0

I am trying to filter a large dataset to contain results between a range of days and months over several years to evaluate seasonal objectives. My season is defined from 15 March through 15 September. I can't figure out how to filter the days so that they are only applied to March and September and not the other months within the range. My dataframe is very large and contains proprietary information, but I think the most important information is that the dates are describes by columns: SampleDate (date formatted as %y%m%d), day (numeric), and month (numeric).

I have tried filtering using multiple conditions like so:

S1 <- S1 %>% filter((S1$month >= 3 & S1$day >=15) , (S1$month<=9 & S1$day<=15 ))

I also attempted to set ranges using between for every year that I have data with no luck:

 S1 %>% filter(between(SampleDate, as.Date("2010-03-15"), as.Date("2010-09-15") & 
                as.Date("2011-03-15"), as.Date("2011-09-15")&
               as.Date("2012-03-15"), as.Date("2012-09-15")&
               as.Date("2013-03-15"), as.Date("2013-09-15")&
               as.Date("2014-03-15"), as.Date("2014-09-15")&
               as.Date("2015-03-15"), as.Date("2015-09-15")&
               as.Date("2016-03-15"), as.Date("2016-09-15")&
               as.Date("2017-03-15"), as.Date("2017-09-15")&
               as.Date("2018-03-15"), as.Date("2018-09-15")))

I am pretty new to R and can't find any solution online. I know there must be a somewhat simple way to do this! Any help is greatly appreciated!

Hello there,

Happy to help if you have at least some data to work with (it can just be the actual date variables and not all of it). That way at least we can see if the proposed solution will work correctly for your data. See here on reprex: FAQ: How to do a minimal reproducible example ( reprex ) for beginners

Thank you! Here is a sample of my data:
S1:
Analyte Name SampleDate day month
chemx 2012-01-09 9 1
chemx 2012-08-14 14 8
chemx 2013-05-21 21 5
chemx 2014-09-09 9 9
chemx 2010-12-28 28 12
chemx 2013-03-28 28 3
chemx 2010-10-12 12 10
chemx 2010-03-14 14 3
chemx 2010-09-16 16 9

I added the last 2 dates as they lie right outside of my bounds. There are several other columns I did not include that need to be preserved in my dataset after filtering. I really appreciate the help!

I'm not sure I understand your problem, is this what you mean?

library(dplyr)
library(lubridate)

# Sample data on a copy/paste friendly format
S1 <- data.frame(
  stringsAsFactors = FALSE,
      Analyte_Name = c("chemx","chemx","chemx",
                       "chemx","chemx","chemx","chemx","chemx","chemx"),
        SampleDate = c("2012-01-09","2012-08-14",
                       "2013-05-21","2014-09-09","2010-12-28","2013-03-28",
                       "2010-10-12","2010-03-14","2010-09-16"),
               day = c(9L, 14L, 21L, 9L, 28L, 28L, 12L, 14L, 16L),
             month = c(1L, 8L, 5L, 9L, 12L, 3L, 10L, 3L, 9L)
)

S1 %>%
    filter(SampleDate >= as.Date(paste(year(SampleDate), 03, 15, sep = "-")),
           SampleDate <= as.Date(paste(year(SampleDate), 09, 15, sep = "-")))
#>   Analyte_Name SampleDate day month
#> 1        chemx 2012-08-14  14     8
#> 2        chemx 2013-05-21  21     5
#> 3        chemx 2014-09-09   9     9
#> 4        chemx 2013-03-28  28     3

Created on 2020-10-03 by the reprex package (v0.3.0)

2 Likes

@andresrcs - this is also how I understood the problem.

@Micaela - able to confirm this is what you wanted?

Yes, this is what I meant. My apologies for not providing the proper syntax.

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.