Selecting Specific Months

Hello again!

I have a large dataset (subset here) and I need to make three histograms:

  1. all of the data
  2. Data from June through September (all years)
    3 Data from October through May

I am able to use filter (dpylr) to select years, but I have not been able to figure out filtering ranges of months.

I have copied a subset of my data here. "Result" is obviously the value. I apologize if data are not in correct format. Please advise if not.

Thank you.

Date Parameter Result
15-Nov-17 nitrogen 7.97
5-Apr-18 nitrogen 7.47
4-Apr-19 nitrogen 11.63
16-Nov-17 nitrogen 1.14
6-Apr-18 nitrogen 1
21-Jun-18 nitrogen 0.992
30-Aug-18 nitrogen 1.1
6-Dec-18 nitrogen 1.2
27-Mar-19 nitrogen 1.12
4-Apr-19 nitrogen 1.124
25-Jul-19 nitrogen 1.24
17-Oct-19 nitrogen 1.25
19-Nov-19 nitrogen 1.528
23-Oct-19 nitrogen 1.217
15-Nov-17 nitrogen 0.17
5-Apr-18 nitrogen 0.1
16-Nov-17 nitrogen 0.083
6-Apr-18 nitrogen 0.09
21-Jun-18 nitrogen 0.36
30-Aug-18 nitrogen 0.17
6-Dec-18 nitrogen 0.09
27-Mar-19 nitrogen 0.09
25-Jul-19 nitrogen 0.09
17-Oct-19 nitrogen 0.12
15-Nov-17 nitrogen 8.14
5-Apr-18 nitrogen 7.56
4-Apr-19 nitrogen 11.785
16-Nov-17 nitrogen 1.14
6-Apr-18 nitrogen 1
21-Jun-18 nitrogen 0.99
30-Aug-18 nitrogen 1.27
6-Dec-18 nitrogen 1.2
27-Mar-19 nitrogen 1.12
4-Apr-19 nitrogen 1.173
25-Jul-19 nitrogen 1.24
17-Oct-19 nitrogen 1.36
19-Nov-19 nitrogen 1.765
23-Oct-19 nitrogen 1.381

This is my simplified R code. Having a problem with the date selections...

plot1 <- Data1 %>%
filter(Data1$Date == months= )%>%
ggplot(x=Result)+
geom_histogram()
plot1

I'm sure this isn't the most concise way to do it but the way I've approached problems like this in the past is by creating dummy variables for example

Data1$Summer <- ifelse(grepl("Jun", Data1$Date) | grepl("Jul", Data1$Date) |
                         grepl("Aug", Data1$Date) | grepl("Sep", Data1$Date), 1, 0)

Then do the same for Data1$Winter

Then you can use filter() to create three separate dataframes to create histograms of.

I'll be keeping an eye on the replies for more efficient ways of doing this.

@Erica and @andresrcs

Thanks for you help! I looking forward to diving into this.

Craig

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

This is simple if you transform your Date variable into an actual date variable instead of a character variable.

library(dplyr)
library(lubridate)

# Sample data on a copy/paste friendly format
sample_df <- data.frame(
  stringsAsFactors = FALSE,
              Date = c("15-Nov-17","5-Apr-18",
                       "4-Apr-19","16-Nov-17","6-Apr-18","21-Jun-18","30-Aug-18",
                       "6-Dec-18","27-Mar-19","4-Apr-19","25-Jul-19",
                       "17-Oct-19","19-Nov-19","23-Oct-19","15-Nov-17","5-Apr-18",
                       "16-Nov-17","6-Apr-18","21-Jun-18","30-Aug-18",
                       "6-Dec-18","27-Mar-19","25-Jul-19","17-Oct-19","15-Nov-17",
                       "5-Apr-18","4-Apr-19","16-Nov-17","6-Apr-18",
                       "21-Jun-18","30-Aug-18","6-Dec-18","27-Mar-19","4-Apr-19",
                       "25-Jul-19","17-Oct-19","19-Nov-19","23-Oct-19"),
         Parameter = c("nitrogen","nitrogen",
                       "nitrogen","nitrogen","nitrogen","nitrogen","nitrogen",
                       "nitrogen","nitrogen","nitrogen","nitrogen","nitrogen",
                       "nitrogen","nitrogen","nitrogen","nitrogen","nitrogen",
                       "nitrogen","nitrogen","nitrogen","nitrogen",
                       "nitrogen","nitrogen","nitrogen","nitrogen","nitrogen",
                       "nitrogen","nitrogen","nitrogen","nitrogen","nitrogen",
                       "nitrogen","nitrogen","nitrogen","nitrogen","nitrogen",
                       "nitrogen","nitrogen"),
            Result = c(7.97,7.47,11.63,1.14,1,
                       0.992,1.1,1.2,1.12,1.124,1.24,1.25,1.528,1.217,0.17,
                       0.1,0.083,0.09,0.36,0.17,0.09,0.09,0.09,0.12,
                       8.14,7.56,11.785,1.14,1,0.99,1.27,1.2,1.12,1.173,
                       1.24,1.36,1.765,1.381)
)

sample_df %>% 
    mutate(Date = dmy(Date)) %>% 
    filter(month(Date) %in% 6:9)
#>         Date Parameter Result
#> 1 2018-06-21  nitrogen  0.992
#> 2 2018-08-30  nitrogen  1.100
#> 3 2019-07-25  nitrogen  1.240
#> 4 2018-06-21  nitrogen  0.360
#> 5 2018-08-30  nitrogen  0.170
#> 6 2019-07-25  nitrogen  0.090
#> 7 2018-06-21  nitrogen  0.990
#> 8 2018-08-30  nitrogen  1.270
#> 9 2019-07-25  nitrogen  1.240

Created on 2020-05-15 by the reprex package (v0.3.0)

1 Like