Creating a new column based on two dates and renaming the time range

Hi! I an having a hard time translating this script into R.

I basically need to have a start and end date from the posting date column, give that range of time a new name and from that translate that time range into a new column.

  • DEFINE FIELD c_Month_to_Summarize_On COMPUTED

SUBSTRING("December 22,2018 - January 21,2019",1,50) IF Posting_Date >= 20171222 AND Posting_Date <= 20190121

SUBSTRING("January 22,2019 - February 21,2019",1,50) IF Posting_Date >= 20190122 AND Posting_Date <= 20190221

SUBSTRING("February 22,2019 - March 21,2019",1,50) IF Posting_Date >= 20190222 AND Posting_Date <= 20190321

SUBSTRING("March 22,2019 - April 21,2019",1,50) IF Posting_Date >= 20190322 AND Posting_Date <= 20190421

SUBSTRING("April 22,2019 - May 21,2019",1,50) IF Posting_Date >= 20190422 AND Posting_Date <= 20190521

SUBSTRING("May 22,2019 - June 21,2019",1,50) IF Posting_Date >= 20190522 AND Posting_Date <= 20190621

SUBSTRING("June 22,2019 - July 21,2019",1,50) IF Posting_Date >= 20190622 AND Posting_Date <= 20190721

SUBSTRING("July 22,2019 - August 21,2019",1,50) IF Posting_Date >= 20190722 AND Posting_Date <= 20190821

SUBSTRING("August 22,2019 - September 21,2019",1,50) IF Posting_Date >= 20190822 AND Posting_Date <= 20190921

SUBSTRING("September 22,2019 - October 21,2019",1,50) IF Posting_Date >= 20190922 AND Posting_Date <= 20191021

"OTHER"*

However my code when i write it out which is as below, gives me a crazy error.
t_Purchase_Card_Monthly_Summary <- pc%>%
mutate(c_Month_to_Summarize_On =("January 22,2019 - February 21,2019" = (Posting_Date >= "20190122") ,(Posting_Date<= "20190221"),
"February 22,2019 - March 21,2019" = (Posting_Date >= "20190222"), (Posting_Date<= "20190121"),
"March 22,2019 - April 21,2019" = (Posting_Date >= "20190322"), (Posting_Date<= "20190121"),
"April 22,2019 - May 21,2019" = (Posting_Date >= "20190422") , (Posting_Date<= "20190121"),
"May 22,2019 - June 21,2019" = (Posting_Date >= "20190522") , (Posting_Date<= "20190121"),
"June 22,2019 - July 21,2019" = (Posting_Date >= "20190622") , (Posting_Date<= "20190121"),
"July 22,2019 - August 21,2019" = (Posting_Date >= "20190722") ,( Posting_Date<= "20190121"),
"August 22,2019 - September 21,2019" = (Posting_Date >= "20190822") ,(Posting_Date<= "20190121"),
"September 22,2019 - October 21,2019" = (Posting_Date >= "20190922") ,( Posting_Date<= "20190121"))%>%

Looks like you need to use the case_when function from the dplyr package.

t_Purchase_Card_Monthly_Summary <- pc%>%
  mutate(c_Month_to_Summarize_On = case_when(
    (Posting_Date >= "20190122") & (Posting_Date<= "20190221") ~ "January 22,2019 - February 21,2019",
    (Posting_Date >= "20190222") & (Posting_Date<= "20190121") ~ "February 22,2019 - March 21,2019",
    etc...

I'll go ahead and try your method! I appreciate it! What other work around would you recommend if this doesn't work?

It depends on what you are trying to do. One thing is that c_Month_to_Summarize_On is going to be a string, which seems like it could be quite cumbersome to me. But I don't now what you are trying to do with it.

Actually there's probably a better way of doing it where you don't need to hardcode the dates.

so I got this error
Error in pc %>% mutate(c_Month_to_Summarize_On = case_when((Posting_Date >= :
could not find function "%>%"

I am doing this for payment information hence why I need the dates to be hard set for the most part.

did you load dplyr? This includes the %>% "pipe operator" (which is actually from the magrittr package).

library(dplyr)

dplyr is loaded. I'm not sure now if the issue lays with in my column data Posting.Date

Are you still getting the error?

Posting_Date is a string like "20200504" right?

I got the error fixed. I didn't re run the entire script.

Now I have this error.

Groups: ï..Short.Name, Posting.Date [491]

ï..Short.Name Posting.Date Month Transaction.Amount

1 ANDERSON VICTOR 4/15/2019 NA 7055.
2 ANDERSON VICTOR 4/26/2019 NA 341.
3 ANDERSON VICTOR 4/3/2019 NA 1625.
4 ANDERSON VICTOR 5/1/2019 NA 939.
5 ANDERSON VICTOR 5/10/2019 NA 749
6 ANDERSON VICTOR 5/13/2019 NA 2424.
7 ANDERSON VICTOR 5/15/2019 NA 92.8
8 ANDERSON VICTOR 5/21/2019 NA 12.5
9 ANDERSON VICTOR 5/23/2019 NA 792.
10 ANDERSON VICTOR 5/24/2019 NA 6180.

... with 481 more rows

View(t_Purchase_Card_Monthly_Summary)

I can't seem to get the c_Month_to_Summarize_On to populate as renamed column (Month)

What would you recommend? to trouble shoot that?

I did that, now the column has N/A with none of the classifications.

This is what mys script looks like right now.

mutate(c_Month_to_Summarize_On = case_when(
(Posting.Date >= "1/22/2019") & (Posting.Date<= "2/21/2019") = "January 22,2019 - February 21,2019",
(Posting.Date >= "2/22/2019") & (Posting.Date<= "3/21/2019") = "February 22,2019 - March 21,2019",
(Posting.Date >= "3/22/2019") & (Posting.Date<= "4/21/2019") = "March 22,2019 - April 21,2019",
(Posting.Date >= "4/22/2019") & (Posting.Date<= "5/21/2019") = "April 22,2019 - May 21,2019",
(Posting.Date >= "5/22/2019") & (Posting.Date<= "6/21/2019") = "May 22,2019 - June 21,2019",
(Posting.Date >= "6/22/2019") & (Posting.Date<= "7/21/2019") = "June 22,2019 - July 21,2019",
(Posting.Date >= "7/22/2019") & (Posting.Date<= "8/21/2019") = "July 22,2019 - August 21,2019",
(Posting.Date >= "8/22/2019") & (Posting.Date<= "9/21/2019") = "August 22,2019 - September 21,2019",
(Posting.Date >= "9/22/2019") & (Posting.Date<= "10/21/2019") = "September 22,2019 - October 21,2019"))%>%
group_by(ï..Short.Name, Posting.Date,Month = c_Month_to_Summarize_On)%>%

Maybe it would be worth working through this free online book to get some basic knowledge of R.

I think you can't rename a column inside a group_by()
You can do it first using rename(Month = c_Month_to_Summarize_On)

the renaming works on the group_by!

This is the main error to the case_by you shared.
Error: unexpected ')' in " (Posting.Date >= "9/22/2019") & (Posting.Date<= "10/21/2019") = "September 22,2019 - October 21,2019")"

Use ~ not = .......

I'm really sorry I'm taking up a lot of your time.

Now this is the new error.

Error: Problem with mutate() input c_Month_to_Summarize_On.
x LHS of case 1 ((Posting.Date >= "1/22/2019") & (Posting.Date <= "2/21/2019") ~ January 22,2019 - ...) must be a logical vector, not a formulaobject. i Inputc_Month_to_Summarize_Oniscase_when(...)`.

Please post the whole thing. Put three backticks first ``` then paste your code. The three backticks makes it format correctly as code.

Thank you for that tip! Please see my code below

  mutate(c_Month_to_Summarize_On = case_when(
      (Posting.Date >= "1/22/2019") & (Posting.Date<= "2/21/2019") ~ "January 22,2019 - February 21,2019"
      (Posting.Date >= "2/22/2019") & (Posting.Date<= "3/21/2019") ~ "February 22,2019 - March 21,2019"
      (Posting.Date >= "3/22/2019") & (Posting.Date<= "4/21/2019") ~ "March 22,2019 - April 21,2019"
      (Posting.Date >= "4/22/2019") & (Posting.Date<= "5/21/2019") ~ "April 22,2019 - May 21,2019"
      (Posting.Date >= "5/22/2019") & (Posting.Date<= "6/21/2019") ~ "May 22,2019 - June 21,2019"
      (Posting.Date >= "6/22/2019") & (Posting.Date<= "7/21/2019") ~ "June 22,2019 - July 21,2019"
      (Posting.Date >= "7/22/2019") & (Posting.Date<= "8/21/2019") ~ "July 22,2019 - August 21,2019"
      (Posting.Date >= "8/22/2019") & (Posting.Date<= "9/21/2019") ~ "August 22,2019 - September 21,2019"
      (Posting.Date >= "9/22/2019") & (Posting.Date<= "10/21/2019") ~ "September 22,2019 - October 21,2019"))%>%
  group_by(ï..Short.Name,Month = c_Month_to_Summarize_On)%>%

```This right here is the error: 
Error: Problem with `mutate()` input `c_Month_to_Summarize_On`.
x LHS of case 1 (`(Posting.Date >= "1/22/2019") & (Posting.Date <= "2/21/2019") ~ `January 22,2019 - ...`) must be a logical vector, not a `formula` object.
i Input `c_Month_to_Summarize_On` is `case_when(...)`.

You need a comma at the end of each line. Please see the documentation for case_when in RStudio. The format is

mutate(
  newthing = case_when(
    logicalvector1 ~ newvalue1,
    logicalvector2 ~ newvalue2,
    TRUE ~ defaultvalue)
)

The last line is optional but allows you to catch errors.

It worked! For whatever reason now, the column c_Month_to_Summarize_On has none of the ranges we determined.

  mutate(c_Month_to_Summarize_On = case_when(
      (Posting.Date >= "1/22/2019") & (Posting.Date<= "2/21/2019") ~ "January 22,2019 - February 21,2019",
      (Posting.Date >= "2/22/2019") & (Posting.Date<= "3/21/2019") ~ "February 22,2019 - March 21,2019",
      (Posting.Date >= "3/22/2019") & (Posting.Date<= "4/21/2019") ~ "March 22,2019 - April 21,2019",
      (Posting.Date >= "4/22/2019") & (Posting.Date<= "5/21/2019") ~ "April 22,2019 - May 21,2019",
      (Posting.Date >= "5/22/2019") & (Posting.Date<= "6/21/2019") ~ "May 22,2019 - June 21,2019",
      (Posting.Date >= "6/22/2019") & (Posting.Date<= "7/21/2019") ~ "June 22,2019 - July 21,2019",
      (Posting.Date >= "7/22/2019") & (Posting.Date<= "8/21/2019") ~ "July 22,2019 - August 21,2019",
      (Posting.Date >= "8/22/2019") & (Posting.Date<= "9/21/2019") ~ "August 22,2019 - September 21,2019",
      (Posting.Date >= "9/22/2019") & (Posting.Date<= "10/21/2019") ~ "September 22,2019 - October 21,2019"))%>%
  group_by(ï..Short.Name,c_Month_to_Summarize_On)%>%
  summarise(Transaction.Amount = sum(Transaction.Amount))
t_Purchase_Card_Monthly_Summary

c_Month_to_Summarize_On Transaction.Amount
   <fct>           <chr>                                <dbl>
 1 ANDERSON VICTOR NA                                  38288.
 2 BERTUCCI VITO   NA                                  30530.
 3 BROOKS RACHEL E NA                                  27880.
 4 BRUSLAVSKI OLGA NA                                    434.
 5 BURNS WILLIAM P NA                                  41756.
 6 CHIANG DAVID C  NA                                  30181.
 7 CORTEZ ELIZABET NA                                    217.
 8 CUEVAS CHARLES  NA                                   2505.
 9 ERICKSON JEFFRE NA                                  40602.
10 FINLEY-HARRIGAN NA

It only works with dates like "20200504". It won't work with "5/22/2019" because dates of this format are not sorted properly when they are strings. It would be better if you used Date class in R, convert all your text to proper dates. You can use library(lubridate) for this. Then you can use any date format.