Sorting By Date -- New To R

Hi Everyone I am new to R and looking for some help categorizing my data, or better a good tutorial on how this can be done. I currently have a data set where the date column is organized in the format YYYY-MM-DD. I am looking for some help to categorize the dates by Year, then month. if there is no date in column 1, then I would like to use the date in column 2. meaning port of the information of that row to column 1. This probably means I need to categorize column 1 & 2 and then sub in 2 where there is no 1.

Date Column 1 | Date Column 2 | Information
YYYY-MM-DD | YYYY-MM-DD | XXXXXX
YYYY-MM-DD | YYYY-MM-DD | XXXXXX
YYYY-MM-DD | YYYY-MM-DD | XXXXXX
YYYY-MM-DD | YYYY-MM-DD | XXXXXX
YYYY-MM-DD | YYYY-MM-DD | XXXXXX

Hi, and welcome.

Questions benefit greatly from a reproducible example, called a reprex. In this case it would resolve an ambiguity of whether your date columns were character strings, like "2020-01-18" or class.

The lubridate package has functions to discard DD to get a year/month.

The dplyr package allows you to replace the contents of the first column with those of the second. Here, again, an assumption is needed without a reprex--that the empty data for column 1 is represented by NA

The syntax to do this would be

my_data %<>% mutate(Date1 <- ifelse(is.na(),Date2,Date1)

(there's also a sort function, called arrange

but you'll want to look at the R for Data Science chapters on this, first.

Hello, Thanks for your reply.

Currently, the date column is represented as a string Ex) YYYY-MM-DD. However when there is no date, the box is left empty. Should this change how I am going about answering this question?

You can use the lubridate::hm() function to convert. That will probably produce NAs

Thanks. I will try this and get back to you

1 Like

Should something go inside the is.na() block. I have it in my scipt however I am getting an error

Error in is.na() : 0 arguments passed to 'is.na' which requires 1 

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

I cannot share my data set due to the project I am working on but attached I have included a dataset of something similar showing what I am trying to do . Where there is no Incident Date Take the date from the Report Date Column and put that in the Incident Date column as well so it shows in both. My goal is to fill in all the NA's and then once done, organize by date however I think I already have script to do that

That is not a dataset, that is a screenshot and is not very useful since I can't copy your sample data into my R session and give you a working solution, please read the guide I gave you and try to provide a proper reproducible example including sample data on a copy/paste friendly format.

1 Like

Perhaps this might help, however when I am using the dataset I currently have it is from a CSV file and this is a small table I have just created in R. I hope this helps.

datapasta::df_paste(head(iris, 11)[, c('Incident.Date', 'Report.Date','Artifact.Number')])

data.frame(
  Incident.Date = c('1/1/2015','NA','2/2/2015','NA','5/5/2016','NA','4/4/2017','4/4/2018','NA','5/5/2018','1/4/2015'),
  Report.Date = c('3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019'),
  Artifact.Number = c(1,2,3,4,5,6,7,8,9,10,11)
)
)

In the dataset that I am working with, there are no strings 'NA', they are just blank cells

1 Like

Yes, it helps, this is sample data on a proper format

You are getting 'NA' as a string (with quotes) because you are not reading the data correctly from the CSV file, they should be NA (without quotes) which is the way R represents blanks, it stands for Not Available.

Anyways, If I understand you correctly, this is what you are trying to do

library(tidyverse)
library(lubridate)

# This is just sample data, you can replace this with the actual dataset that you read from the CSV file
sample_data <- data.frame(
    Incident.Date = c('1/1/2015','NA','2/2/15','NA','5/5/2016','NA','4/4/2017','4/4/2018','NA','5/5/2018','1/4/2015'),
    Report.Date = c('3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019'),
    Artifact.Number = c(1,2,3,4,5,6,7,8,9,10,11)
)

sample_data %>%
    mutate_at(vars(contains("Date")), dmy) %>% 
    rowwise() %>% 
    mutate(Incident.Date = if_else(
        is.na(Incident.Date),
        true = Report.Date,
        false = Incident.Date)
    ) %>% 
    ungroup() %>% 
    mutate(year = year(Incident.Date),
           month = month(Incident.Date)) %>% 
    arrange(Incident.Date)
#> Warning: 4 failed to parse.
#> # A tibble: 11 x 5
#>    Incident.Date Report.Date Artifact.Number  year month
#>    <date>        <date>                <dbl> <dbl> <dbl>
#>  1 2015-01-01    2019-03-03                1  2015     1
#>  2 2015-02-02    2019-03-03                3  2015     2
#>  3 2015-04-01    2019-03-03               11  2015     4
#>  4 2016-05-05    2019-03-03                5  2016     5
#>  5 2017-04-04    2019-03-03                7  2017     4
#>  6 2018-04-04    2019-03-03                8  2018     4
#>  7 2018-05-05    2019-03-03               10  2018     5
#>  8 2019-03-03    2019-03-03                2  2019     3
#>  9 2019-03-03    2019-03-03                4  2019     3
#> 10 2019-03-03    2019-03-03                6  2019     3
#> 11 2019-03-03    2019-03-03                9  2019     3

Created on 2020-01-20 by the reprex package (v0.3.0.9000)

Thank you for your solution, however on one machine this worked perfectly and on another I get the following error

Error in UseMethod("tbl_vars")
    no applicable method for 'tbl_vars' applied to an object of class "c('matrix, 'logical')"

I've double checked to make sure all the correct packages are installed and that the libraries are also being used correctly.

Can you provide a reproducible example for this? Other wise I have no means to help you any further.

I've been doing some research and it seems that in my provided example, all the date values where entered as strings. However in my data set, they could've been vectorized or be in a different formats.. That could be potentially why I am getting this error. I will have to do some further digging and update the thread when I know more.

Yes, is.na(SOMEVARIABLE)

You can show the actual structure of your data by using dput() instead of datapasta::df_paste().

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.