Convert numbers to date


#1

Hi all,

I'm a bit struggling with converting a vector of integers into dates. I've imported a dataset from Stata using:

> dataire <- read.dta13("~/lcapm_ireland.dta", convert.factors = TRUE, 
generate.factors = FALSE, encoding = "UTF-8", fromEncoding = NULL, 
convert.underscore = FALSE, missing.type = FALSE, convert.dates = TRUE, 
replace.strl = TRUE, add.rownames = FALSE)

My date variable in Stata is a monthly time series starting on January 2000 formatted as "2000-Jan". However, Stata handles dates as integers, similarly to R. Therefore, when importing the dataset into R, I end up with a vector of dates of the form:

> c(478, 479, 480, ...)

In addition, my date variable is

> class(datem)
[1] "Date"

How can I use as.Date or other functions to transform the time-series of integers in a monthly date variable formatted as "%Y-%b" ?

Thank you for your help

S


#2

There're a few ways to do this, but my favourite in terms of readability is to use the lubridate package (which comes with the tidyverse but isn't loaded with the core set of packages):

library(lubridate)

# here's your vector: 478, 479, 480, ...
datem = 478:493


# … and here it is as Date objects:
as.Date('2000-01-01') + months(datem)

You can also use this technique with days(), years(), weeks() and other periods of time!


#3

Another option: as.Date() takes an origin parameter that does what you are looking for as far as turning your integers into dates:

I misunderstood the scenario — since the integers in this case represent months (I think?) from the origin (which is not 2000-01-01), the conversion is more complicated. But the next bit stands, regardless, I think! (if anything, the mixup reinforces the advice to always compare converted dates to known dates in the other system)

It’s worth taking note of this warning from the as.Date() docs:

Conversion From Other Systems

Most systems record dates internally as the number of days since some origin, but this is fraught with problems, including

  • Is the origin day 0 or day 1? As the ‘Examples’ show, Excel manages to use both choices for its two date systems.

  • If the origin is far enough back, the designers may show their ignorance of calendar systems. For example, Excel's designer thought 1900 was a leap year (claiming to copy the error from earlier DOS spreadsheets), and Matlab's designer chose the non-existent date of ‘January 0, 0000’ (there is no such day), not specifying the calendar. (There is such a year in the ‘Gregorian’ calendar as used in ISO 8601:2004, but that does say that it is only to be used for years before 1582 with the agreement of the parties in information exchange.)

The only safe procedure is to check the other systems values for known dates: reports on the Internet (including R-help) are more often wrong than right.

When it comes time to output your dates as character strings in your preferred format, you can use base format() (and relatives — see documentation) or lubridate::stamp()


#4

Thanks for your reply! This code actually works, but instead of putting
as.Date('2000-01-01') + months(datem), I need to impose ('1960-03-01') to have a as.Date vector that starts in 01/01/2000.

Thanks again!


#5

Thanks for your reply! While your code also works, it converts daily dates, while I need monthly dates.
Can I include the format() option on the code line?

Thanks


#6

No worries, @smgrilli. The incorrect origin in my code was down to me misunderstanding your code (I thought you meant 1 = Jan 2000, not 478 = Jan 2000). You could use months(datem - 478) of you wanted to leave the origin as it is. Glad it's working for you!


#7

Hm, I’m not sure I’ve understood your situation correctly! For what it’s worth, R stores dates as the number of days since its origin date, so there’s not really any such thing as a “monthly date” internally. Some day of the month has to be specified (usually the first of the month).


#8

Thank you for your reply. In fact, while the full date has to be specified YYYY-MM-DD, I wanted to proceed on a monthly base, such that 478 = 01/01/2000 and 479 = 01/02/2000. The solution provided by @rensa works, but I'm always interested in looking at other alternatives if you have some to suggest.

Thanks