Advise on creating new data type (for day of the year)

Background
I often work with and support people working with daily climatic time series data. The day of the year (number from 1 to 365/366) is a useful variable. I often calculate a yearly summary such as day of the year of a first occurrence of some event. I then want to plot a simple time series of this "day of the year". It's much nicer to display this as e.g. "1 Feb" for 32 so the meaning is clear. For graphs, I convert the day of year to a date with an arbitrary year and then just display the day and month on the axis, as shown below.

library(ggplot2)
df <- data.frame(year = 2011:2015, doy = c(215, 101, 135, 53, 325))

ggplot(df, aes(x = year, y = as.Date(doy, origin = "2015-12-31"))) + 
  geom_line() +
  scale_y_date(date_labels="%d %b") +
  labs(y = "doy")

image

From searching around this seems the easiest solution. But it gets a bit repetitive, and is off putting to new users of R I work with.

Plus, it would also be nice if this could display as "1 Feb" when printing, and in data frames etc, not just graphs.

I've not found any data structure/type set up for this. So I'm thinking that it could be useful to make a "day of the year" data structure which is internally just a number, but can display in other ways. I think what I want is something similar to how difftime works. So, I'm wanting to create something that would work like:

as.doy(c(32, 33))
#> [1]   "1-Feb" "2-Feb"

I have experience in R but not at this kind of programming with R. So I'm not really sure where to start on this. I'm assuming its feasible to do.
Is there any documentation or packages that have done similar things that I could learn from?
And, if I had my own data type, could I create my own scale_ functions which ggplot2 would recognise so that it could by default display as I like on an axis?

Any help or guidance would be very appreciated, thanks.

I can not answer any of your questions, but is a new type necessary? You can get the day number from date (and vice versa) simply by %j.

day_numbers <- c(215, 101, 135, 53, 325)

calendar_dates <- as.Date(x = day_numbers,
                          origin = as.Date(x = "2019-01-01"),
                          format = "%j")

display_without_year <- format(x = calendar_dates,
                               format = "%d %b")

data.frame(day_numbers, calendar_dates, display_without_year)
#>   day_numbers calendar_dates display_without_year
#> 1         215     2019-08-04               04 Aug
#> 2         101     2019-04-12               12 Apr
#> 3         135     2019-05-16               16 May
#> 4          53     2019-02-23               23 Feb
#> 5         325     2019-11-22               22 Nov

Thanks for the reply. The problem with this is that I can't use display_without_year for graphs as its a character/factor, so I would still need to transform the day_numbers for plotting and also use day_numbers for any calculations.
Also, managing two columns for the same information could lead to mistakes if someone makes a change and forgets to keep them in sync. So trying to make this is simple as possible for (inexperienced) users was the motivation for suggesting a new data type.

Do you know how to create R packages? This is a prerequisite for what you want. If you do, it's kinda trivial.

x <- 12

as_doy = function(x){
  stopifnot(x %in% 1:366)
  structure(x, class = "doy")  # add a class attribute
}

x <- as_doy(x)

# the function.dot notation is used to define s3 methods, that's why it's 
# bad practice to put dots in function/variable names, even if 
# base R does that as well. 
print.doy <- function(x, ...){
  cat(paste0(x, "th day of the year"))
  invisible(x)  # print methods should always return their input invisibly
}

x

You have to register print.doy() as an S3 method in a dedicated R package to really make that approach practical for other uses (refer to hadleys "r packages" book).

If you want create scales for ggplot2 that's an extra step and a bit more complicated. You can look at the code of my package dint for examples :slight_smile: (the most complicated part is getting the breaks right).

If you just want to label the axes appropriately, it might be enough to define format.doy() (like print.doy() but returns a character representation of x)

1 Like

@hoelk thank you so much, this is exactly what I needed to get started.
Thanks for the link to your package too, for the ggplot2 scales creation. I can take examples from there. The package itself looks relevant for me too!

Glad to help! Also take a look at S3 · Advanced R.

There are a few built-in generics in R that you want to define S3 methods for if you create a new S3 class. print() and format() are probably the most important ones for you, plot() and summary() are also common.

You can also define +, - (revelant for you I think) and max(), min() etc...

If you skim a bit through the code of dint you probably find examples for everything you need for your doy, as the it does something pretty similar :slight_smile:

Thanks, yes I think what I need is pretty similar to what you've already done in dint :slight_smile:

Am I right that if I don't define max.doy then it will default to using max for numeric vectors? I know R doesn't have inheritance in the same way as other languages, but it is still numeric in that sense? It seems so as + and - do something without defining functions for them.

Also, I've tried something quickly, but getting an error when I put a doy vector into a data frame. Is this because I need to define as.data.frame.doy? If so, what does that need to achieve? Sorry, I couldn't see this easily in your package.

doy <- function(x, y_length = 366) {
  class(x) <- "doy"
  attr(x, "y_length") <- y_length
  x
}

print.doy <- function(x, ...) {
  cat(format.doy(x, ...))
}

format.doy <- function(x, ...) {
  y <- as.Date(paste("2000", x), format = "%Y %j")
  format(y, format = "%d-%b")
}

x <- doy(1:10)
x
#> 01-Jan 02-Jan 03-Jan 04-Jan 05-Jan 06-Jan 07-Jan 08-Jan 09-Jan 10-Jan
data.frame(x = doy(1:10))
#> Error in as.data.frame.default(x[[i]], optional = TRUE): cannot coerce class '"doy"' to a data.frame

hmm that's weird. probably my code example should have been:

class(x) <- c("doy", "numeric")

instead of just "doy". I would still have expected data.frame() though... strange...
inheritance for s3 classes work from left to right, so R first looks for print.doy(), then print.numeric() (and then print.default())

and yes you inherit everything from numeric, just not necessarily everything makes sense (so you want to loop + at the end of the year etc)

Great, thanks that worked with by adding "numeric" to the class.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.