How to properly structure a Data Analysis when dealing with Dates

I am trying to improve the quality of my data analysis scripts.

I have been doing several analysis for different clients and very often I have to deal with dates and general notion of Time (hourly/daily/weekly/monthly/yearly).

Everytime, after some time of data analysis, I always end up by building a Shiny dashboard.
I would like to have advice on what is the best way to structure my code when I have event data with date granularity as 'hourly'.

Below is the "restructuration plan" of my different scripts that I have in mind but I would like it to be challenged if you anticipate things that could not be efficient in the following proces:

  1. Get raw data in with date granularity at hourly level
  2. Perform the different data analysis at hourly level
  3. Aggregate at daily/weekly/monthly/yearly only when necessary for building ggplot graphs
  4. Build Shiny and attach filter for the Date to select granularity of x axis (hourly/daily/weekly/monthly/yearly granularity)

Thanks a lot for any of your advice!
Max

What sort of analysis do you do?

Some common analysis you can do with time series is identifying seasonality and making forecasts.

You can do regression to try to relate variables but because time series typically have autocorrelation, the observations are not statistically independent so the p-values are compromised.

It's mostly analyzing users behavior inside webapps or apps, how many media they have, what features they are using, the different cohorts analysis etc.

Cool. You might want a data set aggregated by app/service and another aggregated by customer. The first would be user full for high level plots of usage over time and the second for analyzing customer behavior and clustering into customer types.

1 Like

Thanks, dully noted. What bothers me the most is how to handle Date and when it make sense to use the most granular time (like hourly for me) and when it makes more sense to use less granular like week/month.

You might make that decision based on the sparsity of the data. If an hourly frequency produces observations with lots of zeros, you might choose daily or monthly instead.