Visualize by week number and year

ggplot2
visualization

#1

I have a dataframe df with two columns x and y. x represents date and y represent a numerical value.

I want to aggregate that by week_no and year.
Similarly for month_no and year.

And then visualize(e.g Histogram) Please help


#2

Could you please supply a reprex?

That'd make it a lot easier to help out :slightly_smiling_face:


#3

start with stackoverflow


#4

If you have something specific in mind, it would help to understand better what you mean by:

  • aggregate: sum? average? count? (I assume average below.)
  • week_no: there are a few ways to calculate these, could you give some examples of what week_no should correspond to what dates? (I assume ISOweek below.)
  • visualize: histograms are great for showing the distribution of one variable. Do you want to see the distribution across those summaries, or within them? (I assume the former.)

Here's an example of how that could work.

library(tidyverse)
library(lubridate)

example_data <-
  tibble(date = seq.Date(from = ymd(20110101),
                          to   = ymd(20151231),
                          by   = "day"),
         value = rnorm(length(date), 1000, sd = 500),
         year = year(date),
         week = isoweek(date))

example_data_summary <-
  example_data %>%
  group_by(year, week) %>%
  summarize(avg = mean(value)) %>%
  ungroup()

ggplot(example_data_summary, aes(x = avg)) +
  geom_histogram()

Rplot02


#5

Hey jonspring thanks for your response! sorry for late reply

But I want to visualize something in this way 2017-46, 2017-51, 2018-12. Exactly in this way i.e the format is %Y-%W . And aggregate in the sense count/freq.

And not a histogram in this case bar chart would be better


#6

Here are some examples along the lines you described.

(BTW, I heartily recommend the R4DS book, available free online, especially the chapters on data visualization and data transformation. I think it would give you a good idea about different ways to explore the kind of question you posed.)

library(tidyverse)
library(lubridate)

example_data <-
  tibble(date = seq.Date(from = ymd(20160101),
                         to   = ymd(20171231),
                         by   = "day"),
         daily_instances = rnorm(length(date), 10, sd = 2) %>% as.integer()) %>%
  uncount(weights = daily_instances) %>%
  mutate(value = rnorm(length(date), 100, 5))

example_data_summary <-
  example_data %>%
  group_by(year_week = floor_date(date, "1 week")) %>%
  summarize(count = n(),
            sum = sum(value))

# This shows the sum of each week's values like your first post.
ggplot(example_data_summary, aes(x = year_week, y = sum)) +
  geom_col()

# This shows the count of how many instances there are per week, like your 2nd post.
# (But it doesn't use the "y" variable you mentioned in your first post.)
ggplot(example_data_summary, aes(x = year_week, y = count)) +
  geom_col()

Rplot05

Rplot04


#7

Hey jonspring I have a data frame with two cols URL and Date.

library(tidyverse)
library(lubridate)

example_data_summary <- df %>%
  group_by(year_week = floor_date(Date, "1 week")) %>%
  dplyr::summarize(URL, count = n())

I implemented the above code bet getting an error Error in summarise_impl(.data, dots) :
Column URL must be length 1 (a summary value), not 2


#8

Here's what summarize() is expecting to get:

Name-value pairs of summary functions. The name will be the name of the variable in the result. The value should be an expression that returns a single value like min(x), n(), or sum(is.na(y)).

summarize() feeds each group of rows (identified by group_by) into whichever aggregation functions you pass to it. Then it collapses your dataframe so that there is one row for each group, with only your grouping columns plus columns for the results of the aggregation functions.

The n() function automatically counts observations in each group, so you don't need to directly process URL in order to get a count of URLs per group. Funnily enough, it doesn't matter what columns you have other than Date, the code is the same:

example_data_summary <- df %>%
  group_by(year_week = floor_date(Date, "1 week")) %>%
  summarize(count = n())

One note: originally, you wanted your dates formatted like YYYY-WW where WW is the week of the year. This code rounds your dates down to the date corresponding to the start of the nearest full week, but on its own it doesn't format the dates. This is a very good thing for plotting, because it means you have real date values from which to construct an axis (if you formatted the dates in your dataframe, they'd be character values and ggplot() would treat them as categories — not what you want!). If necessary, you can choose whatever formatting you need later as part of the plotting code (but ggplot() tends to make pretty good automatic guesses).

There are different standards in different parts of the world for how to count weeks. The floor_date() default is to start the week on Sunday. If you need weeks to start on a different day, you'll have to set that using the week_start parameter. For instance, to have weeks start on Monday:

floor_date(Date, "1 week", week_start = 1)

#9

thanks a lot john spring. I am really thankful to you for the book and also the solution as well.

I was really unaware that the solution symbol can only be assigned to one answer.

Anyways thanks a lot again :slight_smile:


#10

Thanks a lot jcblum to you. I have come across your message.

But to be honest I really wanted to thank both of you and even I thought I have marked solution to both your answer and @johnspring . But it can be assigned to one only.

I thank you again for the way you have explained just a small function. :slight_smile:


#11

I am really sorry for the way I asked this question. I understand why you have mentioned this.
Though I use R from a long time but I am new to this community. Next time I'll take care of this :slight_smile:


#12

Hey leon !! next time I'll take care of this. I am new to this community I was unaware of this. :slight_smile: