Getting ratios of categories over time

Hello,
I'm trying to observe the ratio change between categories over two decades in some data. Essentially, between each year I want to be able to say what ratio each category holds within that year, and display it over the two decades. Does anyone have any advice on how to start this?
Cheers,
J. Maxwell

It is hard to give specific advice without knowing more about the data you are starting with. For example, do you already have yearly values for the categories or does the yearly value have to be calculated? Please give more information about your data. It would help a lot if you could show a bit of the data. If it is in a data frame called DF, the result of the following command would be very helpful.

dput(head(DF))

Paste the result of that between two lines that consist of only three back ticks.

```
Your output here.
```

Each category doesn't have a specific yearly value. The categories are based off of title posts and the date is recorded for the title post. I ran the dput(head(DF)), but because of the length of the titles it is pretty unseemly. What specifically are you looking for with the dput? Here is an example of some of the data. Does this help clear things up? I apologize for being unclear initially.

Category month replies title views year
Education 10 463 NEW WHITE NATIONAL SCHOOL (K-12) FORMING.... 210859 2005
Education 5 72 Homeschool lessons 69853 2004
Children 1 52 Book suggestions for children and young adults 34967 2012
Misc 8 304 Firefox - A Better Browser For Whites That's Spreading Like Wildfire 166373 2005
Education 8 46 College textbooks: buy/sell 38115 2007
Misc 12 42 Free Online Course on How to Start a Business 43691 2004

I invented a small data set with only two categories, two years and no extra columns but I think that is sufficient to show the method.

library(dplyr)
DF <- data.frame(category = rep(LETTERS[1:2], each = 6),
                 views = c(143, 198, 87, 252, 632, 56, 484, 399, 144, 256, 532, 333),
                 year = rep(2010:2011, 6))
DF
#>    category views year
#> 1         A   143 2010
#> 2         A   198 2011
#> 3         A    87 2010
#> 4         A   252 2011
#> 5         A   632 2010
#> 6         A    56 2011
#> 7         B   484 2010
#> 8         B   399 2011
#> 9         B   144 2010
#> 10        B   256 2011
#> 11        B   532 2010
#> 12        B   333 2011

AnnualTotal <- DF %>% group_by(year) %>% summarize(Total = sum(views))
#> `summarise()` ungrouping output (override with `.groups` argument)

Cat_Year <- DF %>% group_by(category, year) %>% 
  summarise(GroupTotal = sum(views))
#> `summarise()` regrouping output by 'category' (override with `.groups` argument)

Cat_Year <- inner_join(Cat_Year, AnnualTotal, by = "year")
Cat_Year
#> # A tibble: 4 x 4
#> # Groups:   category [2]
#>   category  year GroupTotal Total
#>   <chr>    <int>      <dbl> <dbl>
#> 1 A         2010        862  2022
#> 2 A         2011        506  1494
#> 3 B         2010       1160  2022
#> 4 B         2011        988  1494
Cat_Year <- Cat_Year %>% mutate(Ratio = GroupTotal/Total)
Cat_Year
#> # A tibble: 4 x 5
#> # Groups:   category [2]
#>   category  year GroupTotal Total Ratio
#>   <chr>    <int>      <dbl> <dbl> <dbl>
#> 1 A         2010        862  2022 0.426
#> 2 A         2011        506  1494 0.339
#> 3 B         2010       1160  2022 0.574
#> 4 B         2011        988  1494 0.661

Created on 2020-09-09 by the reprex package (v0.3.0)

library(dplyr)
DF <- data.frame(category = rep(LETTERS[1:2], each = 6),
views = c(143, 198, 87, 252, 632, 56, 484, 399, 144, 256, 532, 333),
year = rep(2010:2011, 6))

For this section is there an easier way to tackle the views for your script? I have 3,169 different titles, and therefore a lot of different views, and typing out each one seems a bit much.

That part of the code is simply me inventing some data. You should use your own data set that you partially displayed in a previous post. Whatever it is named, substitute that name for DF in the line

AnnualTotal <- DF %>% group_by(year) %>% summarize(Total = sum(views))

I'm still struggling a bit to get your script to meld with my data. I probably should have prefaced that I am relatively new to R. Here's where I think I am getting stuck.
-you're creating object an object (AnnualTotal) that should have the titles organized and then summarized. When I try and run that data, however, I am met with an error code "Error in summarize(., Total = sum(views)) :
argument "by" is missing, with no default"

I do not know what the problem is. Please post the output of

dput(head(DF))

except replace DF with the name of your data frame. When you paste the output into your reply, put a line containing only three back ticks just before and after. Like this
```
Paste your output here
```
The back tick key is just to the left of the number 1 on US keyboards.

> dput(head(SF))
structure(list(Category = c("Education", "Education", "Children", 
"Misc", "Education", "Misc"), month = c(10L, 5L, 1L, 8L, 8L, 
12L), replies = c(463L, 72L, 52L, 304L, 46L, 42L), title = c("NEW WHITE NATIONAL SCHOOL (K-12) FORMING....", 
"Homeschool lessons", "Book suggestions for children and young adults", 
"Firefox - A Better Browser For Whites That's Spreading Like Wildfire", 
"College textbooks: buy/sell", "Free Online Course on How to Start a Business"
), views = c(210859L, 69853L, 34967L, 166373L, 38115L, 43691L
), year = c(2005L, 2004L, 2012L, 2005L, 2007L, 2004L), moyr = c(50, 
33, 125, 48, 72, 40)), row.names = c(NA, 6L), class = "data.frame")

The following code works for me using the data you posted. The only changes I made to my original code were to write Category with an upper case C to match your data and to add the arrange() function at the end to sort the final data frame so that data from each year are displayed together.

library(dplyr)

DF <- structure(list(Category = c("Education", "Education", "Children", 
                                  "Misc", "Education", "Misc"), 
                     month = c(10L, 5L, 1L, 8L, 8L, 12L), 
                     replies = c(463L, 72L, 52L, 304L, 46L, 42L), 
                     title = c("NEW WHITE NATIONAL SCHOOL (K-12) FORMING....",
                               "Homeschool lessons", "Book suggestions for children and young adults",
                               "Firefox - A Better Browser For Whites That's Spreading Like Wildfire",
                               "College textbooks: buy/sell", 
                               "Free Online Course on How to Start a Business"), 
                     views = c(210859L, 69853L, 34967L, 166373L, 38115L, 43691L), 
                     year = c(2005L, 2004L, 2012L, 2005L, 2007L, 2004L), 
                     moyr = c(50,33, 125, 48, 72, 40)), row.names = c(NA, 6L), class = "data.frame")


AnnualTotal <- DF %>% group_by(year) %>% summarize(Total = sum(views))
#> `summarise()` ungrouping output (override with `.groups` argument)

Cat_Year <- DF %>% group_by(Category, year) %>% 
  summarise(GroupTotal = sum(views))
#> `summarise()` regrouping output by 'Category' (override with `.groups` argument)

Cat_Year <- inner_join(Cat_Year, AnnualTotal, by = "year")
Cat_Year
#> # A tibble: 6 x 4
#> # Groups:   Category [3]
#>   Category   year GroupTotal  Total
#>   <chr>     <int>      <int>  <int>
#> 1 Children   2012      34967  34967
#> 2 Education  2004      69853 113544
#> 3 Education  2005     210859 377232
#> 4 Education  2007      38115  38115
#> 5 Misc       2004      43691 113544
#> 6 Misc       2005     166373 377232
Cat_Year <- Cat_Year %>% mutate(Ratio = GroupTotal/Total) %>% 
  arrange(year, Category)
Cat_Year
#> # A tibble: 6 x 5
#> # Groups:   Category [3]
#>   Category   year GroupTotal  Total Ratio
#>   <chr>     <int>      <int>  <int> <dbl>
#> 1 Education  2004      69853 113544 0.615
#> 2 Misc       2004      43691 113544 0.385
#> 3 Education  2005     210859 377232 0.559
#> 4 Misc       2005     166373 377232 0.441
#> 5 Education  2007      38115  38115 1    
#> 6 Children   2012      34967  34967 1

Created on 2020-09-11 by the reprex package (v0.3.0)

That worked. Thank you so much for your help.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.