Summarizing by year not working

Hello

Still new to R-studio I would love if you could help me figure out what I'm doing wrong. I have a dataset with the year variable "aar" and lots of corresponding variables.

Id like to sum these variables for each year and plot it.

Specifically I'd love to start by counting how many of the variable "KTT" there is in each year.
I thus tried the code, where aar is year:

r_e %>%
group_by(aar) %>%
summarize(Test=sum(KTT, na.rm=TRUE))

It yields
Test
1 168652

I hope this is the correct way to post the code:

dput(head(r_e, n = 20))

structure(list(aar = c("2019", "2019", "2019", "2019", "2019",
"2019", "2019", "2019", "2019", "2019", "2019", "2019", "2019",
"2019", "2019", "2019", "2019", "2019", "2019", "2019"), KOB = c(3404515,
3402234, 3402167, 3395227, 3395226, 3391197, 3391195, 3389888,
3389887, 3388696, 3388695, 3388694, 3388031, 3388030, 3388029,
3388028, 3388027, 3388026, 3387367, 3387365), CVR-nummer = c(41015861,
41001887, 40997180, 40967524, 40967508, 40945482, 40945490, 40939105,
40939091, 40930949, 40930930, 40930922, 40927522, 40927514, 40927506,
40927492, 40927484, 40927476, 40925422, 40925414), Egenkapital = c(40,
88, 332, 1786, 1786, 35935, 15729, 20052, 23809, 44993, 5624,
5624, 33103, 33103, 189184, 15006, 10003, 10003, 860, 715), Selskabskapital = c(40,
90, 40, 40, 40, 50, 50, 100, 100, 400, 50, 50, 40, 40, 40, 40,
40, 40, 40, 40), Aarets_res = c(0, -2, 292, 0, 0, 453, -7, 2047,
432, 10867, 1358, 1358, 603, 603, 2731, 6, 3, 3, 386, 37), Virksomhedsform = c("APS",
"APS", "APS", "APS", "APS", "APS", "APS", "APS", "APS", "APS",
"APS", "APS", "APS", "APS", "APS", "APS", "APS", "APS", "APS",
"APS"), Valuta = c("DKK", "DKK", "DKK", "DKK", "DKK", "DKK",
"DKK", "DKK", "DKK", "DKK", "DKK", "DKK", "DKK", "DKK", "DKK",
"DKK", "DKK", "DKK", "DKK", "DKK"), Region = c(84, 83, 82, 85,
85, 85, 85, 85, 85, 82, 84, 82, 81, 81, 81, 81, 81, 81, 82, 82
), Oevr_reserver = c(NA, NA, NA, NA, NA, 7356, NA, NA, NA, 10867,
1358, 1358, 596, 596, 2736, NA, NA, NA, NA, NA), Udskudt_skat = c(NA,
NA, 712, NA, NA, 1885, NA, 805, 479, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 254), Lang_gaeld_t_ejer_mv = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), Prioritetesgaeld = c(NA, NA, 4447, NA, NA, NA, NA, 352, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Lang_gaeld_t_realk = c(NA,
NA, 4101, NA, NA, NA, NA, 211, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Lang_bankgaeld = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), Lang_gaeld_t_koncern = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Oevr_lang_gaeld = c(NA,
NA, NA, NA, NA, NA, NA, 361, 114, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Langfristet_gaeld = c(NA, NA, 4101, NA, NA,
NA, NA, 572, 114, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
Kort_gaeld_t_realkr = c(NA, NA, 346, NA, NA, NA, NA, 141,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Kort_bankgaeld = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), Kort_gaeld_t_kon = c(NA, 1050, 1521, NA, NA, 48,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
Kort_gaeld_t_ejer_mv = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), Anden_gaeld = c(NA,
NA, 192, NA, NA, 456, 13, 1488, 1928, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 2304, 6287), Oevr_kort_gaeld = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), Kortfristet_gaeld = c(NA, 1050, 2059, NA, NA,
660, 13, 2345, 2023, NA, NA, NA, 7, 7, 5, 7, 6, 6, 2450,
6300), Type = c("APS", "APS", "APS", "APS", "APS", "APS",
"APS", "APS", "APS", "APS", "APS", "APS", "APS", "APS", "APS",
"APS", "APS", "APS", "APS", "APS"), Valuta.y = c("Danish Krone",
"Danish Krone", "Danish Krone", "Danish Krone", "Danish Krone",
"Danish Krone", "Danish Krone", "Danish Krone", "Danish Krone",
"Danish Krone", "Danish Krone", "Danish Krone", "Danish Krone",
"Danish Krone", "Danish Krone", "Danish Krone", "Danish Krone",
"Danish Krone", "Danish Krone", "Danish Krone"), value = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
Aarets_res_DKK = c(0, -2, 292, 0, 0, 453, -7, 2047, 432,
10867, 1358, 1358, 603, 603, 2731, 6, 3, 3, 386, 37), Egenkapital_DKK = c(40,
88, 332, 1786, 1786, 35935, 15729, 20052, 23809, 44993, 5624,
5624, 33103, 33103, 189184, 15006, 10003, 10003, 860, 715
), Selskabskapital_DKK = c(40, 90, 40, 40, 40, 50, 50, 100,
100, 400, 50, 50, 40, 40, 40, 40, 40, 40, 40, 40), AE = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
KT = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), num = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2019L), antal = c(1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), KTT = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))

I hope you can help me! Thanks a lot

You're definitely on the right track with your code. Looking at the sample of your data, all the values of KTT are 0, so I made a small example that I think will demonstrate what you want to do.

library(tidyverse)

df <- tibble(
  aar = rep(2017:2019, each = 5),
  ktt = rpois(15, 5)
)

df %>% 
  group_by(aar) %>% 
  summarize(ktt = sum(ktt))
#> # A tibble: 3 x 2
#>     aar   ktt
#>   <int> <int>
#> 1  2017    25
#> 2  2018    28
#> 3  2019    26

Created on 2020-06-04 by the reprex package (v0.3.0)

Thanks for the reply!

As are over a million observations, where most of them KTT is equal to 0, yet there is still a significant amount equal to 1. Although i see the confusion haha.

I feel like the code you posted is identical to mine:
r_e %>%
group_by(aar) %>%
summarize(KTT = sum(KTT, na.rm=TRUE))

Yielding:

 KTT

1 168652


Doing the exact replica:
r_e %>%
group_by(aar) %>%
summarize(KTT = sum(KTT))

Yields
KTT
1 NA

I think I found the solution!

My "Tidyverse" Library had been corrupted, so re-installing it and only using that package helped!

Thanks for the answer!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.