Hello everyone,
I have a data table with more than 5'000'000 rows and 18 columns.
I would like to count all my rows for each year, if a certain column (occupation) is a specific value.
Fortunately, I was already able to successfully create the table, which obtains the desired occupation for each year and lists also the frequency.
Unfortunately, I struggle with counting the frequency of the occupation 9920 in each year. Therefore, my problem is that I don't know how to connect the data that is stored in my data table with the commands that count the frequency.
A selection of my data:
row YEAR OCC2010
1 1990 4600
2 1990 2810
3 1990 9920
4 1990 9920
5 1991 5600
6 1991 5700
7 1991 5160
...
My file is:
setwd("C:\\Users\\power\\Desktop\\Data Thesis")
library(ipumsr)
library(dplyr, warn.conflicts = FALSE)
cps_ddi <- read_ipums_ddi("cps_00002.xml")
cps_data <- read_ipums_micro(cps_ddi, verbose = FALSE)
View(cps_data)
# Problem
cps_data(year = (sample(c("1990", "1991", "1992", "1993"), 100000, replace=T)),
OCC2010 = rep("9920", 100000))
combns = expand.grid(c("1990", "1991", "1992", "1993"), "9920")
combns = cbind (combns, apply(combns, 1, function(x)sum(transp$YEAR==x[1] & transp$OCC2010==x[2])))
colnames(combns) = c("year", "occupation", "count")
View(combns)
My desired table, which I've created already succesfully
row year occupation count
1 1990 9920 0
2 1991 9920 0
3 1992 9920 0
4 1992 9920 0
Many thanks in advance and I hope somebody could help me
Xx
Freddy