Count scholarity by year

ecacarva · October 2, 2020, 1:33pm

I have the following data set:

|3475 | PROF1 | 1988 | Undergraduate
|3478 | PROF2 | 1996 | Undergraduate
|3476 | PROF1 | 1998 | Master
|3479 | PROF2 | 2000 | Master
|3477 | PROF1 | 2002 | Doctoral
|3480 | PROF3 | 2004 | Doctoral

I am trying to do a script that accum the type of graduate by year, but, when change the scholing type decrease from another. I need to show something like this:

Undergraduate	Master	Doctoral

1988 1 0 0
1989 1 0 0
1990 1 0 0
1991 1 0 0
1992 1 0 0
1993 1 0 0
1994 1 0 0
1995 1 0 0
1996 2 0 0
1997 2 0 0
1998 1 1 0
1999 1 1 0
2000 0 2 0
2001 0 2 0
2002 0 1 1
2003 0 1 1
2004 0 0 2

Explaining: Until 1997 we had 2 undergraduate professors, so in 1998 one of them concluded a master degree, then we need to decrease the number of undergraduates in 1 because now we need to account him as a master and not an undergraduate, and the same for doctoral.

Somebody would help?

andresrcs · October 3, 2020, 1:05am

You can do something like this

library(tidyr)
library(dplyr)

# Sample data on a copy/paste friendly format
sample_df <- data.frame(
  stringsAsFactors = FALSE,
                id = c(3475, 3478, 3476, 3479, 3477, 3480),
              prof = c("PROF1", "PROF2", "PROF1", "PROF2", "PROF1", "PROF3"),
              year = c(1988, 1996, 1998, 2000, 2002, 2004),
            degree = c("Undergraduate",
                       "Undergraduate","Master","Master","Doctoral","Doctoral")
)

sample_df %>%
    count(year, degree) %>% 
    pivot_wider(names_from = degree, values_from = n, values_fill = 0)
#> # A tibble: 6 x 4
#>    year Undergraduate Master Doctoral
#>   <dbl>         <int>  <int>    <int>
#> 1  1988             1      0        0
#> 2  1996             1      0        0
#> 3  1998             0      1        0
#> 4  2000             0      1        0
#> 5  2002             0      0        1
#> 6  2004             0      0        1

^{Created on 2020-10-03 by the reprex package (v0.3.0)}

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

nirgrahamuk · October 3, 2020, 12:02pm

my solution:

exdf <- structure(list(
  code = c(3475, 3478, 3476, 3479, 3477, 3480),
  p = c(
    "PROF1", "PROF2", "PROF1", "PROF2", "PROF1",
    "PROF3"
  ), y = c(1988, 1996, 1998, 2000, 2002, 2004), deg = c(
    "Undergraduate",
    "Undergraduate", "Master", "Master", "Doctoral", "Doctoral"
  )
), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

library(tidyverse)


maxy <- max(exdf$y)

code_a_prof <- function(df,maxy){
  
  df2 <- mutate(df,
    next_deg = lead(deg),
    next_y = lead(y) - 1
  ) %>%
    mutate(next_y = ifelse(is.na(next_y), maxy, next_y)) %>%
    rowwise() %>%
    mutate(
      deg_list = list(rep(deg, 1 + next_y - y)),
      y_list = list(seq(from = y, to = next_y, by = 1))
    )
  
  pname <- pull(df, p) %>% unique()

  tibble(
    y = unlist(df2$y_list),
    deg = unlist(df2$deg_list)
  ) %>% mutate(p = pname)
}

# code_a_prof(exdf%>%filter(p=="PROF1"),maxy)

group_by(exdf,p) %>% 
  group_map(~code_a_prof(.,maxy),.keep=TRUE) %>%
  bind_rows %>% 
  group_by(y,deg) %>% 
  summarise(n=n()) %>%
  pivot_wider(names_from="deg",
              values_from="n")

system · October 11, 2020, 2:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.