Error when using mutate to label sets of rows in dataset

Context: I am attempting to label subsets of my data frame using the code below so that I can then:

coll_count <- coll_count %>% mutate(I=year==1974:1979,
                                      II=year==1980:1984,
                                      III=year==1985:1989,
                                      IV=year==1990:1994,
                                      V=year==1995:1999,
                                      VI=year==2000:2004,
                                      VII=year==2005:2009,
                                      VIII=year==2010:2014,
                                      IX=year==2014:2017)
  1. use pivot_longer place "I:IX" columns into one column named "epoch"

  2. Run the following code below

df_word <- df_word %>% # sum repetitions by year (denominator)  
  group_by(epoch) %>% 
  mutate(sum_repet_epoch = sum(repet)) %>% 
  ungroup()

df_word_year <- df_word %>% # compute standardization (for AV,A,V)
  group_by(epoch) %>%
  mutate(sev_word = (sumAVprod.word/sum_repet_epoch),
         aro_word = (sumAprod.word/sum_repet_epoch),  
         val_word = (sumVprod.word/sum_repet_epoch)) %>%
  distinct() %>%
  select(year, lemma, sev_word)
  ungroup()
  1. partition the data frame into the epochs
sev1 <- df_word_year %>% filter(year==1974:1979) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev2 <- df_word_year %>% filter(year==1980:1984) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev3 <- df_word_year %>% filter(year==1985:1989) %>% arrange(sev_word) %>% slice_max(sev_word, n=100) 
sev4 <- df_word_year %>% filter(year==1990:1994) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev5 <- df_word_year %>% filter(year==1995:1999) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev6 <- df_word_year %>% filter(year==2000:2004) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev7 <- df_word_year %>% filter(year==2005:2009) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev8 <- df_word_year %>% filter(year==2010:2014) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)
sev9 <- df_word_year %>% filter(year==2014:2017) %>% arrange(sev_word) %>% slice_max(sev_word, n=100)

Problem: It almost seems like it would be quicker to section the data manually in excel and then import it but I am trying to learn to handle a bigger data frame efficiently a there are 40,597 rows.

I am essentially trying to add another column to my data frame that partitions the columns by the "year" column in the data frame (broken into the 9 groups specified above). Because I think I want to group_by using this "epoch" column afterwards, I am not immediately partitioning the initial data frame using slice etc.

Would anyone have an idea as to how to better automate this? Currently, I am getting the following error after the first code chunk: "longer object length is not a multiple of shorter object"

Hi, imagine this bit doesn't work. you probably want to use dplyr::case_when() for this. For the rest of your problem, it would be good if you provided a reproducible example.

Thanks. I am looking at how to apply case_when() but it is a little confusing. Would you be able to give an example on how to do the below/correct the below attempt?

coll_count <- coll_count %>% mutate(I = case_when(year==1974:1979))

At present, I receive the following error:
**longer object length is not a multiple of shorter object lengthError in mutate():
! Problem while computing I = case_when(year == 1974:1979).
Caused by error in case_when():
! Case 1 (year == 1974:1979) must be a two-sided formula, not a logical vector.
Backtrace:

  1. coll_count %>% mutate(I = case_when(year == 1974:1979))
  2. dplyr::case_when(year == 1974:1979)
    Error in mutate(., I = case_when(year == 1974:1979)) : Caused by error in case_when(): ! Case 1 (year == 1974:1979) must be a two-sided formula, not a logical vector.**

Hi, can you provide a reproducible example of your dataset and perhaps an example of the output you are after?

This is a very basic example anyway:

library(tidyverse)

tibble(year = seq(1974, 2009, 1)) %>% 
  mutate(x = case_when(year %in% 1974:1979 ~ "I",
                     year %in% 1980:1984  ~ "II",
                     TRUE ~ "III"))

# A tibble: 36 x 2
    year x    
   <dbl> <chr>
 1  1974 I    
 2  1975 I    
 3  1976 I    
 4  1977 I    
 5  1978 I    
 6  1979 I    
 7  1980 II   
 8  1981 II   
 9  1982 II   
10  1983 II   
# ... with 26 more rows

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.