Make a category variable bases on range of row number

Hello community,

I want to make a category variable, a simple 1...n category, based on a specific range of rownumbers in the dataset. So, for the first 100 rows I want the category variable to have a value of 1, for the next 100 rows a value of 2 and so on. I've tried it with the follow code and it is working, but I have to specify the break-points and would rather work in percentage based ranges. My current code hardly seems efficient especially when manipulating larger datasets without having to specify all the break-points.
Does someone have a more robust/elegant solution?

Code:
install.packages("gapminder")
library(gapminder)

df <- gapminder

df <- df %>%
mutate(rownumber = row_number()) %>%
mutate(category = cut(rownumber,
breaks=c(0, 300, 600, 900, 1400, 1704),
labels=c("1","2","3", "4", "5")))

head(df)

Output:

library(gapminder)

df <- gapminder

maxrow <- nrow(gapminder)
split_length<-300
ccats <- maxrow %/% split_length + 1

df <- df %>%
  mutate(rownumber = row_number()) %>%
  mutate(category = cut(rownumber,
                        breaks=c(seq(0,maxrow,split_length),Inf),
                        labels=1:ccats ))

df %>% group_by(category) %>% summarise(mn=min(rownumber),mx=max(rownumber))
# A tibble: 6 x 3
category    mn    mx
<fct>    <int> <int>
1            1   299
2          300   599
3          600   899
4          900  1199
5         1200  1499
6         1500  1704
1 Like

Thank you for the code Nir! This is exactly what I had in mind.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.