Creating a new column with breaks


#1
mydata <- cbind(
        ID = c(1, 2, 3, 4, 5 ,6, 7, 8, 9, 10), # ID of patient
        Age = c(22, 55, 90, 7, 14, 100, 72, 85, 91, 43),# Age of patient
        Gender = c(1, 2, 2, 2, 1, 1, 2, 1, 1, 2)# Gender, 1 = Male, 2 = Female
        )
# I am trying to create a new column which would have age breaks.
# I've provided the matrix above as an example of what I am working on. 
# I'd like to create a column which can take my age data and put it in categories of 
# 0 - 29, 30 - 34, 35-39 up to 90+. 
# Can anyone help me with the code please?
# I've tried using 'cut' function but it will only take a single number 
# rather than what I've put.

#3

First, here's a nice way to setup your data for the reprex:

df <- data.frame(
  ID = c(1, 2, 3, 4, 5 ,6, 7, 8, 9, 10), # ID of patient
  Age = c(22, 55, 90, 7, 14, 100, 72, 85, 91, 43),# Age of patient
  Gender = c(1, 2, 2, 2, 1, 1, 2, 1, 1, 2)# Gender, 1 = Male, 2 = Female
)

I think cut could still work for you (I think you want slightly different breaks...).


library(dplyr)
df %>% 
  mutate(
    age_cut = cut(Age, breaks = c(0,30,35,40,Inf))
  )
#>    ID Age Gender  age_cut
#> 1   1  22      1   (0,30]
#> 2   2  55      2 (40,Inf]
#> 3   3  90      2 (40,Inf]
#> 4   4   7      2   (0,30]
#> 5   5  14      1   (0,30]
#> 6   6 100      1 (40,Inf]
#> 7   7  72      2 (40,Inf]
#> 8   8  85      1 (40,Inf]
#> 9   9  91      1 (40,Inf]
#> 10 10  43      2 (40,Inf]

Created on 2018-05-21 by the reprex package (v0.2.0).


#4

Hi Curtis, thanks for this. I've tried the code - but I am getting breaks of 0 - 30, 30 - 35, 35 - 40, 45 - 50 etc. What I need to achieve is 0 - 29, 30 - 34, 35 - 39, 40 - 44, 45 - 49, 50 - 54, 55 - 59, 60 - 64, 65 - 69, 70 - 74, 75 - 79, 80 - 84, 85 - 89, 90+

This is to match my patient data to some published data, which has the above breaks. Not my choice of banding :slight_smile:


#5

Note the breaks argument in cut. You can adjust these to whatever bins you’d like.

Note the bins intervals in cut use the standard notation, with brackets ], for inclusive and parenthesis ) for exclusive.


#6

I have tried all morning to create the breaks I listed with the 'cut' function - which was why I posted. Not to worry - I have gone with the old fashioned method below - a bit long winded but it's done the trick!


df1$AgeCat[df1$Age >= 0 & df1$Age <= 29] <- "0 - 29"
df1$AgeCat[df1$Age >= 30 & df1$Age <= 34] <- "30 - 34"
df1$AgeCat[df1$Age >= 35 & df1$Age <= 39 ] <- "35 - 39"
df1$AgeCat[df1$Age >= 40 & df1$Age <= 44 ] <- "40 - 44"
df1$AgeCat[df1$Age >= 45 & df1$Age <= 49 ] <- "45 - 49"
df1$AgeCat[df1$Age >= 50 & df1$Age <= 54 ] <- "50 - 54"
df1$AgeCat[df1$Age >= 55 & df1$Age <= 59 ] <- "55 - 59"
df1$AgeCat[df1$Age >= 60 & df1$Age <= 64 ] <- "60 - 64"
df1$AgeCat[df1$Age >= 65 & df1$Age <= 69 ] <- "65 - 69"
df1$AgeCat[df1$Age >= 70 & df1$Age <= 74 ] <- "70 - 74"
df1$AgeCat[df1$Age >= 75 & df1$Age <= 79 ] <- "75 - 79"
df1$AgeCat[df1$Age >= 80 & df1$Age <= 84 ] <- "80 - 84"
df1$AgeCat[df1$Age >= 85 & df1$Age <= 89 ] <- "85 - 89"
df1$AgeCat[df1$Age >= 90] <- "90+"

# Create Age bands table ------------------------------------------------------
df2 <- as.data.frame.matrix(table(df1$AgeCat, df1$Gender))
# Converts exactly as laid out in table

#7

It's useful to keep your code concise and avoid replication.
Unless I am misunderstanding something (wouldn't be the first time) cut can give you the same result, though with slightly different category text.

for example for the first few categories;

df <- data.frame(
  ID = c(1, 2, 3, 4, 5 ,6, 7, 8, 9, 10), # ID of patient
  Age = c(22, 55, 90, 7, 14, 100, 33, 85, 91, 43),# Age of patient
  Gender = c(1, 2, 2, 2, 1, 1, 2, 1, 1, 2)# Gender, 1 = Male, 2 = Female
)
library(dplyr)
df %>% 
  mutate(
    age_cut = cut(Age, breaks = c(0,29,34,39,Inf))
  )
#>    ID Age Gender  age_cut
#> 1   1  22      1   (0,29]
#> 2   2  55      2 (39,Inf]
#> 3   3  90      2 (39,Inf]
#> 4   4   7      2   (0,29]
#> 5   5  14      1   (0,29]
#> 6   6 100      1 (39,Inf]
#> 7   7  33      2  (29,34]
#> 8   8  85      1 (39,Inf]
#> 9   9  91      1 (39,Inf]
#> 10 10  43      2 (39,Inf]

Created on 2018-05-21 by the reprex package (v0.2.0).

The category (29,34] is equivalent to your 30 - 34 range.

The breaks arguments is where you can set these ranges. For example with breaks = c(0,29,34,39,Inf), the bins will be set between 0, 29, 34, 39 and infinity.


#8

No your not misunderstanding anything, it's me being thick! I kept putting 0 - 34, 35 - 39 etc. in the 'cut' function (I know, silly move). Curtis this is fab, thanks very much :):grinning: