a question about cut fuction

makisa · February 19, 2023, 2:56pm

Hello

I am expected to create a new Age variable with quantiles. Upper bounds of the classes which are in the "middle" should be excluded. So the levels should look like this:
Levels: [20,30] [30,40) [40,50]

I tried this (and many more commands):
cut(don_ind$Age, breaks = c(20, 30, 40, 50), include.lowest = TRUE, right = FALSE)

But it didn't work. Do you have any idea?

Thank you.

jrkrideau · February 19, 2023, 3:29pm

I think we need a see your code and some sample data.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

You might find CUT in R

makisa · February 19, 2023, 3:42pm

Thank you for your answer. Please fnd below some sample data:

Ans this is the code:
cut(don_ind$Age, breaks = c(20, 30, 40, 50), include.lowest = TRUE, right = FALSE)

nirgrahamuk · February 19, 2023, 3:51pm

Hello,
I'm sure you shared this image with the best intentions, but perhaps you didnt realise what it implies.
If someone wished to use example data to test code against, they would type it out from your screenshot...

This is very unlikely to happen, and so it reduces the likelihood you will receive the help you desire.
Therefore please see this guide on how to reprex data. Key to this is use of either datapasta, or dput() to share your data as code

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

jrkrideau · February 19, 2023, 4:12pm

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here.

makisa · February 19, 2023, 6:00pm

This is not probably what you mean. But thank you though:

dput(head(don_ind, 100))
structure(list(ID = c("88KA2U6AOE6K", "HQ6JA5AZ4T3B", "IA8LCFMN5LP2",
"8UEO3PCE07XX", "UUH8A81JLYQU", "SAD8N6CHLVMY", "WTWPLHFJLLXS",
"L5RG1JO68E82", "YUCNWXK1AZQL", "VD4PN7XC7B0D", "FACED2LOU7A4",
"QVUXGAC5YSEP", "Z2ER73W064CM", "TWVO6F0NIEMM", "9KPD4G0BHISJ",
"0PALZVHZCWOI", "CEW6N5J2T5IE", "UKRPBR1ZHE2B", "NDKKFXDOE07S",
"JZUUIN8W29FV", "I55NX5DGPBXY", "61LTIITHE509", "VJCHJPP7O2G0",
"R5DQXWQMZH7O", "H22J09R44IJZ", "0OGELQFVC321", "4FVPW2VP7YT2",
"X86A81CUZQQ3", "238P1IXZFMDJ", "JV62N6PZQ2GL", "DZFR2DN8HIIB",
"3XR61KX8D8ID", "GX9K9I2DMRTU", "0ZT5WA0URUGY", "DJXJOSNIM5W2",
"WSVV8HR7FIX3", "U0G292A03D4N", "WF77UDDYI94W", "SUS7J52SJMMT",
"9YB29VZGHD98", "5307S9J8O21Y", "FQO6V073PTFH", "ZBV07SCATBXN",
"5AY4366QQXXS", "8GL62YDUMWNZ", "96VDVX8B9GUF", "6CT9WZ43D1H2",
"N71K6J257XPL", "64O9516N9RSB", "HADDTUOCDOOM", "QJL12653JPT9",
"XOEW9LLDUXRP", "SVY7PY2J6I3A", "IAAGWIUBE9HJ", "VCFEHS5768O8",
"XVMIC9GWBP9N", "F876XR5LC5RK", "4JMH44XEY6IS", "2OELC8DT7BQF",
"OCA13TFRMGDF", "V0NNH8J6KSAD", "Y9PG3WV96QA8", "D36TM3V4QDC7",
"VU2HYFPF3O0A", "XPZTPC4TS4YW", "EZYPCIREYKUY", "3Q8K3IDSX390",
"A39RE6FXKC9Y", "T67L2G50TIRV", "IEETVGXAEP2A", "FN1NZQ6S9IKN",
"IB933IKA9CKN", "8Y6O60K25AC1", "3PHC26VY15PZ", "PXG590IY0AB3",
"51YZJMVBLQPR", "44ODR7ARTQXG", "9ZOHOCX2RD9K", "FSDU1YA1UCED",
"IHBSJVF998H3", "5KLASQ52A7M2", "KP7JSMK0A1UO", "IQFM972HLC4D",
"2OF2Z2T4J79I", "KT7UNYJFF85B", "75KJ2Y2UWYKC", "SZE9ZIBLXJ50",
"MRDVB9HYGZQI", "7ZN52FQUUJ4O", "OOH60LO4TAAR", "DCLWUMTK0BKZ",
"9F7O3X0EUJN5", "VA8GT3N1C76D", "9LZCQ89U7EIC", "MDHHCZFY6JMO",
"ONXJ8AKB9DE9", "Z5X8ZB3AL5G7", "8V40B8WTZVHD", "CYCMS470905X",
"I9LQUHT18EHR"), Sexe = structure(c(2L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L), .Label = c("Femmes",
"Hommes"), class = "factor"), Age = c(26, 40, 50, 43, 48, 47,
45, 42, 47, 23, 45, 43, 24, 61, 38, 51, 46, 39, 42, 51, 47, 52,
43, 56, 49, 51, 49, 51, 42, 51, 49, 28, 37, 50, 31, 55, 45, 43,
52, 33, 33, 47, 47, 50, 26, 39, 56, 48, 56, 42, 41, 37, 51, 44,
41, 47, 44, 60, 43, 38, 56, 44, 55, 41, 34, 54, 45, 44, 47, 44,
43, 35, 42, 49, 35, 47, 51, 38, 43, 52, 61, 51, 36, 47, 51, 36,
51, 46, 33, 57, 41, 42, 37, 41, 50, 51, 50, 41, 46, 38), Region = structure(c(3L,
7L, 1L, 5L, 5L, 9L, 6L, 6L, 2L, 7L, 10L, 1L, 8L, 7L, 7L, 8L,
3L, 2L, 10L, 4L, 9L, 3L, 8L, 8L, 8L, 6L, 2L, 9L, 10L, 8L, 1L,
1L, 5L, 7L, 6L, 2L, 2L, 8L, 3L, 9L, 6L, 2L, 6L, 4L, 6L, 7L, 7L,
8L, 6L, 6L, 3L, 8L, 1L, 5L, 4L, 3L, 2L, 5L, 7L, 10L, 8L, 6L,
8L, 4L, 1L, 6L, 9L, 9L, 9L, 7L, 2L, 5L, 1L, 4L, 7L, 3L, 9L, 6L,
9L, 8L, 7L, 1L, 5L, 1L, 7L, 4L, 8L, 7L, 8L, 6L, 6L, 8L, 4L, 5L,
8L, 10L, 6L, 5L, 9L, 6L), .Label = c("Alberta", "Colombie-Britannique",
"Ile-du-Prince-Édouard", "Manitoba", "Nouveau-Brunswick", "Nouvelle-Ecosse",
"Ontario", "Quebec", "Saskatchewan", "Terre-Neuve et Labrador"
), class = "factor")), row.names = c(NA, 100L), class = "data.frame")

jrkrideau · February 19, 2023, 9:12pm

That is exactly what we wanted. Thanks.

This gives us an exact copy of your data set so before I or anyone else tries to answer your question we can do something like

class(dat1)
[1] "data.frame"

which assures us we are dealing with a data.frame and not a matrix or a table and so on.

Then we can do this:

str(dat1)
'data.frame':	100 obs. of  4 variables:
 $ ID    : chr  "88KA2U6AOE6K" "HQ6JA5AZ4T3B" "IA8LCFMN5LP2" "8UEO3PCE07XX" ...
 $ Sexe  : Factor w/ 2 levels "Femmes","Hommes": 2 1 1 1 1 2 1 1 1 1 ...
 $ Age   : num  26 40 50 43 48 47 45 42 47 23 ...
 $ Region: Factor w/ 10 levels "Alberta","Colombie-Britannique",..: 3 7 1 5 5 9 6 6 2 7 ...

which gives us the structure of the data.frame and assures us Age is numeric and we can use "cut()" on it.

BTW, none of the territories?

makisa · February 19, 2023, 9:38pm

yes, cut. i tried all of the commands below:

cut(don_ind$Age, breaks = c(20, 30, 40, 50), include.lowest = TRUE, right = TRUE)
cut(don_ind$Age, breaks = c(20, 30, 40, 50), include.lowest = TRUE, right = FALSE)
cut(don_ind$Age, breaks = c(20, 30, 40, 50), include.lowest = FALSE, right = FALSE)
cut(don_ind$Age, breaks = c(20, 30, 40, 50), include.lowest = FALSE, right = TRUE)
cut(don_ind$Age, breaks = c(20, 30, 40, 50), right = FALSE)

but i couldn't get these levels:
[20,30] [30,40) [40,50]

(Upper bound of the "middle" should be excluded.)

jrkrideau · February 20, 2023, 12:00am

Do you have to use cut?

I don't understand this? I am not sure if I have the divisions exactly as you want but if you do not have to use cut I think this gives you the same basic result.

## Assuming your data.frame is named `dat1`
library(data.table)
dat2 <-  as.data.table(dat1)  ## convert dat1 from a data.frame to a data.table

dat2[, Agegroup := fcase(
                         Age %between% c(0,  20), "A",
                         Age %between% c(21, 30), "B",
                         Age %between% c(31, 40), "C",
                         Age %between% c(41, 50), "D",
                                 default = "E")]
dat2[, Agegroup := as.factor(Agegroup)]

dat2

setDF(dat2)  ## convert dat1 from a data.table to a data.frame
             ## probably not needed but just in case.

system · March 13, 2023, 12:00am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.