how to generate the dummy variable for this variable?

chiaoyi · March 12, 2020, 5:28am

I am a novice of R. This variable, test.ind, has 2 levels. It's an indicator for whether the customer participated in the program; "R"=participated, "C"=nonparticipant (control).

The question is: Convert the variable test.ind to a 0/1 indicator variable (an integer), and name it treat, where customers enrolled in the program are assigned a value of 1, and customers not enrolled in the program are assigned a value of 0.

How should I do that?

> dput(head(newdata, 5)[c("test.ind")])
structure(list(test.ind = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("C", 
"R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")

ismirsehregal · March 12, 2020, 8:11am

I changed your dataset a little, as it was only containing "R".
Are you sure you want integer values? I'd go with boolean (see participatedBool)

participated <- structure(list(test.ind = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("C", "R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")

participatedInt <- as.integer(participated$test.ind)-1
print(participatedInt)

participatedBool <- as.logical(participatedInt)
print(participatedBool)

chiaoyi · March 12, 2020, 12:10pm

What does your first line mean? My data set has tons of observations. I just give 5 obs as a data example.

nirgrahamuk · March 12, 2020, 1:04pm

chiaoyi
Please see the Forum's Homework Policy.
FAQ: Homework Policy

The first line assigns your example list, to a named object with the name participated
You could assign your own list to the same name

chiaoyi · March 13, 2020, 2:41pm

But I already have a variable named test.ind. Why do I still need to generate a new variable called participated?

nirgrahamuk · March 13, 2020, 2:43pm

use whatever name you want, but you need to apply transformations, like subtracting by one and casting to logical type

chiaoyi · March 13, 2020, 2:48pm

I see. But how to add this new variable to my data frame? I tried this, but it fails.

newdata <- newdata %>% participated <- as.integer(newdata$test.ind)-1

nirgrahamuk · March 13, 2020, 2:59pm

you should avoid assignment <- more than once on a 'line' of code... its confusing at the least
Here is a full example.

new_data <- structure(list(test.ind = structure(c(1L, 2L, 1L, 2L, 2L), .Label = c("C", 
                                                                      "R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")
new_data
# > new_data
# test.ind
# 1        C
# 2        R
# 3        C
# 4        R
# 5        R
new_data$test.ind <- as.integer(new_data$test.ind) -1 

new_data
# > new_data
# test.ind
# 1        0
# 2        1
# 3        0
# 4        1
# 5        1

chiaoyi · March 13, 2020, 4:29pm

I see. But how do you ensure that R denotes 1 and L denotes 0 and NA keeps NA?

nirgrahamuk · March 13, 2020, 4:31pm

you gave us that C is 1 and R is 2, we merely subtracted 1 to make C 0 R 1
if C was 2 and R was 1 then to make C 0 R 1, you would
multiply by -1 for C -2 R -1
then add 2 so C 0 R 1

chiaoyi · March 13, 2020, 4:36pm

How can I know C and R's values? I rerun my original data set, and I use:
unique(newdata$test.ind)

It says:
[1] R C
Levels: C R

It doesn't mention numeric values here. And what I want is create a new variable, thus R=1, C=0, NA=NA

nirgrahamuk · March 13, 2020, 4:48pm

ok, so your dput() revealed to us the internal representation of the factor.
also can use as.integer() to see the factors internal integer representation.
Or can avoice considering that and use a logic test to directly assign the value you want conditionally.
Perhaps you prefer that approach. In this case I can say if the factor text is 'C' give me 0, otherwise give me 1. (otherwise includes 'R')

new_data <- structure(list(test.ind = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("C", 
                                                                      "R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")
new_data$test.ind
# > new_data$test.ind
# [1] C R R R R
# Levels: C R
as.integer(new_data$test.ind)
# > as.integer(new_data$test.ind)
# [1] 1 2 2 2 2
new_data$test.ind <- ifelse(new_data$test.ind=="C",0,1)
new_data$test.ind 
# > new_data$test.ind 
# [1] 0 1 1 1 1

chiaoyi · March 13, 2020, 5:05pm

It's weird. I think it's a simple question, but it seems I screw it up. I feel very sorry.

If I use

unique(newdata$test.ind)

Then This gives me R and C.

If I use

as.integer(newdata$test.ind)

This gives me a full list of 2 and 1s.
And what I want is create a new variable, thus R=1, C=0, NA=NA

nirgrahamuk · March 13, 2020, 5:07pm

yes, if you did as.integer(unique( newdata$test.ind)) you would get the unique integer values.
But this unique stuff isnt much use aside from understanding your data. its not required to change your data in a mechanical sense.
to change "C" into 0, and "R" into 1
do

new_data$test.ind <- ifelse(new_data$test.ind=="C",0,1)

chiaoyi · March 13, 2020, 5:53pm

Thanks. I checked the help ifelse file.
"ifelse(test, yes, no) means yes return values for true elements of test. no return values for false elements of test."

So can you check whether my understanding is correct or wrong? In your code, the test is the value = C. And if it's correct, we assign this variable 0. Otherwise, we assign 1.

But in this case, NA will also be assigned to 1?

nirgrahamuk · March 13, 2020, 5:56pm

good question, NA's are often a special case in base R, in the case of ifelse they go to NA.
see this for an example:

ifelse(c("C","R","X",NA) =="C",0,1)
#  0  1  1 NA

C goes to zero as required, both R and X since they arent C go to 1 , but the NA goes to NA

system · March 20, 2020, 5:56pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.