how to generate the dummy variable for this variable?

I am a novice of R. This variable, test.ind, has 2 levels. It's an indicator for whether the customer participated in the program; "R"=participated, "C"=nonparticipant (control).

The question is: Convert the variable test.ind to a 0/1 indicator variable (an integer), and name it treat, where customers enrolled in the program are assigned a value of 1, and customers not enrolled in the program are assigned a value of 0.

How should I do that?

> dput(head(newdata, 5)[c("test.ind")])
structure(list(test.ind = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("C", 
"R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")

I changed your dataset a little, as it was only containing "R".
Are you sure you want integer values? I'd go with boolean (see participatedBool)

participated <- structure(list(test.ind = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("C", "R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")

participatedInt <- as.integer(participated$test.ind)-1
print(participatedInt)

participatedBool <- as.logical(participatedInt)
print(participatedBool)

What does your first line mean? My data set has tons of observations. I just give 5 obs as a data example.

chiaoyi
Please see the Forum's Homework Policy.
FAQ: Homework Policy

The first line assigns your example list, to a named object with the name participated
You could assign your own list to the same name

But I already have a variable named test.ind. Why do I still need to generate a new variable called participated?

use whatever name you want, but you need to apply transformations, like subtracting by one and casting to logical type

I see. But how to add this new variable to my data frame? I tried this, but it fails.

newdata <- newdata %>% participated <- as.integer(newdata$test.ind)-1

you should avoid assignment <- more than once on a 'line' of code... its confusing at the least :slight_smile:
Here is a full example.

new_data <- structure(list(test.ind = structure(c(1L, 2L, 1L, 2L, 2L), .Label = c("C", 
                                                                      "R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")
new_data
# > new_data
# test.ind
# 1        C
# 2        R
# 3        C
# 4        R
# 5        R
new_data$test.ind <- as.integer(new_data$test.ind) -1 

new_data
# > new_data
# test.ind
# 1        0
# 2        1
# 3        0
# 4        1
# 5        1

I see. But how do you ensure that R denotes 1 and L denotes 0 and NA keeps NA?

you gave us that C is 1 and R is 2, we merely subtracted 1 to make C 0 R 1
if C was 2 and R was 1 then to make C 0 R 1, you would
multiply by -1 for C -2 R -1
then add 2 so C 0 R 1

How can I know C and R's values? I rerun my original data set, and I use:
unique(newdata$test.ind)

It says:
[1] R C
Levels: C R

It doesn't mention numeric values here. And what I want is create a new variable, thus R=1, C=0, NA=NA

ok, so your dput() revealed to us the internal representation of the factor.
also can use as.integer() to see the factors internal integer representation.
Or can avoice considering that and use a logic test to directly assign the value you want conditionally.
Perhaps you prefer that approach. In this case I can say if the factor text is 'C' give me 0, otherwise give me 1. (otherwise includes 'R')

new_data <- structure(list(test.ind = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("C", 
                                                                      "R"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")
new_data$test.ind
# > new_data$test.ind
# [1] C R R R R
# Levels: C R
as.integer(new_data$test.ind)
# > as.integer(new_data$test.ind)
# [1] 1 2 2 2 2
new_data$test.ind <- ifelse(new_data$test.ind=="C",0,1)
new_data$test.ind 
# > new_data$test.ind 
# [1] 0 1 1 1 1

It's weird. I think it's a simple question, but it seems I screw it up. I feel very sorry.

If I use

unique(newdata$test.ind)

Then This gives me R and C.

If I use

as.integer(newdata$test.ind)

This gives me a full list of 2 and 1s.
And what I want is create a new variable, thus R=1, C=0, NA=NA

yes, if you did as.integer(unique( newdata$test.ind)) you would get the unique integer values.
But this unique stuff isnt much use aside from understanding your data. its not required to change your data in a mechanical sense.
to change "C" into 0, and "R" into 1
do

new_data$test.ind <- ifelse(new_data$test.ind=="C",0,1)
1 Like

Thanks. I checked the help ifelse file.
"ifelse(test, yes, no) means yes return values for true elements of test. no return values for false elements of test."

So can you check whether my understanding is correct or wrong? In your code, the test is the value = C. And if it's correct, we assign this variable 0. Otherwise, we assign 1.

But in this case, NA will also be assigned to 1?

good question, NA's are often a special case in base R, in the case of ifelse they go to NA.
see this for an example:

ifelse(c("C","R","X",NA) =="C",0,1)
#  0  1  1 NA

C goes to zero as required, both R and X since they arent C go to 1 , but the NA goes to NA

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.