How to create a new column in which I can manually will in the date

I would like to create a new column for an existing data set were all value will be either 0 or 1. As a small number of the values are 1 I will manually input these in accordance with row numbers. How can I do this?

Am I right in using the following code for the first step.

cbind(new_column_name = 0, df)

It would be better if you provide a reproducible example, but you can do something like this

library(dplyr)
new_dataset <-  cars %>% 
    mutate(new_column = ifelse(dist > 90, 1, 0))
head(new_dataset)
#>   speed dist new_column
#> 1     4    2          0
#> 2     4   10          0
#> 3     7    4          0
#> 4     7   22          0
#> 5     8   16          0
#> 6     9   10          0

Created on 2018-12-19 by the reprex package (v0.2.1)

4 Likes

You could first make a new variable on your data.frame:

df['newVar'] <- 0

And after that, will depend on how the row info is stored. If you have them stored in a vector named 'ones' (for example), you can:

df[ones, 'newVar'] <- 1

But of course, it may need to be adapted to your original data formatting, as you could use an ifelse as suggested above :slight_smile:

cheers
Fer

The value of 1 relates to the individual in that row possessing a certain quality e.g. black_hair.
I would like to have the entire column as 0 then input 1 into a cell in that column for only four individuals
could you give me an example of how i could, for example, for the individual whom is defined by column "X1" as 4, i.e the fourth individual take the value of 0 in the new column "black_hair"?

Thanks again

It would be easier if you could present us with a reproducible example. Just 5 or 6 observations would be cool.

Your are asking for a specific answer to an unspecific question, the best we can do is to guess what you are trying to do.

If I guessed right you want to do one-hot encoding in your dataset, if you want to follow a manual approach you can do something like this:

library(dplyr)

example_data <- tibble(X1 = c(1,2,3,4,5),
                       hair_color = factor(c('blonde', 'blonde', 'red', 'black', 'brown')),
                       eyes_color = factor(c('green', 'green', 'blue', 'brown', 'brown')))

new_data <- example_data %>% 
    mutate(black_hair = ifelse(hair_color == 'black', 1, 0))
new_data
#> # A tibble: 5 x 4
#>      X1 hair_color eyes_color black_hair
#>   <dbl> <fct>      <fct>           <dbl>
#> 1     1 blonde     green               0
#> 2     2 blonde     green               0
#> 3     3 red        blue                0
#> 4     4 black      brown               1
#> 5     5 brown      brown               0

But, you can also use a library and encode all your variables at once, with caret for example you can do something like this:

library(dplyr)
library(caret)
example_data <- tibble(X1 = c(1,2,3,4,5),
                       hair_color = factor(c('blonde', 'blonde', 'red', 'black', 'brown')),
                       eyes_color = factor(c('green', 'green', 'blue', 'brown', 'brown')))

dmy <- dummyVars(" ~ .", data = example_data)
trsf <- data.frame(predict(dmy, newdata = example_data))
trsf
#>   X1 hair_color.black hair_color.blonde hair_color.brown hair_color.red
#> 1  1                0                 1                0              0
#> 2  2                0                 1                0              0
#> 3  3                0                 0                0              1
#> 4  4                1                 0                0              0
#> 5  5                0                 0                1              0
#>   eyes_color.blue eyes_color.brown eyes_color.green
#> 1               0                0                1
#> 2               0                0                1
#> 3               1                0                0
#> 4               0                1                0
#> 5               0                1                0
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.