Create dummy variable

Hi everybody,
I would like to ask for help. I have data like that.

df <- data.frame(stringsAsFactors = FALSE,
ticker = as.factor(c("AAA", "AAA", "AAA", "AAM", "AAM", "AAM", "AAM",
"AAM", "AAM", "AAM")),
code = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
year = c(2016L, 2017L, 2018L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L,
2015L),
Rsquared = c(0.352056874, 0.153949414, 0.294185982, 0.471601759,
0.492063894, 0.072787034,
0.088809017, 0.027639196, 0.062938271,
0.01485792),
                               SPI = as.factor(c("0.849306191", "1.614827022", "1.023443021",
                                                 "0.54360083",
                                                 "0.495553509", "2.273584969",
                                                 "2.099646998", "3.116617298",
                                                 "2.400446634", "3.655988108")),
                             State = as.factor(c("0", "0", "0", "0", "0", "0", "0", "0", "0", "0")),
                           Foreign = as.factor(c("0.1749", "0.1193", "0.0988", "0.003550307",
                                                 "0.014386416", "0.015",
                                                 "0.0265", "0.0255", "0.0097",
                                                 "0.0049")),
                          Domestic = as.factor(c("0.8251", "0.8807", "0.9012", "0.9964", "0.9856",
                                                 "0.985", "0.9735",
                                                 "0.9745", "0.9903", "0.9951"))
                      )

(1) I would like to generate a new variable "code" for each stock ticker, for example, ticker AAA will get code of 001, AAM gets code of 002, ABC gets 003,.....
(2) I would like to generate a new dummy variable "State10", if the value in "State" is greater than 10%, it will return 1, the others is 0.
I would be grateful if anyone can help.

I'm going to pass on 1) because of the left padding, since I don't know how many tickers you're dealing with, but if you have a way of generating a vector of equal length as the ticker column you can just dplyr:bind_col

For 2), use dplyr:mutate

rev_df <- df %>% mutate(STATE10 = ifelse(State > 10%, 1,0))

However, State is set up as a character factor, so the test will always fail because you can't compute 10% of a character.

I will take the plunge on the code generation. If you want a three digit code padded with zero:

df <- df %>% mutate(code = formatC(as.numeric(ticker), width = 3, flag = "0"))
1 Like

Thank you so much.
For 2). I convert the variable "State", "Foreign","Domestic"... to numeric, use this function
df$State=as.numeric(levels(df$State))[df$State]
But there is a warning
" Warning message:
NAs introduced by coercion"
Is there any problem with this warning?

For 1):
df$ticker is a factor, so just convert it to an integer and you got your code. If you want it to be a string (with zero-padding) just put it through sprintf:

df$code = sprintf("%03d", as.integer(df$ticker))

Of course, if you want to control what tickers get what code you will need to setup a manual mapping. This can be done in a new data.frame that then gets merged on df:

tickermap = data.frame(ticker = as.factor(c("AAA","AAM","ABC")), code = c("003","001","002"))
df = merge(df, tickermap, all.x=T)

My code is not tested, so there may be typos and other mistakes :slight_smile:

Cheers
Steen

2 Likes

The msg indicates that some values could not be made numeric. Having NAs means that you will have to exclude them from calculations, an extra step, but aside from that it depends on the magnitude of the data loss.

I convert "State" to numeric and then run your code. But it still doesn't work

rev_df <- df %>% mutate(STATE10 = ifelse(State > 10%, 1,0))
Error: unexpected input in "rev_df <- df %>% mutate(STATE10 = ifelse(State > 10%, 1,0))"

10% isn't numeric, try 0.1

You have to do something like this

library(tidyverse)
df %>% 
    mutate(state = parse_number(as.character(state)),
           state10 = if_else(state > 0.1, 1, 0))

It seems like you are struggling with some very basic data wrangling, maybe it would be better for you to work in your R basics first, take a look to this free resources that would get you up and running with R

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.