Issue in convert into "character variables to numeric variables"

dplyr

#1

I want to convert in r, character variables to numeric variables.

e.g.

BALOCHISTAN (BAL) =1
Islamabad ICT = 2

KHYBER PAKHTUNKHWA(KPK) = 3

PUNJAB (PUJ) = 4 

SINDH (SNI) = 5

#2

Like so?

library("tidyverse")
ids = c("BALOCHISTAN (BAL)", "Islamabad ICT", "KHYBER PAKHTUNKHWA(KPK)", "PUNJAB (PUJ)", "SINDH (SNI)")
classes = ids %>% factor(levels = ids) %>% as.numeric
names(classes) = ids

Yielding

> print(classes)
      BALOCHISTAN (BAL)           Islamabad ICT KHYBER PAKHTUNKHWA(KPK)            PUNJAB (PUJ)             SINDH (SNI) 
                      1                       2                       3                       4                       5 

#3

Thanks..
but i want to show charter variable as numeric variable without classes, and there is no alternative option used instead of library("tidyverse").


#4

@Leon’s code is only using the tidyverse package for the pipe ( %>% ) — but the pipe does make the code easier to read and follow. You can do the same thing with only base R packages:

ids = c("BALOCHISTAN (BAL)", "Islamabad ICT", "KHYBER PAKHTUNKHWA(KPK)", "PUNJAB (PUJ)", "SINDH (SNI)")
classes = as.numeric(factor(ids, levels = ids))
names(classes) = ids

print(classes)

#>      BALOCHISTAN (BAL)           Islamabad ICT KHYBER PAKHTUNKHWA(KPK) 
                      1                       2                       3 
#>           PUNJAB (PUJ)             SINDH (SNI) 
                      4                       5 

A named numeric vector is still a numeric vector, it just has metadata attributes that may be useful later (in plotting, for instance). If you don’t care about preserving the names, though, you can skip that step:

ids = c("BALOCHISTAN (BAL)", "Islamabad ICT", "KHYBER PAKHTUNKHWA(KPK)", "PUNJAB (PUJ)", "SINDH (SNI)")
classes = as.numeric(factor(ids, levels = ids))

print(classes)
#> [1] 1 2 3 4 5

All that being said, if you are trying to do this because you want to find a way to use the categorical variable (the place names) in an analysis, it is very likely that you should only be converting to factor, not all the way to numeric. The whole point of factors is that they create a numeric encoding for categorical variables that allows them to be treated properly by other R statistical modeling functions.


#5

:+1: What @jcblum said :slightly_smiling_face:


#6

Perhaps this is what you are thinking of? (Still agreeing with @jcblum comment on factors though)

set.seed(950515)
ids = c("BALOCHISTAN (BAL)", "Islamabad ICT", "KHYBER PAKHTUNKHWA(KPK)", "PUNJAB (PUJ)", "SINDH (SNI)")
test_dat = sample(ids, 20, replace = TRUE)
data.frame(City = test_dat, Class = as.numeric(factor(test_dat, levels = ids)))
                      City Class
1              SINDH (SNI)     5
2        BALOCHISTAN (BAL)     1
3        BALOCHISTAN (BAL)     1
4             PUNJAB (PUJ)     4
5              SINDH (SNI)     5
6        BALOCHISTAN (BAL)     1
7            Islamabad ICT     2
8  KHYBER PAKHTUNKHWA(KPK)     3
9              SINDH (SNI)     5
10             SINDH (SNI)     5
11       BALOCHISTAN (BAL)     1
12       BALOCHISTAN (BAL)     1
13 KHYBER PAKHTUNKHWA(KPK)     3
14            PUNJAB (PUJ)     4
15             SINDH (SNI)     5
16 KHYBER PAKHTUNKHWA(KPK)     3
17            PUNJAB (PUJ)     4
18            PUNJAB (PUJ)     4
19            PUNJAB (PUJ)     4
20       BALOCHISTAN (BAL)     1