recoding many string variables into a set of numbers

Hi... I'm new to R. I've got a data frame with 170,000 observations and 69 variables. Some of these variables are strings that I would like to recode to numbers. But I don't want to be typing each string and a corresponding number. I'd rather just have R assign ascending numerical values to strings in alphabetical order for each variable.

Just as an example, if each observation is nested within 1 of the 50 US states and each state is in the data frame as a string. I'd like to create a new variable (e.g., 'state_numeric') such that Alabama = 1, Alaska = 2, Arizona = 3 etc., without having to type out each state name and each corresponding numeric value.

Any suggestions?


Welcome to the community!

You can use the factor and as.integer in the following way:

set.seed(seed = 32767)

u <- sample(x =,
            size = 100,
            replace = TRUE)

v <- as.integer(x = factor(x = u))

head(x = data.frame(u, v),
     n = 10)
#>           u  v
#> 1  February  4
#> 2      July  6
#> 3       May  9
#> 4     April  1
#> 5   January  5
#> 6  December  3
#> 7   January  5
#> 8      June  7
#> 9  November 10
#> 10 February  4

Created on 2019-06-10 by the reprex package (v0.3.0)

Hope this helps.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.