recoding many string variables into a set of numbers

Hi... I'm new to R. I've got a data frame with 170,000 observations and 69 variables. Some of these variables are strings that I would like to recode to numbers. But I don't want to be typing each string and a corresponding number. I'd rather just have R assign ascending numerical values to strings in alphabetical order for each variable.

Just as an example, if each observation is nested within 1 of the 50 US states and each state is in the data frame as a string. I'd like to create a new variable (e.g., 'state_numeric') such that Alabama = 1, Alaska = 2, Arizona = 3 etc., without having to type out each state name and each corresponding numeric value.

Any suggestions?

Thanks

Welcome to the community!

You can use the factor and as.integer in the following way:

set.seed(seed = 32767)

u <- sample(x = month.name,
            size = 100,
            replace = TRUE)

v <- as.integer(x = factor(x = u))

head(x = data.frame(u, v),
     n = 10)
#>           u  v
#> 1  February  4
#> 2      July  6
#> 3       May  9
#> 4     April  1
#> 5   January  5
#> 6  December  3
#> 7   January  5
#> 8      June  7
#> 9  November 10
#> 10 February  4

Created on 2019-06-10 by the reprex package (v0.3.0)

Hope this helps.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.