Create new variable, change character to integer

Hi,

I have a variable called IMD_group with 5 different values (all character, in my tibble here they are factor as I can't work out how to make character in csv and import with datapasta, sorry). The entries are currently messy (i.e. A1 (LQ) should be 1 or one, upwards to 5). Code below doesn't seem to be working. Any ideas?

Thanks

tibble::tribble(
~ID, ~IMD_decile,
1L, "A1 (LQ)",
2L, "A2",
3L, "A3",
4L, "A1 (LQ)",
5L, "A5 (HD)",
6L, "A4",
7L, "A1",
8L, "A4",
9L, "A5 (HD)",
10L, "A2"
)

IMD_one <- c("A1 (LQ)")
IMD_two <- c("A2")
IMD_three <- c("A3")
IMD_four <- c("A4")
IMD_five <- c("A5 (HD)")

tribble1 <- tribble %>%
mutate(IMD_group = case_when(
IMD_quintile %in% IMD_one ~ "2",
IMD_quintile %in% IMD_two ~ "3",
IMD_quintile %in% IMD_three ~ "4",
IMD_quintile %in% IMD_four ~ "5"
))

Hi there, it seems you have your variables mixed up - you want to define IMD_quintile using the values in IMD_group. Your code seems to want to do the opposite. Also, you need to define the tribble prior to working on it. Lastly, since you want an integer, I removed the quotes in the recoded values. This creates a dbl vector, so if you need integers you would force that conversion too. How does this look?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)

my_tribble = tibble::tribble(
  ~ID, ~IMD_group,
  1L, "A1 (LQ)",
  2L, "A2",
  3L, "A3",
  4L, "A1 (LQ)",
  5L, "A5 (HD)",
  6L, "A4",
  7L, "A1",
  8L, "A4",
  9L, "A5 (HD)",
  10L, "A2"
)

IMD_one <- c("A1 (LQ)")
IMD_two <- c("A2")
IMD_three <- c("A3")
IMD_four <- c("A4")
IMD_five <- c("A5 (HD)")

tribble1 <- my_tribble %>%
  mutate(
    IMD_quintile = case_when(
      IMD_group %in% IMD_one ~ 2,
      IMD_group %in% IMD_two ~ 3,
      IMD_group %in% IMD_three ~ 4,
      IMD_group %in% IMD_four ~ 5
    ),
    IMD_quintile = as.integer(IMD_quintile)
  )

print(tribble1)
#> # A tibble: 10 × 3
#>       ID IMD_group IMD_quintile
#>    <int> <chr>            <int>
#>  1     1 A1 (LQ)              2
#>  2     2 A2                   3
#>  3     3 A3                   4
#>  4     4 A1 (LQ)              2
#>  5     5 A5 (HD)             NA
#>  6     6 A4                   5
#>  7     7 A1                  NA
#>  8     8 A4                   5
#>  9     9 A5 (HD)             NA
#> 10    10 A2                   3

Created on 2021-11-29 by the reprex package (v2.0.1)

Hi,
Thanks very much. It works on my repress but not in the real data frame because I think the A1 (LQ) etc is in character format. It's a secure portal so I can't reprex it. Do you know I can either make a cxv reprex with character variable to add to the thread, or how this code would need to change to make it work for character variable? The error message I get is NAs introduced by coercion - do you know what that means?
Thanks again

I've worked it out in the dummy data with character, will try and apply to real df now. Thanks very much again

You say it is not working. Do you see that in your end result?

You don't have a rule to translate A5 (HD) : i.e. you do define IMD_five but you don't use that.
So it is not strange that A5 (HD) is matched with NA and that could cause the error message concerning coercion.

Thanks so much to you both. Problem solved!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.