Need help with the categorical variable

I'm cleaning data that is originally a character.

head(bil$location, 20)
[1] "Right parietal lobe tumor" "Right frontal lobe tumor" "Rt. Frontal Astrocytoma" "Right Parietal Tumor"
[5] "Right Frontal Parietal Tumor" "Right Parietal Tumor" "Left Frontal Mass" "Left frontal tumor"
[9] "Right frontal tumor" "Left parietal lesion" "Right Frontal Lobe Tumor" "Left Frontal Lobe Astrocytoma"
[13] "Left Temporal Lobe Tumor" "Left Frontal Lobe Tumor" "Left Frontal Low Grade Glioma" "Right sided Tumor"
[17] "Left sided glioma" "Left Frontal Lesion" "Left Frontal Lesion" "Left Frontal Lesion"

I want to create another variable as a factor with 11 levels;
1-Frontal
2-Parietal
3-Fronto-parietal
4-Temporal
5-Fronto-temporal
6-Parietal
7-Parieto-occipital
8-Temporo-occipital
9-Insula
10-Temporo-insula
11-multiple

The new variable should grab the information from the original variable. For example, if the observation is "Right Frontal Tumor", it should be in the level 1 "Frontal". If the observation is "Right Fronto-parietal astrocytoma", it should be in the level 3 "Fronto-parietal". If the observation is "Recurrent Right Frontal PNET", it should be in the level 1 "Frontal".

If the observation in the bil$location cannot be defined at any level, it should be defined as NA. For example, if the information in bil$location is "Left Hemisphere Astrocytoma", it should return "NA".

Can anyone suggest to me the approach to do this or the appropriate package to tackle this problem?

The general approach will involve regular expressions for pattern matching in text, a good package in that area is stringr.

1 Like

I would make use of stringr::str_detect and dplyr::case_when. Maybe something like:

bil_coded <- bil %>%
    new_variable = factor(case_when(
        stringr::str_detect(location, "frontal") ~ "1-Frontal", # create as many criteria as needed to d
        stringr::str_detect(location, "parietal") ~ "2-Parietal",
        TRUE ~ NA_character_ # this should create an NA observation for everything else that doesn't fit your criteria
    ))

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.