I'm cleaning data that is originally a character.
head(bil$location, 20)
[1] "Right parietal lobe tumor" "Right frontal lobe tumor" "Rt. Frontal Astrocytoma" "Right Parietal Tumor"
[5] "Right Frontal Parietal Tumor" "Right Parietal Tumor" "Left Frontal Mass" "Left frontal tumor"
[9] "Right frontal tumor" "Left parietal lesion" "Right Frontal Lobe Tumor" "Left Frontal Lobe Astrocytoma"
[13] "Left Temporal Lobe Tumor" "Left Frontal Lobe Tumor" "Left Frontal Low Grade Glioma" "Right sided Tumor"
[17] "Left sided glioma" "Left Frontal Lesion" "Left Frontal Lesion" "Left Frontal Lesion"
I want to create another variable as a factor with 11 levels;
1-Frontal
2-Parietal
3-Fronto-parietal
4-Temporal
5-Fronto-temporal
6-Parietal
7-Parieto-occipital
8-Temporo-occipital
9-Insula
10-Temporo-insula
11-multiple
The new variable should grab the information from the original variable. For example, if the observation is "Right Frontal Tumor", it should be in the level 1 "Frontal". If the observation is "Right Fronto-parietal astrocytoma", it should be in the level 3 "Fronto-parietal". If the observation is "Recurrent Right Frontal PNET", it should be in the level 1 "Frontal".
If the observation in the bil$location cannot be defined at any level, it should be defined as NA. For example, if the information in bil$location is "Left Hemisphere Astrocytoma", it should return "NA".
Can anyone suggest to me the approach to do this or the appropriate package to tackle this problem?