Find & replace values and splitting columns

I'm almost new to RStudio. Can someone help?

I have a Data Frame in named "Data2", where i would like to use few find & replace, also split a column into 2 columns.

I'm mentioning current structure and expected new structure.

Current Structure

ID Group
1 XXXXXX_MaleFR_YY
2 XXXXXX_FemaleFR_YY
3 XXXXXX_FemaleFR_YY
4 XXXXXX_UnknownNL_YY
... ...
500 XXXXXX_MaleNL_YY

Expected Structure

ID Gender Language
1 Male FR
2 Female FR
3 Female FR
4 Unknown NL
... ...
500 Male NL

Thoughts on my mind:

  • Find & Replace "XXXXXX_" to ""
  • Find & Replace "_YY" to ""
  • Find & Replace "Male" to "Male,"
  • Find & Replace "Female" to "Female,"
  • Find & Replace "Unknown" to "NA,"
  • Then the split the Group column into 2 (Gender, Language) and delimit the values using ",".

Hope someone can help.

This might be one way to go:

library(tidyverse)
data <- tibble(
	ID = c(1, 2),
	Group = c("XXX_MaleAB_YY", "XXX_FemaleCD_YY")
)

data2 <- data %>% separate(Group, c("remove_this1", "keep_this_and_split", "remove_this2"), sep = "_") %>% 
	select(-contains('remove')) %>% 
	mutate( gender = substr(keep_this_and_split, 1, nchar(keep_this_and_split) - 2),
					country = substr(keep_this_and_split, nchar(keep_this_and_split)-1, nchar(keep_this_and_split)) ) %>% 
	select(-keep_this_and_split)

This would be another way to do it

library(tidyverse)

df <- data.frame(stringsAsFactors=FALSE,
                 ID = c(1, 2, 3, 4, 500),
                 Group = c("XXXXXX_MaleFR_YY", "XXXXXX_FemaleFR_YY",
                           "XXXXXX_FemaleFR_YY", "XXXXXX_UnknownNL_YY",
                           "XXXXXX_MaleNL_YY")
)

df %>% 
    transmute(ID = ID,
              Gender = str_extract(Group, "(?<=_).+(?=[:upper:]{2}_)"),
              Language = str_extract(Group, "[:upper:]{2}(?=_[:upper:]{2})"))
#>    ID  Gender Language
#> 1   1    Male       FR
#> 2   2  Female       FR
#> 3   3  Female       FR
#> 4   4 Unknown       NL
#> 5 500    Male       NL

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.