Create Function

I have this dataset, i paste only few data

"73C23" "62R31" "62M26" "58C44" "53R02" NA "78R58" "76C63"

I'm trying to write a function that does the following operations on strings: the first two numerical values must be extracted and compared with the parameter 18, if > 18 must be added to 1900, if <= must be added to 2000 and make a separate Year column. the central letter must be compared with this legend: mesi_legend head(mesi_legend) January February March April May June "A" "B" "C" "D" "E" "H" And it is written on another column

While for the last final numbers must be extracted normally if male, subtracted 40 if female (as a discrimine I can use > 31, subtract 40 but I have margin of error i think.

i start from this:

"GFNNTN78R58G812M"

first step> "78R58"

Output_of_my_function(78R58) > Year 1978 Month October Day 18

Can someone help me?

Using some tidyverse tools,

library(dplyr)
library(tidyr)
library(lubridate)

df <- tibble(
 demo = c("73C23", "62R31", "62M26", "58C44", "53R02", NA, "78R58", "76C63") 
)
df <- df %>% 
  separate(
    col = demo, 
    into = c("birth_year", "birth_month_index", "other"),
    sep = c(2,3)
  ) %>% 
  mutate(
    birth_year = as.numeric(birth_year),
    birth_year = case_when(
      birth_year > 18 ~  (1900 + birth_year),
      birth_year <= 18 ~ (2000 + birth_year),
      TRUE ~ birth_year
    )
  )

df
#> # A tibble: 8 x 3
#>   birth_year birth_month_index other
#>        <dbl> <chr>             <chr>
#> 1       1973 C                 23   
#> 2       1962 R                 31   
#> 3       1962 M                 26   
#> 4       1958 C                 44   
#> 5       1953 R                 02   
#> 6         NA <NA>              <NA> 
#> 7       1978 R                 58   
#> 8       1976 C                 63

  • Note separate for splitting your single variable into it's parts
  • case_when and mutate for reworking the three new columns.

Here's a nice cheatsheet on these tasks:


With your birth_month, I personally like to merge over your index.
The month indices you're working with is not totally clear to me, but I hope you get the idea



Months <- tibble(
  month_index = LETTERS[1:12], 
  month = lubridate::month(1:12, label = TRUE)
)

df <- df %>% 
  left_join(
    Months,
    by = c('birth_month_index' = 'month_index')
)
df
#> # A tibble: 8 x 4
#>   birth_year birth_month_index other month
#>        <dbl> <chr>             <chr> <ord>
#> 1       1973 C                 23    Mar  
#> 2       1962 R                 31    <NA> 
#> 3       1962 M                 26    <NA> 
#> 4       1958 C                 44    Mar  
#> 5       1953 R                 02    <NA> 
#> 6         NA <NA>              <NA>  <NA> 
#> 7       1978 R                 58    <NA> 
#> 8       1976 C                 63    Mar

Created on 2018-10-31 by the reprex package (v0.2.1)


A good way to ask a quest like this is with a reproducible example, or what folks call a reprex for short. REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

6 Likes

Thanks a lot i solve with this for all of my column:
calcolo_ext_anno <- ifelse(ext_anno > 18, ext_anno+1900, ext_anno+2000)

1 Like

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: