Revising the code on a conditional join with a lookup table (mapping)

Note: I solved the problem, I need to learn which is the most efficient or "R way" to solve it.

The data:

I have a set of observations that corresponds to encounters in a urgency centre. Diagnositcs are coded by ICD-10. I wanted to give a general idea using the ICD chapters (https://en.wikipedia.org/wiki/ICD-10#Chapters), so I needed some sort of "conditional" join (an alphanumeric range). So in the end, I had two dfs: the ICD-10 table (wikipedia link above), and my data that looks like this:

# tabla_cie10
# originally the code was represented as a range like: A00-B99. Since I couldn't do anything with it, I split it to a "start" and "end" column.
  capitulo codigo_inicio codigo_termino                                                                                                                 titulo
1        I           A00            B99                                                                        Ciertas enfermedades infecciosas y parasitarias
2       II           C00            D48                                                                                                             Neoplasias
3      III           D50            D89 Enfermedades de la sangre y de los órganos hematopoyéticos y otros trastornos que afectan el mecanismo de la inmunidad
4       IV           E00            E90                                                                   Enfermedades endocrinas, nutricionales y metabólicas
5        V           F00            F99                                                                               Trastornos mentales y del comportamiento
6       VI           G00            G99                                                                                      Enfermedades del sistema nervioso
# df
  CODIGO_ESTANDAR id cie10_titulo
1           Z02.7  1
2           N39.0  2
3           S42.3  3
4             B86  4
5             N23  5
6           R50.9  6

I ended up doing a rowwise for loop that solved my problem:

btw <- function(x, left, right) {
  x >= left & x <= right
}


for (i in 1:nrow(tabla_cie10)) {
 
  row <- tabla_cie10[i,]
  
  df <- df %>%
    mutate(cie10_titulo = if_else(
      btw(CODIGO_ESTANDAR, row$codigo_inicio, row$codigo_termino),
      row$titulo,
      cie10_titulo))
}

The code actually worked:

  CODIGO_ESTANDAR id                                                                                    cie10_titulo
1           Z02.7  1               Factores que influyen en el estado de salud y contacto con los servicios de salud
2           N39.0  2                                                         Enfermedades del aparato genitourinario
3           S42.3  3                    Traumatismos, envenenamientos y algunas otras consecuencias de causa externa
4             B86  4                                                 Ciertas enfermedades infecciosas y parasitarias
5             N23  5                                                         Enfermedades del aparato genitourinario
6           R50.9  6 Síntomas, signos y hallazgos anormales clínicos y de laboratorio, no clasificados en otra parte

While I could solve my problem, I have the feeling that I solved it in a non elegant or the "R" way. Also I felt I couldn't search this issue because I didn't know how to look for it (so if you could help me to correct the title of the topic it would be welcome too).

If you want to stay in the tidy-world, you could check out the fuzzyjoin package which allows you to specify non exact matching joins.

There are some other R solutions to a similar problem written up here, including using data.table::foverlaps().

1 Like

If you could post your tables as dput() output for copying, I could be more sure, but here's a possible solution that uses the %>% pipe (are you familiar with it?):

library(tidyverse)

df1 <- 
tribble(
~capitulo, ~codigo_inicio, ~codigo_termino,                                                                                                                 ~titulo,
        'I',           'A00',            'B99',                                                                        'Ciertas enfermedades infecciosas y parasitarias',
       'II',           'C00',            'D48',                                                                                                             'Neoplasias',
      'III',           'D50',            'D89', 'Enfermedades de la sangre y de los órganos hematopoyéticos y otros trastornos que afectan el mecanismo de la inmunidad',
       'IV',           'E00',            'E90',                                                                   'Enfermedades endocrinas, nutricionales y metabólicas',
        'V',           'F00',            'F99',                                                                               'Trastornos mentales y del comportamiento',
       'VI',           'G00',            'G99',                                                                                      'Enfermedades del sistema nervioso'
)

df2 <- 
tribble(
~CODIGO_ESTANDAR, ~id,
           'Z02.7',  1,
           'N39.0',  2,
           'S42.3',  3,
             'B86',  4,
             'N23',  5,
           'R50.9',  6
)

btw <- function(x, left, right) {
  x >= left & x <= right
}

df1 %>% mutate(dummy = 1) %>% 
  inner_join(df2 %>% mutate(dummy = 1)) %>% 
  filter(CODIGO_ESTANDAR %>% btw(codigo_inicio, codigo_termino)) %>% 
  select(codigo_inicio, CODIGO_ESTANDAR, codigo_termino, titulo )
#> Joining, by = "dummy"
#> # A tibble: 1 x 4
#>   codigo_inicio CODIGO_ESTANDAR codigo_termino titulo                      
#>   <chr>         <chr>           <chr>          <chr>                       
#> 1 A00           B86             B99            Ciertas enfermedades infecc…

df1 %>% mutate(dummy = 1) %>% 
  inner_join(df2 %>% mutate(dummy = 1)) %>% 
  filter(CODIGO_ESTANDAR %>% btw(codigo_inicio, codigo_termino)) %>% 
  select(CODIGO_ESTANDAR, id, titulo)
#> Joining, by = "dummy"
#> # A tibble: 1 x 3
#>   CODIGO_ESTANDAR    id titulo                                         
#>   <chr>           <dbl> <chr>                                          
#> 1 B86                 4 Ciertas enfermedades infecciosas y parasitarias

Created on 2020-03-06 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.