Hello! I have a list of medical organizations types which looks like this:
list <- c("hospital", "center", "polyclinic", "dispencer")
I also have a dataframe with the name of the organization and the defined type which looks like this (there is an extreme case presented here which needs a solution):
| Name | Type |
| -------- | -------------- |
| cure center state hospital | hospital |
| state polyclinic cure center | center |
| state hospital main dispancer| dispancer|
| first hospital number one | hospital |
As you can see some names have 2 types of organizations. To deal with them I want to remove items from the list above according to the value in the column Type. For example, if the value in the column Type is center, then the word center should be deleted from the list and it will look like this ( c("hospital", "polyclinic", "dispencer")). After that I will just delete everything before the word from the list so that it will look like this:
Name
Type
Name after
cure center state hospital
hospital
state hospital
state polyclinic cure center
center
cure center
state hospital main dispancer
dispancer
main dispancer
first hospital number one
hospital
first hospital number one
The data to work with is:
Name <- c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one",)
Type <- c("hospital", "center", "dispancer", "hospital")
Hello! The thing is that you just delete one word before the type in the column Type, while I need to detect the word in the word Type, delete it from the list of types and remove everything before and including the type other than the one in the Type column. For example, your code will not work for this instance: "state polyclinic state adult hospital one". It will leave "adult hospital one", while I need "state adult hospital one"
Hi @gocoyd ,
Not sure it's the most elegant solution, but this should work:
library(tidyverse)
data <- tibble(Name = c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one", "state polyclinic state adult hospital one"),
Type = c("hospital", "center", "dispancer", "hospital", "hospital"))
data %>% mutate(type_count = str_count(Name, "hospital|polyclinic|dispancer|center"),
type_other = str_extract(Name, setdiff(c("hospital", "center", "dispancer", "polyclinic") %>% paste0(collapse = "|"), Type)),
`Name After` = case_when(
type_count == 1 ~ Name,
TRUE ~ str_remove(Name, paste0(".*", type_other, "\\s")))
) %>% select(-c(type_count, type_other))
#> # A tibble: 5 × 3
#> Name Type `Name After`
#> <chr> <chr> <chr>
#> 1 cure center state hospital hospital state hospital
#> 2 state polyclinic cure center center cure center
#> 3 state hospital main dispancer dispancer main dispancer
#> 4 first hospital number one hospital first hospital number one
#> 5 state polyclinic state adult hospital one hospital state adult hospital one
It creates two additional columns that are selected out for the final output:
type_count: how many types are present in the name
type_other: what type is contained in the name, other than what is in the type column
Then with case_when(), we return the full name if only one file type is present in the Name (type_count == 1) or else we remove the beginning of the string until the first type.
Caveat: if Name contains more than 2 types, Name After will start after the first type occurence.
I also thought of this solution, more robust I think and if Name contains more than one type (after removing the Type column), it will return the name after the last found occurence.
library(tidyverse); library(tidytext)
data <- tibble(Name = c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one", "state polyclinic state adult hospital one"),
Type = c("hospital", "center", "dispancer", "hospital", "hospital"))
name_after <- function(Name, Type){
types_remaining <- setdiff(c("hospital", "center", "dispancer", "polyclinic"), Type)
as_tibble(Name) %>% unnest_tokens(words, value, "words") %>%
mutate(types = ifelse(words %in% types_remaining, words, NA)
) %>%
fill(types, .direction = "up") %>%
filter(is.na(types)) %>% pull(words) %>% paste0(collapse = " ")
}
data %>%
rowwise() %>%
mutate(`Name After` = name_after(Name, Type))
#> # A tibble: 5 × 3
#> # Rowwise:
#> Name Type `Name After`
#> <chr> <chr> <chr>
#> 1 cure center state hospital hospital state hospital
#> 2 state polyclinic cure center center cure center
#> 3 state hospital main dispancer dispancer main dispancer
#> 4 first hospital number one hospital first hospital number one
#> 5 state polyclinic state adult hospital one hospital state adult hospital one