Remove item from the list based on the column value

Hello! I have a list of medical organizations types which looks like this:

list <- c("hospital", "center", "polyclinic", "dispencer")

I also have a dataframe with the name of the organization and the defined type which looks like this (there is an extreme case presented here which needs a solution):

| Name | Type |
| -------- | -------------- |
| cure center state hospital | hospital            |
| state polyclinic cure center | center            |
| state hospital main dispancer| dispancer|
| first hospital number one   | hospital            |

As you can see some names have 2 types of organizations. To deal with them I want to remove items from the list above according to the value in the column Type. For example, if the value in the column Type is center, then the word center should be deleted from the list and it will look like this ( c("hospital", "polyclinic", "dispencer")). After that I will just delete everything before the word from the list so that it will look like this:

Name Type Name after
cure center state hospital hospital state hospital
state polyclinic cure center center cure center
state hospital main dispancer dispancer main dispancer
first hospital number one hospital first hospital number one

The data to work with is:

Name <- c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one",)
Type <- c("hospital", "center", "dispancer", "hospital")

Do you have any ideas?

Name <- c("cure center state hospital", "state polyclinic cure center", "state hospital main dispancer", "first hospital number one")
Type <- c("hospital", "center", "dispancer", "hospital")

library(tidyverse)
(dset <- tibble(
  Name = Name,
  Type = Type
) |> rowwise() |> mutate(
  list_of_words = strsplit(Name, " "),
  detect = list(which(unlist(list_of_words) == Type)),
  rebuilt = paste0(unlist(list_of_words)[
    seq(
      from = detect - 1,
      to = length(unlist(list_of_words))
    )
  ], collapse = " ")
) |> ungroup())


(slim_result <- select(dset,
  Name,
  Type,
  `Name after` = rebuilt
))

Hello! My R gives me an error:
Error: unexpected '>' in:
" Type = Type
) |>"

What can I do?

Upd: I changed |> to %>% and now it works

Hello! The thing is that you just delete one word before the type in the column Type, while I need to detect the word in the word Type, delete it from the list of types and remove everything before and including the type other than the one in the Type column. For example, your code will not work for this instance: "state polyclinic state adult hospital one". It will leave "adult hospital one", while I need "state adult hospital one"

I dont understand what you want. :man_shrugging:

Hi @gocoyd ,
Not sure it's the most elegant solution, but this should work:

library(tidyverse)
data <- tibble(Name = c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one", "state polyclinic state adult hospital one"), 
       Type = c("hospital", "center", "dispancer", "hospital", "hospital"))
data %>% mutate(type_count = str_count(Name, "hospital|polyclinic|dispancer|center"), 
                type_other = str_extract(Name, setdiff(c("hospital", "center", "dispancer", "polyclinic") %>% paste0(collapse = "|"), Type)),
                `Name After` = case_when(
                  type_count == 1 ~ Name, 
                  TRUE ~ str_remove(Name, paste0(".*", type_other, "\\s")))
) %>% select(-c(type_count, type_other))
#> # A tibble: 5 × 3
#>   Name                                      Type      `Name After`             
#>   <chr>                                     <chr>     <chr>                    
#> 1 cure center state hospital                hospital  state hospital           
#> 2 state polyclinic cure center              center    cure center              
#> 3 state hospital main dispancer             dispancer main dispancer           
#> 4 first hospital number one                 hospital  first hospital number one
#> 5 state polyclinic state adult hospital one hospital  state adult hospital one

It creates two additional columns that are selected out for the final output:

  • type_count: how many types are present in the name
  • type_other: what type is contained in the name, other than what is in the type column
    Then with case_when(), we return the full name if only one file type is present in the Name (type_count == 1) or else we remove the beginning of the string until the first type.

Caveat: if Name contains more than 2 types, Name After will start after the first type occurence.

1 Like

I also thought of this solution, more robust I think and if Name contains more than one type (after removing the Type column), it will return the name after the last found occurence.

library(tidyverse); library(tidytext)
data <- tibble(Name = c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one", "state polyclinic state adult hospital one"), 
               Type = c("hospital", "center", "dispancer", "hospital", "hospital"))

name_after <- function(Name, Type){
  types_remaining <- setdiff(c("hospital", "center", "dispancer", "polyclinic"), Type)
  as_tibble(Name) %>% unnest_tokens(words, value, "words") %>% 
    mutate(types = ifelse(words %in% types_remaining, words, NA)
    ) %>% 
    fill(types, .direction = "up") %>% 
    filter(is.na(types)) %>% pull(words) %>% paste0(collapse = " ")
}

data %>% 
  rowwise() %>% 
  mutate(`Name After` = name_after(Name, Type))
#> # A tibble: 5 × 3
#> # Rowwise: 
#>   Name                                      Type      `Name After`             
#>   <chr>                                     <chr>     <chr>                    
#> 1 cure center state hospital                hospital  state hospital           
#> 2 state polyclinic cure center              center    cure center              
#> 3 state hospital main dispancer             dispancer main dispancer           
#> 4 first hospital number one                 hospital  first hospital number one
#> 5 state polyclinic state adult hospital one hospital  state adult hospital one

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.