Remove item from the list based on the column value

gocoyd · April 11, 2023, 3:27pm

Hello! I have a list of medical organizations types which looks like this:

list <- c("hospital", "center", "polyclinic", "dispencer")

I also have a dataframe with the name of the organization and the defined type which looks like this (there is an extreme case presented here which needs a solution):

| Name | Type |
| -------- | -------------- |
| cure center state hospital | hospital            |
| state polyclinic cure center | center            |
| state hospital main dispancer| dispancer|
| first hospital number one   | hospital            |

As you can see some names have 2 types of organizations. To deal with them I want to remove items from the list above according to the value in the column Type. For example, if the value in the column Type is center, then the word center should be deleted from the list and it will look like this ( c("hospital", "polyclinic", "dispencer")). After that I will just delete everything before the word from the list so that it will look like this:

Name	Type	Name after
cure center state hospital	hospital	state hospital
state polyclinic cure center	center	cure center
state hospital main dispancer	dispancer	main dispancer
first hospital number one	hospital	first hospital number one

The data to work with is:

Name <- c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one",)
Type <- c("hospital", "center", "dispancer", "hospital")

Do you have any ideas?

nirgrahamuk · April 11, 2023, 5:13pm

Name <- c("cure center state hospital", "state polyclinic cure center", "state hospital main dispancer", "first hospital number one")
Type <- c("hospital", "center", "dispancer", "hospital")

library(tidyverse)
(dset <- tibble(
  Name = Name,
  Type = Type
) |> rowwise() |> mutate(
  list_of_words = strsplit(Name, " "),
  detect = list(which(unlist(list_of_words) == Type)),
  rebuilt = paste0(unlist(list_of_words)[
    seq(
      from = detect - 1,
      to = length(unlist(list_of_words))
    )
  ], collapse = " ")
) |> ungroup())


(slim_result <- select(dset,
  Name,
  Type,
  `Name after` = rebuilt
))

gocoyd · April 12, 2023, 6:15am

Hello! My R gives me an error:
Error: unexpected '>' in:
" Type = Type
) |>"

What can I do?

Upd: I changed |> to %>% and now it works

gocoyd · April 12, 2023, 7:01am

Hello! The thing is that you just delete one word before the type in the column Type, while I need to detect the word in the word Type, delete it from the list of types and remove everything before and including the type other than the one in the Type column. For example, your code will not work for this instance: "state polyclinic state adult hospital one". It will leave "adult hospital one", while I need "state adult hospital one"

nirgrahamuk · April 12, 2023, 7:44am

I dont understand what you want.

xvalda · April 12, 2023, 9:00am

Hi @gocoyd ,
Not sure it's the most elegant solution, but this should work:

library(tidyverse)
data <- tibble(Name = c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one", "state polyclinic state adult hospital one"), 
       Type = c("hospital", "center", "dispancer", "hospital", "hospital"))
data %>% mutate(type_count = str_count(Name, "hospital|polyclinic|dispancer|center"), 
                type_other = str_extract(Name, setdiff(c("hospital", "center", "dispancer", "polyclinic") %>% paste0(collapse = "|"), Type)),
                `Name After` = case_when(
                  type_count == 1 ~ Name, 
                  TRUE ~ str_remove(Name, paste0(".*", type_other, "\\s")))
) %>% select(-c(type_count, type_other))
#> # A tibble: 5 × 3
#>   Name                                      Type      `Name After`             
#>   <chr>                                     <chr>     <chr>                    
#> 1 cure center state hospital                hospital  state hospital           
#> 2 state polyclinic cure center              center    cure center              
#> 3 state hospital main dispancer             dispancer main dispancer           
#> 4 first hospital number one                 hospital  first hospital number one
#> 5 state polyclinic state adult hospital one hospital  state adult hospital one

It creates two additional columns that are selected out for the final output:

type_count: how many types are present in the name
type_other: what type is contained in the name, other than what is in the type column
Then with case_when(), we return the full name if only one file type is present in the Name (type_count == 1) or else we remove the beginning of the string until the first type.

Caveat: if Name contains more than 2 types, Name After will start after the first type occurence.

xvalda · April 12, 2023, 2:31pm

I also thought of this solution, more robust I think and if Name contains more than one type (after removing the Type column), it will return the name after the last found occurence.

library(tidyverse); library(tidytext)
data <- tibble(Name = c("cure center state hospital","state polyclinic cure center","state hospital main dispancer","first hospital number one", "state polyclinic state adult hospital one"), 
               Type = c("hospital", "center", "dispancer", "hospital", "hospital"))

name_after <- function(Name, Type){
  types_remaining <- setdiff(c("hospital", "center", "dispancer", "polyclinic"), Type)
  as_tibble(Name) %>% unnest_tokens(words, value, "words") %>% 
    mutate(types = ifelse(words %in% types_remaining, words, NA)
    ) %>% 
    fill(types, .direction = "up") %>% 
    filter(is.na(types)) %>% pull(words) %>% paste0(collapse = " ")
}

data %>% 
  rowwise() %>% 
  mutate(`Name After` = name_after(Name, Type))
#> # A tibble: 5 × 3
#> # Rowwise: 
#>   Name                                      Type      `Name After`             
#>   <chr>                                     <chr>     <chr>                    
#> 1 cure center state hospital                hospital  state hospital           
#> 2 state polyclinic cure center              center    cure center              
#> 3 state hospital main dispancer             dispancer main dispancer           
#> 4 first hospital number one                 hospital  first hospital number one
#> 5 state polyclinic state adult hospital one hospital  state adult hospital one

system · April 9, 2024, 4:24pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.