Grouping variables in a Column

Hello All,

I am in need of your assistance. Below you will find various categories of testing data. My question is, how can I combine or group certain variables together?

For example, how would I combine HIV-Positive (72) with Positive (110) so that I can create a scenario where it could just be: HIV Positive = 182

Ideally under this column, I would like to restructure it in a way where I can combine negative with negative and positive with positive.

the column title is "Final.Test.Result."

|1||328|
|2|Discordant|5|
|3|HIV-1 Negative|88|
|4|HIV-1 Positive|72|
|5|HIV Negative|5304|
|6|HIV Positive, undifferentiated|1|
|7|Inconclusive, further testing needed|30|
|8|Invalid|10|
|9|Negative|37722|
|10|Positive|110|
|11|Preliminary positive|17|

All the help is greatly appreciated.

Thank you!

You can do something like this

library(tidyverse)

# Sample data on a copy/paste friendly format, replace with your actual data frame.
sample_df <- data.frame(
    stringsAsFactors = FALSE,
    Final.Test.Result = c(NA,"Discordant",
                          "HIV-1 Negative","HIV-1 Positive","HIV Negative",
                          "HIV Positive, undifferentiated","Inconclusive, further testing needed",
                          "Invalid","Negative","Positive","Preliminary positive"),
    Count = c(328L,5L,88L,72L,5304L,1L,
           30L,10L,37722L,110L,17L)
)

sample_df %>% 
    mutate(group = case_when(
        str_detect(Final.Test.Result, "[Pp]ositive") ~ "Positive",
        str_detect(Final.Test.Result, "[Nn]egative") ~ "Negative",
        TRUE ~ "Other"
    )) %>% 
    group_by(group) %>% 
    summarise(Count = sum(Count))
#> # A tibble: 3 x 2
#>   group    Count
#>   <chr>    <int>
#> 1 Negative 43114
#> 2 Other      373
#> 3 Positive   200

Created on 2021-03-26 by the reprex package (v1.0.0.9002)

Note: Next time please provide a proper REPRoducible EXample (reprex) illustrating your issue.

This was very helpful, however, there are a couple of elements I would like to do, and I apologize for not specifying more.

I want to keep the original data frame, except, I want to mutate those new characters in Final.Test.Result within the original data frame

ideally, I would like to have this:

New_Final_Test_Result
Discordant
Negative
Positive
Invalid

How would you possibly code that?

I thank you for your help!

Is this what you mean?

library(tidyverse)

# Sample data on a copy/paste friendly format, replace with your actual data frame.
sample_df <- data.frame(
    stringsAsFactors = FALSE,
    Final.Test.Result = c(NA,"Discordant",
                          "HIV-1 Negative","HIV-1 Positive","HIV Negative",
                          "HIV Positive, undifferentiated","Inconclusive, further testing needed",
                          "Invalid","Negative","Positive","Preliminary positive"),
    Count = c(328L,5L,88L,72L,5304L,1L,
              30L,10L,37722L,110L,17L)
)

sample_df %>% 
    mutate(New_Final.Test.Result = case_when(
        str_detect(Final.Test.Result, "[Pp]ositive") ~ "Positive",
        str_detect(Final.Test.Result, "[Nn]egative") ~ "Negative",
        str_detect(Final.Test.Result, "[Dd]iscordant") ~ "Discordant",
        TRUE ~ "Invalid"
    ))
#>                       Final.Test.Result Count New_Final.Test.Result
#> 1                                  <NA>   328               Invalid
#> 2                            Discordant     5            Discordant
#> 3                        HIV-1 Negative    88              Negative
#> 4                        HIV-1 Positive    72              Positive
#> 5                          HIV Negative  5304              Negative
#> 6        HIV Positive, undifferentiated     1              Positive
#> 7  Inconclusive, further testing needed    30               Invalid
#> 8                               Invalid    10               Invalid
#> 9                              Negative 37722              Negative
#> 10                             Positive   110              Positive
#> 11                 Preliminary positive    17              Positive

Created on 2021-03-26 by the reprex package (v1.0.0.9002)

With your help and examples, I was able to figure out and guide myself on how I wanted it to work out. This is the coding I used:

Aphirm_2019_2020_HIV_Testing_Data <- Aphirm_2019_2020_HIV_Testing_Data %>%
mutate(final_test_result_coded = case_when(
str_detect(str_to_lower(Final.Test.Result), ".discordant.") ~ "Discordant",
str_detect(str_to_lower(Final.Test.Result), ".invalid.") ~ "Invalid",
str_detect(str_to_lower(Final.Test.Result), ".inconclusive.") ~ "Inconclusive",
str_detect(str_to_lower(Final.Test.Result), ".pos.") ~ "Positive",
str_detect(str_to_lower(Final.Test.Result), ".neg.") ~ "Negative",
T ~ NA_character_
))

overall, I wanted what you showed, but just to be added as a new column on the original data frame.

I appreciate your time and willingness to help!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.