Replace a particular value with text

I want to replace the value 98 with "No response" in my entire excel workbook wherever it appears. Pls help me out.

library(tidyverse)
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
tibble::tribble(
  ~enumerator, ~l1el4_identify_letter3, ~l1el4_identify_letter4, ~l1el4_identify_letter5, ~l1el4_total, ~l1el5_dummy, ~l1el5_match_pic1,
     "PEN008",                      0L,                      1L,                      1L,           0L,           0L,                0L,
     "PEN001",                      1L,                      1L,                      1L,           1L,           0L,                1L,
     "PEN006",                      0L,                      1L,                     98L,           1L,           1L,                0L,
     "PEN006",                      0L,                      0L,                      0L,           0L,           0L,                0L,
     "PEN010",                      1L,                      0L,                      0L,           1L,          98L,                0L,
     "PEN010",                     98L,                      1L,                      0L,           0L,           0L,                0L,
     "PEN003",                      1L,                      0L,                      0L,           1L,           0L,                0L,
     "PEN003",                      1L,                      1L,                      0L,           0L,           0L,                1L
  )
#> # A tibble: 8 x 7
#>   enumerator l1el4_identify_le~ l1el4_identify_le~ l1el4_identify_l~ l1el4_total
#>   <chr>                   <int>              <int>             <int>       <int>
#> 1 PEN008                      0                  1                 1           0
#> 2 PEN001                      1                  1                 1           1
#> 3 PEN006                      0                  1                98           1
#> 4 PEN006                      0                  0                 0           0
#> 5 PEN010                      1                  0                 0           1
#> 6 PEN010                     98                  1                 0           0
#> 7 PEN003                      1                  0                 0           1
#> 8 PEN003                      1                  1                 0           0
#> # ... with 2 more variables: l1el5_dummy <int>, l1el5_match_pic1 <int>

Created on 2021-10-30 by the reprex package (v2.0.1)

You can do something like this but have in mind that all the affected variables are going to be converted to class "character".

library(tidyverse)

sample_df <- tibble::tribble(
    ~enumerator, ~l1el4_identify_letter3, ~l1el4_identify_letter4, ~l1el4_identify_letter5, ~l1el4_total, ~l1el5_dummy, ~l1el5_match_pic1,
    "PEN008",                      0L,                      1L,                      1L,           0L,           0L,                0L,
    "PEN001",                      1L,                      1L,                      1L,           1L,           0L,                1L,
    "PEN006",                      0L,                      1L,                     98L,           1L,           1L,                0L,
    "PEN006",                      0L,                      0L,                      0L,           0L,           0L,                0L,
    "PEN010",                      1L,                      0L,                      0L,           1L,          98L,                0L,
    "PEN010",                     98L,                      1L,                      0L,           0L,           0L,                0L,
    "PEN003",                      1L,                      0L,                      0L,           1L,           0L,                0L,
    "PEN003",                      1L,                      1L,                      0L,           0L,           0L,                1L
)

sample_df %>% 
    mutate(across(everything(), ~ if_else(. == 98, "No Response", as.character(.))))
#> # A tibble: 8 × 7
#>   enumerator l1el4_identify_le… l1el4_identify_le… l1el4_identify_l… l1el4_total
#>   <chr>      <chr>              <chr>              <chr>             <chr>      
#> 1 PEN008     0                  1                  1                 0          
#> 2 PEN001     1                  1                  1                 1          
#> 3 PEN006     0                  1                  No Response       1          
#> 4 PEN006     0                  0                  0                 0          
#> 5 PEN010     1                  0                  0                 1          
#> 6 PEN010     No Response        1                  0                 0          
#> 7 PEN003     1                  0                  0                 1          
#> 8 PEN003     1                  1                  0                 0          
#> # … with 2 more variables: l1el5_dummy <chr>, l1el5_match_pic1 <chr>

Created on 2021-10-30 by the reprex package (v2.0.1)

All variables in "character" class will make it difficult for me for further analysis. Is there any method to maintain its numeric form?

Keep everything numeric but recode 98 to NA
https://stats.idre.ucla.edu/r/faq/how-does-r-handle-missing-values/

isn't there any other way to change it to text format and keep the variables as numeric?

I already have NAs in my dataset. So if I recode 98 to NA, then that becomes problematic.

Given your context, what do those other NA values mean ?

There are a bunch of blank cells in our dataset. These are already showing "NA". If I convert 98 to NA, that would be a trouble for me.

I suppose you either keep them as number 98 but take care in any calculation to treat them specially. Maybe a thing to convert them to that would enforce that discipline would be to have them be Inf for infinity

This is a challenge as we have around 400+ variables. So adjusting for calculation each time would be difficult.

It's not possible to maintain the numeric form for just some rows in a column. tibble will force all values to have the same format. This is to optimize the performance.

If you want the values recoded, why don't you use them as they are and recode them at the end of the analysis?

Or change them as character because from what I can gather, you are using them as categorical variables anyway so I don't see why you need them in numeric value anyway.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.