How transform "<X" values to "X/2" in a dataframe with several variables

Hi all,

I am dealing with a dataframe containing several variables and some of them are non-mesurable and reported as "<X" - for example <10.
How can I convert <X values in (X/2)?

I tried the following, using pacman + rio + tidyverse, but it is very long and seems to give some issues:

df2 <- df %>% #CREATE NEW DATA FRAME WITH NO NON-DETECTS
mutate(Benzene = ifelse(str_detect(
string = Benzene,
pattern = "<"
),
parse_number(gsub(
pattern = "<",
replacement = "", x = Benzene
)) / 2, parse_number(Benzene)
)) %>%
mutate(C6_C10 = ifelse(str_detect( #SUBSTITUTE C6-C10
string = C6_C10,
pattern = "<"
),
parse_number(gsub(
pattern = "<",
replacement = "", x = C6_C10
)) / 2, parse_number(C6_C10)
)) %>% #AND SO ON FOR ALL VARIABLES, WHICH ARE 20

Regards,

The approach seems valid overall, but can be simplified since parse_number() already removes any non-string characters.

To apply it to many columns, you can simply put the operations in a function that you call with across().

You don't provide data to test, I would expect something like that to work:


library(tidyverse)
df <- data.frame(col1=c(1,4,7,2,"<56",8,3,8,"<10",7,2),
                 col2=c(65,"<7",1,5,"<99",3,6,8,876,"<34",1))

process_numbers <- function(x){
  if_else(str_detect(x, "^<"),
          parse_number(x)/2,
          parse_number(x))
}

df %>%
  mutate(across(col1:col2, process_numbers))
#>    col1  col2
#> 1     1  65.0
#> 2     4   3.5
#> 3     7   1.0
#> 4     2   5.0
#> 5    28  49.5
#> 6     8   3.0
#> 7     3   6.0
#> 8     8   8.0
#> 9     5 876.0
#> 10    7  17.0
#> 11    2   1.0

Created on 2020-12-21 by the reprex package (v0.3.0)

Of course you can consider using round() or floor() to have only integer results.

1 Like

Hi AlexisW,

Thank you so much for answering. It is appreciated very much.
I have tried your code many times, but I cannot make it work. This is the error I get:

Error: Problem with mutate() input ..1.
x is.character(x) is not TRUE
i Input ..1 is across(C6_C10:Sulphate, process_numbers).
Run rlang::last_error() to see where the error occurred.

Note that columns in my "df", , which are 20, are named C6_C10 throught to Sulphate.

are they all character columns or a mix of character columns with the artifacts your describe, but some purely numeric with no such artifacts ? I think its likely a matter of identifying the variables to process (and which to not), or to add conditional tests into the process function, so it can work with either (i.e. the full set)

1 Like

They are all value, comprised between 0.01 to 11,000. Some data point were non-detects and reported as less-than a certain value, for example <2 or <0.01.
However, some columns are "dbl" while others are "chr".

I tried this:

INSTALL AND LOAD PACKAGES

pacman::p_load(pacman, rio, tidyverse)

LOAD AND PREPARE DATA

df <- import("data/Dubbo_One_event.xlsx") %>%
as_tibble() %>%
mutate(Well_ID = as.factor(Well_ID)) %>%
rename(y = Well_ID) %>%
print()

#CONVERT CHARACTERS INTO NUMBERS

df <- lapply(df, function(x) as.numeric(as.character(x)))
warnings()
sapply(df, class)

#CONVERT 'LESS-THAN X' INTO (X/2)

library(tidyverse)

process_numbers <- function(x){
if_else(str_detect(x, "<"),
parse_number(x)/2,
parse_number(x))
}

df %>%
mutate(across(C6_C10:Sulphate, process_numbers))

IT STILL GIVES AN ERROR:

rlang::last_error()
<error/dplyr:::mutate_error>
Problem with mutate() input ..1.
x is.character(x) is not TRUE
i Input ..1 is across(C6_C10:Sulphate, process_numbers).
Backtrace:
Run rlang::last_trace() to see the full context.

process_numbers <- function(x){
if(is.character(x){
if_else(str_detect(x, "<"),
parse_number(x)/2,
parse_number(x))} else{x}
}
2 Likes

As an alternative to changing process_numbers(), the across() function can be used in conjunction with where():

process_numbers <- function(x){
  if_else(str_detect(x, "<"),
  parse_number(x)/2,
  parse_number(x))
}

df %>%
  mutate(
    across(
      C6_C10:Sulphate & where(is.character),
      process_numbers
    )
  )

I add this as an alternative, because in this example, you have defined process_numbers() yourself, sometimes we may wish to apply a function in this way that we don't have control over and using where() can add that additional caveat.

1 Like

Hi guys,
thanks for all your answers. It still does not work - it seems hard to conver characters '<X" into X/2. I thought with R it would have been easy but I might do it using excel.

I still got:

Error: Problem with mutate() input ..1.
x object 'process_numbers' not found
i Input ..1 is across(C6_C10:Sulphate & where(is.character), process_numbers).
Run rlang::last_error() to see where the error occurred.

This means that in your session, you have not run code that defines the process_numbers function.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.