Error in Rounding for Comparison Purposes

Hello,

So recently in another Topic I received some guidance on putting together a code to help run a comparison of two files, filtered down to specific items of interest. The goal of the code is to be able to QC vast amounts of data with an output of True/False if the values match. I have a couple variations of the code that all work (differing scenarios), however one possible scenario is that the values of one file are rounded and the other is not.

The code for this works for the most part, however I notice it throws up a few false errors (saying things do not match when they should) sometimes when the value of comparison contains a x.5. After some research this, I believe, is the integrated "Round to the nearest even" standard that is only found in coding and statistics. To start, please do not get into the "oh this is the way it is" argument. I totally understand the validity of it for statistics, and am not arguing that point. What I am looking for is some help integrating a code into my function that would essentially make it not do this, instead following the "normal" rule for rounding that has been learned coming up since grade school.

First time using reprex in this format, so will edit after posting if it looks weird. The dataset I am using is:

data.frame(
  stringsAsFactors = FALSE,
           Barcode = c("621175209036","621175209037",
                       "621175209038","621175209039","621175209136",
                       "621175209137","621175209138","621175209139","621175209633",
                       "621175209634","621175209635","621175209636",
                       "621175209733","621175209734","621175209735","621175209736",
                       "621175210333","621175210334","621175210335",
                       "621175210336"),
          Result.x = c("157.86","6931.095","43.938",
                       "<2.34","<31.26","5623.703","65.535","<2.34",
                       "<31.26","5319.741","28.478","<2.34","257.009","3241.802",
                       "55.961","<2.34","198.521",">24000","60.129",
                       "<2.34"),
          Result.y = c("157.9","6931.1","43.9",
                       "<2.34","<31.26","5623.7","65.5","<2.34","<31.26",
                       "5319.7","28.5","<2.34","257.0","3241.8","56.0","<2.34",
                       "198.5",">24000","60.1","<2.34")
)
#>         Barcode Result.x Result.y
#> 1  621175209036   157.86    157.9
#> 2  621175209037 6931.095   6931.1
#> 3  621175209038   43.938     43.9
#> 4  621175209039    <2.34    <2.34
#> 5  621175209136   <31.26   <31.26
#> 6  621175209137 5623.703   5623.7
#> 7  621175209138   65.535     65.5
#> 8  621175209139    <2.34    <2.34
#> 9  621175209633   <31.26   <31.26
#> 10 621175209634 5319.741   5319.7
#> 11 621175209635   28.478     28.5
#> 12 621175209636    <2.34    <2.34
#> 13 621175209733  257.009    257.0
#> 14 621175209734 3241.802   3241.8
#> 15 621175209735   55.961     56.0
#> 16 621175209736    <2.34    <2.34
#> 17 621175210333  198.521    198.5
#> 18 621175210334   >24000   >24000
#> 19 621175210335   60.129     60.1
#> 20 621175210336    <2.34    <2.34

Created on 2021-08-16 by the reprex package (v2.0.0)

The code I am looking to hopefully modify is:

myData %>%
+ mutate(identical = round(as.numeric(Result.x,1)) == round(as.numeric(Result.y, 1)))
#> Error in myData %>% +mutate(identical = round(as.numeric(Result.x, 1)) == : could not find function "%>%"

Created on 2021-08-16 by the reprex package (v2.0.0)

Any help or advise would be greatly appreciated on finding a workaround for this. Like I said before, I understand and appreciate the rounding standard, but for the sake of a comparison I need rounding to behave "normally".

-Q

Instead of round(x) you can use floor(x + 0.5).

The error is because you haven't loaded dplyr. Add this to the start: library(dplyr)

I believe this can be generalized as

round_as_third_grade <- function(x, digits = 0){
  scalar <- 10 ^ digits
  
  out <- x * scalar
  out <- floor(out + 0.5)
  out <- out * (1 / scalar)
  
  out
}


round_as_third_grade(0.05, 1)
round_as_third_grade(0.15, 1)
round_as_third_grade(0.25, 1)
round_as_third_grade(2.5)
round_as_third_grade(2.225, 2)

Someone should check my work on that, though.

The issue is not dplyr. I always load dplyr as part of my startup. Good thought though, I went back to double check that it indeed is loaded.

You had not loaded it in the example you posted. Hence the error:

#> Error in myData %>% +mutate(identical = round(as.numeric(Result.x, 1)) == : could not find function "%>%"

I was curious about that, as that only appeared when I used reprex/datapasta. Prior to that the code still ran without issues.

This seems to be on the right track. Running the individual numbers confirms that this works. However when I try to incorporate this into the code above, it shoots out FALSE for for the comparison when it should be TRUE. Were you able to get this to run successfully?

I just found something I hadn't noticed before. You seem to have placed the , 1 in the wrong set of parentheses. Compare these two commands:

round(as.numeric(Result.x,1))   # your original command
round(as.numeric(Result.x), 1)  # what I think you intended.

The following appears to work:

library(dplyr)

round_as_third_grade <- function(x, digits = 0){
  scalar <- 10 ^ digits
  
  out <- x * scalar
  out <- floor(out + 0.5)
  out <- out * (1 / scalar)
  
  out
}


myData <- 
  data.frame(
    stringsAsFactors = FALSE,
    Barcode = c("621175209036","621175209037",
                "621175209038","621175209039","621175209136",
                "621175209137","621175209138","621175209139","621175209633",
                "621175209634","621175209635","621175209636",
                "621175209733","621175209734","621175209735","621175209736",
                "621175210333","621175210334","621175210335",
                "621175210336"),
    Result.x = c("157.86","6931.095","43.938",
                 "<2.34","<31.26","5623.703","65.535","<2.34",
                 "<31.26","5319.741","28.478","<2.34","257.009","3241.802",
                 "55.961","<2.34","198.521",">24000","60.129",
                 "<2.34"),
    Result.y = c("157.9","6931.1","43.9",
                 "<2.34","<31.26","5623.7","65.5","<2.34","<31.26",
                 "5319.7","28.5","<2.34","257.0","3241.8","56.0","<2.34",
                 "198.5",">24000","60.1","<2.34")
  )

myData %>%
  mutate(Rounded.x = round_as_third_grade(as.numeric(Result.x), 1), 
         Rounded.y = round_as_third_grade(as.numeric(Result.y), 1), 
         identical = round_as_third_grade(as.numeric(Result.x), 1) == 
           round_as_third_grade(as.numeric(Result.y), 1))

It would appear you are correct. I think I may have introduced this error when I put the "as.numeric" parameter into the function. Changing this along with what you illustrated works! Exactly what I was looking for. I might have to keep the "round_as_third_grade" name for that function, it fits so perfectly. Thank you and to all who provided feedback!

Q

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.