tibbles: my tibble adds precision where it didn't exist

Overall I'm loving how tibbles print - much more easier to view and understand how many observations you have etc.
But I have a follow-up question with an opposite problem - my tibble adds precision where it didn't exist. See this example: both numbers have 2 decimals, but their difference has 3 decimals. Is this expected? I don't want to set pillar.sigfig to 2 as I'll have other numbers, e.g. 12.34, that would then be affected too.

Ideas, comments, workarounds? Thanks!

library(tidyverse)
tibbledata = tibble(x = 0.81, y = 0.91)

tibbledata %>% 
  mutate(y-x)
#> # A tibble: 1 x 3
#>       x     y `y - x`
#>   <dbl> <dbl>   <dbl>
#> 1  0.81  0.91   0.100

Created on 2019-08-07 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Ubuntu 16.04.5 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB:en                    
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2019-08-07                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [2] CRAN (R 3.6.1)
#>  backports     1.1.4   2019-04-10 [2] CRAN (R 3.6.1)
#>  broom         0.5.2   2019-04-07 [2] CRAN (R 3.6.1)
#>  callr         3.3.1   2019-07-18 [2] CRAN (R 3.6.1)
#>  cellranger    1.1.0   2016-07-27 [2] CRAN (R 3.6.1)
#>  cli           1.1.0   2019-03-19 [2] CRAN (R 3.6.1)
#>  colorspace    1.4-1   2019-03-18 [2] CRAN (R 3.6.1)
#>  crayon        1.3.4   2017-09-16 [2] CRAN (R 3.6.1)
#>  desc          1.2.0   2018-05-01 [2] CRAN (R 3.6.1)
#>  devtools      2.1.0   2019-07-06 [2] CRAN (R 3.6.1)
#>  digest        0.6.20  2019-07-04 [2] CRAN (R 3.6.1)
#>  dplyr       * 0.8.3   2019-07-04 [2] CRAN (R 3.6.1)
#>  evaluate      0.14    2019-05-28 [2] CRAN (R 3.6.1)
#>  fansi         0.4.0   2018-10-05 [2] CRAN (R 3.6.1)
#>  forcats     * 0.4.0   2019-02-17 [2] CRAN (R 3.6.1)
#>  fs            1.3.1   2019-05-06 [2] CRAN (R 3.6.1)
#>  generics      0.0.2   2018-11-29 [2] CRAN (R 3.6.1)
#>  ggplot2     * 3.2.0   2019-06-16 [2] CRAN (R 3.6.1)
#>  glue          1.3.1   2019-03-12 [2] CRAN (R 3.6.1)
#>  gtable        0.3.0   2019-03-25 [2] CRAN (R 3.6.1)
#>  haven         2.1.1   2019-07-04 [2] CRAN (R 3.6.1)
#>  highr         0.8     2019-03-20 [2] CRAN (R 3.6.1)
#>  hms           0.5.0   2019-07-09 [2] CRAN (R 3.6.1)
#>  htmltools     0.3.6   2017-04-28 [2] CRAN (R 3.6.1)
#>  httr          1.4.0   2018-12-11 [2] CRAN (R 3.6.1)
#>  jsonlite      1.6     2018-12-07 [2] CRAN (R 3.6.1)
#>  knitr         1.23    2019-05-18 [2] CRAN (R 3.6.1)
#>  lattice       0.20-38 2018-11-04 [2] CRAN (R 3.6.1)
#>  lazyeval      0.2.2   2019-03-15 [2] CRAN (R 3.6.1)
#>  lubridate     1.7.4   2018-04-11 [2] CRAN (R 3.6.1)
#>  magrittr      1.5     2014-11-22 [2] CRAN (R 3.6.1)
#>  memoise       1.1.0   2017-04-21 [2] CRAN (R 3.6.1)
#>  modelr        0.1.4   2019-02-18 [2] CRAN (R 3.6.1)
#>  munsell       0.5.0   2018-06-12 [2] CRAN (R 3.6.1)
#>  nlme          3.1-140 2019-05-12 [2] CRAN (R 3.6.1)
#>  pillar        1.4.2   2019-06-29 [2] CRAN (R 3.6.1)
#>  pkgbuild      1.0.3   2019-03-20 [2] CRAN (R 3.6.1)
#>  pkgconfig     2.0.2   2018-08-16 [2] CRAN (R 3.6.1)
#>  pkgload       1.0.2   2018-10-29 [2] CRAN (R 3.6.1)
#>  prettyunits   1.0.2   2015-07-13 [2] CRAN (R 3.6.1)
#>  processx      3.4.1   2019-07-18 [2] CRAN (R 3.6.1)
#>  ps            1.3.0   2018-12-21 [2] CRAN (R 3.6.1)
#>  purrr       * 0.3.2   2019-03-15 [2] CRAN (R 3.6.1)
#>  R6            2.4.0   2019-02-14 [2] CRAN (R 3.6.1)
#>  Rcpp          1.0.2   2019-07-25 [2] CRAN (R 3.6.1)
#>  readr       * 1.3.1   2018-12-21 [2] CRAN (R 3.6.1)
#>  readxl        1.3.1   2019-03-13 [2] CRAN (R 3.6.1)
#>  remotes       2.1.0   2019-06-24 [2] CRAN (R 3.6.1)
#>  rlang         0.4.0   2019-06-25 [2] CRAN (R 3.6.1)
#>  rmarkdown     1.14    2019-07-12 [2] CRAN (R 3.6.1)
#>  rprojroot     1.3-2   2018-01-03 [2] CRAN (R 3.6.1)
#>  rvest         0.3.4   2019-05-15 [2] CRAN (R 3.6.1)
#>  scales        1.0.0   2018-08-09 [2] CRAN (R 3.6.1)
#>  sessioninfo   1.1.1   2018-11-05 [2] CRAN (R 3.6.1)
#>  stringi       1.4.3   2019-03-12 [2] CRAN (R 3.6.1)
#>  stringr     * 1.4.0   2019-02-10 [2] CRAN (R 3.6.1)
#>  testthat      2.1.1   2019-04-23 [2] CRAN (R 3.6.1)
#>  tibble      * 2.1.3   2019-06-06 [2] CRAN (R 3.6.1)
#>  tidyr       * 0.8.3   2019-03-01 [2] CRAN (R 3.6.1)
#>  tidyselect    0.2.5   2018-10-11 [2] CRAN (R 3.6.1)
#>  tidyverse   * 1.2.1   2017-11-14 [2] CRAN (R 3.6.1)
#>  usethis       1.5.1   2019-07-04 [2] CRAN (R 3.6.1)
#>  utf8          1.1.4   2018-05-24 [2] CRAN (R 3.6.1)
#>  vctrs         0.2.0   2019-07-05 [2] CRAN (R 3.6.1)
#>  withr         2.1.2   2018-03-15 [2] CRAN (R 3.6.1)
#>  xfun          0.8     2019-06-25 [2] CRAN (R 3.6.1)
#>  xml2          1.2.0   2018-01-24 [2] CRAN (R 3.6.1)
#>  yaml          2.2.0   2018-07-25 [2] CRAN (R 3.6.1)
#>  zeallot       0.1.0   2018-01-28 [2] CRAN (R 3.6.1)
#> 
#> [1] /home/rots/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /opt/R/3.6.1/lib/R/library
1 Like

Float point arithmetic is always tricky and I think that is exactly what is happening here. Since computers don't actually have a notion of float point, 0.91 - 0.81 is not 0.1 exactly, but can be something like 0.10000000000001.

I would bet that this is what is happening here and I don't think there is ready-made solution of how to avoid situations like these.

2 Likes

Thank you for this, much appreciated.
Why is the number of trailing zeros / significant digits presented in a tibble a floating point issue?

Because as far as computers are concerned, there is no such thing as a float. All of them are stored with some sort of error. As to why tibble specifically is so careful about printing those significant digits I can't say, but you can peer a bit more if you set pillar significant digits to a high number:

library(tibble)
library(magrittr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tibbledata = tibble(x = 0.81, y = 0.91)

res <- tibbledata %>% 
  mutate(y-x)

options(pillar.sigfig = 20)

res
#> # A tibble: 1 x 3
#>                          x     y                  `y - x`
#>                      <dbl> <dbl>                    <dbl>
#> 1         0.81000000000000  0.91      0.09999999999999998

Created on 2019-08-13 by the reprex package (v0.3.0)

As you can see, the reality is even freakier then initial example.

There has been a lot of discussion on default rounding rules for tibbles and is this just not another example.
I don't think the current behaviour makes sense.
We are simply trying to teach good data science principles around rounding.
It is difficult to explain this behaviour to people.
Don't see it as a floating point issue ->

library(tibble)
library(magrittr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tibbledata = tibble(x = c(0.81, 0.71), y = c(0.91, 0.91))

res <- tibbledata %>% 
  mutate(y-x)
res
#> # A tibble: 2 x 3
#>       x     y `y - x`
#>   <dbl> <dbl>   <dbl>
#> 1  0.81  0.91   0.100
#> 2  0.71  0.91   0.2
options(pillar.sigfig = 20)
res
#> # A tibble: 2 x 3
#>                          x     y                  `y - x`
#>                      <dbl> <dbl>                    <dbl>
#> 1         0.81000000000000  0.91      0.09999999999999998
#> 2         0.71              0.91      0.20000000000000008

Created on 2019-08-13 by the reprex package (v0.3.0)

If you think that this is a bug, you can always file an issue on github, maybe there is another answer that I don't know about.

Fundamentally, I agree with you that 0.100 and 0.1 are not the same and it's misleading to print 0.100. However, I've explained the underlying reason. It is not unique to R in any way, it's just how computers work - https://0.30000000000000004.com/.

1 Like

Thanks very much both for the discussion and extra examples.
What I still don't understand is why in Ewen's last example, 0.100 gets the extra zeros, whereas 0.2 in the same column doesn't?
Anyone? :slight_smile:

I suspect this is because decimal 0.2 can be represented exactly in binary whereas decimal 0.1 requires a float in binary.

EDIT: Actually, I think my explanation is wrong, but I'll leave it up anyway.