Arithmetics with extremely small numbers

Hi there!

I am facing a big issue in computing the cumsum() of a vector. The vector has a length of ~ 10,000 elements and from element say 2000 the values go down to 1e-310. To give a feeling of the distribution I am dealing with, here is a plot.

When I try to apply cumsum() I get lots of ones, which is impossible, and a minimum value around 10^-2. I searched over stack overflow and found these two posts:

  1. https://stackoverflow.com/questions/51490022/r-number-precision-how-to-prevent-rounding
  2. https://stackoverflow.com/questions/2287616/controlling-number-of-decimal-digits-in-print-output-in-r

None of them helped me out.
I am not even sure how to reproduce this so I am happy to share my 9137 x 2 matrix.
Looking forward to hearing from you guys!
Thanks

Welcome to our community!

As it is mentioned in the question of the 1st link you posted, it's just a result of numeric precision of R. I didn't know about the Rmpfr package, but it seems very useful. What are the problems you're facing using it?

It'll be better to express these problems in form of a reproducible example. In case you don't know how to make one, here's a great post:

Just out of curiosity, why is the dimension 9137\times2? You've 9137 observations I guess, but why 2, and not 1?

This is a general issue, not really R specific. Usually, you solve it by applying logarithm, summing the results, and then converting the final result back. Is it possible to do something like this in your case?

2 Likes

Hi, just the stupid comment of the day, focused also to others that could say why is not good/worth etc...

What about splitting the numbers to characters, split them by character, row bind onto a matrix, do cumsum by column and collate back onto a very long string? It would be a huge matrix to work with, but no accumulation of rounding errors due to R.. no?
Also, each cumsum should be made a string, and split, to send the bigger numbers than 9 to the next column

Simply put, I use the function mpfr() to define the precision of my initial matrix with no luck. Even if I put 200 digits this does not affect my final result whatsoever.

To answer your question, I need the second column for later operations. So in principle, this could a vector per se. It's just handy to have it stored in a matrix.

Here is a minimal for you:

y <- sample( BestPair, 100 )
dput( y )

c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319, 
2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301, 
1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305, 
8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608, 
0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310, 
2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299, 
0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304, 
8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311, 
1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313, 
3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310, 
1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318, 
0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306, 
8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291, 
4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298, 
9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311, 
3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311, 
2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345, 
1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319, 
3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302, 
1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313, 
5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309, 
0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312, 
0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313, 
8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311, 
6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318, 
3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295, 
0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305, 
1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894, 
6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319, 
6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319, 
7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303, 
4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314, 
3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318, 
3.66277772043574e-304)

It's the first time I noticed dput on this community.

I think mpfr works. You should try higher precision, more than 320 for this sample. Check below.

library(Rmpfr)
#> Loading required package: gmp
#> 
#> Attaching package: 'gmp'
#> The following objects are masked from 'package:base':
#> 
#>     %*%, apply, crossprod, matrix, tcrossprod
#> C code of R package 'Rmpfr': GMP using 64 bits per limb
#> 
#> Attaching package: 'Rmpfr'
#> The following object is masked from 'package:gmp':
#> 
#>     outer
#> The following objects are masked from 'package:stats':
#> 
#>     dbinom, dnorm, dpois, pnorm
#> The following objects are masked from 'package:base':
#> 
#>     cbind, pmax, pmin, rbind

y <- c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319, 
       2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301, 
       1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305, 
       8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608, 
       0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310, 
       2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299, 
       0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304, 
       8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311, 
       1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313, 
       3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310, 
       1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318, 
       0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306, 
       8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291, 
       4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298, 
       9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311, 
       3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311, 
       2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345, 
       1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319, 
       3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302, 
       1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313, 
       5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309, 
       0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312, 
       0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313, 
       8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311, 
       6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318, 
       3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295, 
       0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305, 
       1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894, 
       6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319, 
       6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319, 
       7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303, 
       4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314, 
       3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318, 
       3.66277772043574e-304)

min(y)
#> [1] 7.304761e-320

z <- mpfr(x = y,
          precBits = 325)

t <- cumsum(x = z)

tail(x = t)
#> 6 'mpfr' numbers of precision  325   bits 
#> [1] 0.013264632965598830110880157473474127982626669108867645263671875
#> [2] 0.013264632965598830110880157473474127982626669108867645263671875
#> [3] 0.013264632965598830110880157473474127982626669108867645263671875
#> [4] 0.013264632965598830110880157473474127982626669108867645263671875
#> [5] 0.013264632965598830110880157473474127982626669108867645263671875
#> [6] 0.013264632965598830110880157473474127982626669108867645263671875

tail(x = y)
#> [1] 2.163099e-295 7.552486e-314 3.163655e-310 1.969615e-305 3.211254e-318
#> [6] 3.662778e-304

Created on 2019-03-20 by the reprex package (v0.2.1)

Thanks for the support. I tried with logarithm but no luck with that.

I already tried with mpfr() as well but I don't think it's working. Since my vector is a probability, I am expecting to get only one element with value 1. This is not the case, actually I get most of the elements with value 1.

Regarding your example, I think the same is happening to you. You have the very same number for the first 6 elements. If you did a plot of t you would probably see an almost flat distribution like the following one, which is wrong.

Am I explaining myself? What is it that I don't understand here?

Just to be clear, this is what I am getting out of Matlab. This represents the Cumulative sum of my vector and as you can see it works.

No, you're right and I was clearly wrong.

I tried with some higher values and by trial and error, 1053 was 1st precision value to generate 100 distinct t value. So, you'll have to figure what works for the total dataset.

library(Rmpfr)
#> Loading required package: gmp
#> 
#> Attaching package: 'gmp'
#> The following objects are masked from 'package:base':
#> 
#>     %*%, apply, crossprod, matrix, tcrossprod
#> C code of R package 'Rmpfr': GMP using 64 bits per limb
#> 
#> Attaching package: 'Rmpfr'
#> The following object is masked from 'package:gmp':
#> 
#>     outer
#> The following objects are masked from 'package:stats':
#> 
#>     dbinom, dnorm, dpois, pnorm
#> The following objects are masked from 'package:base':
#> 
#>     cbind, pmax, pmin, rbind

y <- c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319, 
       2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301, 
       1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305, 
       8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608, 
       0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310, 
       2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299, 
       0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304, 
       8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311, 
       1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313, 
       3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310, 
       1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318, 
       0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306, 
       8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291, 
       4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298, 
       9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311, 
       3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311, 
       2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345, 
       1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319, 
       3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302, 
       1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313, 
       5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309, 
       0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312, 
       0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313, 
       8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311, 
       6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318, 
       3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295, 
       0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305, 
       1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894, 
       6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319, 
       6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319, 
       7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303, 
       4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314, 
       3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318, 
       3.66277772043574e-304)

z <- mpfr(x = y,
          precBits = 1053)

t <- cumsum(x = z)

length(x = unique(x = t))
#> [1] 100

Created on 2019-03-20 by the reprex package (v0.2.1)

Again out of curiosity, how did you plot t? It's a list, and if I unlist it, R uses its own precision.

So I could try to increase the precision like crazy but that heavily affects speed and efficiency, of course. This is because it requires much more space to store the matrix and apparently, computing the cumsum on mpfr class objects is really slow. This operation has to be done hundreds of times so that is why I am skeptical about this.

Well, that is the plot coming out without applying mpfr(). To check that this was doing something wrong I just inspected the first elements.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.