Arithmetics with extremely small numbers

contefranz · March 20, 2019, 7:34am

Hi there!

I am facing a big issue in computing the cumsum() of a vector. The vector has a length of ~ 10,000 elements and from element say 2000 the values go down to 1e-310. To give a feeling of the distribution I am dealing with, here is a plot.

When I try to apply cumsum() I get lots of ones, which is impossible, and a minimum value around 10^-2. I searched over stack overflow and found these two posts:

None of them helped me out.
I am not even sure how to reproduce this so I am happy to share my 9137 x 2 matrix.
Looking forward to hearing from you guys!
Thanks

Yarnabrina · March 20, 2019, 8:02am

Welcome to our community!

As it is mentioned in the question of the 1st link you posted, it's just a result of numeric precision of R. I didn't know about the Rmpfr package, but it seems very useful. What are the problems you're facing using it?

It'll be better to express these problems in form of a reproducible example. In case you don't know how to make one, here's a great post:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Just out of curiosity, why is the dimension 9137\times2? You've 9137 observations I guess, but why 2, and not 1?

mishabalyasin · March 20, 2019, 8:13am

This is a general issue, not really R specific. Usually, you solve it by applying logarithm, summing the results, and then converting the final result back. Is it possible to do something like this in your case?

Fer · March 20, 2019, 8:23am

Hi, just the stupid comment of the day, focused also to others that could say why is not good/worth etc...

What about splitting the numbers to characters, split them by character, row bind onto a matrix, do cumsum by column and collate back onto a very long string? It would be a huge matrix to work with, but no accumulation of rounding errors due to R.. no?
Also, each cumsum should be made a string, and split, to send the bigger numbers than 9 to the next column

contefranz · March 20, 2019, 8:35am

Simply put, I use the function mpfr() to define the precision of my initial matrix with no luck. Even if I put 200 digits this does not affect my final result whatsoever.

To answer your question, I need the second column for later operations. So in principle, this could a vector per se. It's just handy to have it stored in a matrix.

Here is a minimal for you:

y <- sample( BestPair, 100 )
dput( y )

c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319, 
2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301, 
1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305, 
8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608, 
0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310, 
2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299, 
0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304, 
8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311, 
1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313, 
3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310, 
1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318, 
0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306, 
8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291, 
4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298, 
9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311, 
3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311, 
2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345, 
1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319, 
3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302, 
1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313, 
5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309, 
0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312, 
0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313, 
8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311, 
6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318, 
3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295, 
0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305, 
1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894, 
6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319, 
6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319, 
7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303, 
4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314, 
3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318, 
3.66277772043574e-304)

Yarnabrina · March 20, 2019, 8:50am

It's the first time I noticed dput on this community.

I think mpfr works. You should try higher precision, more than 320 for this sample. Check below.

library(Rmpfr)
#> Loading required package: gmp
#> 
#> Attaching package: 'gmp'
#> The following objects are masked from 'package:base':
#> 
#>     %*%, apply, crossprod, matrix, tcrossprod
#> C code of R package 'Rmpfr': GMP using 64 bits per limb
#> 
#> Attaching package: 'Rmpfr'
#> The following object is masked from 'package:gmp':
#> 
#>     outer
#> The following objects are masked from 'package:stats':
#> 
#>     dbinom, dnorm, dpois, pnorm
#> The following objects are masked from 'package:base':
#> 
#>     cbind, pmax, pmin, rbind

y <- c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319, 
       2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301, 
       1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305, 
       8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608, 
       0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310, 
       2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299, 
       0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304, 
       8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311, 
       1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313, 
       3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310, 
       1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318, 
       0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306, 
       8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291, 
       4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298, 
       9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311, 
       3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311, 
       2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345, 
       1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319, 
       3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302, 
       1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313, 
       5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309, 
       0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312, 
       0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313, 
       8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311, 
       6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318, 
       3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295, 
       0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305, 
       1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894, 
       6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319, 
       6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319, 
       7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303, 
       4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314, 
       3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318, 
       3.66277772043574e-304)

min(y)
#> [1] 7.304761e-320

z <- mpfr(x = y,
          precBits = 325)

t <- cumsum(x = z)

tail(x = t)
#> 6 'mpfr' numbers of precision  325   bits 
#> [1] 0.013264632965598830110880157473474127982626669108867645263671875
#> [2] 0.013264632965598830110880157473474127982626669108867645263671875
#> [3] 0.013264632965598830110880157473474127982626669108867645263671875
#> [4] 0.013264632965598830110880157473474127982626669108867645263671875
#> [5] 0.013264632965598830110880157473474127982626669108867645263671875
#> [6] 0.013264632965598830110880157473474127982626669108867645263671875

tail(x = y)
#> [1] 2.163099e-295 7.552486e-314 3.163655e-310 1.969615e-305 3.211254e-318
#> [6] 3.662778e-304

^{Created on 2019-03-20 by the reprex package (v0.2.1)}

contefranz · March 20, 2019, 8:58am

Thanks for the support. I tried with logarithm but no luck with that.

I already tried with mpfr() as well but I don't think it's working. Since my vector is a probability, I am expecting to get only one element with value 1. This is not the case, actually I get most of the elements with value 1.

Regarding your example, I think the same is happening to you. You have the very same number for the first 6 elements. If you did a plot of t you would probably see an almost flat distribution like the following one, which is wrong.

Am I explaining myself? What is it that I don't understand here?

contefranz · March 20, 2019, 9:06am

Just to be clear, this is what I am getting out of Matlab. This represents the Cumulative sum of my vector and as you can see it works.

Yarnabrina · March 20, 2019, 9:07am

No, you're right and I was clearly wrong.

I tried with some higher values and by trial and error, 1053 was 1st precision value to generate 100 distinct t value. So, you'll have to figure what works for the total dataset.

library(Rmpfr)
#> Loading required package: gmp
#> 
#> Attaching package: 'gmp'
#> The following objects are masked from 'package:base':
#> 
#>     %*%, apply, crossprod, matrix, tcrossprod
#> C code of R package 'Rmpfr': GMP using 64 bits per limb
#> 
#> Attaching package: 'Rmpfr'
#> The following object is masked from 'package:gmp':
#> 
#>     outer
#> The following objects are masked from 'package:stats':
#> 
#>     dbinom, dnorm, dpois, pnorm
#> The following objects are masked from 'package:base':
#> 
#>     cbind, pmax, pmin, rbind

y <- c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319, 
       2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301, 
       1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305, 
       8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608, 
       0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310, 
       2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299, 
       0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304, 
       8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311, 
       1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313, 
       3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310, 
       1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318, 
       0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306, 
       8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291, 
       4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298, 
       9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311, 
       3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311, 
       2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345, 
       1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319, 
       3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302, 
       1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313, 
       5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309, 
       0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312, 
       0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313, 
       8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311, 
       6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318, 
       3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295, 
       0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305, 
       1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894, 
       6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319, 
       6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319, 
       7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303, 
       4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314, 
       3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318, 
       3.66277772043574e-304)

z <- mpfr(x = y,
          precBits = 1053)

t <- cumsum(x = z)

length(x = unique(x = t))
#> [1] 100

^{Created on 2019-03-20 by the reprex package (v0.2.1)}

Again out of curiosity, how did you plot t? It's a list, and if I unlist it, R uses its own precision.

contefranz · March 20, 2019, 9:13am

So I could try to increase the precision like crazy but that heavily affects speed and efficiency, of course. This is because it requires much more space to store the matrix and apparently, computing the cumsum on mpfr class objects is really slow. This operation has to be done hundreds of times so that is why I am skeptical about this.

Well, that is the plot coming out without applying mpfr(). To check that this was doing something wrong I just inspected the first elements.

system · April 10, 2019, 9:13am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.