The irlba package for computing partial SVD returns inconsistent results

I've been using irlba::irlba to do a partial SVD for very large sparse one-hot encoded datasets. The advantage of irlba is that it is efficient for sparse data and allows you to specify center and scale vectors without explicitly forming the intermediate matrix thereby preserving sparsity. base::svd can't do this.

reprex() below uses the simple iris data to show that explicitly scaling with irlba(scale(N), ...) produces the correct result, while using the scale and center arguments with irlba(N, center = colMeans(N), scale = apply(N, 2, sd), ...) produces an incorrect result.

Is this a bug, or am I doing something wrong? Any help appreciated.

library(irlba)
N <- iris[-5]
str(N)
#> 'data.frame':    150 obs. of  4 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

a <- irlba(scale(N), nv = 2, nu = 2)
str(a)
#> List of 5
#>  $ d    : num [1:2] 20.9 11.7
#>  $ u    : num [1:150, 1:2] -0.1082 -0.0995 -0.113 -0.1099 -0.1142 ...
#>  $ v    : num [1:4, 1:2] 0.521 -0.269 0.58 0.565 -0.377 ...
#>  $ iter : num 0
#>  $ mprod: num 0
biplot(a$u, a$v)


b <- irlba(N, center = colMeans(N), scale = apply(N, 2, sd), nv = 2, nu = 2)
str(b)
#> List of 5
#>  $ d    : num [1:2] 67.9 27.7
#>  $ u    : num [1:150, 1:2] 0.01973 -0.05465 0.06383 0.00869 0.02097 ...
#>  $ v    : num [1:4, 1:2] -0.7531 -0.1218 -0.6345 -0.1242 0.0144 ...
#>  $ iter : num 0
#>  $ mprod: num 0
biplot(b$u, b$v)

Created on 2019-01-08 by the reprex package (v0.2.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.