mbpca Rpackage - multiblocks PCA

Hello,

I have 3 biofluids for which a group of fatty acids (almost the same in each biological matrice) has been measured, and i would like to use a reduction dimension method taking into account that variables are coming from 3 differents blocks (=biofluids). I already performed a simple PCA but i am trying to find a method that is more appropriate considering the 3 blocks aspect. I found that mbpca package (see mbpca Rdocumentation : https://www.rdocumentation.org/packages/mogsa/versions/1.6.4/topics/mbpca) is adapted for this kind of problematic proposing three different methods : consensus PCA (CPCA), generalized CCA (GCCA) or multiple co-inertia analsyis (MCIA). I read the article entitled "A framework for sequential multiblock component methods" https://doi.org/10.1002/cem.811 but i am not able to fully understand the differences between methods and to find which one is suitable for my case.
I would be grateful if someone could clarify this to me.

Thanks in advance
Aline

The example from the vignette uses microarray data for NCI-60 cell lines from different platforms. Seems analogous.

library(mogsa)
data(NCI60_4arrays)
sapply(NCI60_4arrays, dim)
#>      agilent hgu133 hgu133p2 hgu95
#> [1,]     300    298      268   288
#> [2,]      60     60       60    60
tumorType <- sapply(strsplit(colnames(NCI60_4arrays$agilent), split = "\\."), "[", 1)
colcode <- as.factor(tumorType)
levels(colcode) <- c(
  "red", "green", "blue", "cyan", "orange",
  "gray25", "brown", "gray75", "pink"
)
colcode <- as.character(colcode)
moa <- mbpca(NCI60_4arrays,
  ncomp = 10, k = "all", method = "globalScore", option = "lambda1",
  center = TRUE, scale = FALSE, moa = TRUE, svd.solver = "fast", maxiter = 1000
)
#> calculating component 1 ...
#> calculating component 2 ...
#> calculating component 3 ...
#> calculating component 4 ...
#> calculating component 5 ...
#> calculating component 6 ...
#> calculating component 7 ...
#> calculating component 8 ...
#> calculating component 9 ...
#> calculating component 10 ...

plot(moa, value = "eig", type = 2)


r <- bootMbpca(moa, mc.cores = 1, B = 20, replace = FALSE, resample = "sample")
#> method is set to 'globalScore'.


plot(moa, value = "eig", type = 2)


moas <- mbpca(NCI60_4arrays,
  ncomp = 3, k = 0.1, method = "globalScore", option = "lambda1",
  center = TRUE, scale = FALSE, moa = TRUE, svd.solver = "fast", maxiter = 1000
)
#> calculating component 1 ...
#> calculating component 2 ...
#> calculating component 3 ...
scr <- moaScore(moa)
scrs <- moaScore(moas)
diag(cor(scr[, 1:3], scrs))
#>       PC1       PC2       PC3 
#> 0.9741884 0.9889647 0.9546203
layout(matrix(1:2, 1, 2))
plot(scrs[, 1:2], col = colcode, pch = 20)
legend("topright", legend = unique(tumorType), col = unique(colcode), pch = 20)
plot(scrs[, 2:3], col = colcode, pch = 20)


gap <- moGap(moas, K.max = 12, cluster = "hcl")

layout(matrix(1, 1, 1))

hcl <- hclust(dist(scrs))
cls <- cutree(hcl, k = 4)
clsColor <- as.factor(cls)
levels(clsColor) <- c("red", "blue", "orange", "pink")
clsColor <- as.character((clsColor))
heatmap(t(scrs[hcl$order, ]), ColSideColors = colcode[hcl$order], Rowv = NA, Colv = NA)

heatmap(t(scrs[hcl$order, ]), ColSideColors = clsColor[hcl$order], Rowv = NA, Colv = NA)

genes <- moaCoef(moas)
genes$nonZeroCoef$agilent.V1.neg
#>                            id         coef
#> FGD3_agilent     FGD3_agilent -0.105242702
#> TMC6_agilent     TMC6_agilent -0.045236957
#> GMFG_agilent     GMFG_agilent -0.042502839
#> IQGAP2_agilent IQGAP2_agilent -0.001185483

Created on 2023-06-22 with reprex v2.0.2

Thank you for your help !

I tried the function mbpca as shown above but i have an error. I think it's because my dataframe is not appropriate.
In the Rdocumentation it is said that we shoud implement : A list of matrix or data.frame , where rows are variables and columns are samples. The columns among the matrices need to be match but the variables do not need to be.
In my case, for now, i have a matrice with the numeric values of each individual in rows and the variables in columns: each fatty acids from each biofluid is named differently (see below).
Should i have to swap rows and colomns ? or is there something else to do ?

i tried to transpose my matrice but the function still doesn't work : i have the error message : Error in svd.sol(tab) : infinite or missing values in 'x'.

Code used :
MBPCA <- mbpca(transposed_data, ncomp=10, method="globalScore", k = "all", center = TRUE, scale = TRUE, option = "uniform", maxiter = 1000, moa = TRUE, verbose = TRUE, svd.solver = "svd")
Aline

library(mogsa)
# data used in example
data(NCI60_4arrays)
# it's a list
is.list(NCI60_4arrays)
#> [1] TRUE
# of data frames
is.data.frame(NCI60_4arrays[1][[1]])
#> [1] TRUE
# like this one (300 columns trimmed to 5 for display here)
head(NCI60_4arrays[1][[1]])[1:5]
#>         BR.MCF7 BR.MDA_MB_231 BR.HS578T BR.BT_549 BR.T47D
#> ST8SIA1   -6.97         -5.51     -7.30     -6.11   -6.59
#> YWHAQ      5.19          5.83      5.12      5.64    4.40
#> EPHA4     -2.27         -2.62      1.02      2.42    0.10
#> GTPBP5     0.11          0.00     -0.03      0.19    0.26
#> PVR       -0.92          1.28      1.46      1.69    0.24
#> ATP6V1H    1.52          1.42      2.07      2.63    2.73
# names of the data frames
sapply(NCI60_4arrays,names) |> colnames()
#> [1] "agilent"  "hgu133"   "hgu133p2" "hgu95"
# dimensions of the data frames
sapply(NCI60_4arrays, dim)
#>      agilent hgu133 hgu133p2 hgu95
#> [1,]     300    298      268   288
#> [2,]      60     60       60    60

So, to proceed from a single matrix, m, try the following to create an argument, l to mbpca(). as follows

d <- as.data.frame(m)
l <- list(d)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.