Vectorisation or numpy like variable in R to speed calculation

I have a y variable that contains 160 000 collums that I am going to use in mediator analysis. How can I speed thing up? Is it possible to use vectorisation like in numpy? Where I can use these commands,

'''
model.0=lm(ydata ~ vlbw)
summary(model.0)
model.M = lm(iq ~ vlbw)
summary(model.M)
model.Y=lm(ydata ~ vlbw + iq)
results = mediate(model.M,model.Y,treat = 'vlbw',mediator = 'iq',boot = T,sims=500)
coefs=extract_mediation_summary(summary(results))
p[vector]=coefs[1,4]
'''

Where I plug in the entire y variable as depend variable? What data format can I use for y?

R is inherently vectorised.
Has your code been particularily slow?
Your question reads as though you are anticipating problems before encountering them.
I found it confusing that you say you have a variable with 160000 columns. Do you mean that you have a dataframe? Or did you mean values/entries rather than columns?

I have 160 000 values for cortical thickness, cortical areas for each persons that I wished to use as depend variable. I have read the data from freesurfer and it is a dataframe with these dim(df$x)
[1] 163842 1000, and I have age,sex, group for the 1000 persons.

and is there some column/variable in the dataframe that is a signifier of some treatment ( or absence of treatment), that would take the place of vlbw in the example code you shared?

This is neuroimaging where I perform 160 000 statistical testing with vlbw=group and IQ. I use a for loop to go through each testing. Y is values for 160 000 points on the left part of the brain.

ok. Its hard for me to know what you know from what you don't know in R...
What's the most specific question you would like some support with relating to this issue ?

What I basically want to do in reverse. If I have a x vector with number from 1:16000
I can do this in Python y=x.^2. How can i apply lm on all y using single instruction multiple data which is also done in Keras.

(ints_to_10 <- 1:10)
(sqrd_result <- ints_to_10 ^2 )

?

what makes the data multiple rather than singular, I will assume you mean one dataframe with some grouping variable, and you want a list of models.

This sort of pattern can be used.

library(tidyverse)
options(tibble.print_min = 25)
# list of coefficients 
(coeffs <- 1:10)

(example_data <- 
  tibble(
    xvals = rep(coeffs,10),
    coeff = map(coeffs,~rep(.,10)) %>% unlist
  ) %>% mutate(y=coeff*xvals + 1,
               group=paste0("g",coeff)) %>% select(-coeff))


#make a list of models, one for each group

list_of_lms <- map(unique(example_data$group),
    ~ lm(y~xvals,data=filter(example_data,group==.)))

library(broom)
(analyse_lms <- map_dfr(list_of_lms,
                       ~glance(.)) )
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.