Vectorisation or numpy like variable in R to speed calculation

karo · July 9, 2020, 1:16pm

I have a y variable that contains 160 000 collums that I am going to use in mediator analysis. How can I speed thing up? Is it possible to use vectorisation like in numpy? Where I can use these commands,

'''
model.0=lm(ydata ~ vlbw)
summary(model.0)
model.M = lm(iq ~ vlbw)
summary(model.M)
model.Y=lm(ydata ~ vlbw + iq)
results = mediate(model.M,model.Y,treat = 'vlbw',mediator = 'iq',boot = T,sims=500)
coefs=extract_mediation_summary(summary(results))
p[vector]=coefs[1,4]
'''

Where I plug in the entire y variable as depend variable? What data format can I use for y?

nirgrahamuk · July 9, 2020, 2:52pm

R is inherently vectorised.
Has your code been particularily slow?
Your question reads as though you are anticipating problems before encountering them.
I found it confusing that you say you have a variable with 160000 columns. Do you mean that you have a dataframe? Or did you mean values/entries rather than columns?

karo · July 9, 2020, 4:41pm

I have 160 000 values for cortical thickness, cortical areas for each persons that I wished to use as depend variable. I have read the data from freesurfer and it is a dataframe with these dim(df$x)
[1] 163842 1000, and I have age,sex, group for the 1000 persons.

nirgrahamuk · July 9, 2020, 4:47pm

and is there some column/variable in the dataframe that is a signifier of some treatment ( or absence of treatment), that would take the place of vlbw in the example code you shared?

karo · July 9, 2020, 4:52pm

This is neuroimaging where I perform 160 000 statistical testing with vlbw=group and IQ. I use a for loop to go through each testing. Y is values for 160 000 points on the left part of the brain.

nirgrahamuk · July 9, 2020, 4:54pm

ok. Its hard for me to know what you know from what you don't know in R...
What's the most specific question you would like some support with relating to this issue ?

karo · July 9, 2020, 6:51pm

What I basically want to do in reverse. If I have a x vector with number from 1:16000
I can do this in Python y=x.^2. How can i apply lm on all y using single instruction multiple data which is also done in Keras.

nirgrahamuk · July 10, 2020, 8:49am

(ints_to_10 <- 1:10)
(sqrd_result <- ints_to_10 ^2 )

?

nirgrahamuk · July 10, 2020, 9:04am

what makes the data multiple rather than singular, I will assume you mean one dataframe with some grouping variable, and you want a list of models.

This sort of pattern can be used.

library(tidyverse)
options(tibble.print_min = 25)
# list of coefficients 
(coeffs <- 1:10)

(example_data <- 
  tibble(
    xvals = rep(coeffs,10),
    coeff = map(coeffs,~rep(.,10)) %>% unlist
  ) %>% mutate(y=coeff*xvals + 1,
               group=paste0("g",coeff)) %>% select(-coeff))


#make a list of models, one for each group

list_of_lms <- map(unique(example_data$group),
    ~ lm(y~xvals,data=filter(example_data,group==.)))

library(broom)
(analyse_lms <- map_dfr(list_of_lms,
                       ~glance(.)) )

system · July 31, 2020, 9:06am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.