Using lapply with data.table to generate several outputs at once.

skan · February 5, 2019, 7:27pm

I have a data.table with several variables (columns) and their standard errors. I'd like to calculate several things such as the weighted mean, the mean of the standard error, p-values...

Toy example:

myDT <- data.table(ID=1:8, AA=21:28, BB=31:38, CC=41:48, stdAA=(11:18)/10, stdBB=(11:18)/10, stdCC=(11:18)/10)
mycol <- c("AA", "BB", "CC")   # Select these ones.

OUTPUT1 <- myDT[,.(Variables=mycol,Estimate=lapply(mycol, function (x) weighted.mean(get(x),1/(get(paste0("std",x)))^2 ) ),
   Std.Error=lapply(mycol, function (x) mean(get(paste0("std",x)))))]  
OUTPUT1[,tvalue := as.numeric(Estimate)/as.numeric(Std.Error) ][,pvalue := 2*pt(-abs(tvalue),df=7)]

As you see I've used "as.numeric" because otherwise I'm getting the error "non-numeric argument to binary operator",
I guess the problem is I the data.table contains lists instead of vectors.
I would like to get the same but simpler, inserting everything inside the same lapply.

How can I do it?

The following code doesn't produce the expected output because it inserts a vector on each element of the second column, they I can't further work with it.

OUTPUT2 <- myDT[,.(mycol,lapply(mycol, function (x) 
  c(weighted.mean(get(x),1/(get(paste0("std",x)))^2 ),
   mean(get(paste0("std",x))) )))]

Maybe I need to add some kind of rbindlist but I haven't be able to do it properly.

My next try works though it's not shorter. (And I haven't event added columnames because they get removed, I need to add them later)

OUTPUT3 <- data.table(mycol,t(myDT[,c(lapply(mycol, function (x) .(weighted.mean(get(x),1/(get(paste0("std",x)))^2 ), mean(get(paste0("std",x))))))]))
OUTPUT3[,tvalue := as.numeric(V1)/as.numeric(V2) ][,pvalue := 2*pt(-abs(tvalue),df=7)]

It would be great to have all calculations inside a single lapply.

skan · February 5, 2019, 7:51pm

I'm sorry but I have unintentionally undo the modifications you have made to the format and I don't know how to go back.

martin.R · February 5, 2019, 8:09pm

I changed the code in blockquotes to formatted code blocks (using the </> button), which is how code is supposed to be presented. This will make it more likely that somebody will respond to your query..

nwerth · February 5, 2019, 8:11pm

The problem here is untidy data. It's a bad sign when code needs to do string manipulation of names for anything other than making them look pretty. So the first step is to clean this data.

values <- melt(myDT, "ID", mycol)
head(values)
#    ID variable value
# 1:  1       AA    21
# 2:  2       AA    22
# 3:  3       AA    23
# 4:  4       AA    24
# 5:  5       AA    25
# 6:  6       AA    26

stds <- melt(myDT, "ID", paste0("std", mycol))
setnames(stds, "value", "std")
stds[, variable := gsub("std", "", variable)]
head(stds)
#    ID variable std
# 1:  1       AA 1.1
# 2:  2       AA 1.2
# 3:  3       AA 1.3
# 4:  4       AA 1.4
# 5:  5       AA 1.5
# 6:  6       AA 1.6

cleanDT <- values[
  stds,
  on = c("ID", "variable")
]
head(cleanDT)
#    ID variable value std
# 1:  1       AA    21 1.1
# 2:  2       AA    22 1.2
# 3:  3       AA    23 1.3
# 4:  4       AA    24 1.4
# 5:  5       AA    25 1.5
# 6:  6       AA    26 1.6

Now the rest is simple.

OUTPUT1 <- cleanDT[
  ,
  list(
    Estimate  = weighted.mean(value, 1 / std^2),
    Std.Error = mean(std)
  ),
  by = list(variable)
][
  ,
  tvalue := Estimate / Std.Error
][
  ,
  pvalue := 2 * pt(-abs(tvalue), df=7)
]

OUTPUT1
#    variable Estimate Std.Error   tvalue       pvalue
# 1:       AA 23.76617      1.45 16.39046 7.671357e-07
# 2:       BB 33.76617      1.45 23.28702 6.833554e-08
# 3:       CC 43.76617      1.45 30.18357 1.129896e-08

skan · February 5, 2019, 8:55pm

It works, thanks, though I would like to know how to do it without first reshaping the data.
How to solve the "nested lists" problem easily.

system · February 26, 2019, 8:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.