Using lapply with data.table to generate several outputs at once.

I have a data.table with several variables (columns) and their standard errors. I'd like to calculate several things such as the weighted mean, the mean of the standard error, p-values...

Toy example:

myDT <- data.table(ID=1:8, AA=21:28, BB=31:38, CC=41:48, stdAA=(11:18)/10, stdBB=(11:18)/10, stdCC=(11:18)/10)
mycol <- c("AA", "BB", "CC")   # Select these ones.
OUTPUT1 <- myDT[,.(Variables=mycol,Estimate=lapply(mycol, function (x) weighted.mean(get(x),1/(get(paste0("std",x)))^2 ) ),
   Std.Error=lapply(mycol, function (x) mean(get(paste0("std",x)))))]  
OUTPUT1[,tvalue := as.numeric(Estimate)/as.numeric(Std.Error) ][,pvalue := 2*pt(-abs(tvalue),df=7)]

As you see I've used "as.numeric" because otherwise I'm getting the error "non-numeric argument to binary operator",
I guess the problem is I the data.table contains lists instead of vectors.
I would like to get the same but simpler, inserting everything inside the same lapply.

How can I do it?

The following code doesn't produce the expected output because it inserts a vector on each element of the second column, they I can't further work with it.

OUTPUT2 <- myDT[,.(mycol,lapply(mycol, function (x) 
  c(weighted.mean(get(x),1/(get(paste0("std",x)))^2 ),
   mean(get(paste0("std",x))) )))]

Maybe I need to add some kind of rbindlist but I haven't be able to do it properly.

My next try works though it's not shorter. (And I haven't event added columnames because they get removed, I need to add them later)

OUTPUT3 <- data.table(mycol,t(myDT[,c(lapply(mycol, function (x) .(weighted.mean(get(x),1/(get(paste0("std",x)))^2 ), mean(get(paste0("std",x))))))]))
OUTPUT3[,tvalue := as.numeric(V1)/as.numeric(V2) ][,pvalue := 2*pt(-abs(tvalue),df=7)]

It would be great to have all calculations inside a single lapply.

I'm sorry but I have unintentionally undo the modifications you have made to the format and I don't know how to go back.

I changed the code in blockquotes to formatted code blocks (using the </> button), which is how code is supposed to be presented. This will make it more likely that somebody will respond to your query..

The problem here is untidy data. It's a bad sign when code needs to do string manipulation of names for anything other than making them look pretty. So the first step is to clean this data.

values <- melt(myDT, "ID", mycol)
head(values)
#    ID variable value
# 1:  1       AA    21
# 2:  2       AA    22
# 3:  3       AA    23
# 4:  4       AA    24
# 5:  5       AA    25
# 6:  6       AA    26

stds <- melt(myDT, "ID", paste0("std", mycol))
setnames(stds, "value", "std")
stds[, variable := gsub("std", "", variable)]
head(stds)
#    ID variable std
# 1:  1       AA 1.1
# 2:  2       AA 1.2
# 3:  3       AA 1.3
# 4:  4       AA 1.4
# 5:  5       AA 1.5
# 6:  6       AA 1.6

cleanDT <- values[
  stds,
  on = c("ID", "variable")
]
head(cleanDT)
#    ID variable value std
# 1:  1       AA    21 1.1
# 2:  2       AA    22 1.2
# 3:  3       AA    23 1.3
# 4:  4       AA    24 1.4
# 5:  5       AA    25 1.5
# 6:  6       AA    26 1.6

Now the rest is simple.

OUTPUT1 <- cleanDT[
  ,
  list(
    Estimate  = weighted.mean(value, 1 / std^2),
    Std.Error = mean(std)
  ),
  by = list(variable)
][
  ,
  tvalue := Estimate / Std.Error
][
  ,
  pvalue := 2 * pt(-abs(tvalue), df=7)
]

OUTPUT1
#    variable Estimate Std.Error   tvalue       pvalue
# 1:       AA 23.76617      1.45 16.39046 7.671357e-07
# 2:       BB 33.76617      1.45 23.28702 6.833554e-08
# 3:       CC 43.76617      1.45 30.18357 1.129896e-08
1 Like

It works, thanks, though I would like to know how to do it without first reshaping the data.
How to solve the "nested lists" problem easily.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.