Unable to set the names for each list in the list column within the data table.

Ship · February 13, 2023, 10:57am

I have a data table, as shown below.

dt <- data.table(label = c("a", "b", "c", "d", "e"),value = c(list(1),list(2),list(3),list(4),list(5)))

The "value" column is a list column, and currently, each element (which is also a list) in this column doesn't have any names. I want to use the set_names function (or any other similar function) to assign names to these lists in the "value" column.

However, when I tried using the following code, it didn't work.

dt[, value := set_names(value, label)]

I tried to use:

set_names(dt$value, dt$label)

It didn't change the names of the "value" column within the dt data table.

I also tried :

setattr(dt$value, 'names', 'dt$label')

which solved the previous problem, but now I have another issue: if there are other columns that are created using the "value" column, such as dt[, value2 := map(value, as.character)], these other columns are unable to retain the names of the lists in the "value" column.

So the question is : how to set the names for each list in the list column within the data table.

nirgrahamuk · February 13, 2023, 12:11pm

seems like you are fighting with data.table ...

# do data.tables support named list items with named members ?
tibble(v=list("a"=1)) |> str()
data.table(v=list("a"=1)) |> str() # no sign of 'a'
data.table(v=list(list("a"=1))) |> str() # can doubly embed it ; but odder structure now

it seems you found a partial approach by why of setattr's to get something going ...
however ;
I don't have any context for the programming objective you are trying to reach, so I can't speak with much confidence. but I assume that if I was in your shoes I would review the decision to use data.table or to use it in quite the way you initially intended in order to achieve your ultimate goal. I assume that having named list members in a column was only a step on the way towards acheiving what you want, and not the true end goal in and of itself ?

data.table is best for processing the tabular data it was designed for; if your data deviates from that significantly you may well have a rough road to travel.

if its just to have an associated label for a single thing in data.table; you already had that to start with by having the related label column alongside the value column; That seems a perfectly normal and traditional approach to maintaining related facts / across rows etc.

Ship · February 14, 2023, 9:47am

Thank you for answering my questions!!

you are right. Having named list members in a column was only a step and i want to label these list column.
Here are my objectives and the steps I have taken so far.

I have several datasets that I would like to perform some analysis on. To begin, I nest each dataset into one column called "data", which is preceded by a "label" column. The reason for this is that some analysis functions may return multiple outputs, and instead of creating separate columns for each output, I prefer to combine them into one column as a list. This way, I don't need to differentiate between results from different models by assigning different names, such as "model_p" or "ttest_p".

For example, if I perform multiple linear models (for multiple variables) and tidy them using the "map" function, I can extract the p-value and save all of the results in a "model_result" column. The same procedure applies for t-tests. Once I have completed this process for one dataset, I can map it onto other datasets. If I want to extract all p-values, I can use "map(model_result, 'pvalue')". The result will be a list of p-values with the variable names. (I set names for each model in 'my_model' function). I can do the same process for t-tests.

If the lists have labels, I can directly convert them into a data table, with the list names becoming the column names. For example, "label1_model" and "label2_ttest". This way, I can plot or perform other data presentation tasks more easily.

The above is all my thoughts, thank you very much for reading this, if you can point out some problems in my thinking or directly give some more standard practice, I will be very grateful, thanks again.

nirgrahamuk · February 14, 2023, 10:09am

well, it seems that tibbles allows for the sort of name labelling you prefer; I don't think stepping away from data.table is going to be a great loss to you, unless you have final table with millions of records, after nesting. if the tables are ways to manage a few dozen different datasets and model setups, the tibble should be plenty performant I would think.

system · March 7, 2023, 10:09am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.