Using quantile function in a large data.table

Hi there,

I want to calculate the 5%, 25%, 50%, 75% and 95% quantiles (percentiles) for my data.table for every datetime (yellow). So I get five values for every time in a day.

I tried:

quantile(dt.data, c(.05, .25, .5, .75, .95))

but I get this Error: Error in [.data.frame(x, i) : undefined columns selected

Maybe I have to use something like this:

out <- dt.data[ ,lapply(.SD, quantile, prob = c(.05, .25, .5, .75, .95), na.rm = TRUE),  by = c("date")]

What's wrong?

Greetings, Christina

See https://stackoverflow.com/questions/16846380/apply-a-function-to-every-specified-column-in-a-data-table-and-update-by-referen . Maybe this works.

The error tells you the problem: you need to specify over which columns to apply the function.

You have edited the code while have been writing this, so in the absence of a reproducible example I think something like this should work:

out <- dt.data[ ,lapply(.SD, quantile, prob = c(.05, .25, .5, .75, .95), na.rm = TRUE),  .SDcols = grep("V", names(dt.data), value = TRUE)]

There may be a shorter version of the .SDcols specification.

Hm, with your code I get a table for the v's from 1 to 31.

I need it per quantiles, so it looks like this:

image

In that case I suspect you need to reshape the data first via melt().

We'll need a reproducible example:
FAQ: How to do a minimal reproducible example ( reprex ) for beginners - meta / Guides & FAQs - RStudio Community

Thanks, it worked with:

dt.data[, as.list(quantile(.SD, c(.05, .25, .5, .75, .95), na.rm = TRUE)), by c("date")]

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.