# Using quantile function in a large data.table

Hi there,

I want to calculate the 5%, 25%, 50%, 75% and 95% quantiles (percentiles) for my data.table for every datetime (yellow). So I get five values for every time in a day.

I tried:

``````quantile(dt.data, c(.05, .25, .5, .75, .95))
``````

but I get this Error: Error in `[.data.frame`(x, i) : undefined columns selected

Maybe I have to use something like this:

``````out <- dt.data[ ,lapply(.SD, quantile, prob = c(.05, .25, .5, .75, .95), na.rm = TRUE),  by = c("date")]
``````

What's wrong?

Greetings, Christina

The error tells you the problem: you need to specify over which columns to apply the function.

You have edited the code while have been writing this, so in the absence of a reproducible example I think something like this should work:

``````out <- dt.data[ ,lapply(.SD, quantile, prob = c(.05, .25, .5, .75, .95), na.rm = TRUE),  .SDcols = grep("V", names(dt.data), value = TRUE)]
``````

There may be a shorter version of the .SDcols specification.

Hm, with your code I get a table for the v's from 1 to 31.

I need it per quantiles, so it looks like this:

In that case I suspect you need to reshape the data first via `melt()`.

We'll need a reproducible example:
FAQ: How to do a minimal reproducible example ( reprex ) for beginners - meta / Guides & FAQs - RStudio Community

Thanks, it worked with:

``````dt.data[, as.list(quantile(.SD, c(.05, .25, .5, .75, .95), na.rm = TRUE)), by c("date")]

``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.