Decrease processing time using data.table in an R code

I would like some help in relation to decreasing the processing time of the code below. In this example it took 28 seconds, however, I have a much larger code, which takes much longer. Before I used pivot_longer to calculate adjusted1, and after I used data.table, it improved considerably. However, I would like to know if there is any way to improve it, for example, leaving the adjusted2 output in data.table.

library(dplyr)
library(tidyr)
library(lubridate)
library(data.table)
library(tictoc)

#database
df1 <- data.frame( 
                   date1 =  as.Date( "2021-12-01"),
                   date2= rep(seq( as.Date("2021-01-01"), length.out=27500, by=1), each = 2),
                   Category = rep(c("ABC", "EFG"), length.out = 55000),
                   Week = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
                                "Saturday", "Sunday"), length.out = 55000),
                   DR1 = sample( 200:250, 55000, repl=TRUE),  
                   setNames( replicate(365, { sample(0:55000, 55000)}, simplify=FALSE),
                             paste0("DRM", formatC(1:365, width = 2, format = "d", flag = "0"))))
df1<-as.data.table(df1)

dmda<-"2021-12-10"
code<-"ABC"

tic()        

adjusted1<-melt(df1[date2 == dmda & Category == code][, 
lapply(.SD, sum, na.rm = TRUE), by = Category, 
.SDcols = patterns("^DRM")],
id.var = "Category", variable.name = "name", value.name = "val")[, 
name := readr::parse_number(as.character(name))][]
colnames(adjusted1)[-1]<-c("days","numbers")

adjusted2 <- adjusted1 %>% 
  group_by(Category) %>% 
  slice((ymd(dmda) - min(as.Date(df1$date1) [
    df1$Category == first(Category)])):max(days)+1) %>%
  ungroup%>%data.frame()


if(any(table(adjusted2$numbers) >= 3)& length(unique(adjusted2$numbers)) == 1){
  yz <- unique(adjusted2$numbers)
  var<-as.numeric(yz)
}else  
model <- lm(numbers ~ I(days^2), adjusted2)
coef<-max(coef(model)[1], 0)
var<-as.numeric(coef)

toc()   
0.28 sec elapsed

{profvis} will be helpful in determining where you should look to optimize your code. I just ran profvis() on your code and it looks like constructing your data frame at the top, specifically running sample takes the bulk of the time - are you trying to speed that up too? Or is that just sample data used to replicate your wrangling code below that?

Thanks @michaelbgarcia!

It's just a sample to replicate the code. But I'm using a real database with similar specifications to the one I presented to you. The idea is to try to reduce this computational time a little more. I've already managed to use data.table in adjusted1, which helped. Could it be that in adjusted2, I can do something in this sense too?

I think it would be helpful to run the profiler against your real data to get a better sense. This sample data is so small I feel like we are splitting hairs. If the data are large enough, converting to data.table will help in adjusted2 (or just plug in {dtplyr} per my previous post to you :wink: )

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.