Optimize code for scatter plot generation in R

The executable code below generates a scatter plot that depends on the date (date2) he chooses and three lines are also generated, referring to mean, mean+standard deviation and mean-standard deviation, which are based on the day of the week (Week) that is chosen.

As you can see, I used vector i to generate the mean and standard deviation. But I would like to optimize this, that is, when he chooses the date, he already understands what day of the week it is, so he doesn't need to use this i vector.

For example, I put it to generate scatterplot 10/04/2021, so the code would need to know it's a Saturday, without having to set vector i to 3.

Can you help me with this question?

The link to download the database is:database_test.xlsx - Google Sheets

library(dplyr)
library(ggplot2)
library(tidyr)
library(lubridate)

df<-read_excel('C:/Users/Downloads/database_test1.xlsx')

df<-subset(df,df$date2<df$date1) 

dim_data<-dim(df)

day<-c(seq.Date(from = as.Date(df$date2[1]),
                to = as.Date(df$date2[dim_data[1]]),
                by = "1 day"))

df_grouped <- df %>%
  mutate(across(starts_with("date"), as.Date)) %>% 
  group_by(date2) %>% 
  summarise(Id = first(Id),
            date1 = first(date1),
            Week = first(Week),
            D = first(D),
            D1 = sum(D1)) %>% 
  select(Id,date1,date2,Week,D,D1)

df_grouped <- df_grouped %>% mutate(date1=format(date1,"%d/%m/%Y"),
                                    date2=format(date2,"%d/%m/%Y"))
df_grouped<-data.frame(df_grouped)

DS=c("Thursday","Friday","Saturday") 

i<-3
df_OC<-subset(df_grouped,is.na(D)) 
ds_OC<-subset(df_OC,df_OC$Week==DS[i])

#Mean and Standard Deviation
mean_Week<-mean(as.numeric(ds_OC[,"D1"]) )
sdeviation_Week<-sd(as.numeric(ds_OC[,"D1"]))

#create scatter plot
scatter_date <- function(dt, dta = df) {
  dta %>%
    filter(date2 == ymd(dt)) %>%
    summarize(across(starts_with("DR"), sum)) %>%
    pivot_longer(everything(), names_pattern = "DR(.+)", values_to = "val") %>%
    mutate(name = as.numeric(name)) %>%
    plot(xlab = "Days", ylab = "Types", xlim = c(0, 7),
         ylim = c((min(.$val) %/% 10) * 10, (max(.$val) %/% 10 + 1) * 15))
    abline(h=mean_Week, col='blue') 
    abline(h=(mean_Week + sdeviation_Week), col='green',lty=2) 
    abline(h=(mean_Week - sdeviation_Week), col='orange',lty=2)
}  

scatter_date("2021-04-10",df)

enter image description here

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.