The executable code below generates a scatter plot that depends on the date (date2
) he chooses and three lines are also generated, referring to mean, mean+standard deviation and mean-standard deviation, which are based on the day of the week (Week
) that is chosen.
As you can see, I used vector i
to generate the mean and standard deviation. But I would like to optimize this, that is, when he chooses the date, he already understands what day of the week it is, so he doesn't need to use this i
vector.
For example, I put it to generate scatterplot 10/04/2021, so the code would need to know it's a Saturday, without having to set vector i
to 3.
Can you help me with this question?
The link to download the database is:database_test.xlsx - Google Sheets
library(dplyr)
library(ggplot2)
library(tidyr)
library(lubridate)
df<-read_excel('C:/Users/Downloads/database_test1.xlsx')
df<-subset(df,df$date2<df$date1)
dim_data<-dim(df)
day<-c(seq.Date(from = as.Date(df$date2[1]),
to = as.Date(df$date2[dim_data[1]]),
by = "1 day"))
df_grouped <- df %>%
mutate(across(starts_with("date"), as.Date)) %>%
group_by(date2) %>%
summarise(Id = first(Id),
date1 = first(date1),
Week = first(Week),
D = first(D),
D1 = sum(D1)) %>%
select(Id,date1,date2,Week,D,D1)
df_grouped <- df_grouped %>% mutate(date1=format(date1,"%d/%m/%Y"),
date2=format(date2,"%d/%m/%Y"))
df_grouped<-data.frame(df_grouped)
DS=c("Thursday","Friday","Saturday")
i<-3
df_OC<-subset(df_grouped,is.na(D))
ds_OC<-subset(df_OC,df_OC$Week==DS[i])
#Mean and Standard Deviation
mean_Week<-mean(as.numeric(ds_OC[,"D1"]) )
sdeviation_Week<-sd(as.numeric(ds_OC[,"D1"]))
#create scatter plot
scatter_date <- function(dt, dta = df) {
dta %>%
filter(date2 == ymd(dt)) %>%
summarize(across(starts_with("DR"), sum)) %>%
pivot_longer(everything(), names_pattern = "DR(.+)", values_to = "val") %>%
mutate(name = as.numeric(name)) %>%
plot(xlab = "Days", ylab = "Types", xlim = c(0, 7),
ylim = c((min(.$val) %/% 10) * 10, (max(.$val) %/% 10 + 1) * 15))
abline(h=mean_Week, col='blue')
abline(h=(mean_Week + sdeviation_Week), col='green',lty=2)
abline(h=(mean_Week - sdeviation_Week), col='orange',lty=2)
}
scatter_date("2021-04-10",df)