How to include time-delayed weather variables in a classifier?

I'm having difficulties in how to put an idea into practice using classifiers. I am working with variables related to an infectious disease where I am calculating the probability of the disease occurring in a certain location and would like to improve my current accuracy, precipitation and F1-measure results. I obtained data for this disease from a city, where the cases reported in each neighborhood were reported monthly between the years 2000 to 2003. I also used climatic variables (index of vegetation, precipitation and temperature). My dataset in the case is in the format (Neighborhood, Month/Year, Cases, EVI, Precipitation, temperature). The variable Cases I turned into a categorical variable, where “Yes” indicates the neighborhoods that had at least one case, and “No” indicates no reported case.

In my R implementations, I'm applying five classifiers (Random Forest, LDA, Decision Tree, Bayesian generalized linear models and Naive Bayes), my dependent variable is Cases and the others are independent. But I got very low precision, F1 and Recall results (less than 30%).

Here the part code used in R

   data <- dplyr::select(original_data, EVI, Precip, Temperature, Humidity, CasesBin2)
        set.seed(123)
        train <- createDataPartition(data$CasesBin2,
                                 p = 0.85, # % of data going to training
                                 teams = 1,
                                 list = F)
        train.orig <- data[ train,]
        test <- data[-train,]
        #THE. Global options that we will use in all our trained models
    
    ctrl <- trainControl(method = "CV",
                         number = 10,
                         classProbs = TRUE,
                         summaryFunction = twoClassSummary)
    
     
    # Naive Bayes: original data
    
    nb_orig_start <- Sys.time()
    nb_orig <- train(CasesBin2 ~ .,
                     date = train.orig,
                     method = "naive_bayes",
                     trControl = ctrl,
                     metric = "ROC")
    
    nb_orig_end <- Sys.time()
    nb_orig_runtime <- nb_orig_end - nb_orig_start
    nb_orig_runtime
    
    nb_orig_train_pred <- predict(nb_orig,train.orig,type = "prob")
    nb_orig_train <- factor(ifelse(nb_orig_train_pred$Yes > 0.8,"Yes","No"))
    confusionMatrix(nb_orig_train, getElement(train.orig,'Bin2 Cases'), positive="Yes")
    
       
    #F. Random Forest: original data
    rf_orig_start <- Sys.time()
    rf_orig <- train(CasesBin2 ~ .,
                     date = train.orig,
                     method = "rf",
                     trControl = ctrl,
                     metric = "ROC")
    
    rf_orig_end <- Sys.time()
    rf_orig_runtime <- rf_orig_end - rf_orig_start
    rf_orig_runtime
    
    rf_orig_train_pred <- predict(rf_orig,train.orig,type = "prob")
    rf_orig_train <- factor(ifelse(rf_orig_train_pred$Yes > 0.8,"Yes","No"))
    confusionMatrix(rf_orig_train, getElement(train.orig,'Bin2 Cases'), positive="Yes")
 
 
 
       #################################################
    #Naive Bayes Model - Test on original dataset#
    #################################################
    #A. NB Model predictions
    
    nb_orig_pred_start <- Sys.time()
    nb_orig_pred <- predict(nb_orig,test,type = "prob")
    
    #B. NB - Assign class to probabilities
    
    nb_orig_test <- factor(ifelse(nb_orig_pred$Yes> 0.8,"Yes","No"))
    nb_orig_pred_end <- Sys.time()
    nb_orig_pred_runtime <- nb_orig_pred_end - nb_orig_pred_start
    nb_orig_pred_runtime
    
    confusionMatrix(nb_orig_test, getElement(test,'CasosBin2'), positive="Yes")
    
    #C. NB Save Precision/Recall/F
    
    precision_nbOrig <- posPredValue(nb_orig_test,test$CasosBin2,positive = "Yes")
    recall_nbOrig    <- sensitivity(nb_orig_test,test$CasosBin2,positive = "Yes")
    F1_nbOrig         <- (2 * precision_nbOrig * recall_nbOrig) / (recall_nbOrig + precision_nbOrig)
    
    
    #########################################
    #Random Forest Model - Test on original dataset#
    #########################################
    #A. LR Model predictions
    rf_orig_pred_start <- Sys.time()
    rf_orig_pred <- predict(rf_orig,test,type = "prob")
    
    #B. RF  - Assign class to probabilities
    
    rf_orig_test <- factor(ifelse(rf_orig_pred$Yes> 0.8,"Yes","No"))
    rf_orig_pred_end <- Sys.time()
    rf_orig_pred_runtime <- rf_orig_pred_end - rf_orig_pred_start
    rf_orig_pred_runtime
    
    confusionMatrix(rf_orig_test, getElement(test,'CasosBin2'), positive="Yes")
    
    #C. RF Save Precision/Recall/F
    
    precision_rfOrig <- posPredValue(rf_orig_test,test$CasosBin2,positive = "Yes")
    recall_rfOrig    <- sensitivity(rf_orig_test,test$CasosBin2,positive = "Yes")
    F1_rfOrig   <- (2 * precision_rfOrig * recall_rfOrig) / (recall_rfOrig + precision_rfOrig)

The idea to try to improve results is to deal with two issues:

I think neighborhood relationships between neighborhoods would make differences in my metrics. It would be interesting to define a distance metric between neighborhoods (Euclidean distance between centers, or a binary variable that is 1 if it is a neighbor and 0 if not, or some other metric) and input the fact that a neighboring neighborhood has or not having had dengue in the previous instant of time. However, I don't know how I would do this in my ranking model.

I'm just considering static relationships between the data: giving the values ​​of the input variables in the month to predict the number of cases in the month itself. But I would also like to explore dynamic temporal relationships. Most likely, the number of cases at an instant of time depends on past climatological variables, given the disease cycle. I thought of including in the model, as additional input variables, past weather information. But I don't know how to do it either.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.