How to add back existing columns to model after building

The issue is not necessarily be memory‐ it could also be due to a 32-bit installation of R, which only supports up to 3GB address space. If your OS supports it, install the 64-bit version of R.

I called activity toy data only because it seemed designed solely to illuminate the issue of manipulating data frames. Because it has four categorical variables and one numeric variable, it's not obvious how a logistic model is being constructed, since each of the categorical variables have unique values. The Big Book of R has many examples and explanations, but finding the appropriate ones requires careful framing of the question.

The goal is to model a dependent variable Y as a function of X_1 \dots X_n where Y is continuous and \forall X_i is categorical.

For example, with Supply_hrs in the role of Y and the remaining variables X_1 \dots X_4, it becomes clear immediately that a logistic model is inappropriate:

> fit <- glm(Supply_hrs ~ ., data = activity, family = "binomial")
Error in eval(family$initialize) : y values must be 0 <= y <= 1`

By contrast, an OLS model will run, although the data characteristics limit its usefulness (all values are unique)

suppressPackageStartupMessages({
  library(dplyr)
})
activity <- data.frame(
  Customer_Name = c("Jane", "Bill", "Fred", "Tina", "Joe"),
  Account_No = c("332", "432", "556", "884", "119"),
  supply_line = c("York", "shark", "Aba", "kwara", "Bethel"),
  Cons = c("0-2300", "2300-4003", "4003-1121", "1121-3022", "3022-1713"),
  Supply_hrs = c(9, 5, 8, 10, 1)
)

fit <- lm(Supply_hrs ~ ., data = activity)
summary(fit)
#> 
#> Call:
#> lm(formula = Supply_hrs ~ ., data = activity)
#> 
#> Residuals:
#> ALL 5 residuals are 0: no residual degrees of freedom!
#> 
#> Coefficients: (12 not defined because of singularities)
#>                   Estimate Std. Error t value Pr(>|t|)
#> (Intercept)              5         NA      NA       NA
#> Customer_NameFred        3         NA      NA       NA
#> Customer_NameJane        4         NA      NA       NA
#> Customer_NameJoe        -4         NA      NA       NA
#> Customer_NameTina        5         NA      NA       NA
#> Account_No332           NA         NA      NA       NA
#> Account_No432           NA         NA      NA       NA
#> Account_No556           NA         NA      NA       NA
#> Account_No884           NA         NA      NA       NA
#> supply_lineBethel       NA         NA      NA       NA
#> supply_linekwara        NA         NA      NA       NA
#> supply_lineshark        NA         NA      NA       NA
#> supply_lineYork         NA         NA      NA       NA
#> Cons1121-3022           NA         NA      NA       NA
#> Cons2300-4003           NA         NA      NA       NA
#> Cons3022-1713           NA         NA      NA       NA
#> Cons4003-1121           NA         NA      NA       NA
#> 
#> Residual standard error: NaN on 0 degrees of freedom
#> Multiple R-squared:      1,  Adjusted R-squared:    NaN 
#> F-statistic:   NaN on 4 and 0 DF,  p-value: NA

Created on 2020-11-14 by the reprex package (v0.3.0.9001)

"Adding back" means rejoining customer name and account number to the model. With your example, how do I add the FIT model to customer name and account number? This issue was brought about by the size of memory.
"Toy data that doesn't seem fit for a logistic regression" OK. can you refer me to a good book that highlights numerous examples of building machine learning models in R and shiny deployment?

About the simplest example I can give of fitting a model on only a few model variables (i.e. reduced columns training set), but having the predictions as well as all original data on a final dataset.

conceptually do you have a very different workflow from this ?

library(tidyverse)

# just for example train on odd rows of iris which has 150 rows
(train0 <- iris[as.logical(1:150 %% 2), ] %>%
  as_tibble(rownames = "rownum"))
# omit rownum, Sepal.Length and Petal.Length from train

(train1 <- select(
  train0,
  -rownum, -Sepal.Length, -Petal.Length
))

# fit a model on train
my_model <- lm(Petal.Width ~ ., data = train1)
summary(my_model)

# can predict on the full data which has all the variables...

iris2 <- iris %>% as_tibble()
iris2$pred_petal_width <- predict(my_model, newdata = iris2)
iris2

Shiny deployment, I can't help you with. It's data science in the same way that PowerPoint is rhetoric in the way it is typically used— p-hacking for the masses. It's great as an EDA tool for users who know what they are about.

A model is a description of a population, not of an observation. When we say that a patient has a 0.02 probability of dying of COVID-19 exposure, that does not mean that an observed patient is 0.02 dead. The patient is either dead or not, 0 or 1. Only if the patient is a random observation from a normal population can we say anything useful about their status.

What you may need to be looking at is classification methods. Given an observation of meter readings and estimates of distribution line capacity, what is the likely status of a meter? For that see, the Irizzary text

I was unable to fit a glm model on Status ~ [anything] with these data. I don't think I will be able to assist further.

Thank you Technocrat, you have been most helpful. Although the dataset which I shared here is not truly a pure representation of the original dataset for privacy reasons. I expected that anyone could simply look at it and know my pain points. I use a 64GB version of R.

After so much trial, this is the error I am getting:

 # remove unwanted columns
model_input_df <- ml[, c(-1, -2,-3,-4,-5,-6,-7)]
glimpse(model_input_df)
#Preliminary casting to the appropriate data type.
model_input_df$Status <- as.factor(model_input_df$Status)
model_input_df$Feeder <- as.character(model_input_df$Feeder)
model_input_df$group_cons <- as.factor(model_input_df$group_cons)
#...........................................................................
#...........................................................................
  #BUILDING THE MACHINE LEARNING MODEL/partitionin the data
  intrain<- createDataPartition(model_input_df$Status,p=0.75,list=FALSE)
function "createDataPartition"
  set.seed(2017)
  training<- model_input_df[intrain,]
  testing<- model_input_df[-intrain,]
#............................................................................
#Confirm the splitting is correct:
  dim(training); dim(testing)
  LogModel <- glm(Status ~ .,data=training,family=binomial, maxit=100)
  print(summary(LogModel))
#...............................................................................
  colnames(model_input_df)
  LogModel <- c(1, 2, 3, 4, 5,6,7,8,9)
  # binding them together using rbind function of Base R 
  final_df <- rbind(ml[, c(-1, -2,-3,-4,-5,-6,-7)], "pred_values" = LogModel)
  head(final_df)

I get the error message:
Warning messages:
1: In [<-.factor(*tmp*, ri, value = 6) :
invalid factor level, NA generated
2: In [<-.factor(*tmp*, ri, value = 7) :
invalid factor level, NA generated
3: In [<-.factor(*tmp*, ri, value = 8) :
invalid factor level, NA generated
4: In [<-.factor(*tmp*, ri, value = 9) :
invalid factor level, NA generated