I am working on a project that is using 5 hour ms time periods (continuous predictor) to determine the H2S concentration (continuous response). I want to determine if the type of experiment (with a sponge or without) Has an effect on the uptake rate of H2S concentration. I attempted to run an ANCOVA. But my data was not normal. I have attached my code.
>class(Average5HourDropData$Type)
>attach (Average5HourDropData)
>boxplot (Concentration ~ Type)
#Does not look normal. even after transformations.
#Now doing GAM
>library('mgcv')
>gam1<-gam(Concentration~s(Time)+factor(Type),data = Average5HourDropData)
>gam2 <- gam (Concentration~s(Time)+ s(Time, by=factor(Type)) + factor(Type), data=Average5HourDropData)
>AIC(gam1,gam2)
#df AIC
#gam1 11.92811 204265.6
#gam2 20.99418 116060.9
>summary(gam2)
#Family: gaussian
#Link function: identity
#Formula:
#Concentration ~ s(Time) + s(Time, by = factor(Type)) + factor(Type)
#Parametric coefficients:
#Estimate Std. Error t value Pr(>|t|)
#(Intercept) 43.52397 0.01777 2449 <2e-16 ***
#factor(Type)Sponge 38.27730 0.02513 1523 <2e-16 ***
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Approximate significance of smooth terms:
#edf Ref.df F p-value
#s(Time) 0.6667 0.6667 1471 <2e-16 ***
#s(Time):factor(Type)Control 8.6618 8.6667 1253 <2e-16 ***
#s(Time):factor(Type)Sponge 8.6657 8.6667 11594 <2e-16 ***
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Rank: 28/29
#R-sq.(adj) = 0.994 Deviance explained = 99.4%
#GCV = 4.2744 Scale est. = 4.2713 n = 27050
>anova (gam2)
#Family: gaussian
#Link function: identity
#Formula:
#Concentration ~ s(Time) + s(Time, by = factor(Type)) + factor(Type)
#Parametric Terms:
#df F p-value
#factor(Type) 1 2319697 <2e-16
#Approximate significance of smooth terms:
#edf Ref.df F p-value
#s(Time) 0.6667 0.6667 1471 <2e-16
#s(Time):factor(Type)Control 8.6618 8.6667 1253 <2e-16
#s(Time):factor(Type)Sponge 8.6657 8.6667 11594 <2e-16
>gam.check(gam2)
#Method: GCV Optimizer: magic
#Smoothing parameter selection converged after 20 iterations.
#The RMS GCV score gradient at convergence was 4.213637e-07 .
#The Hessian was positive definite.
#Model rank = 28 / 29
#Basis dimension (k) checking results. Low p-value (k-index<1) may
#indicate that k is too low, especially if edf is close to k'.
#k' edf k-index p-value
#s(Time) 9.000 0.667 0.5 <2e-16 ***
#s(Time):factor(Type)Control 9.000 8.662 0.5 <2e-16 ***
#s(Time):factor(Type)Sponge 9.000 8.666 0.5 <2e-16 ***
#Signif. codes: 0 ‘***’0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>plot(gam2)
#Summary plot
>new.data.IV1 <- rep(seq(min(Average5HourDropData$Time),max(Average5HourDropData$Time)), length(unique(factor(Average5HourDropData$Type)))[1])
>new.data.IV2 <- c(rep("1", length(new.data.IV1)/2), rep("2", length(new.data.IV1)/2))
>new.data <- as.data.frame(cbind(new.data.IV1, new.data.IV2))
>names(new.data) <- c("Time", "Type")
>new.data$Time<- as.numeric(as.character(new.data$Time))
#Error is here
>model.predict <- predict(gam2, newdata = new.data,type="response", se.fit=TRUE)
Error in model.frame.default(Terms[[i]], data, xlev = object$xlevels) :
factor factor(Type) has new level 1