Shelby
September 15, 2020, 11:59am
1
I ran a GAM with on categorical variable and one continuous variable. I am getting this error when trying to graph the function:
Error in model.frame.default(Terms[[i]], data, xlev = object$xlevels) :
factor factor(Type) has new level 1
I do not understand what it means even though I have looked at other comments is various communities.
Hi!
To help us help you, could you please prepare a repr oducible ex ample (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
A minimal reproducible example consists of the following items:
A minimal dataset, necessary to reproduce the issue
The minimal runnable code necessary to reproduce the issue, which can be run
on the given dataset, and including the necessary information on the used packages.
Let's quickly go over each one of these with examples:
Minimal Dataset (Sample Data)
You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue.
Let's say, as an example, that you are working with the iris data frame
head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.…
Shelby
September 15, 2020, 3:06pm
3
I am working on a project that is using 5 hour ms time periods (continuous predictor) to determine the H2S concentration (continuous response). I want to determine if the type of experiment (with a sponge or without) Has an effect on the uptake rate of H2S concentration. I attempted to run an ANCOVA. But my data was not normal. I have attached my code.
>class(Average5HourDropData$Type)
>attach (Average5HourDropData)
>boxplot (Concentration ~ Type)
#Does not look normal. even after transformations.
#Now doing GAM
>library('mgcv')
>gam1<-gam(Concentration~s(Time)+factor(Type),data = Average5HourDropData)
>gam2 <- gam (Concentration~s(Time)+ s(Time, by=factor(Type)) + factor(Type), data=Average5HourDropData)
>AIC(gam1,gam2)
#df AIC
#gam1 11.92811 204265.6
#gam2 20.99418 116060.9
>summary(gam2)
#Family: gaussian
#Link function: identity
#Formula:
#Concentration ~ s(Time) + s(Time, by = factor(Type)) + factor(Type)
#Parametric coefficients:
#Estimate Std. Error t value Pr(>|t|)
#(Intercept) 43.52397 0.01777 2449 <2e-16 ***
#factor(Type)Sponge 38.27730 0.02513 1523 <2e-16 ***
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Approximate significance of smooth terms:
#edf Ref.df F p-value
#s(Time) 0.6667 0.6667 1471 <2e-16 ***
#s(Time):factor(Type)Control 8.6618 8.6667 1253 <2e-16 ***
#s(Time):factor(Type)Sponge 8.6657 8.6667 11594 <2e-16 ***
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Rank: 28/29
#R-sq.(adj) = 0.994 Deviance explained = 99.4%
#GCV = 4.2744 Scale est. = 4.2713 n = 27050
>anova (gam2)
#Family: gaussian
#Link function: identity
#Formula:
#Concentration ~ s(Time) + s(Time, by = factor(Type)) + factor(Type)
#Parametric Terms:
#df F p-value
#factor(Type) 1 2319697 <2e-16
#Approximate significance of smooth terms:
#edf Ref.df F p-value
#s(Time) 0.6667 0.6667 1471 <2e-16
#s(Time):factor(Type)Control 8.6618 8.6667 1253 <2e-16
#s(Time):factor(Type)Sponge 8.6657 8.6667 11594 <2e-16
>gam.check(gam2)
#Method: GCV Optimizer: magic
#Smoothing parameter selection converged after 20 iterations.
#The RMS GCV score gradient at convergence was 4.213637e-07 .
#The Hessian was positive definite.
#Model rank = 28 / 29
#Basis dimension (k) checking results. Low p-value (k-index<1) may
#indicate that k is too low, especially if edf is close to k'.
#k' edf k-index p-value
#s(Time) 9.000 0.667 0.5 <2e-16 ***
#s(Time):factor(Type)Control 9.000 8.662 0.5 <2e-16 ***
#s(Time):factor(Type)Sponge 9.000 8.666 0.5 <2e-16 ***
#Signif. codes: 0 ‘***’0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>plot(gam2)
#Summary plot
>new.data.IV1 <- rep(seq(min(Average5HourDropData$Time),max(Average5HourDropData$Time)), length(unique(factor(Average5HourDropData$Type)))[1])
>new.data.IV2 <- c(rep("1", length(new.data.IV1)/2), rep("2", length(new.data.IV1)/2))
>new.data <- as.data.frame(cbind(new.data.IV1, new.data.IV2))
>names(new.data) <- c("Time", "Type")
>new.data$Time<- as.numeric(as.character(new.data$Time))
#Error is here
>model.predict <- predict(gam2, newdata = new.data,type="response", se.fit=TRUE)
Error in model.frame.default(Terms[[i]], data, xlev = object$xlevels) :
factor factor(Type) has new level 1
Shelby:
Average5HourDropData
Hello, and thanks again for providing code, however this is not a reproducible example because without data the code will not reproduce the error you experienced.
That said, my guess is that your use of factor(Type) throughout the model fit code, has no correlate with the later dataset on which you want to predict. You should predict on data that has the same form as that on which your model was trained. if you want Type to be a factor, then make it a factor everywhere, rather than inline/on the fly during fitting.
Shelby
September 16, 2020, 5:52pm
5
I don't think I understand your assumption. Because there is a correlation between type and concentration measurement. This is seen in the significance of the model. Type is a factor throughout the GAM.
Well let's say
Average5HourDropData is this
Time
Concentration
Type
600
120.3155574
Sponge
601
120.2848155
Sponge
602
120.2538643
Sponge
603
120.2235755
Sponge
604
120.1931335
Sponge
605
120.1661757
Sponge
606
120.1422358
Sponge
607
120.1132464
Sponge
608
120.0870242
Sponge
609
120.0613516
Sponge
610
120.0466342
Sponge
611
120.0327227
Sponge
612
120.0201408
Sponge
613
120.0020943
Sponge
614
119.989341
Sponge
615
119.9760124
Sponge
616
119.9590184
Sponge
617
119.945784
Sponge
618
119.9372138
Sponge
619
119.9233473
Sponge
620
119.9054254
Sponge
621
119.88972
Sponge
622
119.8812268
Sponge
623
119.8618457
Sponge
624
119.8420157
Sponge
600
55.81117364
Control
601
55.87698653
Control
602
55.94301309
Control
603
56.00449191
Control
604
56.03258702
Control
605
56.0574053
Control
606
56.08215191
Control
607
56.11047429
Control
608
56.14380632
Control
609
56.18873294
Control
610
56.24251382
Control
611
56.29417378
Control
612
56.34920071
Control
613
56.4034161
Control
614
56.47717362
Control
615
56.55403724
Control
616
56.6249944
Control
617
56.6987154
Control
618
56.76957097
Control
619
56.84017442
Control
620
56.90509887
Control
621
56.96636615
Control
622
57.0188798
Control
623
57.06590791
Control
624
57.1107242
Control
system
Closed
October 7, 2020, 5:52pm
6
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.