 # Weibull log likelihood when shape parameter is large.

Hi I have some data for which I compute the Weibull log likelihood. The problem is if shape is very large NaN is produced example:
dweibull(4.5,shape=exp(10),scale=2,log=T)
gives NaN
Any suggestions how to avoid this issue?

You can write a little function to catch the NaN, as shown below, but you have to decide what to put in its place. I chose -Inf but that is likely not very useful in further calculations.
Are you sure you need to have such a large shape parameter?

``````dweibull(4.5,shape = c(2,4,8, 16, 800, 1000), scale = 2, log = TRUE)
#> Warning in dweibull(4.5, shape = c(2, 4, 8, 16, 800, 1000), scale = 2, log
#> = TRUE): NaNs produced
#>   -4.251570e+00  -2.250297e+01  -6.497780e+02  -4.314256e+05
#>  -5.572043e+281            NaN

My_dweibull <- function(x) {
ifelse(is.nan(dweibull(4.5, x, scale = 2, log = TRUE)), -Inf,
dweibull(4.5, x, scale = 2, log = TRUE))
}
My_dweibull(c(2,4,8,16,800,1000))
#> Warning in dweibull(4.5, x, scale = 2, log = TRUE): NaNs produced
#> Warning in dweibull(4.5, x, scale = 2, log = TRUE): NaNs produced
#>   -4.251570e+00  -2.250297e+01  -6.497780e+02  -4.314256e+05
#>  -5.572043e+281           -Inf

suppressWarnings(My_dweibull(c(2,4,8,16,800,1000)))
#>   -4.251570e+00  -2.250297e+01  -6.497780e+02  -4.314256e+05
#>  -5.572043e+281           -Inf
``````

Created on 2019-08-30 by the reprex package (v0.2.1)

1 Like

Thanks for your reply. I am actually doing Bayesian MCMC(Metropolis Hastings) to sample the log of shape parameter using rate parameter fixed at 0.05. The proposal is a normal proposal with mean at current value and scale 2. Prior is standard normal prior. So in my log likelihood function the shape is exp(log_shape). This is giving NaN for the loglikelihood after few iterations and the sampling stops due to that.

I did some Bayesian MCMC a few years ago with Stan/rstan and I can only extrapolate from that bit of experience that the structure of your model is allowing the chains to wander into unrealistic regions. Can you tighten up priors or put constraints on parameters or maybe define the model differently so the chains stay away from such numerically difficult regions? I do not understand enough of your explanation to offer specific suggestions, due to my thin knowledge in the subject.

I agree with you about the chain wandering into some unrealistic regions. I think the prior might be something I want to look into. My code is very specific but I am sharing it nevertheless.

``````#function to generate epidemic using weibull as the hazard of contact
epi_weibull<-function(infector,n.pop,shape,rate.contact, rate.recovery)
{
#set.seed(10007)
#infector <- 1; n.pop <- 500; rate.contact<-0.0005; rate.recovery<-0.2
i <- 1
timemat <- result <- NULL
k <- 1:n.pop
susceptible <- k[k!=infector]
time_inf <- 0 # infection time of index case
inc_pd<-rgamma(1,shape=9,scale = 1/3) #incubation period of index case
sym_onset_time_own<- time_inf+inc_pd
IP <- 1/rate.recovery#rexp(1,rate.recovery)   #major change here
CI <- rweibull(length(susceptible),shape=shape,scale=1/rate.contact)
contact_idx <- which(CI<=IP)
contact <- susceptible[contact_idx]
contact_time <- sym_onset_time_own + CI[contact_idx] #incubation period is present in contact time
n_contact <- length(contact_idx)
# add the index case to the output data frame.
result <- rbind(result, c(infector, 0, time_inf,sym_onset_time_own, IP, -1,-1, -1, -1))

if(n_contact > 0)
{
# timemat is a matrix containing susceptible individuals who had infectious contact with
# potential infectors.
timemat<-data.frame(infector=rep(infector, n_contact), contact=contact,
recovery_time_infector=rep(sym_onset_time_own + IP, n_contact),
contact_time=contact_time,time_inf_infector=rep(time_inf,n_contact),
sym_onset_infector=rep(sym_onset_time_own, n_contact), IP_infector=rep(IP, n_contact),inc_infector=rep(inc_pd,n_contact))
#contact_time<-round(timemat[,"contact_time"],digits=3)
}

while(!is.null(timemat) && nrow(timemat)>0)
{
i<-i+1
# a new infector is generated
index<-which.min(timemat\$contact_time)
infector<-timemat\$infector[index]
sym_onset_infector<-timemat\$sym_onset_infector[index]
infectee<-timemat\$contact[index]
time<-timemat\$contact_time[index] #Time of contact
IP_infector<-timemat\$IP_infector[index]
inc_infector<-timemat\$inc_infector[index]
time_inf_infector=timemat\$time_inf_infector[index]

IP <- 1/rate.recovery #rexp(1,rate.recovery) #generate ip for newly infected & merge the ip for previous person
inc_pd<-rgamma(1,shape=9,scale=1/3) #generate incubation period for newly infected

sym_onset_time= time+inc_pd
result<-rbind(result, c(infectee, i, time, sym_onset_time, IP,infector, time_inf_infector,sym_onset_infector, IP_infector))

#update timemat
timemat<-timemat[!(timemat\$contact==infectee),] #remove newly infected from matrix
# update the population of susceptibles
susceptible<-susceptible[!(susceptible==infectee)]
if(length(susceptible) >0 )
{
CI <- rweibull(length(susceptible), shape=shape,scale=1/rate.contact) # generate ci with newly infected with susceptible list
contact_idx <- which(CI<=IP)
contact <- susceptible[contact_idx]

contact_time <-sym_onset_time + CI[contact_idx]
n_contact<-length(contact_idx)
if(n_contact > 0)
{
k<- which(!(contact %in% timemat\$contact))
m <- length(k)
if(m > 0)  timemat.new<-data.frame(infector=rep(infectee, m), contact=contact[k],
recovery_time_infector=rep(sym_onset_time+IP, m),
contact_time=contact_time[k], time_inf_infector= time,
sym_onset_infector=rep(sym_onset_time, m), IP_infector=rep(IP, m),inc_infector=rep(inc_pd,m))

k<- which(contact %in% timemat\$contact)
n <- length(k)
if(n > 0)
{
for(j in k)
{
l <- which(timemat\$contact == contact[j])
if(contact_time[j] < timemat\$contact_time[l])
{
timemat\$contact_time[l] <- contact_time[j]
timemat\$infector[l] <- infectee
timemat\$sym_onset_infector[l] <- sym_onset_time
timemat\$recovery_time_infector[l] <- sym_onset_time+IP
timemat\$IP_infector[l] <- IP
timemat\$inc_infector[l]<-inc_pd
timemat\$time_inf_infector<-time
}
}
}
if(!is.null(timemat))
{
if(m > 0) timemat<-rbind(timemat,timemat.new)
} else  if(m > 0) timemat<-timemat.new
}
}
}
if(!is.null(result))
{
result <- as.data.frame(result)
colnames(result) <- c('infectee', 'generation', 'timeofinfection','sym_onset_time', 'IP','infector','time_inf_infector','sym_onset_time_infector', 'IP_infector')
}
return(result)
}
#log_likelihood function
log_likelihood_weibull_ind_an=function(shape,contact_rate,time_vec,npop,infector_vec,obs_data,indexcase){
#time is a vector containing times of infection for all infectees
#infector is a vector for infectors for all infectees

l=length(indexcase)
position=match(indexcase,obs_data\$infectee)
total_inf= nrow(obs_data)-l
infected_seq= obs_data\$infectee[-position]
seq=1:npop
escaped_inf_index= seq[!seq %in% obs_data\$infectee]
escaped_inf_length= length(escaped_inf_index)
each_inf= numeric(total_inf)
for (i in 1:length(infected_seq)){
index=which(infected_seq==infected_seq[i]) #an index that runs from 1,2,
time_inf= time_vec[index]
infector= infector_vec[index]
inf_index=which(obs_data\$infectee==infector)
sym_onsetinf= obs_data\$sym_onset_time[inf_index]
exposed_time=time_inf-sym_onsetinf
escapeindex=which(!obs_data\$infectee %in% infector & obs_data\$sym_onset_time<time_inf)
escapetime= pmin(time_inf-obs_data\$sym_onset_time[escapeindex],obs_data\$IP[escapeindex])
inc_per= obs_data\$sym_onset_time[index+1]-time_inf
each_inf[i]= -(contact_rate^shape)*exposed_time^shape+shape*log(contact_rate)+log(shape)+
shape*log(exposed_time)-(contact_rate^shape)*(sum(escapetime^shape))+dgamma(inc_per,shape=9,scale=1/3,log = T)
}
return(sum(each_inf)-escaped_inf_length*contact_rate^shape*sum(obs_data\$IP))
}
#Running the MCMC
obs_data=epi_weibull(1,1000,5,0.055,0.2)
while(nrow(obs_data)<500) obs_data=epi_weibull(1,1000,5,0.055,0.2)
sd_log_scale=2
accept_shape=0
iter=1000
log_shape=numeric(iter)
log_shape=log(5) #true shape paraneter is 5
for(i in 2:iter){
old_log_shape=log_shape[i-1]
new_log_shape=rnorm(1,mean=old_log_shape,sd=sd_log_scale)
log_lik_top=log_likelihood_weibull_ind_an(contact_rate=0.05,shape=exp(new_log_shape),npop=1000,infector_vec=obs_data\$infector[-1],time_vec=obs_data\$timeofinfection[-1],indexcase=1,obs_data=obs_data)
log_lik_bottom=log_likelihood_weibull_ind_an(contact_rate = 0.05,shape=exp(old_log_shape),npop=1000,infector_vec=obs_data\$infector[-1],time_vec=obs_data\$timeofinfection[-1],indexcase=1,obs_data=obs_data)
print(c(old_log_shape,new_log_shape,log_lik_top,log_lik_bottom))
ratio_lik=exp(log_lik_top-log_lik_bottom)
ratio_prior= dnorm(new_log_shape)/dnorm(old_log_shape)
ratio_prop= dnorm(old_log_shape,mean=new_log_shape,sd=sd_log_scale)/dnorm(new_log_shape,mean=old_log_shape,sd=sd_log_scale)
accept_prob= ratio_lik*ratio_prior*ratio_prop
u=runif(1)
if(u<accept_prob){
log_shape[i]=new_log_shape
accept_shape=accept_shape+1
} else log_shape[i]=old_log_shape
}

}

``````

One thought looking at your code - you have the rate of log_likelihood_weibull_ind_an set to 0.05. and sd_log_scale=2.
The shape of log_likelihood_weibull_ind_an comes from

``````new_log_shape=rnorm(1,mean=old_log_shape,sd=sd_log_scale)
``````

If old_log_shape is 3 (so shape = exp(3) = 20), the new_log_shape has a reasonable chance of being in the range 1 (shape = 2.7) to 5 (shape = 148). If you plot a weibull with a rate of 0.05 and that range of shapes, you will see huge variation in the curves. That seems to me to be a possible cause of the chains getting into unrealistic areas, the model is taking steps that are too large. What happens if you make sd_log_scale smaller?

I hope that suggestion isn't completely nuts. I am really a novice at MCMC.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.