This document generated:

## [1] "Wednesday, June 15, 2022"

\(~\)

Background / Setup

I frequently encounter onvergence problems when fitting mixed models. The goal of this vignette is to show how to use allFit() in the lme4 package to quickly test different optimizers in search of convergence. This document draws heavily on posts by Steven V. Miller, unknown author, and unknown author, many thanks!

We assume a linear mixed model (LMM) approach to analyzing data from a two-arm cluster-randomized trial where the cluster-level variance is different in the two arms. The betas give the mean responses in each arm (we’ll fix the control arm at zero), sig_b2 is a vector of cluster-level variance in each arm, and sigma_residual is the within-cluster variance, assumed to the the same across each arm. To relax the assumption that sigma_residual has to be constant across both arms, you can use the nlme package with different weights by arm instead of lme4.

library(tidyverse)
library(lme4)
library(optimx)
library(minqa)
library(dfoptim)
sim <- function(nsub = 2, nclust = 3, sigma_residual2 = 1,  sig_b2 = c(.3, 10), betas = c(0,1)){
  narm <- 2                                     # Number of arms of the trial
  n <- nsub * nclust * narm                     # Total number of participants
  y <- rep_len(NA, n)
  arm <- rep(0:(narm-1), each = nsub*nclust)
  clustid <- rep(1:(nclust*narm), each = nsub)
  clustRE <- rep(rnorm(narm*nclust, mean = 0, sd = rep(sqrt(sig_b2), each = nclust)), each = nsub)
  sig_b2 <- rep(sig_b2, each = nclust*nsub)
  error <- rnorm(n, mean = 0, sd = sqrt(sigma_residual2))
  beta <- rep(betas, each = nclust*nsub)
  y <- beta + clustRE + error
  return(cbind.data.frame(arm, clustid, sig_b2, clustRE, error, beta, y))
}

\(~\)

Basic use case: Convergence fails with default settings

Simulate some data with a seed that I know will cause a convergence problem:

set.seed(4)
dat <- sim(nsub = 100, nclust = 100, sigma_residual2 = .2,  sig_b2 = c(.2, 1), betas = c(0,1))

Let’s fit a model that allows for different cluster-level variances in the two arms. We specify a random intercept that is shared between both arms, then additional possible variance when arm = 1. When fitting this model with default settings, we get a convergence error.

model <- lmer(y ~ arm + (1 + arm | clustid), data = dat)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0153682 (tol = 0.002, component 1)

Quick fix with allFit()

To use allFit() to iterate through all possible (g)lmer optimizers and find one that converges, use code similar to what’s below. The maxfun option sets the maximum number of iterations - more can increase the likelihood of convergence, though I have had R freeze up sometimes if the value is too high. In this case, 1e5 works.

# Iterate through a set of optimizers, report convergence results
diff_optims <- allFit(model, maxfun = 1e5)
## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap : [OK]
## nmkbw : [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA : [OK]

Do not be fooled! These are not “OK” fits! We need to check for convergence flags. Below, we’re looking for the result NULL - meaning we have no convergence warnings in the fitted model with that optimizer.

diff_optims_OK <- diff_optims[sapply(diff_optims, is, "merMod")]
lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)
## $bobyqa
## NULL
## 
## $Nelder_Mead
## NULL
## 
## $nlminbwrap
## [1] "unable to evaluate scaled gradient"                                       
## [2] "Model failed to converge: degenerate  Hessian with 1 negative eigenvalues"
## 
## $nmkbw
## NULL
## 
## $`optimx.L-BFGS-B`
## [1] "unable to evaluate scaled gradient"                                       
## [2] "Model failed to converge: degenerate  Hessian with 1 negative eigenvalues"
## 
## $nloptwrap.NLOPT_LN_NELDERMEAD
## [1] "unable to evaluate scaled gradient"                                       
## [2] "Model failed to converge: degenerate  Hessian with 1 negative eigenvalues"
## 
## $nloptwrap.NLOPT_LN_BOBYQA
## [1] "Model failed to converge with max|grad| = 0.0153682 (tol = 0.002, component 1)"

We see that bobyqa, Nelder_Mead, and mnkbw converged. We’ll now output (arbitrarily) the first model in the list that converged, with the bobyqa optimizer. Further, that algorithm is built in to lme4, so we do not require the optimx or nloptr packages (at least for now).

convergence_results <- lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)
working_indices <- sapply(convergence_results, is.null)
if(sum(working_indices) == 0){
  print("No algorithms from allFit converged. You may still be able to use the results, but proceed with extreme caution.")
  first_fit <- NULL
} else {
  first_fit <- diff_optims[working_indices][[1]]
}
first_fit
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ arm + (1 + arm | clustid)
##    Data: dat
## REML criterion at convergence: 25728.9
## Random effects:
##  Groups   Name        Std.Dev. Corr
##  clustid  (Intercept) 0.4059       
##           arm         0.6376   0.87
##  Residual             0.4482       
## Number of obs: 20000, groups:  clustid, 200
## Fixed Effects:
## (Intercept)          arm  
##     0.03783      0.88385

Hooray! However, if that doesn’t work, see the Power User section below.

If we had known ahead of time which optimizer to use, we get the same result:

lmer(y ~ arm + (1 + arm | clustid),
     data = dat, control = lmerControl(optimizer = "bobyqa"))
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ arm + (1 + arm | clustid)
##    Data: dat
## REML criterion at convergence: 25728.9
## Random effects:
##  Groups   Name        Std.Dev. Corr
##  clustid  (Intercept) 0.4059       
##           arm         0.6376   0.87
##  Residual             0.4482       
## Number of obs: 20000, groups:  clustid, 200
## Fixed Effects:
## (Intercept)          arm  
##     0.03783      0.88385

If you needed to use one of the algorithms from the optimx or nloptr families, the syntax is a little different in lmerControl(). We won’t run these, they are just here as an example.

lmer(y ~ arm + (1 + arm | clustid), data = dat,
               control = lmerControl(optimizer = "optimx",
                                     optCtrl = list(method = "L-BFGS-B", maxit = 1e5)))
lmer(y ~ arm + (1 + arm | clustid), data = dat,
               control = lmerControl(optimizer = "nloptwrap",
                                     optCtrl = list(algorithm = "NLOPT_LN_BOBYQA", maxit = 1e5)))

Which optimizer to choose?

I’ve found that different optimizers may converge on estimates of the arm-specific variance and intercept correlation that are quite different. However, when transforming those values into the original parameters taking into account the estimated correlation, we find that they are two ways of getting to the same underlying parameter values.

allFit() called inside a function

Note: allFit() sometimes has trouble finding the input data for the models when it is nested in another function. See discussion here and / or contact me if you want some ideas for workarounds.

Parallel processing option

Since allFit() is processing each algorithm indepdendently, it’s a natural candidate for parallel processing. Here’s a simple example of how to make it run faster on my computer - may vary by operating system etc.

library(parallel)
ncores <- detectCores()
diff_optims <- allFit(model, maxfun = 1e5, parallel = 'multicore', ncpus = ncores)

\(~\)

Power Users: Obtaining convergence if allFit() doesn’t work

One downside of using allFit() is that you can’t customize convergence/optimizer settings - you have to use the defaults. (NOTE: Recent updates to lme4 allow for control commands in allFit(), so some of the following may be irrelevant. See lme4 news / release notes for more information.) Yet changing the settings (increasing the maximum number of iterations, lowering the stopping tolerances, etc) can greatly improve convergence, though sometimes at the cost of slower fitting time. Here is some simple code you can use that might help your algorithm converge.

First Step: Change settings

Let’s try the default optimizer with better settings and see if it works. Note that this is almost the same call in that first model we tried to fit, just with more iterations allowed. Some algorithms use maxit, some use maxfun; check the documentation.

test_more_iterations <- lmer(y ~ arm + (1 + arm | clustid),
               data = dat,
               control = lmerControl(optCtrl = list(maxit = 1e9, maxfun = 1e9)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0153682 (tol = 0.002, component 1)

I’ve seen that work in the past, but not this time. Let’s try something else.

Second Step: Amp up settings and loop through optimx optimizers

Below we loop through some of optimx’s optimizer choices AND the better optimizer settings. These choices have worked for me in mixed models; for more info see the optimx documentation.

optimx_options <- c("L-BFGS-B", "nlminb", "nlm", "bobyqa", "nmkb", "hjkb")
for(i in 1:length(optimx_options)){
  print(paste0("Testing optimx: ", optimx_options[i]))
  model_flex <- lmer(y ~ arm + (1 + arm | clustid),
                     data = dat,
                     control = lmerControl(optimizer = "optimx",
                                           optCtrl = list(method = optimx_options[i],
                                                                   maxit = 1e9)))
  if(is.null(model_flex@optinfo$conv$lme4$messages)){
    print(paste0("One of the optimx options, ", optimx_options[i],", worked!"))
    print(model_flex)
    break
  }
}
## [1] "Testing optimx: L-BFGS-B"
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## [1] "Testing optimx: nlminb"
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## [1] "Testing optimx: nlm"
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## [1] "Testing optimx: bobyqa"
## Warning in (function (npt = min(n + 2L, 2L * n), rhobeg = NA, rhoend = NA, :
## unused control arguments ignored
## [1] "One of the optimx options, bobyqa, worked!"
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ arm + (1 + arm | clustid)
##    Data: dat
## REML criterion at convergence: 25728.9
## Random effects:
##  Groups   Name        Std.Dev. Corr
##  clustid  (Intercept) 0.4059       
##           arm         0.6376   0.87
##  Residual             0.4482       
## Number of obs: 20000, groups:  clustid, 200
## Fixed Effects:
## (Intercept)          arm  
##     0.03783      0.88385  
## optimizer (optimx) convergence code: 0 (OK) ; 1 optimizer warnings; 0 lme4 warnings

Third Step: Amp up settings and loop through nloptwrap optimizers

The nloptwrap set of optimizers is a little easier to use, because it uses the same option control formatting as the default. It also has a large (20+) set of algorithm options, and it’s easy to cycle through them in a for loop to find one that might work, all with the customized settings that you want. The first two options do not work, then NLOPT_LN_COBYLA does. Yay…

# More options are here: nloptr::nloptr.print.options() under "algorithm",
# but the set below seem to work most of the time.
algoptions <- c("NLOPT_LN_PRAXIS", "NLOPT_GN_CRS2_LM",
"NLOPT_LN_COBYLA", "NLOPT_LN_NEWUOA",
"NLOPT_LN_NEWUOA_BOUND", "NLOPT_LN_NELDERMEAD",
"NLOPT_LN_SBPLX", "NLOPT_LN_BOBYQA")

for(i in 1:length(algoptions)){
  print(paste0("Testing nloptwrap: ", algoptions[i]))
  model_flex <- lmer(y ~ arm + (1 + arm | clustid),
                     data = dat,
                     control = lmerControl(optimizer = "nloptwrap",
                                           optCtrl = list(algorithm = algoptions[i],
                                                          maxfun = 1e9,
                                                          maxeval = 1e7,
                                                          xtol_abs = 1e-9,
                                                          ftol_abs = 1e-9)))
  if(is.null(model_flex@optinfo$conv$lme4$messages)){
    print(paste0("One of the nloptwrap options, ", algoptions[i],", worked!"))
    print(model_flex)
    break
  }
}
## [1] "Testing nloptwrap: NLOPT_LN_PRAXIS"
## [1] "One of the nloptwrap options, NLOPT_LN_PRAXIS, worked!"
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ arm + (1 + arm | clustid)
##    Data: dat
## REML criterion at convergence: 25728.9
## Random effects:
##  Groups   Name        Std.Dev. Corr
##  clustid  (Intercept) 0.4059       
##           arm         0.8477   0.20
##  Residual             0.4482       
## Number of obs: 20000, groups:  clustid, 200
## Fixed Effects:
## (Intercept)          arm  
##     0.03783      0.88385

\(~\)

STILL no convergence?!?!

If none of the algorithms converge, you may not be stuck yet. In lme4’s documentation (see ?lme4::convergence), they note that you may still be able to use the results of the non-converging models, though there is not universal agreement about that. Here are some links that could help in that situation and provide further info:

Examining the log-likelihood, coefficients, and runtime of different algorithms

Examples of other approaches to acheiving convergence besides just switching the algorithm

Tricks for speeding up model fitting

More general discussion of convergence, very illuminating