(Latest changes: 16.11: IL added links, 13.11.2018 - first version).

Overview

Aim: Present methods for analysing correlated responses in a generalized linear models setting - LMM meets GLM to become GLMM.

Also here (as in module LMM) we will only consider two-level models and in particular focus on random intercept models for binomial and Poisson responses. Emphasis will be on understanding.


Learning material


Topics

  • beaches example - revisited
  • notation
  • the generalized linear mixed effect model (three ingredients)
  • the GLMM with random intercept
  • the marginal model
  • parameter estimation and Laplace approximation
  • summing up: what do we need to know about the GLMM?
  • additional info on different software (not on reading list)

Summing up - what have we learned about the GLMM?

  • The GLMM can be formulated using three ingredients:
    • distributional assumption: \(f(y_{ij}\mid \gamma_i)\) from exponential family
    • structural assumptions: linear predictor \(\eta_{ij}={\bf x}_{ij}^T {\boldsymbol \beta}+{\bf u}_{ij}\gamma_i\) and link function \(\eta_{ij}=g(\mu_{ij})\) where \(\mu_{ij}=\text{E}(Y_{ij}\mid \gamma_i)\).
    • distributional assumptions for the random effects: \(\gamma_i\sim N({\bf 0},{\bf Q})\).
  • The GLMM likelihood function is expressed as an integral with respect to the random effects and does (in general) not have a closed form solution.
  • Numerical approximation methods need to be used to find parameter estimates, and one possibility is to use the Laplace approximation, but many competing method exists.

  • Three R packages that can be investigated is lme4(function glmer) and glmmTMB (template model builder), and the NTNU-flagship inla. How to use these three packages on a simulated data set (binary data, logit link, random intercept and slope) is shown in the end of the module page (NOT on our reading list but for the interested student).

Interactive session week

This is the last interactive session of the course.

Problem 1: Fruits of the weed

This is an analysis that was carried out to look at how plants respond to herbivory, and specifically if there is variation in how well they tolerate it. The species was thale cress, Arabidopsis thaliana, which is a small weed of no commercial value except to some plant biologists.

The response variable is the number of fruits (i.e. seed capsules) per plant. We expect this to be a function of:

  • nutrient levels (low vs. high)
  • simulated herbivory (none vs. apical meristem damage)
  • region (Sweden, Netherlands, Spain)
  • population (3 per region, so 9 in total)
  • individual genotype (24 in total)

Because A. thaliana is highly selfing, seeds of a single individual served as replicates of that genotype (if you don’t understand, don’t panic: genotypes are different varieties).

The data also include nuisance variables, including the placement of the plant in the greenhouse, and the method used to germinate seeds. If these influence the response, they should be in the model, even if we are not interested in them. We will ignore them, though, and leave it as a problem for the reader to see if this is a wise decision.

We read in the data and convert some variables to factors. We could keep reg and amd as they are, as characters, as lmer() would convert them to factors. Note also that we re-level amd to make unclipped the reference level.

library(lme4)
Banta <- read.csv("http://glmm.wikidot.com/local--files/trondheim/Banta_TotalFruits.csv")

Banta$reg <- factor(Banta$reg)  # region: NL, SP, SW
Banta$gen <- factor(Banta$gen)  # genotype: number, but these are arbitrary
Banta$rack <- factor(Banta$rack)  # rack: 1, 2
Banta$nutrient <- factor(Banta$nutrient)  # 1, 8.
Banta$status <- factor(Banta$status)  # Method used to germinate: Transplant, Petri.Plate, Normal
Banta$amd <- factor(Banta$amd, levels = c("unclipped", "clipped"))  # make unclipped the reference

str(Banta)
## 'data.frame':    625 obs. of  9 variables:
##  $ X           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ reg         : Factor w/ 3 levels "NL","SP","SW": 1 1 1 1 1 1 1 1 1 1 ...
##  $ popu        : chr  "3.NL" "3.NL" "3.NL" "3.NL" ...
##  $ gen         : Factor w/ 24 levels "4","5","6","11",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ rack        : Factor w/ 2 levels "1","2": 2 1 1 2 2 2 2 1 2 1 ...
##  $ nutrient    : Factor w/ 2 levels "1","8": 1 1 1 1 2 1 1 1 2 1 ...
##  $ amd         : Factor w/ 2 levels "unclipped","clipped": 2 2 2 2 2 1 2 2 1 2 ...
##  $ status      : Factor w/ 3 levels "Normal","Petri.Plate",..: 3 2 1 1 3 2 1 1 1 2 ...
##  $ total.fruits: int  0 0 0 0 0 0 0 3 2 0 ...

First fit a GLM with nutrient and amd. This is helpful to start to understand the data:

mp1 <- glm(total.fruits ~ nutrient * amd, data = Banta, family = "poisson")

Write down the assumptions of the model

Answer

If the total number of fruits for plant \(i\) is \(N_i\), then \(N_i ~ \text{Poisson}(\lambda_i)\), and \(\log(\lambda_i) = \eta_i = \bf{x}_i^T \mathbf{\beta}\).

What does this suggest? Are there effects of clipping and nutrient? Is there overdispersion, and if so how does it affect the interpretation?

Answer

First, the summary:

summary(mp1)
## 
## Call:
## glm(formula = total.fruits ~ nutrient * amd, family = "poisson", 
##     data = Banta)
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           2.81849    0.01926  146.37   <2e-16 ***
## nutrient8             1.03152    0.02243   46.00   <2e-16 ***
## amdclipped           -0.43367    0.03158  -13.73   <2e-16 ***
## nutrient8:amdclipped  0.36842    0.03571   10.32   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 29269  on 624  degrees of freedom
## Residual deviance: 23550  on 621  degrees of freedom
## AIC: 25921
## 
## Number of Fisher Scoring iterations: 6

Looking at the coefficients, nutrient level 8 increases the number of fruits by round(exp(coef(mp1)[‘nutrient8’]), 1)` times in the unclipped plants. Clipping reduced the number of fruits in the nutrient level 1, but the effect in level 8 is much smaller.

The data seem to be over-dispersed. The residual deviance is much larger than the residual degrees of freedom. We can estimate the overdispersion parameter: it is about 37.9. This makes us much less certain about any inferences:

summary(mp1, dispersion = deviance(mp1)/mp1$df.residual)
## 
## Call:
## glm(formula = total.fruits ~ nutrient * amd, family = "poisson", 
##     data = Banta)
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            2.8185     0.1186  23.769  < 2e-16 ***
## nutrient8              1.0315     0.1381   7.469 8.08e-14 ***
## amdclipped            -0.4337     0.1945  -2.230   0.0258 *  
## nutrient8:amdclipped   0.3684     0.2199   1.675   0.0939 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 37.92277)
## 
##     Null deviance: 29269  on 624  degrees of freedom
## Residual deviance: 23550  on 621  degrees of freedom
## AIC: 25921
## 
## Number of Fisher Scoring iterations: 6

Now let’s fit a GLMM!

mp2 <- glmer(total.fruits ~ nutrient * amd + (1 | gen) + (1 | popu), data = Banta,
    family = "poisson")
summary(mp2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: total.fruits ~ nutrient * amd + (1 | gen) + (1 | popu)
##    Data: Banta
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   20862.8   20889.4  -10425.4   20850.8       619 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -8.418 -3.668 -1.904  1.466 42.941 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  gen    (Intercept) 0.06411  0.2532  
##  popu   (Intercept) 0.25791  0.5079  
## Number of obs: 625, groups:  gen, 24; popu, 9
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           2.75654    0.17836   15.45   <2e-16 ***
## nutrient8             1.02915    0.02249   45.75   <2e-16 ***
## amdclipped           -0.44645    0.03161  -14.12   <2e-16 ***
## nutrient8:amdclipped  0.39541    0.03577   11.06   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ntrnt8 amdclp
## nutrient8   -0.093              
## amdclipped  -0.066  0.523       
## ntrnt8:mdcl  0.058 -0.627 -0.884

Here we have two random effects: gen and ´popu`.

Write down the linear predictor for plant \(i\) (you can keep the fixed effects in \(\bf{x}_i\), but be explicit about the random effects).

Answer

OK, you could write

\(\eta_i = \bf{x}_i^T \boldsymbol{\beta} + \bf{u}_i^T \boldsymbol{\gamma}\)

and explain the components in \(\gamma\) that have a non-zero \(u_i^T\). But we can also write it like this:

\(\eta_{ijk} = \bf{x}_i^T \mathbf{\beta} + \gamma_{0i} + \gamma_{1j}\)

Where we have population \(i\) \((i=1,\dots 9)\) and genotype \(j\) \((j=1,\dots 24)\).

What does this new analysis tell us? (a) Does it change our interpretation above? (b) how do you interpret the random effects?

Answer
  1. The fixed effects are very similar to before, so their interptation doesn’t change.

  2. The random effects are both intercepts: the variance for the population is much higher tna. for genotype, so there is more variation between populations than genotypes.

What about overdispersion? We can use random effects to deal with it: we have a random effect with one level per observation. This is the same as assuming that the data follow a Poisson log-normal distribution.

Fit a model with a random effect to overdispersion (you may need to create a new factor). How does this new model compare to the others?

Answer

First we have to create a factor for the overdispersion, with a different level for each observation.

Banta$OD <- factor(1:nrow(Banta))
mp3 <- glmer(total.fruits ~ nutrient * amd + (1 | gen) + (1 | popu) + (1 | OD), data = Banta,
    family = "poisson")
summary(mp3)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: total.fruits ~ nutrient * amd + (1 | gen) + (1 | popu) + (1 |      OD)
##    Data: Banta
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##    5070.9    5102.0   -2528.4    5056.9       618 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.17448 -0.33938 -0.00083  0.05951  0.54331 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  OD     (Intercept) 2.32534  1.5249  
##  gen    (Intercept) 0.08488  0.2913  
##  popu   (Intercept) 0.68806  0.8295  
## Number of obs: 625, groups:  OD, 625; gen, 24; popu, 9
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            1.8123     0.3125   5.800 6.63e-09 ***
## nutrient8              1.3832     0.1793   7.714 1.22e-14 ***
## amdclipped            -0.6638     0.1903  -3.487 0.000488 ***
## nutrient8:amdclipped   0.4052     0.2605   1.556 0.119820    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ntrnt8 amdclp
## nutrient8   -0.298              
## amdclipped  -0.278  0.485       
## ntrnt8:mdcl  0.203 -0.685 -0.730

Looking at the model, the fixed effects are similar, but the standard errors are larger, so that the interaction is not now statistically significant. The overdispersion term is large, suggesting that there is still quite a bit of variation unaccounted for.

Finally, the “full” model for this data is much more complicated: it was taken from here. If you look at this, the first problem is to understand the model, which has categorical variables as random effects.

Problem 2: Taken from UiO, STK3100, 2011, problem 2

a) Show that the binomial distribution is a member of the exponential family of distributions. What do we mean with canonical link, and which advantages do we get when using canonical link? What is the canonical link for the binomial distibution?

Remark: this has been done several times before. Instead of doing the maths on the board, focus on the concepts. Also discuss this relationship: \(\theta_i\leftrightarrow \mu_i\leftrightarrow \eta_i \leftrightarrow \mathbf{\beta}\) and fill in what is over the arrows, and the missing arrow for the canonical link.

b) This excercise is modified!

Assume the following model:

\[y_{ij} \sim \text{Bin}(1, \pi_{ij})\]

\[ g(\pi_{ij}) = \beta_0 + \beta_1x_{1,ij} + \beta_2x_{2,i} \]

where \(g(\cdot)\) is a suitable link function and all \(y_{ij}\) are independent.

What kind of model is this?

The table below shows AIC-values for three different link functions often used with binary response data:

Link function AIC
probit 57.51571
cloglog 58.30338
logit 57.69237

Explain what AIC is, and argue why it is reasonable to use such criteria in this situation (instead of other tests from this course). Based on these three values, which link-function do you prefer? Why?

Remark: what is the take home message here? Hint: nested models?

Answer

The lowest AIC is for the probit link. But the difference between the AICs for the probit and logit is 0.18, which is almost nothing. Indeed, the difference between the AICs for the probit and cloglog is 0.79, which is also really small (the general advice is that a difference of less than 2 means the models are almost the same). Thus, in reality you could chose any link function based on these results, but the probit is (strictly) the best. It is also not the link function used to simulate the data.

The take-home message is that AIC (and BIC) can be used to compare non-nested models, indeed they can be used to compare models with the same model for the linear predictor.

We will, as we did a few weeks back, look at a fish dataset consisting of observations of cod from the year 2000, but we are now interested in the age of each fish, rather than the weight1. The dataset is from Havforskningsinstituttet in Bergen, and we use only a subset of a large dataset with information about fish in Barentshavet. The following variables are available on a random sample within each catch (a draft of trawl (trål)):

  • length: the length of fish (in cm)
  • weight: the weight of fish (in grams)
  • age: the age of fish (in years)
  • haulsize: the size of the total catch (in ton (1000 kg))

Let \(i\) be the index for catch, and \(j\) be the index for an individual fish in a given catch.

Age is a categorical variable between 2 and 12, but we create a new variable \(A_{ij}\):

\[ A_{ij} = \begin{cases} 1 \text{ if age}_{ij} > 6 \\ 0 \text{ else} \end{cases} \]

which is a binary variable, and we use this for the response.

Remark: before we used weight as response (normal), but now we use dichotomized age as a binary response.

We look at the following model:

\[ A_{ij}|\gamma_{0i} \sim \text{Bin}(1, \pi_{ij}) \] \[ g(\pi_{ij}) = \beta_0 + \beta_1\log(\texttt{length}_{ij}) + \beta_2\log(\texttt{haulsize}_i) + \gamma_{0i} \]

\[ \gamma_{0i} \sim N(0, \tau_0^2) \]

where \(g(\cdot)\) is the logit link function, and all \(A_{ij}\) are conditional independent (given \(\gamma_i\)).

c1) New: What kind of model is this? What is \(\gamma_{0i}\)? Compare to the linear mixed effects model- similarities and differences.

Answer

The model is a GLMM (Generalised Linear Mixed model).

\(\gamma_{0i}\) is a random effect, actually a haul random effect.

Compared to the LMM, this has the same form for the linear predictor, but the difference is that this is transformed with the link function onto a probability scale, and the likelihood is binomial, not normal.

Below you find an excerpt of this model:

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: A ~ log(length) + haulsize + (1 | haul)
##    Data: fish
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##    5117.6    5146.6   -2554.8    5109.6     10497 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -17.780  -0.187  -0.043   0.168  35.835 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  haul   (Intercept) 1.91     1.382   
## Number of obs: 10501, groups:  haul, 127
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -86.1367     2.1222 -40.588   <2e-16 ***
## log(length)  20.0270     0.4958  40.395   <2e-16 ***
## haulsize      0.1797     0.1156   1.554     0.12    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) lg(ln)
## log(length) -0.998       
## haulsize    -0.041  0.040
Answer

Fitting by maximum likelihood means, well, maximising the likelihood: \(f(y_{ij})=\int_{\gamma_i} f(y_{ij}\mid \gamma_i) f(\gamma_i) d\gamma_i\).

The Laplace approximation is used because we have to integrate over the \(\gamma_i\)’s. As we can’t do it analytically, we resort to an approximation that is quick to fit.

Discuss why using a likelihood ratio test to test if \(\tau_0^2 = 0\) is difficult (as in “too difficult to do in this course”).

Answer Because the distributions of the test statistics are difficult/impossible to work out. In part this is because the null hypothesis is that a parameter (the variance of the random effect) is on the boundary of the parameter space.

Without using a formal test, use the model results to evaluate if the random effect should be in the model

Hint How does the variation in the random effect compare to the variation explained by the other effects? (Note: the other effects are not scaled, so you may want to extract their variances from the data)
Answer

We can get some idea about whether the random effects are important by comparing their variance to the amount of variance in the linear predictors explained by the fixed effects, haulsize (i.e. how many fish in a haul) and log(length):

  • The variance of the random effects is \(\tau_0^2 =\) 1.91 (extracted using VarCorr(fit1)).
  • The fixed effects have variances of 1.21 for haul size and 0.03 for log(length).
  • the variance in a linear predictor explained by a fixed effect is \(\beta^2 Var(X)\) (ignoring any covariance between different effects in the model!), So, the variances explained are about 0.18\(^2 \times\) 1.21 = 0.039 for haulsize and 20.03\(^2 \times\) 0.03 = 12.799 for log(length).
So the variation between hauls is larger than the varuation due to haul size, but smaller than the length effect. If you know about fish biology, it will not be surprising that older fish are also larger, and that this is a large effect.

e) The Havforskningsinstituttet believes that the haulsize is important in the modelling of age and length of the fish (read: as a fixed effect). Based on the excerpts above (on dichotomized age) and below (on weight), what do you think about this?

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(weight) ~ scale(log(length)) + scale(haulsize) + (1 | haul)
##    Data: fish
## 
## REML criterion at convergence: -18583.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.1597 -0.6178 -0.0233  0.5886  4.8924 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  haul     (Intercept) 0.004181 0.06466 
##  Residual             0.009378 0.09684 
## Number of obs: 10409, groups:  haul, 127
## 
## Fixed effects:
##                    Estimate Std. Error  t value
## (Intercept)        7.964596   0.005847 1362.210
## scale(log(length)) 0.519109   0.001306  397.559
## scale(haulsize)    0.007670   0.005503    1.394
## 
## Correlation of Fixed Effects:
##             (Intr) sc(())
## scl(lg(ln)) -0.002       
## scale(hlsz)  0.084  0.017
Answer

The coefficients look small, but they also depend on the variances of the data, which is why they are scaled in the model for weight. Once we do that, we can see that the haulsize effect is (very) roughly 100 times smaller than the length effect, but also that it is about 10 times smaller than the haul standard deviation. So it is almost certainly not important.

We get a similar story if we look at Age, see the answer to the previous question.

Remark: It is in general a very bad idea to do a dichotomization of a continuous variable - like what is done here for transforming the age into \(A_{ij}\), because then information is lost. In a hypothesis testing set-up this will in general give substantial loss of power to detect the effect of a covariate.

Software for GLMM: demonstration of glmer, glmmTMB and inla for analysing GLMMs

This is NOT on the reading list, but an extra service to the interested students.

As a general piece of advice, for the models we have been looking at in this course, glmer() works perfectly well. inla and glmmTMB work better for more advanced models (e.g. with spatial effects or other complex correlation structures).

# install.packages('arm') install.packages('reshape2') install.packages('sp')
# install.packages('glmmTBM') install.packages('INLA',
# repos='https://inla.r-inla-download.org/R/stable', dep=TRUE)
library(ggplot2)
library(ggpubr)
library(lme4)
library(glmmTMB)
library(INLA)
library(reshape2)
# library(arm)

For this demonstration, we will simulate a dataset with one fixed effect (x) and random intercept and random slope (always nice to know the true values for parameters to compare with estimates).

Q: What is the model that we simulate from?

Answer

The model is a GLMM, assuming a Bernoulli/binomial response, and a probit link function.

This is the model code:

for (i in 1:15) {
    y[, i] <- beta_0 + beta_1 * x1 + random_int[i] + random_slope[i] * x1 + rnorm(200,
        0, 1)
}

y_binom <- as.numeric(c(y) > 0)

So the model is

\[ \eta_{ij} = \beta_0 + \beta_1 x_{j1} + \gamma_{i} + \delta x_{j1} \]

Why a probit? Because we have \(y=1\) if \(\eta_{ij} + \varepsilon_{ij}>0\), where \(\varepsilon_{ij} \sim N(0,1)\). This is equivalent to a probit link.

set.seed(90) # to get reproduciability
library(MASS)
library(arm)

no.gr <- 15
x1 <- runif(200, 0, 1)

y <- matrix(NA, nrow = 200, ncol = 15)

Q <- matrix(c(0.5, 0.3, 0.3, 0.5), nrow = 2)
random_both <- MASS::mvrnorm(no.gr, c(0,0), Q)
random_int <- random_both[,1]; random_slope <- random_both[,2]

# to test out
#random_int <- runif(15, -2, 2) # when we want to simulate from a model that does not fit exactly
#random_slope <- runif(15, -2, 2)

beta_0 <- -0.45
beta_1 <- 0.76

for (i in 1:15){
  y[,i] <- beta_0 + beta_1*x1 + random_int[i] + random_slope[i]*x1 + rnorm(200, 0, 1)
}

y_binom <- as.numeric(c(y) > 0)

ggplot(cbind(melt(data.frame(y)), x = rep(x1, 15), y = as.factor(y_binom)), 
       mapping = aes(x = x, y = value)) +
  geom_point(aes(col = y)) + geom_smooth(method = "lm", col = "black", se = FALSE) +
  facet_wrap(~ variable) + labs(x = "x", y = "y") 

mydata <- data.frame(y = y_binom, x = x1, group = rep(1:15, each = 200))

head(mydata)
##   y         x group
## 1 0 0.5307603     1
## 2 0 0.8850698     1
## 3 0 0.4399639     1
## 4 1 0.9422908     1
## 5 1 0.1883275     1
## 6 0 0.6555310     1

Use a binary regression with logit link to model the probability. That is, we have one covariate and make the following assumptions.

  1. Distributional assumptions: \(Y_{ij}\mid \gamma_i \sim \text{Bin}(n_{i}, \pi_{ij}) \text{ for } i = 1, \dots, 15, \ j = 1, \dots, n_i = 200, \ N = \sum_i n_i = 3000\).

  2. Structural assumptions: The linear predictor (with random effects) with one fixed covariate: \[\eta_{ij} = \beta_0+\beta_1 x_{ij} + \gamma_{0i}+\gamma_{1i}x_{ij} \] where we will choose \(x_{ij}=u_{ij}\). The link function is: \[\eta_{ij} = \ln \left(\frac{\pi_i}{1-\pi_i}\right)\]

  3. Distributional assumptions for random effects \[\begin{pmatrix} \gamma_{0i}\\ \gamma_{1i} \end{pmatrix} \sim N \left( \begin{pmatrix}0\\ 0 \end{pmatrix},\begin{pmatrix} \tau_0^2 & \tau_{01} \\ \tau_{01} &\tau_1^2 \end{pmatrix} \right)\]

Now we will look at three common ways of analysing such models. For each type we will fit the model and look at results.

fit_glmer <- glmer(y ~ x + (1 + x | group), data = mydata, family = binomial(link = "logit"))

fit_glmmTMB <- glmmTMB(y ~ x + (1 + x | group), data = mydata, family = binomial(link = "logit"))

n.gr <- max(mydata$group)
mydata$group2 <- mydata$group + n.gr
fit_inla <- inla(y ~ x + f(group, model = "iid2d", n = 2*n.gr) + f(group2, x, copy = "group"),
                 data = mydata, family = "binomial",
                 control.compute = list(dic = TRUE),
                 control.family = list(link = "logit"))
coefdf <- data.frame(mean = 
                       c(summary(fit_glmer)$coefficients[,1], 
                         summary(fit_glmmTMB)$coefficients$cond[,1], 
                         fit_inla$summary.fixed[,1]),
                     sd = 
                       c(summary(fit_glmer)$coefficients[,2], 
                         summary(fit_glmmTMB)$coefficients$cond[,2], 
                         fit_inla$summary.fixed[,2]),
                     mod = rep(c("glmer", "glmmTMB", "INLA"), each = length(fit_inla$names.fixed)),
                     par = rep(fit_inla$names.fixed, 3))

true_frame <- data.frame(par = fit_inla$names.fixed, beta = c(beta_0, beta_1))

critval <- qnorm(0.025, lower.tail = FALSE)
ggplot(coefdf) + geom_point(aes(x = mod, y = mean)) + 
  geom_hline(aes(yintercept = 0), col = "#D55E00") +
  geom_hline(data = true_frame, aes(yintercept = beta), col = "forestgreen", size = 1) +
  geom_errorbar(aes(x = mod, ymin = mean-critval*sd, ymax = mean+critval*sd)) +
  facet_wrap(~ par, scales = "free_y") + labs (x = "", y = "")

randdf_intercept <- data.frame(mean = 
                      c(ranef(fit_glmer)$group[,1],
                        ranef(fit_glmmTMB)$cond$group[,1],
                        fit_inla$summary.random$group$mean[unique(mydata$group)],
                        random_int),
                    mod = rep(c("glmer", "glmmTMB", "INLA", "true"), each = length(unique(mydata$group))),
                    x = paste0("y", sprintf("%02d", rep(1:length(unique(mydata$group)), 4))))

critval <- qnorm(0.025, lower.tail = FALSE)
randdf_intercept$low <- c(se.ranef(fit_glmer)$group[,1]*(-critval) + ranef(fit_glmer)$group[,1],
                          ranef(fit_glmmTMB)$cond$group[,1], # do not know how to get these values for glmmTMB
                          fit_inla$summary.random$group$`0.025quant`[unique(mydata$group)],
                          rep(0, 15))
randdf_intercept$high <- c(se.ranef(fit_glmer)$group[,1]*(critval) + ranef(fit_glmer)$group[,1],
                          ranef(fit_glmmTMB)$cond$group[,1], # do not know how to get these values for glmmTMB
                          fit_inla$summary.random$group$`0.975quant`[unique(mydata$group)],
                          rep(0, 15))
randdf_intercept$x2 <- rep(1:15, 4) + rep(c(-0.15,-0.05,0.05,0.15), each = 15)

ggplot(randdf_intercept) + geom_point(aes(x = mean, y = x2, col = mod)) + geom_vline(xintercept = 0, lty = 2) +
  geom_segment(aes(x = low, xend = high, y = x2, yend = x2, col = mod)) +
  labs(x = "random intercept", y = "") + 
  scale_y_continuous(breaks = 1:15, labels = paste0("y", sprintf("%02d", rep(1:length(unique(mydata$group))))))

randdf_slope <- data.frame(mean =
                      c(ranef(fit_glmer)$group[,2],
                        ranef(fit_glmmTMB)$cond$group[,2],
                        fit_inla$summary.random$group$mean[unique(mydata$group2)],
                        random_slope),
                    mod = rep(c("glmer", "glmmTMB", "INLA", "true"), each = length(unique(mydata$group))),
                    x = paste0("y", sprintf("%02d", rep(1:length(unique(mydata$group)), 4))))


critval <- qnorm(0.025, lower.tail = FALSE)
randdf_slope$low <- c(se.ranef(fit_glmer)$group[,2]*(-critval) + ranef(fit_glmer)$group[,2],
                          ranef(fit_glmmTMB)$cond$group[,2], # do not know how to get these values for glmmTMB
                          fit_inla$summary.random$group$`0.025quant`[unique(mydata$group2)],
                          rep(0, 15))
randdf_slope$high <- c(se.ranef(fit_glmer)$group[,2]*(critval) + ranef(fit_glmer)$group[,2],
                          ranef(fit_glmmTMB)$cond$group[,2], # do not know how to get these values for glmmTMB
                          fit_inla$summary.random$group$`0.975quant`[unique(mydata$group2)],
                          rep(0, 15))
randdf_slope$x2 <- rep(1:15, 4) + rep(c(-0.15,-0.05,0.05,0.15), each = 15)

ggplot(randdf_slope) + geom_point(aes(x = mean, y = x2, col = mod)) + geom_vline(xintercept = 0, lty = 2) +
  geom_segment(aes(x = low, xend = high, y = x2, yend = x2, col = mod)) +
  labs(x = "random slope", y = "") + 
  scale_y_continuous(breaks = 1:15, labels = paste0("y", sprintf("%02d", rep(1:length(unique(mydata$group))))))

fittedvalues <- data.frame(mean = 
                             c(fitted.values(fit_glmer),
                               fitted.values(fit_glmmTMB),
                               fit_inla$summary.fitted.values$mean),
                           mod = rep(c("glmer", "glmmTMB", "INLA"), each = nrow(mydata)),
                           x = rep(mydata$x, 3),
                           group = rep(mydata$group,3))

ggplot(fittedvalues) + geom_count(data = mydata, aes(x = y*0.8+0.1, y = y*0.8+0.1), col = grey(0.4)) + scale_size_area() +
  geom_point(aes(x = x, y = mean, col = mod, pch = mod, size = 1.5)) + 
  facet_wrap(~ group) + labs(x = "x", y = expression(hat(y)))

The results are quite similar (as approximation is necessary in the computations, some differences are expected). We can make more complicated models, and (especially from INLA) we also get much more information about the model, but this is way out of the scope of this course. Models like this require computer intenisve methods, and if you are interested in learning more about this, you can take the course TMA4300 - Computer Intensive Statistical Methods in the spring semester.

R packages

# install.packages('arm')
install.packages("reshape2")
install.packages("sp")
install.packages("glmmTBM")
install.packages("INLA", repos = "https://inla.r-inla-download.org/R/stable", dep = TRUE)
install.packages("lme4")
install.packages("devtools")
library(devtools)
# install_github('romunov/AED')
install.packages("sjPlot")
install.packages("sjmisc")

Further reading


  1. if you want to know how to age a fish, NOAA have you covered↩︎