Fun with link-functions

This site serves future me and anyone interested as a help to understand how to deal with link-functions in linear models. One of the difficulties is that the interpretation of models’ parameters changes with the link function and the type of predictors used (see this paper).

If you detect an error, please let me know: lilla.gurtner2@unibe.ch

In general, there are different packages that provide helper functions to get at the estimates of a model at the link level and at the expected and predicted response level (see this helpfull site for the distinctions). I will compare the results of three different packages: brms, tidybayes, marginaleffects.

brms has some functions like posterior_linpred() and posterior_epred() that give access to the draws of the posterior estimates. They are the basis of the following packages, but the results are a bit cumbersome to work with.

tidybayes makes access to these draws simpler by providing them in a tidy format as a tibble rather than a matrix. Based on this direct access to the posterior, summary statistics can be computed / plotted.

marginaleffects uses the insight package to provide marginal means and constrasts for a given model. It provides the predictions() functions to get predictions for each observations, and the comparisons() function to take easily the difference between e.g. groups. Both can have the prefix avg_before them to take the average of the above.

In the first section, I collect the different functions from the different packages and compare them to each other, to be best of my understanding. In the second section, I go through two examples.

Setup

An example generalized linear regression model.

library(tidyverse)
library(brms)
library(marginaleffects)
library(tidybayes)
library(ggdist)

set.seed(123)
dat <- epilepsy 

# a simple model fitting number of epileptic seizure by the treatment (0 or 1)
mod <- brm(count ~ 1 + Trt, 
                  family = poisson(), 
                  data = dat, 
                  file = "poisson_model") 

# the linkfunction of the poisson regression is a log()

Functions to get to the estimates

Estimates on the expected response level

In this part, we get estimates that are back-transformed through the inverse link function. Given that in most brms models, per default, the linear part of the model predicts the central tendency or expected value of the outcome distribution, the estimates that are backtransformed give only the expected values. Thus, the sigma term from brms, which could be the measurement error, is not taken into consideration. Therefore, credible intervals on this level are much narrower than in the next section.

using brms

Again, brms returns matrices, that are not super handy to work with, but these are the basis for the following functions

a <- brms::posterior_linpred(mod, transform = TRUE)
# expected response, without sigma
# # matrix with n_samples x n_observations
b <- brms::posterior_epred(mod)

unique(a == b)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
     [,99] [,100] [,101] [,102] [,103] [,104] [,105] [,106] [,107] [,108]
[1,]  TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,109] [,110] [,111] [,112] [,113] [,114] [,115] [,116] [,117] [,118]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,119] [,120] [,121] [,122] [,123] [,124] [,125] [,126] [,127] [,128]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,129] [,130] [,131] [,132] [,133] [,134] [,135] [,136] [,137] [,138]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,139] [,140] [,141] [,142] [,143] [,144] [,145] [,146] [,147] [,148]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,149] [,150] [,151] [,152] [,153] [,154] [,155] [,156] [,157] [,158]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,159] [,160] [,161] [,162] [,163] [,164] [,165] [,166] [,167] [,168]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,169] [,170] [,171] [,172] [,173] [,174] [,175] [,176] [,177] [,178]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,179] [,180] [,181] [,182] [,183] [,184] [,185] [,186] [,187] [,188]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,189] [,190] [,191] [,192] [,193] [,194] [,195] [,196] [,197] [,198]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,199] [,200] [,201] [,202] [,203] [,204] [,205] [,206] [,207] [,208]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,209] [,210] [,211] [,212] [,213] [,214] [,215] [,216] [,217] [,218]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,219] [,220] [,221] [,222] [,223] [,224] [,225] [,226] [,227] [,228]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE
     [,229] [,230] [,231] [,232] [,233] [,234] [,235] [,236]
[1,]   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE

using tidybayes()

a <- tidybayes::add_linpred_draws(object = mod, newdata = dat, transform = TRUE) 
# tibbles with n_observation_n_chains*n_draws lines
b <-tidybayes::add_epred_draws(object = mod, newdata = dat)
c <-tidybayes::epred_draws(object = mod, newdata = dat)

unique(a$.linpred == b$.epred)
[1] TRUE
unique(a$.linpred == c$.epred)
[1] TRUE
unique(b$.epred == c$.epred)
[1] TRUE

In this case, the three lines give the same result. But see here, and here for why it will not always be the case (e.g. in lognormal models!)

using marginaleffects

type = “response”: Compute posterior draws of the expected value using the brms::posterior_epred function.

# get the estimates on a unit level, conditional on all other predictors being average
a <- marginaleffects::predictions(mod, type = "response") 
# tibbel with prediction for each row of the dataset that was used fitting the model (is not the same as observation in the dataset if you did not impute)
a <- marginaleffects::predictions(mod, type = "response", newdata = dat) # prediction for each row of the dataset
# tibble with n_observatons (including those that had NA and were not part of the model if you did not impute)
b <- get_draws(a) # tibble with n_chain*n_sample*n_observations
a <- marginaleffects::avg_predictions(mod, type = "response", variables = "Trt") 
# tibbel with the mean of the predictions for each row of the dataset that was used fitting the model (is not the same as observation in the dataset if you did not impute)
a <- marginaleffects::avg_predictions(mod, type = "response", newdata = dat) # prediction for each row of the dataset
# tibble with one line, mean of n_observatons (including those that had NA and were not part of the model if you did not impute)
#compare the estimates by taking the difference

a <- marginaleffects::comparisons(mod, variables = "Trt", type = "response")
a <- marginaleffects::avg_comparisons(mod, variables = "Trt", type = "response")

Estimates on the predicted response level

Here, we produce predictions for either our existing data frame, or new data. The predictions are now on the true outcome level, i.e. not expected number of seizures, but actual predictions of actual integers.

using brms

a <- brms::posterior_predict(mod)
# prediction of individual responses, with sigma
# matrix with n_samples x n_observations

using tidybayes

a <- tidybayes::add_predicted_draws(object = mod, newdata = dat)
b <- tidybayes::predicted_draws(object = mod, newdata = dat)

unique(a$.prediction == b$.prediction)
[1] FALSE  TRUE

These are not always the same since there is now “sampling error” included in the predictions. Also, if you rerun each line, there will be different .predictions made.

using marinaleffects

a <- marginaleffects::predictions(mod, type = "prediction")
a <- marginaleffects::predictions(mod, type = "prediction", newdata = dat)
#type = "prediction": Compute posterior draws of the posterior predictive distribution using the brms::posterior_predict function.
b <- get_draws(a) # tibble with n_chain*n_sample*n_observations

a <- marginaleffects::avg_predictions(mod, type = "prediction")
# averaged over predictions() from above

Direct comparison of outputs

In this part, I directly compare the output of tidybayesand marginaleffects, with by-hand computations of the estimates, to better understand what does what. I do it first continuing with the Poisson regression from above.

Poisson Regression

Poisson regressions use the log() as a link function, to ensure that the linear part of the model does not end up giving a lambda that is negative (which poisson()-family cannot handle). So, the model goes like this:

\(outcome \sim Poisson(lambda)\)

\(log(lambda) = a + bx\)

For a and b, we need to specify priors in Baeysian analysis

\(a \sim Normal(0,2)\)

\(b \sim Normsl(0,1)\)

We can use the marginaleffects-Package for prior predictive checks, or we can also use pp_check(). This is not of interest here.

Expected response level comparison

Now, let’s look at the numbers on the response level

response_comparison <- tibble(origin = character(), 
                          estimate = numeric(), 
                          lowerCI = numeric(), 
                          upperCI = numeric()) 

## fixef baseline ----
# push link-level estimates through the inverse linkfunction
fixef_row <- tibble(origin = "fixef(mod)", # from fixef as baseline
          estimate = exp(fixef(mod)[2,1] + fixef(mod)[1,1]) - exp(fixef(mod)[1,1]), 
          lowerCI = exp(fixef(mod)[2,3] + fixef(mod)[1,3]) - exp(fixef(mod)[1,3]), 
          upperCI = exp(fixef(mod)[2,4] + fixef(mod)[1,4]) - exp(fixef(mod)[1,4])) 


## tidybayes ----
# group_by(.draw, Trt)
# compute mean per draw
# pivot wider
# subtract group estimates within draw
# summarise

# correct way to do it
tidybayes_row <- mod |>
  add_linpred_draws(newdata = dat,
    transform = T) |> # now on the response level
  group_by(.draw, Trt) |> # for each draw and level
  summarise(.linpred = mean(.linpred), .groups = "drop") |> # take the mean => 2*4000 rows
  tidyr::pivot_wider(names_from = Trt,
    values_from = .linpred) |># prepare to take the difference
  mutate(diff = `1` - `0`) |>        
  summarise(origin = "tidybayes_correct",# now summarize over the draws
    estimate = mean(diff),
    lowerCI  = quantile(diff, 0.025),
    upperCI  = quantile(diff, 0.975))





# This summarizes first and then does the subtraction, this produces much closer CIs
# There are two ways to this mistake. 

#one
tidybayes_diffs <- tidybayes::add_linpred_draws(object = mod, 
                                  newdata = dat, 
                                  transform = T) |> # set T/F for response / link level
  group_by(Trt) |> 
  summarise(mean_linpred = mean(.linpred),
            lower_CI = quantile(.linpred, probs = 0.025), 
            upper_CI = quantile(.linpred, probs = 0.975)) |> # group expected values
  summarise(across(2:4, ~ .[2] - .[1])) # group difference

tidybayes_row_w1 <- tibble(origin = "tidybayes_wrong1", # from fixef as baseline
          estimate = tidybayes_diffs$mean_linpred, 
          lowerCI = tidybayes_diffs$lower_CI, 
          upperCI = tidybayes_diffs$upper_CI) 

# twp
a <- tidybayes::add_linpred_draws(object = mod, 
                                  newdata = dat, 
                                  transform = T) |> 
  filter(Trt == 1) |> 
  pull(.linpred) |> 
  mean_qi(.linpred, .width = c(0.95))

  
b <- tidybayes::add_linpred_draws(object = mod, 
                                  newdata = dat, 
                                  transform = T) |> 
  filter(Trt == 0) |> 
  pull(.linpred) |> 
  mean_qi(.linpred, .width = c(0.95))

tidybayes_diffs2 <- tibble(mean_linpred = a$y - b$y, 
                     lower_CI = a$ymin - b$ymin, 
                     upper_CI = a$ymax - b$ymax)

tidybayes_row_w2 <- tibble(origin = "tidybayes_wrong2", # from fixef as baseline
          estimate = tidybayes_diffs2$mean_linpred, 
          lowerCI = tidybayes_diffs2$lower_CI, 
          upperCI = tidybayes_diffs2$upper_CI) 

### marginaleffects -----
# per default, mariginaleffects uses the median as a central tendency
# https://marginaleffects.com/man/r/predictions.html#bayesian-posterior-summaries
# to make it comparable to the other things, change this to the mean: 

options("marginaleffects_posterior_center" = "mean", # not the default
        "marginaleffects_posterior_interval" = "eti") # default, just to be explicit

# again here, there is a right and a wrong way to do it

# correct: 
me_row_avg_pred_correct <- marginaleffects::predictions(mod,
  newdata = dat,
  type = "response") |>
  get_draws() |> 
  group_by(drawid, Trt) |> 
   summarise(draws = mean(draw), .groups = "drop") |> 
  pivot_wider(names_from = Trt,
    values_from = draws) |># prepare to take the difference
  mutate(diff = `1` - `0`) |>        
  summarise(origin = "marginaleffects::predictions_correct",# now summarize over the draws
    estimate = mean(diff),
    lowerCI  = quantile(diff, 0.025),
    upperCI  = quantile(diff, 0.975))

# wrong: summarizing before subtracting

margEf_avgPred <- marginaleffects::avg_predictions(mod, 
                                 type = "response", # set to "link" or "response
                                 variables = "Trt") |> # grouped expected values
  summarise(across(2:4, ~ .[2] - .[1])) # group differenece


me_row_avg_pred_w <- tibble(origin = "marginaleffects::avg_predictions_wrong", # from fixef as baseline
          estimate = margEf_avgPred$estimate, 
          lowerCI = margEf_avgPred$conf.low, 
          upperCI = margEf_avgPred$conf.high) 



# do it with the comparisons function
margEf_avgComp <- marginaleffects::avg_comparisons(mod, 
                                 type = "response", # set to "link" or "response
                                 variables = "Trt")


me_row_avg_comp <- tibble(origin = "marginaleffects::avg_comparisons_mean", # from fixef as baseline
          estimate = margEf_avgComp$estimate, 
          lowerCI = margEf_avgComp$conf.low, 
          upperCI = margEf_avgComp$conf.high) 
response_comparison <- response_comparison |> 
  add_row(fixef_row) |> 
  add_row(tidybayes_row) |> 
  add_row(tidybayes_row_w1) |> 
  add_row(tidybayes_row_w2) |> 
  add_row(me_row_avg_pred_correct) |> 
  add_row(me_row_avg_pred_w) |> 
  add_row(me_row_avg_comp)

response_comparison
# A tibble: 7 × 4
  origin                                 estimate lowerCI upperCI
  <chr>                                     <dbl>   <dbl>   <dbl>
1 fixef(mod)                               -0.626  -1.23   0.110 
2 tidybayes_correct                        -0.626  -1.38   0.0983
3 tidybayes_wrong1                         -0.626  -0.614 -0.664 
4 tidybayes_wrong2                         -0.626  -0.614 -0.664 
5 marginaleffects::predictions_correct     -0.626  -1.38   0.0983
6 marginaleffects::avg_predictions_wrong   -0.626  -0.614 -0.664 
7 marginaleffects::avg_comparisons_mean    -0.626  -1.38   0.0983

The numbers in this table are mode interpretable. They are the expected difference in seizures between the treatment and the control group. However, the CI-numbers again are not exactly the same. the plots to see what is exactly equal:

tidybayes_draws <- tidybayes::add_linpred_draws(object = mod, 
                                  newdata = dat, 
                                  transform = T) |> 
 ggplot(aes(x = .linpred, fill =  Trt)) +
  stat_halfeye() + 
  ylab("tidybayes::linpred")


marginaleff_pred_draws <- marginaleffects::predictions(model = mod, type = "response") |> 
  get_draws() |> 
  ggplot(aes(x = draw, fill =  Trt)) +
  stat_halfeye()+
  ylab("marginaleffects::predictions")

cowplot::plot_grid(tidybayes_draws, marginaleff_pred_draws, nrow = 2, 
                   labels = "poisson estimates on the response level")

Bernoulli / Logistic regression

Now, let’s walk though another example, this time a logistic regression, with a new link function, the logit(). To make such a model, let’s dichotomize the count column into “many” and “not many” seizures, and predict it using the Trt variable again.

dat_logistic <- dat |> 
  mutate(count_many = case_when(count < 10 ~ 0, 
                                count >= 10 ~ 1, 
                                TRUE ~ -10)) #test for NAs


mod_logistic <- brm(count_many ~ 1 + Trt, 
                    family = bernoulli(), 
                    data = dat_logistic,
                    file = "logistic_model" )

Expected response level comparison

Now, let’s look at the numbers on the response level. Because we will look at the estimated increase of the probability of having “many” seizures, the by-hand calculations become a bit complicated. Because of the non-linearity of the link-function, and the model-specification, the increase needs to be calculated by first adding Intercept and beta-coefficient, then inverse-linking it, then doing the same with only the intercept, then subtracting these two from one another. The numbers are a good fit for the conditional_effects() plot and therefore, I trust them.

response_comparison <- tibble(origin = character(), 
                          estimate = numeric(), 
                          lowerCI = numeric(), 
                          upperCI = numeric()) 

## fixef baseline ----
# push link-level estimates through the inverse linkfunction

fixef_row <- tibble(origin = "fixef(mod)", # from fixef as baseline
          estimate = boot::inv.logit(fixef(mod_logistic)[2,1] + 
                                       fixef(mod_logistic)[1,1]) - 
            boot::inv.logit(fixef(mod_logistic)[1,1]), 
          lowerCI = boot::inv.logit(fixef(mod_logistic)[2,3] +
                                      fixef(mod_logistic)[1,3]) -
            boot::inv.logit(fixef(mod_logistic)[1,3]), 
          upperCI = boot::inv.logit(fixef(mod_logistic)[2,4] +
                                      fixef(mod_logistic)[1,4]) - 
            boot::inv.logit(fixef(mod_logistic)[1,4])) 


## tidybayes ----
# group_by(.draw, Trt)
# compute mean per draw
# pivot wider
# subtract group estimates within draw
# summarise

# correct way to do it
tidybayes_row <- mod_logistic |>
  add_linpred_draws(newdata = dat_logistic,
    transform = T) |> # now on the response level
  group_by(.draw, Trt) |> # for each draw and level
  summarise(.linpred = mean(.linpred), .groups = "drop") |> # take the mean => 2*4000 rows
  tidyr::pivot_wider(names_from = Trt,
    values_from = .linpred) |># prepare to take the difference
  mutate(diff = `1` - `0`) |>        
  summarise(origin = "tidybayes_correct",# now summarize over the draws
    estimate = mean(diff),
    lowerCI  = quantile(diff, 0.025),
    upperCI  = quantile(diff, 0.975))





# This summarizes first and then does the subtraction, this produces much closer CIs
# There are two ways to this mistake. 

#one
tidybayes_diffs <- tidybayes::add_linpred_draws(object = mod_logistic, 
                                  newdata = dat_logistic, 
                                  transform = T) |> # set T/F for response / link level
  group_by(Trt) |> 
  summarise(mean_linpred = mean(.linpred),
            lower_CI = quantile(.linpred, probs = 0.025), 
            upper_CI = quantile(.linpred, probs = 0.975)) |> # group expected values
  summarise(across(2:4, ~ .[2] - .[1])) # group difference

tidybayes_row_w1 <- tibble(origin = "tidybayes_wrong1", # from fixef as baseline
          estimate = tidybayes_diffs$mean_linpred, 
          lowerCI = tidybayes_diffs$lower_CI, 
          upperCI = tidybayes_diffs$upper_CI) 

# twp
a <- tidybayes::add_linpred_draws(object = mod_logistic, 
                                  newdata = dat_logistic, 
                                  transform = T) |> 
  filter(Trt == 1) |> 
  pull(.linpred) |> 
  mean_qi(.linpred, .width = c(0.95))

  
b <- tidybayes::add_linpred_draws(object = mod_logistic, 
                                  newdata = dat_logistic, 
                                  transform = T) |> 
  filter(Trt == 0) |> 
  pull(.linpred) |> 
  mean_qi(.linpred, .width = c(0.95))

tidybayes_diffs2 <- tibble(mean_linpred = a$y - b$y, 
                     lower_CI = a$ymin - b$ymin, 
                     upper_CI = a$ymax - b$ymax)

tidybayes_row_w2 <- tibble(origin = "tidybayes_wrong2", # from fixef as baseline
          estimate = tidybayes_diffs2$mean_linpred, 
          lowerCI = tidybayes_diffs2$lower_CI, 
          upperCI = tidybayes_diffs2$upper_CI) 

### marginaleffects -----
# per default, mariginaleffects uses the median as a central tendency
# https://marginaleffects.com/man/r/predictions.html#bayesian-posterior-summaries
# to make it comparable to the other things, change this to the mean: 

options("marginaleffects_posterior_center" = "mean", # not the default
        "marginaleffects_posterior_interval" = "eti") # default, just to be explicit

# again here, there is a right and a wrong way to do it

# correct: 
me_row_avg_pred_correct <- marginaleffects::predictions(mod_logistic,
  newdata = dat_logistic,
  type = "response") |>
  get_draws() |> 
  group_by(drawid, Trt) |> 
   summarise(draws = mean(draw), .groups = "drop") |> 
  pivot_wider(names_from = Trt,
    values_from = draws) |># prepare to take the difference
  mutate(diff = `1` - `0`) |>        
  summarise(origin = "marginaleffects::predictions_correct",# now summarize over the draws
    estimate = mean(diff),
    lowerCI  = quantile(diff, 0.025),
    upperCI  = quantile(diff, 0.975))

# wrong: summarizing before subtracting

margEf_avgPred <- marginaleffects::avg_predictions(mod_logistic, 
                                 type = "response", # set to "link" or "response
                                 variables = "Trt") |> # grouped expected values
  summarise(across(2:4, ~ .[2] - .[1])) # group differenece


me_row_avg_pred_w <- tibble(origin = "marginaleffects::avg_predictions_wrong", # from fixef as baseline
          estimate = margEf_avgPred$estimate, 
          lowerCI = margEf_avgPred$conf.low, 
          upperCI = margEf_avgPred$conf.high) 



# do it with the comparisons function
margEf_avgComp <- marginaleffects::avg_comparisons(mod_logistic, 
                                 type = "response", # set to "link" or "response
                                 variables = "Trt")


me_row_avg_comp <- tibble(origin = "marginaleffects::avg_comparisons_mean", # from fixef as baseline
          estimate = margEf_avgComp$estimate, 
          lowerCI = margEf_avgComp$conf.low, 
          upperCI = margEf_avgComp$conf.high) 
response_comparison <- response_comparison |> 
  add_row(fixef_row) |> 
  add_row(tidybayes_row) |> 
  add_row(tidybayes_row_w1) |> 
  add_row(tidybayes_row_w2) |> 
  add_row(me_row_avg_pred_correct) |> 
  add_row(me_row_avg_pred_w) |> 
  add_row(me_row_avg_comp)

response_comparison
# A tibble: 7 × 4
  origin                                 estimate lowerCI upperCI
  <chr>                                     <dbl>   <dbl>   <dbl>
1 fixef(mod)                              -0.0815 -0.122   0.0415
2 tidybayes_correct                       -0.0809 -0.190   0.0302
3 tidybayes_wrong1                        -0.0809 -0.0673 -0.0922
4 tidybayes_wrong2                        -0.0809 -0.0673 -0.0922
5 marginaleffects::predictions_correct    -0.0809 -0.190   0.0302
6 marginaleffects::avg_predictions_wrong  -0.0809 -0.0673 -0.0922
7 marginaleffects::avg_comparisons_mean   -0.0809 -0.190   0.0302

The numbers in this table indicate the increase in probability of having “many” seizures if one is in the treatment group compared to the control group. I have no clue as to why they are not the same.

And look at the plot again to verify that tidybayes and marginaleffects give the same

tidybayes_draws <- tidybayes::add_linpred_draws(object = mod_logistic, 
                                  newdata = dat_logistic, 
                                  transform = T) |> 
 ggplot(aes(x = .linpred, fill =  Trt)) +
  stat_halfeye() + 
  ylab("tidybayes::linpred")


marginaleff_pred_draws <- marginaleffects::predictions(model = mod_logistic, type = "response") |> 
  get_draws() |> 
  ggplot(aes(x = draw, fill =  Trt)) +
  stat_halfeye()+
  ylab("marginaleffects::predictions")

cowplot::plot_grid(tidybayes_draws, marginaleff_pred_draws, nrow = 2, 
                   labels = "logistic regression estimates on the response level")

tl;dr

Both tidybayes and mariginaleffects have nice functions to get to the estimates of generalized linear models, both on the link, the expected and the predicted response levels. They can be made to give the same results. The most important thing is to always summarize last, work as long as possible with the draws.

However, the calculation of the credible intervals is not always consistent across link and response levels. The credible intervals of marginaleffects::avg_comparisons(), marginaleffects::predictions(), and tidybayes::linpred() are identical to the CIs of fixef(model) on the link-level. But the central tendencies and CIs are slightly off for the response-level estimates, which is because the link-function leads to non-linear behaviour. Essentially, because:

\(E(inverse.linkfunction(η) \neq inverse.linkfunction(E(η))\)

if the link function is not the identity function.

The central tendency of marginaleffects::avg_comparisons() is by default the median, which is why, on the link level, it gives a slightly different number than fixef() if this is not changed.

Caveat: I have only tried this with models with one predictor. These things might change as there are more predictors, because then, things get more complicated.