Bootstrapped 95%CIs and ECx from GAMMs

Checked 02/05/2023

Similar to my previous blog on deriving ‘Effect Concentrations’ from GLMMs, this blog details a way to code bootstrapped 95% CI from GAMMs and their bootstrapped ECx. I know of no method to extract ECx from GAMMs analytically like we can for GLMMs, so here I interpolate from the predictions. This isn’t super accurate but the error will be quite small – make sure to keep the prediction length >100 though.

This dataset doesn’t really require a GAMM, I’ve just used it for demonstration purposes. I’ve added an observational-level random effect, which is typically used for tank experiments.

Lets load in some data.

# Load packages (if not installed Tools->Install Packages)
data1 <- read.table(file="", header= TRUE, dec=",")
# 2 Labeling and wrangling -----------------------------------------------
data1$sus.sed <- as.numeric(as.character(data1$sus.sed))
data1$log.x <- log10(data1$sus.sed) #log10 trans
data1$fert <- as.numeric(as.character(data1$fert))
data1$suc <- data1$fert
data1$tot <- as.numeric(as.character(data1$tot))
data1$obs <- factor(formatC(1:nrow(data1), flag="0", width = 3))# unique tank ID for later on
data1$log.x <- ifelse(data1$log.x < 0.1, 0, data1$log.x)  #Replace neg with zero

Next let’s fit a regular GAMM for demonstration sake. Here we are fitting a GAMM, and predicting along raw.x in the data.frame df1. We convert the predicted SE to approximate 95% CI by multiplying them by 1.96.

m2.gamm<-gamm(cbind(suc,(tot - suc))~s(log.x, fx = F, k = 3),
              random = list(obs=~1),
              method = "REML") #

df1 <- data.frame(log.x = seq(min(data1$log.x), max(data1$log.x), length = 30)) #setting up  new  data frame (df) defining log.x values to run 
pred.md1 <- predict(m2.gamm, newdata=df1, type = "response",, lpmatrix=TRUE)
df1$pred = pred.md1$fit
plot(data1$log.x,(data1$suc / data1$tot),main="GAM") #second plot
lines(df1$log.x, df1$pred, type = "l", lwd = 2, col = 2, xaxt = "n", las = 1) #plot model mean line
df1$ = pred.md1$fit + 1.96 * pred.md1$ #predicting the data1 95% CI
df1$ = pred.md1$fit - 1.96 * pred.md1$ #predicting the lower 95% CI
lines(df1$log.x, df1$, type = "l", lwd = 2,  xaxt = "n", las = 1) #plot data1 95% CI
lines(df1$log.x, df1$, type = "l", lwd = 2,  xaxt = "n", las = 1) #plot lower 95% CI

The code used to interpolate the ECx is below. This function calculates the ECx from the maximum of the curve. It simply works by finding the nearest value in the prediction data.frame to X% of the maximum response, and then the corresponding value in the predictor column. If you have a unimodal ‘hump-shaped’ trend, it will only calculate one ECx (I think).

ecx.interp <- function(ecx, df1) {
  inhibx <- max(df1$pred) * ecx
  nearest_idx <- which.min(abs(df1$pred - inhibx))
  df2 <- data.frame(
    ecx = df1$log.x[nearest_idx],
    lower = df1$log.x[which.min(abs(df1$ - inhibx))],
    data = df1$log.x[which.min(abs(df1$ - inhibx))]
ecx.interp(0.5, df1) #this is the ECx calculate between the top and 0.

Ok, now for a GAMM with bootstrapped 95% CI. Make sure to keep sims = as multiple of 200, otherwise the CIs can’t calculate properly. Ideally your finally attempt will want around 1000 otherwise the CIs will look wobbly, like below.

Here we have the code above wrapped into a function called m2.gamm, which outputs the prediction. In the for-loop, we resample the raw data.frame and then apply this function. The prediction is stored into a list and then a data.frame called df3. This data.frame is then ordered and next subset by the 0.025 (2.5%) or 0.975 (97.5%) columns. Because columns are integers, make sure your sims multiplied by these proportions result in an integer e.g. 200.

lst = list() #initial list
fit_gam <- function(resampled) {
  m2.gamm <- gamm(cbind(suc, (tot - suc)) ~ s(log.x, fx = F, k = 3),
                  random = list(obs = ~1),
                  family = binomial,
                  data = resampled,
                  method = "REML")
  pred.md1 <- predict(m2.gamm, newdata=df1, type = "response",
  out = unname(pred.md1$fit)
  # lst[[1]] = dd
  # return(lst)
} #gam function returns prediction

#Now resample the residuals x times
resid_gam <- residuals(m2.gamm$gam, type = "response")
sims = 200
for (i in 1:sims) {
  resampled_resid <- resid_gam[sample(length(resid_gam), replace = TRUE)]
  new_response <- m2.gamm$gam$fitted.values + resampled_resid
  resampled_data <- data1
  resampled_data$suc <- round(pmin(pmax(new_response * resampled_data$tot, 0), resampled_data$tot)) #pmax prevents values less than 0
  out <- fit_gam(resampled_data)
  lst[[i]] <- out

lst  #returns all predictions in list
df3 <-, lst)  #add this to 'loop into a list'
df3 #so all the predicted fits from the x simulations are stored as rows in this list

Now lets see how each resampled curve looks

#plot output of predictions
df4 <-
df4$log.x = df1$log.x
data1.long = df4 %>% tidyr::pivot_longer(-log.x,  names_to = "factors" ,values_to = "meas") %>% data.frame() #keep vec.x, add all other columns to factors , add all their values to meas)
ggplot(data1.long, aes(x = log.x, y = meas, color = 'steelblue1', group = factors)) +
  geom_line() +
  #geom_point(aes(y = meas)) +
  labs(x = "Predictor", y = "Response", title = "Multiple Model Fits") +
  theme_minimal() + theme(legend.position="none")

Let’s take the 2.5% and the 97.5% lines to see the bootstrapped CIs

#ordering and take the bottom 2.5% and top 97.5% lines
eta = 0.5*sims
lowerCI = 0.025*sims
dataCI = 0.975*sims
bb_se1<-apply(df3,2,function(X) X[order(X)]) #orders the predictions from lowest to highest
df1$boot.pred = bb_se1[eta,] #find the bottom 2.5%
df1$boot_lo = bb_se1[lowerCI,] #find the bottom 2.5%
df1$boot_up = bb_se1[dataCI,] #find the top 2.5%
plot(data1$log.x,(data1$suc / data1$tot),main="GAM") #second plot
lines(df1$log.x, df1$boot.pred, type = "l", lwd = 2, col = 2, xaxt = "n", las = 1) #plot model mean line
lines(df1$log.x, df1$boot_lo, type = "l", lwd = 2,  xaxt = "n", las = 1) #plot data2 95% CI
lines(df1$log.x, df1$boot_up, type = "l", lwd = 2,  xaxt = "n", las = 1) #plot lower 95% CI

With that in mind, we can now bootstrap the ECx.

Just to recap:

1) we fit a GAMM to the data

2) we extracted the residuals and resampled them

3) made new predictions from the resampled residuals

4) repeated this 200 times

Ok. Let resample just once to see how it look, but keeping everything in a function, which we will need below.

lst1 = list() #initial list
fit_gam2 <- function(resampled, ecx) {
  m2.gamm <- gamm(cbind(suc, (tot - suc)) ~ s(log.x, fx = F, k = 3),
                  random = list(obs = ~1),
                  family = binomial,
                  data = resampled,
                  method = "REML")
  pred<- predict(m2.gamm, newdata = df1, type = "response",
  df2 = data.frame(log.x = df1$log.x, pred = pred$fit)
  ecx.interp2 <- function(ecx, df2) {
    inhibx <- max(df2$pred) * ecx
    nearest_idx <- which.min(abs(df2$pred - inhibx))
    ecx = df2$log.x[nearest_idx]
  out1 = ecx.interp2(ecx, df2)
  out = unname(out1)
  # lst[[1]] = dd
  # return(lst)
} #gam function returns prediction
fit_gam2(resampled_data, 0.5)   

Now let’s run it in a loop for each resample

#Now run in loop for sims
ecx = 0.5
sims = 200
for(i in 1:sims) {
  # resampled  = data2[sample(nrow(data2), replace = TRUE), ]
  # out = fit_gam2(resampled, ecx)
  # lst[[i]] = out
  resampled_resid <- resid_gam[sample(length(resid_gam), replace = TRUE)]
  new_response <- m2.gamm$gam$fitted.values + resampled_resid
  resampled_data <- data1
  resampled_data$suc <- round(pmin(pmax(new_response * resampled_data$tot, 0), resampled_data$tot)) #pmax prevents values less than 0
  out <- fit_gam2(resampled_data, ecx)
  lst[[i]] <- out
lst  #returns all predictions in list
df3 = unlist(lst)
plot(density(df3)) #density plot
df4 = df3[order(df3)]
df1$boot.pred = df4[eta] #find the bottom 2.5%
df1$boot_lo = df4[lowerCI] #find the bottom 2.5%
df1$boot_up = df4[dataCI] #find the top 2.5%


df5 = data.frame(ecx = df3)
p1 = ggplot(df5, aes(x=ecx))+geom_density(aes(fill = 'steelblue4'), alpha=0.3)+
  stat_pointinterval(aes(y = 0.00, x = ecx),.width = c(.66, .95))
p1 = p1+scale_fill_manual( values = c("steelblue4"))+
  scale_color_manual( values = c("steelblue4"))+theme(legend.position="none")#nice
p1 = p1 + scale_y_continuous(name ="Density")
p1 = p1 + coord_cartesian(ylim = c(0.0, 3.5))

And if you want everything wrapped into a function

gamm_ecx <- function(data1, ecx) {
  data1$suc <- as.numeric(as.character(data1$suc))
  data1$tot <- as.numeric(as.character(data1$tot))
  data1$obs <- factor(formatC(1:nrow(data1), flag="0", width = 3))# unique tank ID for later on
lst1 = list() #initial list
data1$log.x <- ifelse(data1$log.x < 0.1, 0, data1$log.x)  #Replace neg with zero
m2.gamm<-gamm(cbind(suc,(tot - suc))~s(log.x, fx = F, k = 3),
              random = list(obs=~1),
              method = "REML") #
resid_gam <- residuals(m2.gamm$gam, type = "response")
df1 <- data.frame(log.x = seq(min(data1$log.x), max(data1$log.x), length = 30)) #setting up  new  data frame (df) defining log.x values to run 

fit_gam2 <- function(resampled, ecx) {
  m2.gamm <- gamm(cbind(suc, (tot - suc)) ~ s(log.x, fx = F, k = 3),
                  random = list(obs = ~1),
                  family = binomial,
                  data = resampled,
                  method = "REML")
  pred<- predict(m2.gamm, newdata = df1, type = "response",
  df2 = data.frame(log.x = df1$log.x, pred = pred$fit)
  ecx.interp2 <- function(ecx, df2) {
    inhibx <- max(df2$pred) * ecx
    nearest_idx <- which.min(abs(df2$pred - inhibx))
    ecx = df2$log.x[nearest_idx]
  out1 = ecx.interp2(ecx, df2)
  out = unname(out1)
} #gam function returns prediction
#fit_gam2(resampled_data, 0.5)   #gives the same asnwer as above but in a function

sims = 200
lst = list() #initial list
for(i in 1:sims) {
  resampled_resid <- resid_gam[sample(length(resid_gam), replace = TRUE)]
  new_response <- m2.gamm$gam$fitted.values + resampled_resid
  resampled_data <- data1
  resampled_data$suc <- round(pmin(pmax(new_response * resampled_data$tot, 0), resampled_data$tot)) #pmax prevents values less than 0
  out <- fit_gam2(resampled_data, ecx)
  lst[[i]] <- out
lst  #returns all predictions in list
df3 = unlist(lst)
plot(density(df3)) #density plot
df4 = df3[order(df3)]
eta = 0.5*sims
lowerCI = 0.025*sims
dataCI = 0.975*sims

boot.pred = df4[eta] #find the bottom 2.5%
boot_lo = df4[lowerCI] #find the bottom 2.5%
boot_up = df4[dataCI] #find the top 2.5%
df5 = data.frame(boot.pred, boot_lo, boot_up)
output_list = list(df4, df5)

boot_ecx = gamm_ecx(data1, 0.5)

If you fit multiple curves, you can then compare their ECx distributions together to assess for evidence of differences.

Happy bootstrapping!


Automated rapid light curve analyses in R

The objective of this post is to simplify and automate the process of conducting rapid light curve analysis from a Walz Pulse Amplitude Modulation Fluorometry (PAM) fluorometer using their WinControl generated csv files. The first time I did this I jumped around between Excel and Graphpad and eventually finished my analysis in under a day. But I didn’t like the jumping around, and wanted to keep my raw csv files so I decided to code it in R. Little did I realise it would take me a good number of days, although I learnt a lot in the process.

The code is long, and I suspect at times ugly. If you have a cleaner way to run some of the functions, I would be happy to know. For simplicity I haven’t included all the code below but rather described the steps in the text. The full code can be found here ( including some Walz csv files.

Major steps

  1. Naming csv files. Each RLC data from the PAM needs to be saved in a csv file. 2) Every file needs to be in an allocated folder, and each file labelled with a unique ID, with each label the same length seperated by spaces. I use Bulk Rename Utility to do this. An example may be ‘d001 mille’, ‘d050 mille’, ‘d100 mille’. Here the unique ID is the 4-digit ID, and the second block of letters describes the experiment. The code below will read the first bloack of letters.
  2. Importing the folder of RLC into R. Note that if you have delimited you read_delim(filename, delim = ‘;’). I have the unique ID saved as ‘disc’.
  3. Importing the environmental treatment data. Note: As you likely subsampled for the RLC curves, this code uses left_join to merge the two data.frames. You do not need them to match, as long as each has the same unique ID.
  4. Create an rETR column and data clean for anomalies.
  5. Get starting values. Run the function. This is from the modified from the Platt package (they did all the hard work).
  6. Run the nonlinear models and derive all the RLC parameters and models

Ok Let’s get started.

Step 1) Name the csv files in a format of the same length. The first block of letters should be your ID.

This is what it has inside a file. I have many parameters but you can see I have 3 RLCs in just this file. For the RLCs, the only parameter you need are the PAR and the Y(II) columns.

Step 2) Importing the folder of RLC

# Set the working directory
setwd("C:/.../r mil csvs")

# Load necessary libraries

# Define a function to read in csv files and add a "disc" column with the filename
read_csv_filename1 <- function(filename){
  ret <- read_csv(filename)
  ret$disc <- filename

# Read in all csv files in the working directory
filenames <- list.files(full.names = TRUE)
data1 <- ldply(filenames, read_csv_filename1)

# Extract the ID from the start of the file name and add it as a new column
data1$disc1 <- data1 %>% separate(disc, c("a", "b", 'c', 'd')) %>% .$b

Step 3. Importing the environmental treatment data. You should now have everything in a dataframe, however you will need to correctly label your columns into numeric, factors etc. You will also notice that you are lacking your experimental data, which we will now add in. Note: this does not need to be perfectly aligned or sorted, ‘left_join’ will match you ID’s and remove the rest.

# Read in environmental factors data
env.fact <- read.table(file="", 
                       header = TRUE,
                       dec = ",", 
                       na.strings = c("",".","NA"))

# Join data1 and env.fact by the disc1 column
data2 <- left_join(data1, env.fact, by = 'disc1')

# Select only necessary columns from data1
data.s <- dplyr::select(data1, c('disc1', 'PAR','dli', 'spec', 'Y(II)1','Y(II)2', 'Y(II)3'))

# Create long format and pivot data
data1.long <- data.s %>% pivot_longer(-c(disc1, PAR ,dli, spec), 
                                      names_to = "rep" ,
                                      values_to = "meas")

# Create individual ID for each RLC
data1.long$id <- paste(data1.long$disc1, data1.long$rep, sep = "")

Step 4. Create an rETR column and data clean for anomalies. For corals we use rETR instead of ETR which is simply the PAR x Yield

# Calculate rETR 
data1.long$rETR <- data1.long$PAR * data1.long$meas

# Load ggplot2 library

# source theme 

# create a scatter plot with ggplot2
p0 <- ggplot() + 
  geom_point(data = data1.long, 
             mapping = aes(x = PAR, y = rETR), 
             size = 1) + 

# print the plot

Most of my curves look okay at first glance but you can see some have some weird anomalies. I’m going to remove those data points as they are likely to be artifacts.

data1.l = data1.long[which(data1.long$rETR<100),] #remove rTER anomalies

Looking better now. Okay, now we are getting ready to fit the Marquardt–Levenberg regression algorithm which is the standard equation used for RLC. But before we do that, often non-linear models do not handle zero values well. So I will replace all zeros with a very small value. As a rule of thumb, I use one magnitude lower than the lowest treatment level.

data1.l$PAR <- ifelse(data1.l$PAR <= 0, 0.1, data1.l$PAR) #
data1.l$rETR <- ifelse(data1.l$rETR <= 0, 0.01, data1.l$rETR)

Step 5 and 6. Now it is time to run the equations on the data. I have to say, finding a starter function for the curves was far harder than I thought, but the author of the Platt package ( has done all the work here, from which I slightly modified the code. But first we might want to run it on just one RLC to make sure it is functioning correctly.

data1.s = split(data1.l, data1.l$id)
source("") #this is the starter values. Full #credit to the author of the Platt package
start = unname(getInitial(rETR ~, alpha, beta, Pmax), data1.s$m001Y(II)1)) #this finds the starting #values
md1 = nlsLM(rETR ~ Pmax*(1-exp(-alpha*PAR/Pmax))*(exp(-beta*PAR/Pmax)), start=list(Pmax=start[3],alpha=start[1], beta=start[2]), data = data1.s$m001Y(II)1) #notice I added the starting values in the equation
df.x <- data.frame(PAR = seq(0.1, 926, length = 100)) #setting up new data frame (df) defining log.x values to run
vec.x =df.x[,1]
plot(data1.s$m001Y(II)1$PAR, data1.s$m001Y(II)1$rETR, col = 'red')
lines(vec.x, predict(md1, df.x)) #looks good for m001Y(II)1

You can see for my first RLC, the curve has fit well. This is a pretty standard RLC, with a sharp increase in rETR with PAR and eventually a small decrease in rETR after a maximum has been reached.

Ok. Easy part done. At this point we need to now do this for multiple RLCs. That means finding starting values for each RLC, and adding it into multiple models. There is quite a bit of data wrangling here which I won’t explain involving removing errors in the functions. Please consult the full code for details.

Find all starting values

starts = data1.l %>% group_by(id) %>% do(broom::tidy(stats::getInitial(rETR ~, alpha, beta, Ys), data = . ))) %>%
pivot_wider(names_from = names, values_from = x, names_prefix = "") %>% dplyr::select (.,-c('NA'))
colnames(starts) <- c("id", "alpha.s", 'beta.s', 'Pmax.s')
starts = NaRV.omit(starts) #removes inf

Group each RLC and run the model on each.

test2 = data1.l %>% right_join(.,starts, by = 'id') %>%
group_by(id) %>%
do(model = try(nlsLM(rETR ~ Pmax*(1-exp(-alpha*PAR/Pmax))*(exp(-beta*PAR/Pmax)),
start = list(Pmax = mean(.$Pmax.s),
alpha = mean(.$alpha.s),
beta = mean(.$beta.s)),
data = .),silent = TRUE) ) #this gets models for all RLCs

We can then predict for each RLC to check the fit. Consult the github code for lots of wrangling.

Most look okay, I have ~6 out of 42 that didn’t fit, which I might want to look into why. Once happy, I can now run a similar set of code which stores the Pmax, alpha and beta parameters in a nested list.

test = data1.l %>% right_join(.,starts, by = 'id') %>%
group_by(id) %>%
do(model = try(broom::tidy(nlsLM(rETR ~ Pmax*(1-exp(-alphaPAR/Pmax))*(exp(-beta*PAR/Pmax)),
start = list(Pmax = mean(.$Pmax.s),
alpha = mean(.$alpha.s),
beta = mean(.$beta.s)),
data = .),silent = TRUE)) ) #this gets parameters for all models

Finally, we can now calculate the other parameters (i.e rETRmax, Ek, and Em)

unest.test = test %>% unnest(model)
df.param = dplyr::select(unest.test, c(id, term, estimate))
dat_wide <- df.param %>% pivot_wider(names_from = term, values_from = estimate) %>% dplyr::select(.,-c("NA")) #year goes to columns, their areas go as the values, area is the prefix
dat_wide$ETRm = dat_wide$Pmax*(dat_wide$alpha/(dat_wide$alpha+dat_wide$beta))*((dat_wide$beta/(dat_wide$alpha+dat_wide$beta)))^(dat_wide$beta/dat_wide$alpha)
dat_wide$Ek = dat_wide$ETRm/dat_wide$alpha
dat_wide$Em =(dat_wide$Pmax/dat_wide$alpha)*log((dat_wide$alpha+dat_wide$beta)/dat_wide$beta)
final.df = left_join(dat_wide, data1.long, by = 'id')

From here you can analyse your data as normal using you treatment levels. I hope this saved someone some time, and let me know if there are issues running this code with other data sets.

Should I use environmentally unrealistic concentrations? Sometimes…

It’s come up now and then in Reviewer’ comments, written in papers, and during conference questions – are you using environmentally relevant concentrations (ERCs)? It is a good question but I think often over-simplistic in that it creates a false-dichotomy i.e. use of only ERCs is a good thing, and any use of non-ERCs is a bad thing.

I put forward that this view has come from a more traditional Ecologist view of experimental design, when papers were dominated by categorical analyses and ANOVAs, and what were okay rules-of-thumb back some time ago, are pretty outdated now. But at the heart of this are differences in philosophical views of the purpose of laboratory experiments. More traditional Ecologists sometimes believe that purpose of running lab experiments is simply to replicate the field (spoiler alert — you can’t), whereas the alternative view is that the main purpose of lab experiments (excluding mechanistic experiments) is to derive causation, through the removal of confounding variables, and observing effects that scale with concentration etc. This occasionally means moving into the realm of artificiality in efforts to improve experimental control. Whether you use all treatment levels in ERCs really depends on quite a few things, but most importantly how many treatment levels you can spare. I’d argue that using some treatment levels outside of ERC is actually good experimental design, for some of the reasons I outline below.

  1. We need thresholds.

If an effect occurs outside of ERCs i.e. above or below what is environmentally relevant, we still need to know the threshold of an effect for regulation and risk assessment purposes. As Harris et al. 2014 argue (and not just relevant to hazardous substances):

“Indeed, there will be occasions where researchers have to use significantly higher concentrations in order to properly define a LOEC [Low observed effect concentration] for a substance. The LOEC of a substance is, in fact, far more useful in the regulatory sphere than is a conclusion that no effect occurs at environmentally relevant concentrations, because a LOEC enables the regulators to impose more accurate and meaningful safety limits.”

2. We might not know ERCs in other places or in the future

Take work on climate change – more extreme treatment levels are designed for future and often worse-case scenarios such as RCP 8.5. We don’t actually know what will happen in the future but we rely on modelling to get a decent estimate of likely future ERC. (Note: there is some discussion whether RCP 8.5 is actually a future ERC now, given more certainty in our carbon trajectory). It would be a waste of resources that one plans and undertakes an experiment, only to find out that what they thought was an ERC turned out to be only the 50th percentile in another space or time. So the experiment needs to be repeated again.

3. Interpolation

There are lots of reasons to design experiments with many treatment levels that allow for regression models, rather than categorical designs. Some authors have been ramming this home for 20+yrs and yet categorical derived thresholds (LOEC/NOEC) are still common place in ecology. I won’t go into all the details here but one benefit of regressions is the power of interpolation, which allows us to derive all the responses between the treatment levels tested, rather than solely the treatments used.

4. Modelling fit, and being sure the response you see is real.

Some models improve under a greater response. Take for example the four-parameter logistic equation, a nonlinear model traditionally used for dose-responses. Two parameters in the model are the ‘inflection point’ and the ‘lower asymptote’ of the model. Failing to acquire data near these parameters means their estimations aren’t great, leading to large confidence intervals. We can also make more assumptions about the data if we know its shape, leading to better model selection.

But more than that, we want to know that if we do see a response at the highest ERC, that it is real and not by chance. Seeing a continued response in that extra one-higher treatment level (outside the ERCs) will give you confidence to stand by your data. If however, the response disappears in the one- higher treatment level, you almost certainly have a false-positive. This is a type of Quality Control and should be congratulated.

5. Uncoupling correlating treatments

Often two factors or treatments are closely tied to each other and to understand the relative roles of each treatment, the experiment needs to be designed so each treatment can be assessed independent of each other and in various combinations i.e. a fully-crossed design. But many treatment combinations will be by nature unrealistic. Take for example my study area where I investigate the impacts of sediment and low light on organisms, both of these stressors individually cause effects, but both are also correlated. To uncouple and assess the effects on an organism we would need some unrealistic treatment combinations with low sediment and low light, and some with high sediment and high light. If the organisms died in the high sediment/high light treatment but not in the low sediment/low light treatment, then we would know it was the sediment, not the low-light, that caused mortality.

6. Mechanisms Sometimes mechanisms are difficult to observe at ERC, so you need to increase the ‘concentration’ to get a clear signal. The caveat for this is that there needs to be a clear response in the range of the ERCs first (or that this response is relatively established in the literature). For example, researchers know elevated temperatures cause coral bleaching well within environmentally relevant levels; it is unequivocal. So Researchers may simply skip to high temperatures to enable a strong response to work on physiological mechanisms such as heat-shock proteins. In my work looking at how spectral light profiles impact coral, an artificial (monochromatic) spectral profile that causes a response may further ‘hone in’ on the active wavelengths that cause a response under a broader spectrum, even though monochromatic light is unrealistic.


So what is to stop Researchers using strong responses and significant results outside of ERCs to their advantage such as acceptance into journals? Unfortunately this is common (take early microplastic research as an example) but is relatively easy to fix. Authors should (unless completely unknown) need to provide ERC percentiles with their treatment levels. It should be made very clearly what is likely and what is not.

Conclusion and cautions

A few treatment levels outside EVC can, if chosen correctly, add to quality assurance of experimental design, and make the data more far more useable. However if treatment levels are limited, then there is no point to use non-ERCs, and using them will do more harm than good. This is common in categorical designs where interpolation is not possible. Say we have a two factor, 3 x 3 treatment-level design. If one treatment level is not realistic, then 1/3 of treatment levels are unrealistic. If two treatment levels are unrealistic, then nearly half the experiment is unrealistic. Those using categorical designs really do need to choose their treatment levels very carefully around in situ percentiles of field data. However, in treatments/factors with numerous treatment levels, some unrealistically high or low treatments can often help improve the output of the analysis in ways that using solely ERCs can not.

Harris CA, Scott AP, Johnson AC, Panter GH, Sheahan D, Roberts M, Sumpter JP (2014) Principles of sound ecotoxicology. Environ Sci Technol 48:3100-3111

Bundle ballasting model in shiny

A few years ago I was lucky enough to collaborate with prominent researcher in fluid mechanics, Roman Stocker. We developed a mathematical mechanistic model to predict if coral egg-sperm bundles released during coral spawning could possibly sink (ballast) after encounters with sediment grains, as a function of depth, suspended sediment concentration and grain size.

However the main issue with more complicated models is that you can’t really present all combinations of useful inputs that may be relevant for Marine Managers that need to risk assess project. To address this I created a shiny app which I hope will allow easier use of the model.

Using this link you can manipulate the input parameters, changing the size of the bundles to match your species of interest, grain size, or water quality conditions at your local sites.

We also created another equation for the subsequent reduction in egg-sperm encounters that could occur at the water’s surface, which can be found on the second tab.

Ricardo GF, Negri AP, Jones RJ, Stocker R (2016) That sinking feeling: Suspended sediments can prevent the ascent of coral egg bundles. Scientific Reports 6:21567

Introducing jagsNEC – a Bayesian ‘No Effect Concentration’ package

Recently the Australian and New Zealand Water Quality Guidelines (ANZECC) have been updated (Warne et al 2018) and one significant change was the preference of ‘No Effect Concentrations (NECs)’ above other statistical estimates of toxicity such as EC10s and NOECs.

Here is a condensed list of the hierarchy from the Guidelines.

  1. NECs
  2. ECx where x is 10 or less
  3. BEC10s
  4. ECx where x is between 10 and 20
  5. NOECs
  6. NOECs estimated indirectly

Warne et al goes on to state ‘Although NECs are not regularly reported, they are considered the preferred measure of toxicity as they are more closely aligned with the objective of GVs [guideline values], that is, to protect aquatic ecosystems, as they are the concentrations that have no adverse effect on species. Reporting of NECs, and their subsequent use in GV derivation, is likely to increase in the future.

So what is a NEC? A NEC is a ‘no effect concentration’ estimate and is probably easiest to explain when compared to other statistical estimates. Below we have a NOEC calculated using a categorical analysis, an EC10 calculated from a logistic curve, and a NEC calculated from our package. The red indicates the statistical estimate or threshold. There are many problems with NOECs that have been thoroughly described, and which I won’t go into, but you can see it greatly underestimates the threshold. EC10s are better but the S-shape nature of the model means that estimates within 10% of the control often have large error, therefore researchers often report EC10s instead of EC0s. NECs on the other-hand use two models to work out where they best converge (segmented regression), leading to a cleaner estimate of the threshold.

As Warne et al states above, few researches use NECs, probably because they are not aware of them, or do not know how to code them. Our package builds on Pires et al (2002) and Fox (2010), introducing a large suite of probability distributions for many data types. And the whole process is simple, as it automatically detects the data-type you wish to model and selects an appropriate model. We have also included more traditional models such as four-parameter logistic curves and functions to extract ECx’s, even from segmented (NEC) models. And there are also functions to compare NECs or ECx’s of two models.

Below is a working example of how to run jagsNEC

Sys.setenv(“TAR” = “internal”)
devtools::install_github(“AIMS/NEC-estimation”) #
library(R2jags) <- read.table(“;, header= TRUE,dec=”,”) #bent.4$raw.x <- as.numeric(as.character($raw.x))
out <- fit.jagsNEC(,

Once the model has been selected, you will get a message identifying the distribution used “Response variable ‘suc’ modeled using a binomial distribution.”

Next you may want to do some model diagnostics using


All the output from the model is stored in the object ‘out’. To extract the NEC and EC50, and their 95% credible intervals we can use:

2.5% 50% 97.5%
4.391069 4.553966 4.686907

EC_50 EC_50_lw EC_50_up
6.819824 7.019522 7.239189

We can plot the data using:

plot_jagsNEC(out, log = ‘x’)

But you may want to customize the plot using the output. Below is an example of how this can be done.

p0= ggplot()
p0= p0+ geom_point(data =, aes(x = raw.x, y = suc/tot, alpha = 0.8), color = ‘steelblue’, size =$tot/max($tot)*3 , position=position_jitter(width = .01, height = .01))
p0= p0+ geom_line(aes(x = out$pred.vals$x, y = out$pred.vals$y), color = ‘grey30’, size=1)
p0= p0+ geom_ribbon(aes(x = out$pred.vals$x, ymin=out$pred.vals$lw, ymax=out$pred.vals$up), fill=’grey30′, alpha=0.2)
p0= p0+ geom_vline(xintercept = out$NEC[2], col = ‘red’, linetype=1)
p0= p0+ geom_vline(xintercept = out$NEC[1], col = ‘red’, linetype=2)
p0= p0+ geom_vline(xintercept = out$NEC[3], col = ‘red’, linetype=2)
p0= p0+ scale_x_log10()
p0 = p0+ labs(x=expression(XXX~(X~”cm”^{-2})),
p0= p0+ scale_y_continuous( limits = c(0, 1))
p0= p0+ theme_classic()
p0= p0+ theme(legend.position=”none”)

This package is still in beta-testing so you may notice some bugs. If you do, let us know and we may be able to help. Eventually the plan is to move the package to rstan in version 2.0, so look for updates.

If you want more information about the model behind the NEC, please see the description in:

Thomas MC, Flores F, Kaserzon S, Fisher R, Negri AP (2020) Toxicity of ten herbicides to the tropical marine microalgae Rhodomonas salina. Scientific reports 10:1-16.

And finally to cite our package use


To cite package ‘jagsNEC’ in publications use:

Rebecca Fisher, Gerard Ricardo and David Fox (2020). jagsNEC: A Bayesian No Effect Concentration (NEC) package. R package
version 1. R package version 1.0.


Warne M, Batley G, Van Dam R, Chapman J, Fox D, Hickey C, Stauber J (2018) Revised Method for Deriving Australian and New Zealand Water Quality Guideline Values for Toxicants. Prepared for the revision of the Australian and New Zealand Guidelines for Fresh and Marine Water Quality. Australian and New Zealand Governments and Australian state and territory governments, Canberra.

Pires AM, Branco JA, Picado A, Mendonça E (2002) Models for the estimation of a ‘no effect concentration’. Environmetrics: The official journal of the International Environmetrics Society 13:15-27

Fox DR (2010) A Bayesian approach for determining the no effect concentration and hazardous concentration in ecotoxicology. Ecotoxicol Environ Saf 73:123-131

Calculating p-value intervals using bootstrapping

In my previous posts, I showed how we can use simulated data to calculate power and sample size analyses, and estimate distributions around the effect size. But we can also use simulated data to estimate error around p-values. Unless you have been living under a rock, you should know p-values have been getting smashed for the past decade. But while many admit there isn’t anything inherently bad with p-values if they are used and interpreted carefully, it is the abuse and misinterpretation that causes many to argue they should be banned.

Today I was reading Halsey’s The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum?” and they discuss using p-value intervals to estimate the error around the p-value. Using the standard cut-off threshold of 0.05, if a p-value falls under the threshold but the interval is greater than 0.05, then the p-value is likely not reliable. Let’s simulate some binomial data around 0.5 and 0.4, with just 20 individuals per group.

raw.x = c(rep('control', 10), rep('treat', 10))
tot = 20
suc = c(rbinom(10, tot, 0.5), rbinom(10, tot, 0.4))
data1 = data.frame(raw.x, suc, tot = rep(tot, 20))
data1$obs <- factor(formatC(1:nrow(data1), flag="0", width = 3))# unique obs ID
p0 = ggplot()+geom_point(data1, mapping = aes(x = raw.x, y = suc/tot),color = 'steelblue',alpha = 0.5, size = 2, position=position_jitter(width = .01))

I’ve deliberately made it so there is quite a bit of overlap. Let’s fit a basic GLMM and assess the associated p-values.

md3 <- glmer(cbind(suc,(tot - suc)) ~ raw.x+(1|obs),family = binomial (link = logit),data = data1)



You can see the p-value of the slope (raw.x) is 0.0355, just under the 0.05 threshold. But what happens when we resample the data and then refit the model a few hundred times.

sims = 200 #number of simulations
lowerCI = 0.025sims upperCI = 0.975sims
medianCI = 0.5*sims
predFun1<-function(.)(coef(summary(.))[,"Pr(>|z|)"]) ####extract p value function
bb1<-bootMer(md3,FUN=predFun1,nsim=sims, parallel = "multicore", .progress = 'txt') #get bootstrapping, 200 sims df<-apply(bb1$t,2,function(X) X[order(X)]) %>% data.frame()#apply function, just orders the sim intercepts
int = df$X.Intercept. %>% data.frame()
raw.x = df$raw.x %>% data.frame()
blo<-int[lowerCI,] #find the bottom 2.5%
bhi<-int[upperCI,] #find the top 2.5%
c(med, blo, bhi)
blo<-raw.x[lowerCI,] #find the bottom 2.5%
bhi<-raw.x[upperCI,] #find the top 2.5%
c(med, blo, bhi)

The p-intervals for ‘raw.x’ range from <0.001 (highly significant) to 0.692 (highly non-significant). This is a problem, and indicates the p-value wasn’t really reliable. We may have lucked it in the initial analysis.

Finally, let’s plot it up adding a red line through the 0.05 threshold.

p1 = ggplot(raw.x, aes(x=.))+geom_density(aes( color ='red' , fill='red'), alpha=0.3)+
stat_pointintervalh(aes(y = 0.00, x = .),.width = c(.66, .95))
p1 = p1 + scale_y_continuous(name ="Density")
p1 = p1 + scale_x_continuous(name ="p-values")
p1 = p1 + coord_cartesian(ylim = c(0.0, 8))
p1 = p1 +geom_vline(xintercept = 0.05, color = "red", lty = 2)+ theme_light()
p1= p1+ theme(legend.position="none")

You can see the p-value distribution has a long tail. The 95% interval is in light grey and the 66% in black, showing a very unreliable test.

So what do you think? Should we be reporting p-value intervals with all our statistics?

Halsey LG (2019) The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biol Lett 15:20190174

Comparing EC50s between two or more GLMM models

Note: Change of various wordpress themes and R updates may mean the code below might not work without some minor tweaks. Find the full code here (last checked 20/02/23).

In this post I am going to describe how to compare EC50s from two or more GLMM curves. To begin, run the code in my recent post

Two factor dose-response with binomial GLMMs and ECx’s

In the post above we calculated the 95% CI around the ECx using an approximation by multiplying the standard error by 1.96 and adding or subtracting it from the mean ECx. But this makes it difficult to compare the ECx’s except to say that the 95% CI do or not overlap. In this example, we bootstrap the data which creates a distributions around the ECx. From there we can visually present the distributions or compare the distributions statistically.

Let’s first check out our old EC50 of ‘factor a’. The EC50 = 46.12 (95% CI 36.85 – 55.39)

> ec.df1
ecx1   lower1    upper1
(Intercept) 46.11674   36.84638      55.3871

Now let’s bootstrap the EC50. Remember in the previous code I had “ecx1 <- (eta1 – betas1[1])/betas1[2] “, which interpolates the 50% response onto the curve? I will run this 200 times, but resampling the data each time. This will give 200 EC50 values that we can make a distribution out of. I will use ‘lme4’ again, ignoring the initial convergence issue. The 200 EC50s will be stored in ‘bb1$t’. We can also order then and get the bottom 2.5% and 97.5% i.e the 95% CI

#Bootstrapped confidence intervals on ECx values
md1 =glmer(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )
sims = 200 #number of simulations.
lowerCI = 0.025*sims
median.CI = 0.5*sims
upperCI = 0.975*sims
# ecx <- (logit(0.5) – coef(md1)[1])/coef(md1)[2]
predFun2<-function(.)((eta1 – fixef(.)[1])/fixef(.)[2]) #define the function
bb1<-bootMer(md1,FUN=predFun2,nsim=200, parallel = “multicore”, .progress = ‘txt’ ) #get bootstrapping, 200 sims
bb_se1<-apply(bb1$t,2,function(X) X[order(X)]) #apply function
median<-bb_se1[median.CI,] #find the 50%
low<-bb_se1[lowerCI,] #find the bottom 2.5%
up<-bb_se1[upperCI,] #find the top 2.5%

ec.df1.boot = data.frame(median, low, up)

> ec.df1.boot
median   low   up
(Intercept) 45.22589   36.04048    55.48538

You can see they are similar to the approximated confidence intervals above but slightly different. Generally when random effects are involved, bootstrapped confidence intervals are the way to go. Now let’s see the distribution using a density plot



So here is the EC50 distribution for ‘factor a’ that we want to compare factors b and c against. We will need to again relevel the model for the other factors and repeat the code.

#Bootstrapped confidence intervals for factor b and c
data1$factor <- relevel(data1$factor, ref = “b”) #set reference levels for GLMs
md2 <- glmer(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )
bb2<-bootMer(md2,FUN=predFun2,nsim=200, parallel = “multicore”, .progress = ‘txt’ ) #get bootstrapping, 200 sims
data1$factor <- relevel(data1$factor, ref = “c”) #set reference levels for GLMs
md3 <- glmer(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )
bb3<-bootMer(md3,FUN=predFun2,nsim=200, parallel = “multicore”, .progress = ‘txt’ ) #get bootstrapping, 200 sims
all.dist = data.frame(‘a’ = as.vector(bb1$t), ‘b’ = as.vector(bb2$t), ‘c’ = as.vector(bb3$t))
all.dist.long <- gather(all.dist, factor, ecx, a:c, factor_key=TRUE)

p1 = ggplot(all.dist.long, aes(x=ecx))+geom_density(aes(group=factor, color =factor , fill=factor), alpha=0.3)+
stat_pointintervalh(aes(y = 0.00, x = ecx, group=factor),.width = c(.66, .95))+#+facet_wrap(~contrast+time, nrow = 3, ncol = 2)+
p1 = p1+scale_fill_manual( values = c(“steelblue4”, “orange”, ‘red’))+
scale_color_manual( values = c(“steelblue4″,”grey”, “steelblue1″,”steelblue4”, “grey”,”grey”, “grey”,”grey”))+theme(legend.position=”none”)#nice
p1 = p1 + scale_y_continuous(name =”Density”)
p1 = p1 + coord_cartesian(xlim = c(0.0, 50))
p1 = p1 + coord_cartesian(ylim = c(0.0, 0.1))


Starting to look pretty. The blue is ‘factor ‘a’, yellow ‘factor b’, and red ‘factor c’. By looking at the 95% CI on the bottom, the red looks statistically different than the factor a EC50.

Now let’s compare the distributions against ‘factor a’ by subtracting the two distributions.

#Differences posterior
df4.s = data.frame(atob= all.dist$a – all.dist$b, atoc = all.dist$a – all.dist$c)
df4.s.long <- gather(df4.s, factor, diff, atob:atoc, factor_key=TRUE)
p2 = ggplot(df4.s.long, aes(x=diff))+geom_density(aes(group=factor, color =factor , fill=factor), alpha=0.3)+
stat_pointintervalh(aes(y = 0.00, x = diff, group=factor),.width = c(.66, .95))+#+facet_wrap(~contrast+time, nrow = 3, ncol = 2)+
p2 = p2+scale_fill_manual( values = c(“steelblue4”, “orange”, ‘red’))+
scale_color_manual( values = c(“steelblue4″,”grey”, “steelblue1″,”steelblue4”, “grey”,”grey”, “grey”,”grey”))+theme(legend.position=”none”)#nice
p2 = p2 + scale_y_continuous(name =”Density”)
p2 = p2 +geom_vline(xintercept = 0, color = “red”, lty = 2)+ theme_light()
p2 = p2 + coord_cartesian(xlim = c(0.0, 50))
p2 = p2 + coord_cartesian(ylim = c(0.0, 0.1))
# p2 = p2 + scale_x_continuous(name =”Standardized effect size”)
p2 = p2+facet_wrap(~factor)


I’ve added a line through zero, which represents ‘no difference’ i.e if both distribution had the same ECx, the black dot for the median ECx would be on zero. Values less than zero mean that factor b or c are greater than factor a, and vice versa for positive numbers. So the interpretation is that if the 95% interval see by the thin black line overlaps with the red line, the two distributions are not different. Here ‘a’ and ‘b’ are similar, and ‘factor a’ and ‘factor c’ are different.

You can of course get the values for these 95% CI.

atob.diff = sort(df4.s$atob)
median.ab<-atob.diff[median.CI] #find the 50%
low.ab<-atob.diff[lowerCI] #find the bottom 2.5%
up.ab<-atob.diff[upperCI] #find the top 2.5%
ab.diff = data.frame(median.ab, low.ab, up.ab)

> ab.diff
median.ab    low.ab       up.ab
3.940983   -8.379134   16.5668

You may also be interested in making a probability type statement. This probably should be reserved to Bayesian analysis (and yes you can do this exact analysis above using Bayesian model parameter outputs). But in a frequentist sense, we could say, ‘if this experiment was repeated 100 times, the EC50 of factor b would be greater than the EC50 of factor a, X many times‘.

length(which (all.dist$a < all.dist$b))/sims*100

length(which (all.dist$a < all.dist$b))/sims*100
[1] 28

length(which (all.dist$a < all.dist$c))/sims*100

length(which (all.dist$a < all.dist$c))/sims*100


‘if this experiment was repeated 100 times, the EC50 of factor b would be greater than the EC50 of ‘factor a’ 28 of the times’.

‘if this experiment was repeated 100 times, the EC50 of factor c would be greater than the EC50 of ‘factor a’ less than 1 time’.

A comparison of Onset Hobo Light loggers to a wipping multispectral light sensor

This review is obviously not peer-reviewed and represents my opinion only.

Recently I deployed some Onset Hobo Light Loggers ( mounted to a frame next to a top-end multispectral light sensor ( Onset Light loggers are relatively cheap, and log for long periods of time, making them a common tool in the Marine Scientist’s toolbox. However a short-coming of the loggers are that they measure in Lux (a measure of illuminance) rather than irradiance (a measure of energy, which is more relevant to the photobiology of organisms). A second issue is that they foul with algae thereby reducing the light measurement, whereas top-end sensors have wipers and antifoulants to reduce this.

To address the first issue with lux, the sensors are often converted using a standard conversion factor of 0.0185 ( which is appropriate for the loggers above water in direct sunlight. But there is concern that changes in the light spectrum under water with depth and turbidity, may mean these conversion factors are not accurate.

To address the second issue, researchers often only use the first few days of data or clean the sensors intermittently, but how many days until the reading become unreliable?

Below is a 20-day deployment using only daylight hours, with sensor measurements converted to PAR.


A Hobo Onset light logger


A MS8 multispectral light sensor (facing vertically). 

light 1

Multispectral light sensor (black) and Hobo light logger (red) deployed at an inshore turbid reef. The hobo data was converted from lux to PAR using the c.f of 0.0185. The multispec was converted from Watts/nm to photons/nm and then integrated to derive the total PAR. 

Using the standard conversion factor you can see some divergence between the sensors after about day 4, but otherwise the Hobos were surprisingly reliable, given how many factors can affect light intensity.

Put as a proportion (below) and using the same standard conversion factor, you can see the error is pretty reliable for the first 4 days, with a little variation on the 4th day.

light 2

Hobo PAR as a proportion of the MS8 PAR. Values less than 1 indicate underestimation of the light (because of fouling). 

To correct for low but measurable turbidity (0.5 – 0.8 NTU), I found using a slightly higher conversion factor at 0.022736 (r^2 = 0.9847) for the first 4 days produced the best fit. The error during these days remained with +/- 11%.

light 3

I should also mention that the MS8 sensors are not perfect. They have a wiper that stops fouling but only have 8 spectral bands between 400 – 700 nm. This means that there is a bit of course interpolation between each of the bands to derive the PAR. The newer version contains 9 bands and covers the visible spectrum better.

Bottom-line. Calibrated Hobo loggers provide a pretty decent estimation of the downwelling irradiance for the first 3-4 days. For clear-water shallow reefs, a standard conversion factor of 0.0185 would provide a decent estimation of PAR. The amount of fouling could vary between reefs, but given this was deployment was done in one of the most inshore reefs of the GBR, it probably couldn’t be much worse. 



Power/ sample size analysis for dose-response binomial GLMMs

A few tools have come available recently that has aided power analysis of more complicated models. Here I am going to recreate Johnson et al 2015Power analysis for generalized linear mixed models in ecology and evolution for typical survival dose-response data. You can also find their examples attached to their paper.

How it works

More complex models with random effects require simulations to work out their power. Simulation aren’t nearly as complicated as you think and they can tell you masses about your data. The approach Johnson et al take is to i) simulate some data with known trends, ii) analyse the data to find evidence that the null-hypothesis can be rejected  (i.e p-values  < 0.05), iii) repeated the process many times.

The overall percentage of correct rejections is the power. Generally you want at least 80% power (i.e less than 20% false-negatives). Let’s start.

First we need to design our experiment. A typical dose-response design is some continuous predictor that we want to find thresholds for, and we may run these at two levels. An example of this could be a toxic metal at two temperatures. This design will create two ‘curves’. For each combination of the treatments, let’s trial five replicates, and each tank containing 30 larvae.

#Simulate data
x = c(0.1, 0.3, 1, 3, 10, 30, 100)  #proposed treatment levels
vec.x = rep(x, 5)   #number of tank replicates
vec.x <- sort(vec.x)
data1 <- expand.grid(raw.x = vec.x,  factor = c(“a”,”b”))   #two factors ‘a’ and ‘b’
data1$obs <- factor(formatC(1:nrow(data1), flag=”0″, width = 3))# unique tank ID for later on

data1$n = 30   #total number of organisms per tank
str(data1)  #raw.x is my continous factor, ‘factor’ is by categorical factor

We now have a data frame setup, ready for our response data. Now this is the part that requires a little attention. We need to simulate the minimum trends that we think is realistic to detect. In ecotoxicology, a 10% effect from the control is considered reasonable. Obviously there is some pragmatism being used here, as a sample size to pick up <10% may require an unrealistically large sample size (you can test this later).

For the continuous factor ‘raw.x’, I will choose a very slight slope of -0.01 which I can adjust later if there is not a 10% effect between my ‘control dose’ (0.1) and my highest dose (100). For the categorical factor, I will work out the odds ratio of a 10% effect of factor b intercept from factor a. To do this you need to define what is a likely ‘control’ response. In this experiment, I consider a healthy control to have a minimum of 70% survivorship, and a 10% effect from this is 63%. The input parameters are:

raw.x.slope = -0.01
fac.a.control = 0.7
odds.ratio = (63 / 37) / (70 / 30) #factor b (success/fail) / (factor a (success/fail)
obs = 0.1 #random tank error

> odds.ratio
[1] 0.73

If you aren’t confident with these numbers, don’t worry. You will have plenty of opportunity to iteratively fix them when we start plotting. Now we will use the package attached to the Johnson paper to simulate all this together.

data1 <-
sim.glmm( = data1,
fixed.eff = list(
intercept = qlogis(fac.a.control),
factor = log(c(a = 1, b = odds.ratio)),
raw.x = -raw.x.slope),
rand.V = c(obs = obs),
distribution = “binomial”)

Use can see that there is now a ‘response’ column in your data.frame. Let’s plot it.

p0 = ggplot()+geom_point(data1, mapping = aes(x = raw.x, y = response/n),position = position_jitter(width = .02), alpha = 0.50,size = 2)+theme_bw()+facet_wrap(~factor)+scale_x_log10(name =”raw.x”)
p0= p0+ scale_y_continuous( limits = c(0, 1))

My data will look different to yours because we have randomly simulated once


Can you see the weak trends we simulated in the data? There are (apparently) slight negative trends running from left to right across the x-axes, and data set ‘b’ is slightly lower than data set ‘a’. But because we have binomial error (chance) and a random tank effect, these trends are hard to visually-separate the signal from the noise. Let’s analyse the data and see if any significant trends emerge.

fit <- glmer(cbind(response, n – response) ~ raw.x + factor +(1 | obs) , family = binomial, data = data1)
summary(fit) #to show how well it estimates the simulation
coef(summary(fit))[“raw.x”, “Pr(>|z|)”] #p-values for raw.x
coef(summary(fit))[“factorb”, “Pr(>|z|)”] #for factor b

My analysis were actually significant for each factor. Were yours?

Ok, now this time let’s wrap everything into a function, set the ‘seed’ so it returns the same response each time, and plot the analysis.

sim.data1 <- function(…){sim.glmm( = data1,
fixed.eff = list(
intercept = qlogis(fac.a.control),
factor = log(c(a = 1, b = odds.ratio)),
raw.x = raw.x.slope),
rand.V = c(obs = obs),
distribution = “binomial”)}

p0 = ggplot()+geom_point(sim.data1(), mapping = aes(x = raw.x, y = response/n),position = position_jitter(width = .02), alpha = 0.50,size = 2)+theme_bw()+facet_wrap(~factor)+scale_x_log10(name =”dep sed”)
p0= p0+ scale_y_continuous( limits = c(0, 1))

#And check we can still analyse our data witht he function
fit = glmer(cbind(response, n – response) ~ raw.x + factor +(1 | obs) , family = binomial, data = sim.data1())
vec.x = seq(min(data1$raw.x), max(data1$raw.x), length = 100)
df1 <- expand.grid(raw.x = vec.x, factor = levels(data1$factor))
mm <- model.matrix(~raw.x + factor, df1) # build model matrix
eta <- mm %*% fixef(fit)
df1$prediction <- as.vector(exp(eta) / (1 + exp(eta)))
se <- sqrt(diag(mm %*% vcov(fit) %*% t(mm)))
df1$upper <- as.vector(exp(eta + 1.96 *se) /(1 + exp(eta + 1.96 *se)))
df1$lower <- as.vector(exp(eta – 1.96 *se) /(1 + exp(eta – 1.96 *se)))
p0= p0+ geom_line(data = df1, aes(x = raw.x, y = prediction), color = ‘grey30′, size=1)
p0= p0+ geom_ribbon(data = df1, aes(x = raw.x, ymin=lower, ymax=upper,fill=’grey’), alpha=0.2)


Unsurprisingly, this looks quite similar to coral settlement data that I collect (the input parameters have been built around real assay outputs). At this point, you may want run this a few times and refine some of the input parameters. Change the values of the random effect obs to see if it influences the noise, and how much of the error is from (binomial) chance. You might be quite surprised how much noise really is just chance — welcome to binomial data! You can confirm by running this a few times for the control in factor ‘a’.

rbinom(5, 30, 0.7)/30   #binomial error around 0.7 without random tank error

#[1] 0.7666667 0.7666667 0.6000000 0.6666667 0.5666667    #yep, lots of noise

For me, I want to the option of detecting slightly smaller effects of the continuous predictor, so I will change the slope -0.008, and keep the random effect at 0.1.

raw.x.slope = -0.008

obs = 0.1 #random tank error

Finally we will wrap the analysis into a two functions with the outputs being the p-values for each of the coefficients. Note: I have scaled the predictor variable, which can be important to do for multifactor experiments.

sim.mos.pval <- function(…){
fit <- glmer(cbind(response, n – response) ~ scale(raw.x) + factor +(1 | obs) , family = binomial, data = sim.data1())
coef(summary(fit))[“scale(raw.x)”, “Pr(>|z|)”]
} #if this sig, then we have a true positive (which we want to occur 80% of the time – see below).

sim.mos.pval <- function(…){
fit <- glmer(cbind(response, n – response) ~ scale(raw.x) + factor +(1 | obs) , family = binomial, data = sim.data1())
coef(summary(fit))[“factorb”, “Pr(>|z|)”]

Now we can run everything together. So just to recap, we will simulate the response data, analyse it and extract the p-value, and repeat again and again. We will then calculate the proportion of correctly significant p-values. I have only run 20 below for speed, but you should run many more once you have locked in your input parameters.

set.seed(978675) # set seed for random number generator to give repeatable results
sim.pvals <- sapply(1:20, sim.mos.pval) #creates 20 simulations
mean(sim.pvals < 0.05) #what is the prop. of the true positives. Need to be >0.8
binom.test(table(factor(sim.pvals < 0.05, c(T, F))))$ #confidence intervals around this estimate

For the ‘factorb’ function,  my power is 70% (95% CI: 46 – 88).  Johnson recommends increasing the simulation until the 95% CI is within ~2.5% of the mean. If I run it 1000 times I get 75.8% (95% CI: 73.0 – 78). So I have run enough simulation but my power is just short. That leaves me a few options:

  1. Increase my tank replicate by one (may be an overkill since I am so close)
  2. Increase my larvae in each tank (sounds doable)
  3. Decrease my expectations i.e acknowledge I may not pick up subtle effects. I may need to be happy with only a 15% decrease from my control.


My thoughts

What I like about analysis is that the random effects are quite easily incorporated into the simulation. This is great for field data that inherently have lots of random effects, and where we are using null-hypothesis significance testing (NHST) to tease apart many predictor variables.

However in lab studies, we have the benefit of control, and a properly designed experiment will aim to reduce random effects as best as possible. Further, we can move away from NHST and move towards something much more meaningful – effect sizes (sensu Gelman

You can see above that it was a bit clunky to estimate a 10% effect of raw.x. In a previous post, I showed how you could extract standardised effect sizes (EC10s) from similar binomial GLMMs. Using a similar methodology to above we could i) simulate the data, ii) extract EC10s and iii) repeat many times, iv) and compare EC10s until 80% were in 10% of the real value. This is something I need to code up…

But this has been done for nonlinear models where the ECx value is a coefficient in the model and therefore can be easily compared. See this excellant post ( that describes the process.

1. Johnson PC, Barry SJ, Ferguson HM, Müller P. Power analysis for generalized linear mixed models in ecology and evolution. Methods in ecology and evolution. 2015;6(2):133-42.