The code is long, and I suspect at times ugly. If you have a cleaner way to run some of the functions, I would be happy to know. For simplicity I haven’t included all the code below but rather described the steps in the text. The full code can be found here (https://github.com/gerard-ricardo/PAM-RLC-analysis) including some Walz csv files.

Major steps

- Naming csv files. Each RLC data from the PAM needs to be saved in a csv file. 2) Every file needs to be in an allocated folder, and each file labelled with a unique ID, with each label the same length seperated by spaces. I use Bulk Rename Utility to do this. An example may be ‘d001 mille’, ‘d050 mille’, ‘d100 mille’. Here the unique ID is the 4-digit ID, and the second block of letters describes the experiment. The code below will read the first bloack of letters.
- Importing the folder of RLC into R. Note that if you have delimited you read_delim(filename, delim = ‘;’). I have the unique ID saved as ‘disc’.
- Importing the environmental treatment data. Note: As you likely subsampled for the RLC curves, this code uses left_join to merge the two data.frames. You do not need them to match, as long as each has the same unique ID.
- Create an rETR column and data clean for anomalies.
- Get starting values. Run the SSplatt.my function. This is from the modified from the Platt package (they did all the hard work).
- Run the nonlinear models and derive all the RLC parameters and models

Ok Let’s get started.

Step 1) Name the csv files in a format of the same length. The first block of letters should be your ID.

This is what it has inside a file. I have many parameters but you can see I have 3 RLCs in just this file. For the RLCs, the only parameter you need are the PAR and the Y(II) columns.

Step 2) Importing the folder of RLC

```
# Set the working directory
setwd("C:/.../r mil csvs")
# Load necessary libraries
library(tidyr)
library(readr)
library(plyr)
library(dplyr)
# Define a function to read in csv files and add a "disc" column with the filename
read_csv_filename1 <- function(filename){
ret <- read_csv(filename)
ret$disc <- filename
ret
}
# Read in all csv files in the working directory
filenames <- list.files(full.names = TRUE)
data1 <- ldply(filenames, read_csv_filename1)
# Extract the ID from the start of the file name and add it as a new column
data1$disc1 <- data1 %>% separate(disc, c("a", "b", 'c', 'd')) %>% .$b
```

Step 3. Importing the environmental treatment data. You should now have everything in a dataframe, however you will need to correctly label your columns into numeric, factors etc. You will also notice that you are lacking your experimental data, which we will now add in. Note: this does not need to be perfectly aligned or sorted, ‘left_join’ will match you ID’s and remove the rest.

```
# Read in environmental factors data
env.fact <- read.table(file="https://raw.githubusercontent.com/gerard-ricardo/data/master/postset%20treat%20mil",
header = TRUE,
dec = ",",
na.strings = c("",".","NA"))
# Join data1 and env.fact by the disc1 column
data2 <- left_join(data1, env.fact, by = 'disc1')
# Select only necessary columns from data1
data.s <- dplyr::select(data1, c('disc1', 'PAR','dli', 'spec', 'Y(II)1','Y(II)2', 'Y(II)3'))
# Create long format and pivot data
data1.long <- data.s %>% pivot_longer(-c(disc1, PAR ,dli, spec),
names_to = "rep" ,
values_to = "meas")
# Create individual ID for each RLC
data1.long$id <- paste(data1.long$disc1, data1.long$rep, sep = "")
```

Step 4. Create an rETR column and data clean for anomalies. For corals we use rETR instead of ETR which is simply the PAR x Yield

```
# Calculate rETR
data1.long$rETR <- data1.long$PAR * data1.long$meas
# Load ggplot2 library
library(ggplot2)
# source theme
source("https://raw.githubusercontent.com/gerard-ricardo/data/master/theme_sleek1")
# create a scatter plot with ggplot2
p0 <- ggplot() +
geom_point(data = data1.long,
mapping = aes(x = PAR, y = rETR),
size = 1) +
facet_wrap(~id)
# print the plot
print(p0)
```

Most of my curves look okay at first glance but you can see some have some weird anomalies. I’m going to remove those data points as they are likely to be artifacts.

`data1.l = data1.long[which(data1.long$rETR<100),] #remove rTER anomalies`

Looking better now. Okay, now we are getting ready to fit the Marquardt–Levenberg regression algorithm which is the standard equation used for RLC. But before we do that, often non-linear models do not handle zero values well. So I will replace all zeros with a very small value. As a rule of thumb, I use one magnitude lower than the lowest treatment level.

`data1.l$PAR <- ifelse(data1.l$PAR <= 0, 0.1, data1.l$PAR) #`

data1.l$rETR <- ifelse(data1.l$rETR <= 0, 0.01, data1.l$rETR)

Step 5 and 6. Now it is time to run the equations on the data. I have to say, finding a starter function for the curves was far harder than I thought, but the author of the Platt package (https://github.com/SWotherspoon/Platt) has done all the work here, from which I slightly modified the code. But first we might want to run it on just one RLC to make sure it is functioning correctly.

`data1.s = split(data1.l, data1.l$id)`

data1.s$m001Y(II)1

source("https://raw.githubusercontent.com/gerard-ricardo/data/master/ssplattmy") #this is the starter values. Full #credit to the author of the Platt package

library(minpack.lm)

start = unname(getInitial(rETR ~ SSPlatt.my(PAR, alpha, beta, Pmax), data1.s$m001Y(II)1)) #this finds the starting #values

md1 = nlsLM(rETR ~ Pmax**(1-exp(-alpha**PAR/Pmax))**(exp(-beta**PAR/Pmax)), start=list(Pmax=start[3],alpha=start[1], beta=start[2]), data = data1.s$m001Y(II)1) #notice I added the starting values in the equation

df.x <- data.frame(PAR = seq(0.1, 926, length = 100)) #setting up new data frame (df) defining log.x values to run

vec.x =df.x[,1]

plot(data1.s$m001Y(II)1$PAR, data1.s$m001Y(II)1$rETR, col = 'red')

lines(vec.x, predict(md1, df.x)) #looks good for m001Y(II)1

You can see for my first RLC, the curve has fit well. This is a pretty standard RLC, with a sharp increase in rETR with PAR and eventually a small decrease in rETR after a maximum has been reached.

Ok. Easy part done. At this point we need to now do this for multiple RLCs. That means finding starting values for each RLC, and adding it into multiple models. There is quite a bit of data wrangling here which I won’t explain involving removing errors in the functions. Please consult the full code for details.

Find all starting values

`starts = data1.l %>% group_by(id) %>% do(broom::tidy(stats::getInitial(rETR ~ SSPlatt.my(PAR, alpha, beta, Ys), data = . ))) %>%`

pivot_wider(names_from = names, values_from = x, names_prefix = "") %>% dplyr::select (.,-c('NA'))

colnames(starts) <- c("id", "alpha.s", 'beta.s', 'Pmax.s')

library(IDPmisc)

starts = NaRV.omit(starts) #removes inf

Group each RLC and run the model on each.

`test2 = data1.l %>% right_join(.,starts, by = 'id') %>%`

group_by(id) %>%

do(model = try(nlsLM(rETR ~ Pmax**(1-exp(-alpha**PAR/Pmax))**(exp(-beta**PAR/Pmax)),

start = list(Pmax = mean(.$Pmax.s),

alpha = mean(.$alpha.s),

beta = mean(.$beta.s)),

data = .),silent = TRUE) ) #this gets models for all RLCs

We can then predict for each RLC to check the fit. Consult the github code for lots of wrangling.

Most look okay, I have ~6 out of 42 that didn’t fit, which I might want to look into why. Once happy, I can now run a similar set of code which stores the Pmax, alpha and beta parameters in a nested list.

`test = data1.l %>% right_join(.,starts, by = 'id') %>%`

group_by(id) %>%

do(model = try(broom::tidy(nlsLM(rETR ~ Pmax**(1-exp(-alpha*PAR/Pmax))**(exp(-beta**PAR/Pmax)),

start = list(Pmax = mean(.$Pmax.s),

alpha = mean(.$alpha.s),

beta = mean(.$beta.s)),

data = .),silent = TRUE)) ) #this gets parameters for all models

Finally, we can now calculate the other parameters (i.e rETRmax, Ek, and Em)

`unest.test = test %>% unnest(model)`

df.param = dplyr::select(unest.test, c(id, term, estimate))

dat_wide <- df.param %>% pivot_wider(names_from = term, values_from = estimate) %>% dplyr::select(.,-c("NA")) #year goes to columns, their areas go as the values, area is the prefix

dat_wide$ETRm = dat_wide$Pmax**(dat_wide$alpha/(dat_wide$alpha+dat_wide$beta))**((dat_wide$beta/(dat_wide$alpha+dat_wide$beta)))^(dat_wide$beta/dat_wide$alpha)

dat_wide$Ek = dat_wide$ETRm/dat_wide$alpha

dat_wide$Em =(dat_wide$Pmax/dat_wide$alpha)*log((dat_wide$alpha+dat_wide$beta)/dat_wide$beta)

final.df = left_join(dat_wide, data1.long, by = 'id')

From here you can analyse your data as normal using you treatment levels. I hope this saved someone some time, and let me know if there are issues running this code with other data sets.

]]>I put forward that this view has come from a more traditional Ecologist view of experimental design, when papers were dominated by categorical analyses and ANOVAs, and what were okay rules-of-thumb back some time ago, are pretty outdated now. But at the heart of this are differences in philosophical views of the purpose of laboratory experiments. More traditional Ecologists sometimes believe that purpose of running lab experiments is simply to replicate the field (spoiler alert — you can’t), whereas the alternative view is that the main purpose of lab experiments (excluding mechanistic experiments) is to derive **causation**, through the removal of confounding variables, and observing effects that scale with concentration etc. This occasionally means moving into the realm of artificiality in efforts to improve experimental control. Whether you use all treatment levels in ERCs really depends on quite a few things, but most importantly how many treatment levels you can spare. I’d argue that using some treatment levels outside of ERC is actually good experimental design, for some of the reasons I outline below.

**We need thresholds.**

If an effect occurs outside of ERCs i.e. above or below what is environmentally relevant, we still need to know the threshold of an effect for regulation and risk assessment purposes. As Harris et al. 2014 argue (and not just relevant to hazardous substances):

*“Indeed, there will be occasions where researchers have to use significantly higher concentrations in order to properly define a LOEC [Low observed effect concentration] for a substance. The LOEC of a substance is, in fact, far more useful in the regulatory sphere than is a conclusion that no effect occurs at environmentally relevant concentrations, because a LOEC enables the regulators to impose more accurate and meaningful safety limits.”*

**2. We might not know ERCs in other places or in the future**

Take work on climate change – more extreme treatment levels are designed for future and often worse-case scenarios such as RCP 8.5. We don’t actually know what will happen in the future but we rely on modelling to get a decent estimate of likely future ERC. (*Note: there is some discussion whether RCP 8.5 is actually a future ERC now, given more certainty in our carbon trajectory*). It would be a waste of resources that one plans and undertakes an experiment, only to find out that what they thought was an ERC turned out to be only the 50th percentile in another space or time. So the experiment needs to be repeated again.

3. **Interpolation**

There are lots of reasons to design experiments with many treatment levels that allow for regression models, rather than categorical designs. Some authors have been ramming this home for 20+yrs and yet categorical derived thresholds (LOEC/NOEC) are still common place in ecology. I won’t go into all the details here but one benefit of regressions is the power of interpolation, which allows us to derive all the responses between the treatment levels tested, rather than solely the treatments used.

**4. Modelling fit, and being sure the response you see is real.**

Some models improve under a greater response. Take for example the four-parameter logistic equation, a nonlinear model traditionally used for dose-responses. Two parameters in the model are the ‘inflection point’ and the ‘lower asymptote’ of the model. Failing to acquire data near these parameters means their estimations aren’t great, leading to large confidence intervals. We can also make more assumptions about the data if we know its shape, leading to better model selection.

But more than that, we want to know that if we do see a response at the highest ERC, that it is real and not by chance. Seeing a continued response in that extra one-higher treatment level (outside the ERCs) will give you confidence to stand by your data. If however, the response disappears in the one- higher treatment level, you almost certainly have a false-positive. This is a type of Quality Control and should be congratulated.

**5. Uncoupling correlating treatments**

Often two factors or treatments are closely tied to each other and to understand the relative roles of each treatment, the experiment needs to be designed so each treatment can be assessed independent of each other and in various combinations i.e. a fully-crossed design. But many treatment combinations will be by nature unrealistic. Take for example my study area where I investigate the impacts of sediment and low light on organisms, both of these stressors individually cause effects, but both are also correlated. To uncouple and assess the effects on an organism we would need some unrealistic treatment combinations with low sediment and low light, and some with high sediment and high light. If the organisms died in the high sediment/high light treatment but not in the low sediment/low light treatment, then we would know it was the sediment, not the low-light, that caused mortality.

**6. Mechanisms** Sometimes mechanisms are difficult to observe at ERC, so you need to increase the ‘concentration’ to get a clear signal. The caveat for this is that there needs to be a clear response in the range of the ERCs first (or that this response is relatively established in the literature). For example, researchers know elevated temperatures cause coral bleaching well within environmentally relevant levels; it is unequivocal. So Researchers may simply skip to high temperatures to enable a strong response to work on physiological mechanisms such as heat-shock proteins. In my work looking at how spectral light profiles impact coral, an artificial (monochromatic) spectral profile that causes a response may further ‘hone in’ on the active wavelengths that cause a response under a broader spectrum, even though monochromatic light is unrealistic.

############

So what is to stop Researchers using strong responses and significant results outside of ERCs to their advantage such as acceptance into journals? Unfortunately this is common (take early microplastic research as an example) but is relatively easy to fix. Authors should (unless completely unknown) need to provide ERC percentiles with their treatment levels. It should be made very clearly what is likely and what is not.

**Conclusion** **and cautions**

A few treatment levels outside EVC **can**, if chosen correctly, add to quality assurance of experimental design, and make the data more far more useable. However if treatment levels are limited, then there is no point to use non-ERCs, and using them will do more harm than good. This is common in categorical designs where interpolation is not possible. Say we have a two factor, 3 x 3 treatment-level design. If one treatment level is not realistic, then 1/3 of treatment levels are unrealistic. If two treatment levels are unrealistic, then nearly half the experiment is unrealistic. Those using categorical designs really do need to choose their treatment levels very carefully around *in situ* percentiles of field data. However, in treatments/factors with numerous treatment levels, some unrealistically high or low treatments can often help improve the output of the analysis in ways that using solely ERCs can not.

Harris CA, Scott AP, Johnson AC, Panter GH, Sheahan D, Roberts M, Sumpter JP (2014) Principles of sound ecotoxicology. Environ Sci Technol 48:3100-3111

]]>However the main issue with more complicated models is that you can’t really present all combinations of useful inputs that may be relevant for Marine Managers that need to risk assess project. To address this I created a shiny app which I hope will allow easier use of the model.

Using this link https://ricardo-gf.shinyapps.io/Bundle/ you can manipulate the input parameters, changing the size of the bundles to match your species of interest, grain size, or water quality conditions at your local sites.

We also created another equation for the subsequent reduction in egg-sperm encounters that could occur at the water’s surface, which can be found on the second tab.

Ricardo GF, Negri AP, Jones RJ, Stocker R (2016) That sinking feeling: Suspended sediments can prevent the ascent of coral egg bundles. Scientific Reports 6:21567

]]>Here is a condensed list of the hierarchy from the Guidelines.

- NECs
- ECx where x is 10 or less
- BEC10s
- ECx where x is between 10 and 20
- NOECs
- NOECs estimated indirectly

Warne et al goes on to state ‘*Although NECs are not regularly reported, they are considered the preferred measure of toxicity as they are more closely aligned with the objective of GVs [guideline values], that is, to protect aquatic ecosystems, as they are the concentrations that have no adverse effect on species. Reporting of NECs, and their subsequent use in GV derivation, is likely to increase in the future.*‘

So what is a NEC? A NEC is a ‘no effect concentration’ estimate and is probably easiest to explain when compared to other statistical estimates. Below we have a NOEC calculated using a categorical analysis, an EC10 calculated from a logistic curve, and a NEC calculated from our package. The red indicates the statistical estimate or threshold. There are many problems with NOECs that have been thoroughly described, and which I won’t go into, but you can see it greatly underestimates the threshold. EC10s are better but the S-shape nature of the model means that estimates within 10% of the control often have large error, therefore researchers often report EC10s instead of EC0s. NECs on the other-hand use two models to work out where they best converge (segmented regression), leading to a cleaner estimate of the threshold.

As Warne et al states above, few researches use NECs, probably because they are not aware of them, or do not know how to code them. Our package builds on Pires et al (2002) and Fox (2010), introducing a large suite of probability distributions for many data types. And the whole process is simple, as it automatically detects the data-type you wish to model and selects an appropriate model. We have also included more traditional models such as four-parameter logistic curves and functions to extract ECx’s, even from segmented (NEC) models. And there are also functions to compare NECs or ECx’s of two models.

Below is a working example of how to run jagsNEC

*Sys.setenv(“TAR” = “internal”)devtools::install_github(“AIMS/NEC-estimation”) #library(jagsNEC)library(R2jags)binom.data <- read.table(“https://pastebin.com/raw/dUMSAvYi”, header= TRUE,dec=”,”) #bent.4binom.data$raw.x <- as.numeric(as.character(binom.data$raw.x))out <- fit.jagsNEC(data=binom.data,x.var=”raw.x”,y.var=”suc”,trials.var=”tot”)*

Once the model has been selected, you will get a message identifying the distribution used “*Response variable ‘suc’ modeled using a binomial distribution.”*

Next you may want to do some model diagnostics using

*check.chains(out)*

All the output from the model is stored in the object ‘out’. To extract the NEC and EC50, and their 95% credible intervals we can use:

out$NEC

2.5% 50% 97.5%

4.391069 4.553966 4.686907

out$EC50

EC_50 EC_50_lw EC_50_up

6.819824 7.019522 7.239189

We can plot the data using:

plot_jagsNEC(out, log = ‘x’)

But you may want to customize the plot using the output. Below is an example of how this can be done.

*library(ggplot2)p0= ggplot()p0= p0+ geom_point(data = binom.data, aes(x = raw.x, y = suc/tot, alpha = 0.8), color = ‘steelblue’, size = binom.data$tot/max(binom.data$tot)*3 , position=position_jitter(width = .01, height = .01))p0= p0+ geom_line(aes(x = out$pred.vals$x, y = out$pred.vals$y), color = ‘grey30’, size=1)p0= p0+ geom_ribbon(aes(x = out$pred.vals$x, ymin=out$pred.vals$lw, ymax=out$pred.vals$up), fill=’grey30′, alpha=0.2)p0= p0+ geom_vline(xintercept = out$NEC[2], col = ‘red’, linetype=1)p0= p0+ geom_vline(xintercept = out$NEC[1], col = ‘red’, linetype=2)p0= p0+ geom_vline(xintercept = out$NEC[3], col = ‘red’, linetype=2)p0= p0+ scale_x_log10()p0 = p0+ labs(x=expression(XXX~(X~”cm”^{-2})),y=expression(XXX~(prop.)))p0= p0+ scale_y_continuous( limits = c(0, 1))p0= p0+ theme_classic()p0= p0+ theme(legend.position=”none”)p0*

This package is still in beta-testing so you may notice some bugs. If you do, let us know and we may be able to help. Eventually the plan is to move the package to rstan in version 2.0, so look for updates.

If you want more information about the model behind the NEC, please see the description in:

*Thomas MC, Flores F, Kaserzon S, Fisher R, Negri AP (2020) Toxicity of ten herbicides to the tropical marine microalgae Rhodomonas salina. Scientific reports 10:1-16.*

And finally to cite our package use

citation(‘jagsNEC’)

*To cite package ‘jagsNEC’ in publications use:*

*Rebecca Fisher, Gerard Ricardo and David Fox (2020). jagsNEC: A Bayesian No Effect Concentration (NEC) package. R packageversion 1. https://github.com/AIMS/NEC-estimation. R package version 1.0.*

**References**

*Warne M, Batley G, Van Dam R, Chapman J, Fox D, Hickey C, Stauber J (2018) Revised Method for Deriving Australian and New Zealand Water Quality Guideline Values for Toxicants. Prepared for the revision of the Australian and New Zealand Guidelines for Fresh and Marine Water Quality. Australian and New Zealand Governments and Australian state and territory governments, Canberra. *

*Pires AM, Branco JA, Picado A, Mendonça E (2002) Models for the estimation of a ‘no effect concentration’. Environmetrics: The official journal of the International Environmetrics Society 13:15-27*

*Fox DR (2010) A Bayesian approach for determining the no effect concentration and hazardous concentration in ecotoxicology. Ecotoxicol Environ Saf 73:123-131*

Today I was reading Halsey’s *The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum?*” and they discuss using p-value intervals to estimate the error around the p-value. Using the standard cut-off threshold of 0.05, if a p-value falls under the threshold but the interval is greater than 0.05, then the p-value is likely not reliable. Let’s simulate some binomial data around 0.5 and 0.4, with just 20 individuals per group.

`raw.x = c(rep('control', 10), rep('treat', 10))`

tot = 20

set.seed(123)

suc = c(rbinom(10, tot, 0.5), rbinom(10, tot, 0.4))

data1 = data.frame(raw.x, suc, tot = rep(tot, 20))

data1$obs <- factor(formatC(1:nrow(data1), flag="0", width = 3))# unique obs ID

data1

library(ggplot2)

p0 = ggplot()+geom_point(data1, mapping = aes(x = raw.x, y = suc/tot),color = 'steelblue',alpha = 0.5, size = 2, position=position_jitter(width = .01))

p0

I’ve deliberately made it so there is quite a bit of overlap. Let’s fit a basic GLMM and assess the associated p-values.

`library(lme4)`

md3 <- glmer(cbind(suc,(tot - suc)) ~ raw.x+(1|obs),family = binomial (link = logit),data = data1)

`summary(md3)`

`coef(summary(md3))[,"Pr(>|z|)"]`

You can see the p-value of the slope (raw.x) is 0.0355, just under the 0.05 threshold. But what happens when we resample the data and then refit the model a few hundred times.

`sims = 200 #number of simulations`

lowerCI = 0.025sims upperCI = 0.975sims

medianCI = 0.5*sims

predFun1<-function(.)(coef(summary(.))[,"Pr(>|z|)"]) ####extract p value function

bb1<-bootMer(md3,FUN=predFun1,nsim=sims, parallel = "multicore", .progress = 'txt') #get bootstrapping, 200 sims df<-apply(bb1$t,2,function(X) X[order(X)]) %>% data.frame()#apply function, just orders the sim intercepts

int = df$X.Intercept. %>% data.frame()

raw.x = df$raw.x %>% data.frame()

blo<-int[lowerCI,] #find the bottom 2.5%

bhi<-int[upperCI,] #find the top 2.5%

med<-int[medianCI,]

c(med, blo, bhi)

blo<-raw.x[lowerCI,] #find the bottom 2.5%

bhi<-raw.x[upperCI,] #find the top 2.5%

med<-raw.x[medianCI,]

c(med, blo, bhi)

The p-intervals for ‘raw.x’ range from <0.001 (highly significant) to 0.692 (highly non-significant). This is a problem, and indicates the p-value wasn’t really reliable. We may have lucked it in the initial analysis.

Finally, let’s plot it up adding a red line through the 0.05 threshold.

`library(ggplot2)`

library(tidybayes)

p1 = ggplot(raw.x, aes(x=.))+geom_density(aes( color ='red' , fill='red'), alpha=0.3)+

stat_pointintervalh(aes(y = 0.00, x = .),.width = c(.66, .95))

p1 = p1 + scale_y_continuous(name ="Density")

p1 = p1 + scale_x_continuous(name ="p-values")

p1 = p1 + coord_cartesian(ylim = c(0.0, 8))

p1 = p1 +geom_vline(xintercept = 0.05, color = "red", lty = 2)+ theme_light()

p1= p1+ theme(legend.position="none")

p1

You can see the p-value distribution has a long tail. The 95% interval is in light grey and the 66% in black, showing a very unreliable test.

So what do you think? Should we be reporting p-value intervals with all our statistics?

*Halsey LG (2019) The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biol Lett 15:20190174*

In the post above we calculated the 95% CI around the ECx using an approximation by multiplying the standard error by 1.96 and adding or subtracting it from the mean ECx. But this makes it difficult to compare the ECx’s except to say that the 95% CI do or not overlap. In this example, we bootstrap the data which creates a distributions around the ECx. From there we can visually present the distributions or compare the distributions statistically.

Let’s first check out our old EC50 of ‘factor a’. The EC50 = 46.12 (95% CI 36.85 – 55.39)

> ec.df1

ecx1 lower1 upper1

(Intercept) 46.11674 36.84638 55.3871

Now let’s bootstrap the EC50. Remember in the previous code I had “ecx1 <- (eta1 – betas1[1])/betas1[2] “, which interpolates the 50% response onto the curve? I will run this 200 times, but resampling the data each time. This will give 200 EC50 values that we can make a distribution out of. I will use ‘lme4’ again, ignoring the initial convergence issue. The 200 EC50s will be stored in ‘bb1$t’. We can also order then and get the bottom 2.5% and 97.5% i.e the 95% CI

#Bootstrapped confidence intervals on ECx values

library(lme4)

md1 =glmer(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )

sims = 200 #number of simulations.

lowerCI = 0.025*sims

median.CI = 0.5*sims

upperCI = 0.975*sims

# ecx <- (logit(0.5) – coef(md1)[1])/coef(md1)[2]

predFun2<-function(.)((eta1 – fixef(.)[1])/fixef(.)[2]) #define the function

bb1<-bootMer(md1,FUN=predFun2,nsim=200, parallel = “multicore”, .progress = ‘txt’ ) #get bootstrapping, 200 sims

bb_se1<-apply(bb1$t,2,function(X) X[order(X)]) #apply function

median<-bb_se1[median.CI,] #find the 50%

low<-bb_se1[lowerCI,] #find the bottom 2.5%

up<-bb_se1[upperCI,] #find the top 2.5%ec.df1.boot = data.frame(median, low, up)

> ec.df1.boot

median low up

(Intercept) 45.22589 36.04048 55.48538

You can see they are similar to the approximated confidence intervals above but slightly different. Generally when random effects are involved, bootstrapped confidence intervals are the way to go. Now let’s see the distribution using a density plot

plot(density(bb1$t))

So here is the EC50 distribution for ‘factor a’ that we want to compare factors b and c against. We will need to again relevel the model for the other factors and repeat the code.

#Bootstrapped confidence intervals for factor b and c

library(lme4)

data1$factor <- relevel(data1$factor, ref = “b”) #set reference levels for GLMs

md2 <- glmer(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )

bb2<-bootMer(md2,FUN=predFun2,nsim=200, parallel = “multicore”, .progress = ‘txt’ ) #get bootstrapping, 200 sims

data1$factor <- relevel(data1$factor, ref = “c”) #set reference levels for GLMs

md3 <- glmer(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )

bb3<-bootMer(md3,FUN=predFun2,nsim=200, parallel = “multicore”, .progress = ‘txt’ ) #get bootstrapping, 200 sims

all.dist = data.frame(‘a’ = as.vector(bb1$t), ‘b’ = as.vector(bb2$t), ‘c’ = as.vector(bb3$t))

all.dist.long <- gather(all.dist, factor, ecx, a:c, factor_key=TRUE)#Plotting

library(ggplot2)

library(tidybayes)

p1 = ggplot(all.dist.long, aes(x=ecx))+geom_density(aes(group=factor, color =factor , fill=factor), alpha=0.3)+

stat_pointintervalh(aes(y = 0.00, x = ecx, group=factor),.width = c(.66, .95))+#+facet_wrap(~contrast+time, nrow = 3, ncol = 2)+

theme_light()

p1 = p1+scale_fill_manual( values = c(“steelblue4”, “orange”, ‘red’))+

scale_color_manual( values = c(“steelblue4″,”grey”, “steelblue1″,”steelblue4”, “grey”,”grey”, “grey”,”grey”))+theme(legend.position=”none”)#nice

p1 = p1 + scale_y_continuous(name =”Density”)

p1 = p1 + coord_cartesian(xlim = c(0.0, 50))

p1 = p1 + coord_cartesian(ylim = c(0.0, 0.1))

p1

Starting to look pretty. The blue is ‘factor ‘a’, yellow ‘factor b’, and red ‘factor c’. By looking at the 95% CI on the bottom, the red looks statistically different than the factor a EC50.

Now let’s compare the distributions against ‘factor a’ by subtracting the two distributions.

#Differences posterior

df4.s = data.frame(atob= all.dist$a – all.dist$b, atoc = all.dist$a – all.dist$c)

df4.s.long <- gather(df4.s, factor, diff, atob:atoc, factor_key=TRUE)

plot(density(df4.s$atob))

p2 = ggplot(df4.s.long, aes(x=diff))+geom_density(aes(group=factor, color =factor , fill=factor), alpha=0.3)+

stat_pointintervalh(aes(y = 0.00, x = diff, group=factor),.width = c(.66, .95))+#+facet_wrap(~contrast+time, nrow = 3, ncol = 2)+

theme_light()

p2 = p2+scale_fill_manual( values = c(“steelblue4”, “orange”, ‘red’))+

scale_color_manual( values = c(“steelblue4″,”grey”, “steelblue1″,”steelblue4”, “grey”,”grey”, “grey”,”grey”))+theme(legend.position=”none”)#nice

p2 = p2 + scale_y_continuous(name =”Density”)

p2 = p2 +geom_vline(xintercept = 0, color = “red”, lty = 2)+ theme_light()

p2 = p2 + coord_cartesian(xlim = c(0.0, 50))

p2 = p2 + coord_cartesian(ylim = c(0.0, 0.1))

# p2 = p2 + scale_x_continuous(name =”Standardized effect size”)

p2 = p2+facet_wrap(~factor)

p2

I’ve added a line through zero, which represents ‘no difference’ i.e if both distribution had the same ECx, the black dot for the median ECx would be on zero. Values less than zero mean that factor b or c are greater than factor a, and vice versa for positive numbers. So the interpretation is that if the 95% interval see by the thin black line overlaps with the red line, the two distributions are not different. Here ‘a’ and ‘b’ are similar, and ‘factor a’ and ‘factor c’ are different.

You can of course get the values for these 95% CI.

atob.diff = sort(df4.s$atob)

nrow(df4.s)

atob.diff[median.CI]

median.ab<-atob.diff[median.CI] #find the 50%

low.ab<-atob.diff[lowerCI] #find the bottom 2.5%

up.ab<-atob.diff[upperCI] #find the top 2.5%

ab.diff = data.frame(median.ab, low.ab, up.ab)

ab.diff> ab.diff

median.ab low.ab up.ab

3.940983 -8.379134 16.5668

You may also be interested in making a probability type statement. This probably should be reserved to Bayesian analysis (and yes you can do this exact analysis above using Bayesian model parameter outputs). But in a frequentist sense, we could say, ‘*if this experiment was repeated 100 times, the EC50 of factor b would be greater than the EC50 of factor a, X many times*‘.

length(which (all.dist$a < all.dist$b))/sims*100

length(which (all.dist$a < all.dist$b))/sims*100

[1] 28length(which (all.dist$a < all.dist$c))/sims*100

length(which (all.dist$a < all.dist$c))/sims*100

[1]0.5

*‘if this experiment was repeated 100 times, the EC50 of factor b would be greater than the EC50 of ‘factor a’ 28 of the times’.*

*‘if this experiment was repeated 100 times, the EC50 of factor c would be greater than the EC50 of ‘factor a’ less than 1 time’.*

**How it works**

More complex models with random effects require simulations to work out their power. Simulation aren’t nearly as complicated as you think and they can tell you masses about your data. The approach *Johnson et al* take is to i) simulate some data with known trends, ii) analyse the data to find evidence that the null-hypothesis can be rejected (i.e p-values < 0.05), iii) repeated the process many times.

The overall percentage of correct rejections is the power. Generally you want at least 80% power (i.e less than 20% false-negatives). Let’s start.

First we need to design our experiment. A typical dose-response design is some continuous predictor that we want to find thresholds for, and we may run these at two levels. An example of this could be a toxic metal at two temperatures. This design will create two ‘curves’. For each combination of the treatments, let’s trial five replicates, and each tank containing 30 larvae.

#Simulate data

x = c(0.1, 0.3, 1, 3, 10, 30, 100) #proposed treatment levels

vec.x = rep(x, 5) #number of tank replicates

vec.x <- sort(vec.x)

data1 <- expand.grid(raw.x = vec.x, factor = c(“a”,”b”)) #two factors ‘a’ and ‘b’

data1$obs <- factor(formatC(1:nrow(data1), flag=”0″, width = 3))# unique tank ID for later ondata1$n = 30 #total number of organisms per tank

str(data1) #raw.x is my continous factor, ‘factor’ is by categorical factor

We now have a data frame setup, ready for our *response* data. Now this is the part that requires a little attention. We need to simulate the *minimum* trends that we think is realistic to detect. In ecotoxicology, a 10% effect from the control is considered reasonable. Obviously there is some pragmatism being used here, as a sample size to pick up <10% may require an unrealistically large sample size (you can test this later).

For the continuous factor ‘raw.x’, I will choose a very slight slope of -0.01 which I can adjust later if there is not a 10% effect between my ‘control dose’ (0.1) and my highest dose (100). For the categorical factor, I will work out the odds ratio of a 10% effect of factor b intercept from factor a. To do this you need to define what is a likely ‘control’ response. In this experiment, I consider a healthy control to have a minimum of 70% survivorship, and a 10% effect from this is 63%. The input parameters are:

raw.x.slope = -0.01

fac.a.control = 0.7

odds.ratio = (63 / 37) / (70 / 30) #factor b (success/fail) / (factor a (success/fail)

odds.ratio

obs = 0.1 #random tank error> odds.ratio

[1] 0.73

If you aren’t confident with these numbers, don’t worry. You will have plenty of opportunity to iteratively fix them when we start plotting. Now we will use the package attached to the Johnson paper to simulate all this together.

library(GLMMmisc)

data1 <-

sim.glmm(design.data = data1,

fixed.eff = list(

intercept = qlogis(fac.a.control),

factor = log(c(a = 1, b = odds.ratio)),

raw.x = -raw.x.slope),

rand.V = c(obs = obs),

distribution = “binomial”)

Use can see that there is now a ‘response’ column in your data.frame. Let’s plot it.

library(ggplot2)

p0 = ggplot()+geom_point(data1, mapping = aes(x = raw.x, y = response/n),position = position_jitter(width = .02), alpha = 0.50,size = 2)+theme_bw()+facet_wrap(~factor)+scale_x_log10(name =”raw.x”)

p0= p0+ scale_y_continuous( limits = c(0, 1))

p0

My data will look different to yours because we have randomly simulated once

Can you see the weak trends we simulated in the data? There are (apparently) slight negative trends running from left to right across the x-axes, and data set ‘b’ is slightly lower than data set ‘a’. But because we have binomial error (chance) and a random tank effect, these trends are hard to visually-separate the *signal from the noise. *Let’s analyse the data and see if any significant trends emerge.

library(lme4)

fit <- glmer(cbind(response, n – response) ~ raw.x + factor +(1 | obs) , family = binomial, data = data1)

summary(fit) #to show how well it estimates the simulation

coef(summary(fit))[“raw.x”, “Pr(>|z|)”] #p-values for raw.x

coef(summary(fit))[“factorb”, “Pr(>|z|)”] #for factor b

My analysis were actually significant for each factor. Were yours?

Ok, now this time let’s wrap everything into a function, set the ‘seed’ so it returns the same response each time, and plot the analysis.

sim.data1 <- function(…){sim.glmm(design.data = data1,

fixed.eff = list(

intercept = qlogis(fac.a.control),

factor = log(c(a = 1, b = odds.ratio)),

raw.x = raw.x.slope),

rand.V = c(obs = obs),

distribution = “binomial”)}

set.seed(978675)

sim.data1()library(ggplot2)

p0 = ggplot()+geom_point(sim.data1(), mapping = aes(x = raw.x, y = response/n),position = position_jitter(width = .02), alpha = 0.50,size = 2)+theme_bw()+facet_wrap(~factor)+scale_x_log10(name =”dep sed”)

p0= p0+ scale_y_continuous( limits = c(0, 1))

p0#And check we can still analyse our data witht he function

fit = glmer(cbind(response, n – response) ~ raw.x + factor +(1 | obs) , family = binomial, data = sim.data1())

summary(fit)

vec.x = seq(min(data1$raw.x), max(data1$raw.x), length = 100)

df1 <- expand.grid(raw.x = vec.x, factor = levels(data1$factor))

mm <- model.matrix(~raw.x + factor, df1) # build model matrix

eta <- mm %*% fixef(fit)

df1$prediction <- as.vector(exp(eta) / (1 + exp(eta)))

se <- sqrt(diag(mm %*% vcov(fit) %*% t(mm)))

df1$upper <- as.vector(exp(eta + 1.96 *se) /(1 + exp(eta + 1.96 *se)))

df1$lower <- as.vector(exp(eta – 1.96 *se) /(1 + exp(eta – 1.96 *se)))

p0= p0+ geom_line(data = df1, aes(x = raw.x, y = prediction), color = ‘grey30′, size=1)

p0= p0+ geom_ribbon(data = df1, aes(x = raw.x, ymin=lower, ymax=upper,fill=’grey’), alpha=0.2)

p0

Unsurprisingly, this looks quite similar to coral settlement data that I collect (the input parameters have been built around real assay outputs). At this point, you may want run this a few times and refine some of the input parameters. Change the values of the random effect *obs* to see if it influences the noise, and how much of the error is from (binomial) chance. You might be quite surprised how much noise really is just chance — welcome to binomial data! You can confirm by running this a few times for the control in factor ‘a’.

rbinom(5, 30, 0.7)/30 #binomial error around 0.7 without random tank error

#[1] 0.7666667 0.7666667 0.6000000 0.6666667 0.5666667 #yep, lots of noise

For me, I want to the option of detecting slightly smaller effects of the continuous predictor, so I will change the slope -0.008, and keep the random effect at 0.1.

raw.x.slope = -0.008

obs = 0.1 #random tank error

Finally we will wrap the analysis into a two functions with the outputs being the p-values for each of the coefficients. Note: I have scaled the predictor variable, which can be important to do for multifactor experiments.

sim.mos.pval <- function(…){

fit <- glmer(cbind(response, n – response) ~ scale(raw.x) + factor +(1 | obs) , family = binomial, data = sim.data1())

coef(summary(fit))[“scale(raw.x)”, “Pr(>|z|)”]

} #if this sig, then we have a true positive (which we want to occur 80% of the time – see below).sim.mos.pval <- function(…){

fit <- glmer(cbind(response, n – response) ~ scale(raw.x) + factor +(1 | obs) , family = binomial, data = sim.data1())

coef(summary(fit))[“factorb”, “Pr(>|z|)”]

}

Now we can run everything together. So just to recap, we will simulate the response data, analyse it and extract the p-value, and repeat again and again. We will then calculate the proportion of correctly significant p-values. I have only run 20 below for speed, but you should run many more once you have locked in your input parameters.

set.seed(978675) # set seed for random number generator to give repeatable results

sim.mos.pval()

sim.pvals <- sapply(1:20, sim.mos.pval) #creates 20 simulations

mean(sim.pvals < 0.05) #what is the prop. of the true positives. Need to be >0.8

binom.test(table(factor(sim.pvals < 0.05, c(T, F))))$conf.int #confidence intervals around this estimate

For the ‘factorb’ function, my power is 70% (95% CI: 46 – 88). Johnson recommends increasing the simulation until the 95% CI is within ~2.5% of the mean. If I run it 1000 times I get 75.8% (95% CI: 73.0 – 78). So I have run enough simulation but my power is just short. That leaves me a few options:

- Increase my tank replicate by one (may be an overkill since I am so close)
- Increase my larvae in each tank (sounds doable)
- Decrease my expectations i.e acknowledge I may not pick up subtle effects. I may need to be happy with only a 15% decrease from my control.

**My thoughts**

What I like about analysis is that the random effects are quite easily incorporated into the simulation. This is great for field data that inherently have lots of random effects, and where we are using null-hypothesis significance testing (NHST) to tease apart many predictor variables.

However in lab studies, we have the benefit of control, and a properly designed experiment will aim to reduce random effects as best as possible. Further, we can move away from NHST and move towards something much more meaningful – effect sizes (sensu Gelman http://www.stat.columbia.edu/~gelman/research/published/retropower20.pdf).

You can see above that it was a bit clunky to estimate a 10% effect of raw.x. In a previous post, I showed how you could extract standardised effect sizes (EC10s) from similar binomial GLMMs. Using a similar methodology to above we could i) simulate the data, ii) extract EC10s and iii) repeat many times, iv) and compare EC10s until 80% were in 10% of the real value. This is something I need to code up…

But this has been done for nonlinear models where the ECx value is a coefficient in the model and therefore can be easily compared. See this excellant post (http://edild.github.io/lc50_bias_sim/) that describes the process.

*1. Johnson PC, Barry SJ, Ferguson HM, Müller P. Power analysis for generalized linear mixed models in ecology and evolution. Methods in ecology and evolution. 2015;6(2):133-42.*

Some animations using gganimate.

]]>

The easiest way to use this code is to reformat your headings in your data sheet to match the general heading I use here, therefore no code needs to be altered.

Note: This code does not go deep into data exploration, model selection, or model validation to keep things simple. I am going to assume a random ‘tank’ effect (obs), which occurs in almost all experiments.

- Import some data and label type. Your data should be organised in this type of long form

data1 <- read.table(file=”https://raw.githubusercontent.com/gerard-ricardo/data/master/mockmulticb”, header= TRUE,dec=”,”, na.strings=c(“”,”.”,”NA”))

head(data1)

options(scipen = 999) # turn off scientific notation#2)Organising and wrangling

str(data1) #check data type is correct

data1$raw.x <- as.numeric(as.character(data1$raw.x))

# data1$counts <- as.numeric(as.character(data1$counts))

data1$suc <- as.integer(as.character(data1$suc))

data1$tot <- as.integer(as.character(data1$tot))

data1$prop <- data1$suc/data1$tot

data1$obs <- factor(formatC(1:nrow(data1), flag=”0″, width = 3))# unique tank ID for later on

nrow(data1)

str(data1)

data1 = data1[complete.cases(data1), ] #make sure import matches NA type

Okay let’s do some very basic data exploration

table(data1$tot)

> table(data1$tot)

10 12 15 16 17 18 19 20 21 22 23 24 25 26 27 29 30 31 33 34 35 39 40

1 1 3 5 5 5 5 3 5 3 6 4 3 3 4 4 3 3 2 1 1 1 1

This isn’t great, it means each tank has a different number of animals, and predictions for tanks with very few animals will be very coarse. GLMs can weight this unbalance a bit so let’s leave in for the moment. We can see this in the data my making the size of each tank relative to the number of animals in it.

library(ggplot2)

p0 = ggplot()+geom_point(data1, mapping = aes(x = raw.x, y = prop),position = position_jitter(width = .02), alpha = 0.50,size = data1$tot*0.2)+theme_light()

p0 = p0 + facet_wrap(~factor)+scale_x_log10(name =”raw.x”)

p0

Okidok, at this stage you may want to launch into data exploration and consider some of the following

- is the control healthy/ or representative of field data health?
- how does the control relate to the first treatment?
- are data evenly spaced across the logx scale?
- is the effect linear or non-linear?
- is there any non-monotonic (bell shape) effects?
- are there any outliers?

This data set has a number of issues, but lets push on ahead anyway.

#4) Fit the model

library(lme4)

library(splines)

md3 <- glmer(cbind(suc,(tot – suc)) ~ scale(raw.x) * factor + (1|obs) ,family = binomial (link = logit),data = data1)

summary(md3)

library(RVAideMemoire) #GLMM overdispersion test

overdisp.glmer(md3) #Overdispersion for GLMM

#is there an interactive effect?

#is it overdispersed?

At this stage, many would stop and interpret. They would talk about the interactive effect and for significant effects say something like ‘for every unit increase of X, there is a decrease in response of blah blah blah. Neither statement is super useful for Regulators that actually need to make decisions. Why? Because p-values in lab assays often just reflect the experimental design. Experiments that use high concentrations and lots of replicates will likely find significant effects, but these may not be biologically meaningful. And by placing the interpretation on the treatment (which is to be managed), rather than response makes the metric much more usuable. So let’s take an effect-size approach by looking at the magnitude of change of the response. These metrics (ECx) can also be used in other models such as SSDs and meta-analyses. But first let’s predict and plot up the models. Here I’m going to switch to glmmTMB because of convergence issues.

library(glmmTMB)

md1 =glmmTMB(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )

summary(md1)

vec.x = seq(min(data1$raw.x), max(data1$raw.x), length = 100)

df.x <- expand.grid(raw.x = vec.x,

factor = levels(data1$factor))

#make sure names are the same as model

mm <- model.matrix(~raw.x*factor, df.x) # build model matrix to sub the parameters into the equation

eta <- mm %*% fixef(md1)$cond #sub in parameters to get mean on logit scale

df.x$prediction <- as.vector(exp(eta) / (1 + exp(eta))) #back-transform mean from logit scale

se <- sqrt(diag(mm %*% vcov(md1)$cond %*% t(mm)))

df.x$upper <- exp(eta + 1.96 *se) /(1 + exp(eta + 1.96 *se)) #work out upper 95% CI

df.x$lower <- exp(eta – 1.96 *se) /(1 + exp(eta – 1.96 *se)) #work out lower 95% CI

data1$factor = factor(data1$factor,levels = c(“a”, “b”, “c”)) #Set levels in order for model

library(ggplot2)

p0= ggplot()

p0= p0+ geom_point(data = data1, aes(x = raw.x, y = prop,alpha = 0.1), color = ‘steelblue’, size = data1$tot*0.1, position=position_jitter(width = .01))

p0= p0+ geom_line(data = df.x, aes(x = raw.x, y = prediction), color = ‘grey30′, size=1)

p0= p0+ geom_ribbon(data = df.x, aes(x = raw.x, ymin=lower, ymax=upper,fill=’grey’), alpha=0.2)

p0= p0+ scale_x_log10(limits = c(0.9, 100))

p0 = p0+ labs(x=expression(Deposited~sediment~(mg~”cm”^{-2})),

y=expression(Recruit~survival~(prop.)))

p0= p0+ scale_y_continuous( limits = c(-0.05, 1.01))

p0= p0+ theme_light()

p0= p0+ scale_fill_manual( values = c(“grey”,”khaki2″))

p0= p0+ theme(legend.position=”none”)

p0= p0+ facet_wrap(~factor, nrow = 1)

p0

You can see that for each factor, there are some thresholds occurring. For the first level of the categorical factor (the left plot) the curve starts to bend steeply around 10 units. For the third plot, it occurs < 10 units, although the confidence bands are still wide in this region.

We want to predict what levels of the continuous stressor (on the x-axis) causes a 50% effect i.e how much of the stressor will impact 50% of the population. We can do this by interpolating onto the curve. I have used Vernables* dose.p function to do this. In this example I want to compare every EC50 against ‘factor a”s EC50 (represented in the red line).

Yep the code below looks intense but let me explain what is fundamentally going on here.

- We say what type of effect we want to see (here 50% is ec = 50).
- Work out the top of each model at the control
- Find 50% from that.
- Get everything on the logit scale ready for interpolation
- Interpolate for the mean for each curve. Note the model needs to be releved to get the coefficients for the second curve.
- Use some variance-co-variance to get the standard errors
- Use the old Wald approximation to approximate the 95% confidence intervals
- Wrap everything into a data.frame
- This needs to be done for each curve, so I ‘relevel’ for each factor.

#6) Getting ECx’s

ec = 50 #put in your ecx here

library(dplyr)

group.fac <-df.x %>% group_by(factor)%>%summarise(estimate = max(prediction))%>%as.data.frame()

top1 = group.fac$estimate[1] #the modelled control of factor 1

inhib1 = top1 -((ec/100) * top1) #an x% decrease from the control for factor 1

library(VGAM)

eta1 <- logit(inhib1)

data1$factor <- relevel(data1$factor, ref = “b”) #set reference levels for GLMs

md2 <- glmmTMB(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )

data1$factor <- relevel(data1$factor, ref = “c”) #set reference levels for GLMs

md3 <- glmmTMB(cbind(suc,(tot-suc))~raw.x*factor+(1|obs) ,data1,family=’binomial’ )

data1$factor = factor(data1$factor,levels = c(“a”, “b”, “c”)) #Set levels in order for model

betas1 = fixef(md1)$cond[1:2] #intercept and slope for ref 1

betas2 = fixef(md2)$cond[1:2] #intercept and slope for ref 2

betas3 = fixef(md3)$cond[1:2]

ecx1 <- (eta1 – betas1[1])/betas1[2]

ecx2 <- (eta1 – betas2[1])/betas2[2]

ecx3 <- (eta1 – betas3[1])/betas3[2]

pd1 <- -cbind(1, ecx1)/betas1[2]

pd2 <- -cbind(1, ecx2)/betas2[2]

pd3 <- -cbind(1, ecx3)/betas3[2]

ff1 = as.matrix(vcov(md1)$cond[1:2,1:2])

ff2 = as.matrix(vcov(md2)$cond[1:2,1:2])

ff3 = as.matrix(vcov(md3)$cond[1:2,1:2])

ec.se1 <- sqrt(((pd1 %*% ff1 )* pd1) %*% c(1, 1))

ec.se2 <- sqrt(((pd2 %*% ff2 )* pd2) %*% c(1, 1))

ec.se3 <- sqrt(((pd3 %*% ff3 )* pd3) %*% c(1, 1))

upper1 = (ecx1+ec.se1*1.96)

lower1 = (ecx1-ec.se1*1.96)

upper2 = (ecx2+ec.se2*1.96)

lower2 = (ecx2-ec.se2*1.96)

upper3 = (ecx3+ec.se2*1.96)

lower3 = (ecx3-ec.se2*1.96)

ec.df1 = data.frame(ecx1, lower1, upper1)

ec.df2 = data.frame(ecx2, lower2, upper2)

ec.df3 = data.frame(ecx3, lower3, upper3)

ecall = cbind(ec.df1, ec.df2, ec.df3)

ec.df1 #this is your factor 1 ecx values

ec.df2 #this is your factor 1 ecx values

ec.df3 #

p0= p0+ geom_hline(yintercept = inhib1, col = ‘red’, linetype=”dashed”)

p0

Finally lets visualize the EC50’s on the plot

#Visualising the ECX

ecx.all = bind_cols(data.frame(factor = c(‘a’, ‘b’, ‘c’)), data.frame(ecx = c(ecx1, ecx2, ecx3)), data.frame(inhib = c(inhib1, inhib1, inhib1)))

upper.all = bind_cols(data.frame(factor = c(‘a’, ‘b’, ‘c’)), data.frame(upper = c(upper1, upper2, upper3)), data.frame(inhib = c(inhib1, inhib1, inhib1)))

lower.all = bind_cols(data.frame(factor = c(‘a’, ‘b’, ‘c’)), data.frame(lower = c(lower1, lower2, lower3)), data.frame(inhib = c(inhib1, inhib1, inhib1)))

p0 = p0 + geom_point(data = upper.all, aes(x = upper, y = inhib), color = ‘red’, size=2)

p0 = p0 + geom_point(data = ecx.all, aes(x = ecx, y = inhib), color = ‘red’, size=2)

p0 = p0 + geom_point(data = lower.all, aes(x = lower, y = inhib), color = ‘red’, size=2)

p0

You can see the EC50 values decrease between the factors, especially in factor c. The EC10 for the left hand plot is quite a bit lower, so a regulator may set lower ‘trigger values’ for areas where these two combinations of stressors occur. The curve on the left has an EC50 of 46.11, but the curve on right has an EC50 > 27.83, so ‘factor c’ is of much greater concern for regulators. In my view this is much better than simply going off p-values. However we can still compare the EC50s statistically, which is a topic for a later post.

I hope this post was useful to you, and will hopefully get researchers new to modelling thinking through the lens of effect size.

*Venables WN, Ripley BD. Modern applied statistics with S-PLUS. Springer Science & Business Media. 2013.

]]>