Automated rapid light curve analyses in R

The objective of this post is to simplify and automate the process of conducting rapid light curve analysis from a Walz Pulse Amplitude Modulation Fluorometry (PAM) fluorometer using their WinControl generated csv files. The first time I did this I jumped around between Excel and Graphpad and eventually finished my analysis in under a day. But I didn’t like the jumping around, and wanted to keep my raw csv files so I decided to code it in R. Little did I realise it would take me a good number of days, although I learnt a lot in the process.

The code is long, and I suspect at times ugly. If you have a cleaner way to run some of the functions, I would be happy to know. For simplicity I haven’t included all the code below but rather described the steps in the text. The full code can be found here (https://github.com/gerard-ricardo/PAM-RLC-analysis) including some Walz csv files.

Major steps

  1. Naming csv files. Each RLC data from the PAM needs to be saved in a csv file. 2) Every file needs to be in an allocated folder, and each file labelled with a unique ID, with each label the same length seperated by spaces. I use Bulk Rename Utility to do this. An example may be ‘d001 mille’, ‘d050 mille’, ‘d100 mille’. Here the unique ID is the 4-digit ID, and the second block of letters describes the experiment. The code below will read the first bloack of letters.
  2. Importing the folder of RLC into R. Note that if you have delimited you read_delim(filename, delim = ‘;’). I have the unique ID saved as ‘disc’.
  3. Importing the environmental treatment data. Note: As you likely subsampled for the RLC curves, this code uses left_join to merge the two data.frames. You do not need them to match, as long as each has the same unique ID.
  4. Create an rETR column and data clean for anomalies.
  5. Get starting values. Run the SSplatt.my function. This is from the modified from the Platt package (they did all the hard work).
  6. Run the nonlinear models and derive all the RLC parameters and models

Ok Let’s get started.

Step 1) Name the csv files in a format of the same length. The first block of letters should be your ID.

This is what it has inside a file. I have many parameters but you can see I have 3 RLCs in just this file. For the RLCs, the only parameter you need are the PAR and the Y(II) columns.

Step 2) Importing the folder of RLC

setwd("C:/Users/gricardo/OneDrive - Australian Institute of Marine Science/3 Results/6 Post-settlement/2 2018/2018 lab/3 PAM/180119 rlc/r csvs") #set your wd to the folder of your csvs
library(tidyr)
library(readr)
library(plyr)
library(dplyr)

read_csv_filename1 <- function(filename){
ret <- read_csv(filename)
ret$disc <- filename #EDIT
ret
} #read in csvs. Also can read delim using read_delim(delim = ';')
filenames = list.files(full.names = TRUE)
data1 <- ldply(filenames, read_csv_filename1) #compile into a data.frame

data1$disc1 = data1 %>% separate(disc, c("a", "b", 'c', 'd')) %>% .$b #this extracts your ID from the start of the #file name.

Step 3. Importing the environmental treatment data. You should now have everything in a dataframe, however you will need to correctly label your columns into numeric, factors etc. You will also notice that you are lacking your experimental data, which we will now add in. Note: this does not need to be perfectly aligned or sorted, ‘left_join’ will match you ID’s and remove the rest.

env.fact <- read.table(file="https://raw.githubusercontent.com/gerard-ricardo/data/master/postset%20treat%20mil", header= TRUE,dec=",", na.strings=c("",".","NA"))
data2 = left_join(data1, env.fact, by = 'disc1') #joined only for the ID
data.s = dplyr::select (data1,c('disc1', 'PAR','dli', 'spec', 'Y(II)1','Y(II)2', 'Y(II)3')) #extract just the needed columns
data1.long <- data.s %>% pivot_longer(-c(disc1, PAR ,dli, spec), names_to = "rep" ,values_to = "meas") #create long format

data1.long$id <- paste(data1.long$disc1, data1.long$rep, sep = "") #create individual ID for each RLC

Step 4. Create an rETR column and data clean for anomalies. For corals we use rETR instead of ETR which is simply the PAR x Yield

data1.long$rETR = data1.long$PAR * data1.long$meas

library(ggplot2)
source("https://raw.githubusercontent.com/gerard-ricardo/data/master/theme_sleek1") #set theme in code
p0 = ggplot()+geom_point(data1.long, mapping = aes(x = PAR, y = rETR), size = 1 )+facet_wrap(~id)
p0

Most of my curves look okay at first glance but you can see some have some weird anomalies. I’m going to remove those data points as they are likely to be artifacts.

data1.l = data1.long[which(data1.long$rETR<100),] #remove rTER anomalies

Looking better now. Okay, now we are getting ready to fit the Marquardt–Levenberg regression algorithm which is the standard equation used for RLC. But before we do that, often non-linear models do not handle zero values well. So I will replace all zeros with a very small value. As a rule of thumb, I use one magnitude lower than the lowest treatment level.

data1.l$PAR <- ifelse(data1.l$PAR <= 0, 0.1, data1.l$PAR) #
data1.l$rETR <- ifelse(data1.l$rETR <= 0, 0.01, data1.l$rETR)

Step 5 and 6. Now it is time to run the equations on the data. I have to say, finding a starter function for the curves was far harder than I thought, but the author of the Platt package (https://github.com/SWotherspoon/Platt) has done all the work here, from which I slightly modified the code. But first we might want to run it on just one RLC to make sure it is functioning correctly.

data1.s = split(data1.l, data1.l$id)
data1.s$m001Y(II)1
source("https://raw.githubusercontent.com/gerard-ricardo/data/master/ssplattmy") #this is the starter values. Full #credit to the author of the Platt package
library(minpack.lm)
start = unname(getInitial(rETR ~ SSPlatt.my(PAR, alpha, beta, Pmax), data1.s$m001Y(II)1)) #this finds the starting #values
md1 = nlsLM(rETR ~ Pmax*(1-exp(-alpha*PAR/Pmax))*(exp(-beta*PAR/Pmax)), start=list(Pmax=start[3],alpha=start[1], beta=start[2]), data = data1.s$m001Y(II)1) #notice I added the starting values in the equation
df.x <- data.frame(PAR = seq(0.1, 926, length = 100)) #setting up new data frame (df) defining log.x values to run
vec.x =df.x[,1]
plot(data1.s$m001Y(II)1$PAR, data1.s$m001Y(II)1$rETR, col = 'red')
lines(vec.x, predict(md1, df.x)) #looks good for m001Y(II)1


You can see for my first RLC, the curve has fit well. This is a pretty standard RLC, with a sharp increase in rETR with PAR and eventually a small decrease in rETR after a maximum has been reached.

Ok. Easy part done. At this point we need to now do this for multiple RLCs. That means finding starting values for each RLC, and adding it into multiple models. There is quite a bit of data wrangling here which I won’t explain involving removing errors in the functions. Please consult the full code for details.

Find all starting values

starts = data1.l %>% group_by(id) %>% do(broom::tidy(stats::getInitial(rETR ~ SSPlatt.my(PAR, alpha, beta, Ys), data = . ))) %>%
pivot_wider(names_from = names, values_from = x, names_prefix = "") %>% dplyr::select (.,-c('NA'))
colnames(starts) <- c("id", "alpha.s", 'beta.s', 'Pmax.s')
library(IDPmisc)
starts = NaRV.omit(starts) #removes inf

Group each RLC and run the model on each.

test2 = data1.l %>% right_join(.,starts, by = 'id') %>%
group_by(id) %>%
do(model = try(nlsLM(rETR ~ Pmax*(1-exp(-alpha*PAR/Pmax))*(exp(-beta*PAR/Pmax)),
start = list(Pmax = mean(.$Pmax.s),
alpha = mean(.$alpha.s),
beta = mean(.$beta.s)),
data = .),silent = TRUE) ) #this gets models for all RLCs

We can then predict for each RLC to check the fit. Consult the github code for lots of wrangling.

Most look okay, I have ~6 out of 42 that didn’t fit, which I might want to look into why. Once happy, I can now run a similar set of code which stores the Pmax, alpha and beta parameters in a nested list.

test = data1.l %>% right_join(.,starts, by = 'id') %>%
group_by(id) %>%
do(model = try(broom::tidy(nlsLM(rETR ~ Pmax*(1-exp(-alphaPAR/Pmax))*(exp(-beta*PAR/Pmax)),
start = list(Pmax = mean(.$Pmax.s),
alpha = mean(.$alpha.s),
beta = mean(.$beta.s)),
data = .),silent = TRUE)) ) #this gets parameters for all models

Finally, we can now calculate the other parameters (i.e rETRmax, Ek, and Em)

unest.test = test %>% unnest(model)
df.param = dplyr::select(unest.test, c(id, term, estimate))
dat_wide <- df.param %>% pivot_wider(names_from = term, values_from = estimate) %>% dplyr::select(.,-c("NA")) #year goes to columns, their areas go as the values, area is the prefix
dat_wide$ETRm = dat_wide$Pmax*(dat_wide$alpha/(dat_wide$alpha+dat_wide$beta))*((dat_wide$beta/(dat_wide$alpha+dat_wide$beta)))^(dat_wide$beta/dat_wide$alpha)
dat_wide$Ek = dat_wide$ETRm/dat_wide$alpha
dat_wide$Em =(dat_wide$Pmax/dat_wide$alpha)*log((dat_wide$alpha+dat_wide$beta)/dat_wide$beta)
final.df = left_join(dat_wide, data1.long, by = 'id')

From here you can analyse your data as normal using you treatment levels. I hope this saved someone some time, and let me know if there are issues running this code with other data sets.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s