Automated rapid light curve analyses in R

The objective of this post is to simplify and automate the process of conducting rapid light curve analysis from a Walz Pulse Amplitude Modulation Fluorometry (PAM) fluorometer using their WinControl generated csv files. The first time I did this I jumped around between Excel and Graphpad and eventually finished my analysis in under a day. But I didn’t like the jumping around, and wanted to keep my raw csv files so I decided to code it in R. Little did I realise it would take me a good number of days, although I learnt a lot in the process.

The code is long, and I suspect at times ugly. If you have a cleaner way to run some of the functions, I would be happy to know. For simplicity I haven’t included all the code below but rather described the steps in the text. The full code can be found here (https://github.com/gerard-ricardo/PAM-RLC-analysis) including some Walz csv files.

Major steps

  1. Naming csv files. Each RLC data from the PAM needs to be saved in a csv file. 2) Every file needs to be in an allocated folder, and each file labelled with a unique ID, with each label the same length seperated by spaces. I use Bulk Rename Utility to do this. An example may be ‘d001 mille’, ‘d050 mille’, ‘d100 mille’. Here the unique ID is the 4-digit ID, and the second block of letters describes the experiment. The code below will read the first bloack of letters.
  2. Importing the folder of RLC into R. Note that if you have delimited you read_delim(filename, delim = ‘;’). I have the unique ID saved as ‘disc’.
  3. Importing the environmental treatment data. Note: As you likely subsampled for the RLC curves, this code uses left_join to merge the two data.frames. You do not need them to match, as long as each has the same unique ID.
  4. Create an rETR column and data clean for anomalies.
  5. Get starting values. Run the SSplatt.my function. This is from the modified from the Platt package (they did all the hard work).
  6. Run the nonlinear models and derive all the RLC parameters and models

Ok Let’s get started.

Step 1) Name the csv files in a format of the same length. The first block of letters should be your ID.

This is what it has inside a file. I have many parameters but you can see I have 3 RLCs in just this file. For the RLCs, the only parameter you need are the PAR and the Y(II) columns.

Step 2) Importing the folder of RLC

# Set the working directory
setwd("C:/.../r mil csvs")

# Load necessary libraries
library(tidyr)
library(readr)
library(plyr)
library(dplyr)

# Define a function to read in csv files and add a "disc" column with the filename
read_csv_filename1 <- function(filename){
  ret <- read_csv(filename)
  ret$disc <- filename
  ret
}

# Read in all csv files in the working directory
filenames <- list.files(full.names = TRUE)
data1 <- ldply(filenames, read_csv_filename1)

# Extract the ID from the start of the file name and add it as a new column
data1$disc1 <- data1 %>% separate(disc, c("a", "b", 'c', 'd')) %>% .$b

Step 3. Importing the environmental treatment data. You should now have everything in a dataframe, however you will need to correctly label your columns into numeric, factors etc. You will also notice that you are lacking your experimental data, which we will now add in. Note: this does not need to be perfectly aligned or sorted, ‘left_join’ will match you ID’s and remove the rest.

# Read in environmental factors data
env.fact <- read.table(file="https://raw.githubusercontent.com/gerard-ricardo/data/master/postset%20treat%20mil", 
                       header = TRUE,
                       dec = ",", 
                       na.strings = c("",".","NA"))

# Join data1 and env.fact by the disc1 column
data2 <- left_join(data1, env.fact, by = 'disc1')

# Select only necessary columns from data1
data.s <- dplyr::select(data1, c('disc1', 'PAR','dli', 'spec', 'Y(II)1','Y(II)2', 'Y(II)3'))

# Create long format and pivot data
data1.long <- data.s %>% pivot_longer(-c(disc1, PAR ,dli, spec), 
                                      names_to = "rep" ,
                                      values_to = "meas")

# Create individual ID for each RLC
data1.long$id <- paste(data1.long$disc1, data1.long$rep, sep = "")

Step 4. Create an rETR column and data clean for anomalies. For corals we use rETR instead of ETR which is simply the PAR x Yield

# Calculate rETR 
data1.long$rETR <- data1.long$PAR * data1.long$meas

# Load ggplot2 library
library(ggplot2)

# source theme 
source("https://raw.githubusercontent.com/gerard-ricardo/data/master/theme_sleek1")

# create a scatter plot with ggplot2
p0 <- ggplot() + 
  geom_point(data = data1.long, 
             mapping = aes(x = PAR, y = rETR), 
             size = 1) + 
  facet_wrap(~id)

# print the plot
print(p0)

Most of my curves look okay at first glance but you can see some have some weird anomalies. I’m going to remove those data points as they are likely to be artifacts.

data1.l = data1.long[which(data1.long$rETR<100),] #remove rTER anomalies

Looking better now. Okay, now we are getting ready to fit the Marquardt–Levenberg regression algorithm which is the standard equation used for RLC. But before we do that, often non-linear models do not handle zero values well. So I will replace all zeros with a very small value. As a rule of thumb, I use one magnitude lower than the lowest treatment level.

data1.l$PAR <- ifelse(data1.l$PAR <= 0, 0.1, data1.l$PAR) #
data1.l$rETR <- ifelse(data1.l$rETR <= 0, 0.01, data1.l$rETR)

Step 5 and 6. Now it is time to run the equations on the data. I have to say, finding a starter function for the curves was far harder than I thought, but the author of the Platt package (https://github.com/SWotherspoon/Platt) has done all the work here, from which I slightly modified the code. But first we might want to run it on just one RLC to make sure it is functioning correctly.

data1.s = split(data1.l, data1.l$id)
data1.s$m001Y(II)1
source("https://raw.githubusercontent.com/gerard-ricardo/data/master/ssplattmy") #this is the starter values. Full #credit to the author of the Platt package
library(minpack.lm)
start = unname(getInitial(rETR ~ SSPlatt.my(PAR, alpha, beta, Pmax), data1.s$m001Y(II)1)) #this finds the starting #values
md1 = nlsLM(rETR ~ Pmax*(1-exp(-alpha*PAR/Pmax))*(exp(-beta*PAR/Pmax)), start=list(Pmax=start[3],alpha=start[1], beta=start[2]), data = data1.s$m001Y(II)1) #notice I added the starting values in the equation
df.x <- data.frame(PAR = seq(0.1, 926, length = 100)) #setting up new data frame (df) defining log.x values to run
vec.x =df.x[,1]
plot(data1.s$m001Y(II)1$PAR, data1.s$m001Y(II)1$rETR, col = 'red')
lines(vec.x, predict(md1, df.x)) #looks good for m001Y(II)1


You can see for my first RLC, the curve has fit well. This is a pretty standard RLC, with a sharp increase in rETR with PAR and eventually a small decrease in rETR after a maximum has been reached.

Ok. Easy part done. At this point we need to now do this for multiple RLCs. That means finding starting values for each RLC, and adding it into multiple models. There is quite a bit of data wrangling here which I won’t explain involving removing errors in the functions. Please consult the full code for details.

Find all starting values

starts = data1.l %>% group_by(id) %>% do(broom::tidy(stats::getInitial(rETR ~ SSPlatt.my(PAR, alpha, beta, Ys), data = . ))) %>%
pivot_wider(names_from = names, values_from = x, names_prefix = "") %>% dplyr::select (.,-c('NA'))
colnames(starts) <- c("id", "alpha.s", 'beta.s', 'Pmax.s')
library(IDPmisc)
starts = NaRV.omit(starts) #removes inf

Group each RLC and run the model on each.

test2 = data1.l %>% right_join(.,starts, by = 'id') %>%
group_by(id) %>%
do(model = try(nlsLM(rETR ~ Pmax*(1-exp(-alpha*PAR/Pmax))*(exp(-beta*PAR/Pmax)),
start = list(Pmax = mean(.$Pmax.s),
alpha = mean(.$alpha.s),
beta = mean(.$beta.s)),
data = .),silent = TRUE) ) #this gets models for all RLCs

We can then predict for each RLC to check the fit. Consult the github code for lots of wrangling.

Most look okay, I have ~6 out of 42 that didn’t fit, which I might want to look into why. Once happy, I can now run a similar set of code which stores the Pmax, alpha and beta parameters in a nested list.

test = data1.l %>% right_join(.,starts, by = 'id') %>%
group_by(id) %>%
do(model = try(broom::tidy(nlsLM(rETR ~ Pmax*(1-exp(-alphaPAR/Pmax))*(exp(-beta*PAR/Pmax)),
start = list(Pmax = mean(.$Pmax.s),
alpha = mean(.$alpha.s),
beta = mean(.$beta.s)),
data = .),silent = TRUE)) ) #this gets parameters for all models

Finally, we can now calculate the other parameters (i.e rETRmax, Ek, and Em)

unest.test = test %>% unnest(model)
df.param = dplyr::select(unest.test, c(id, term, estimate))
dat_wide <- df.param %>% pivot_wider(names_from = term, values_from = estimate) %>% dplyr::select(.,-c("NA")) #year goes to columns, their areas go as the values, area is the prefix
dat_wide$ETRm = dat_wide$Pmax*(dat_wide$alpha/(dat_wide$alpha+dat_wide$beta))*((dat_wide$beta/(dat_wide$alpha+dat_wide$beta)))^(dat_wide$beta/dat_wide$alpha)
dat_wide$Ek = dat_wide$ETRm/dat_wide$alpha
dat_wide$Em =(dat_wide$Pmax/dat_wide$alpha)*log((dat_wide$alpha+dat_wide$beta)/dat_wide$beta)
final.df = left_join(dat_wide, data1.long, by = 'id')

From here you can analyse your data as normal using you treatment levels. I hope this saved someone some time, and let me know if there are issues running this code with other data sets.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s