Friday, September 7, 2012

Phylogenetic Comparative Methods: Some Generalities

The comparative method, in one form or another, has been of paramount importance in biology since its inception. It is the basis for understanding how and why organisms differ. Prior to the 1980's the comparative method most often involved simple statistics (regression, correlation). There are several problems with this approach when comparing different taxa (species, genera etc.). Firstly, taxa do not have completely independent evolutionary histories. This is not a new concept. Even Linnaeus' hierarchical classification system indirectly represents the non-independence of species. Secondly, simple statistics such as correlation assume complete independence and the evolutionary process inherently violates this assumption. But comparative biologists need not despair!

There are a few methods for dealing with the issue of phylogenetic non-independence. These include (but are not limited to) Phylogenetically Independent Contrasts (PIC) (Felsenstein 1985) and Phylogenetic Generalized Least Squares Regression (PGLS) (Grafen 1989). The most commonly used method has been PIC.

PIC involves the calculation of contrasts (branch length calibrated differences) between sister taxa. Regressions are then carried out on the contrasts (through the origin) as opposed to the raw data, which effectively removes the influence of relatedness (Felsenstein 1985). PIC can be used for the comparison of one continuous with one categorical trait and for two continuous traits.

R code:
require(ape) # Paradis (2006)
tree <- read.nexus("tree.nex") # A tree with branch lengths
data <- read.csv("data.csv", header=T,row.names=1) # Data with row names as taxon names and two traits
pic1 <-pic(trait1,tree) # Calculate contrasts for each trait
pic2 <-pic(trait2,tree)
piclm <- lm(pic1~pic2 -1) # Regress contrasts through the origin

On the other hand, PGLS is very similar to statistics employed by ecologists concerned about spatial autocorrelation (localities closer to each other are likely to be more similar). PGLS constructs a correlation matrix based on the distance of taxa on the tree. The matrix is then incorporated into a generalized linear model (Grafen, 1989). One advantage of PGLS over PIC is that it can accomodate several models of evolution (e.g. changes in evolutionary rate, stabilizing selection). In contrast, PIC assumes a Brownian Motion (stochastic; BM) process of trait evolution (Felsenstein, 1985). Although PIC and PGLS may be equivalent under a BM model (Blomberg et al. 2012), the comparison cannot be made under other evolutionary models (evolutionary models will be the subject of a coming post). PGLS can be used to compare two continuous traits (for the comparison of a continuous and categorical traits, Phylogenetic Generalized Estimating Equations (Paradis and Claude (2002)) function similarly to PGLS).

R code:
# PGLS assuming BM
require(ape)
tree <- read.nexus("tree.nex")
data <- read.csv("data.csv", header=T,row.names=1) 
gls1<- gls(trait1~trait2,data=data,correlation=corBrownian(phy=tree),method="ML")
# Using the maximum likelihood method enables the comparison of evolutionary models using AIC

This post has been rather brief but I intend to continue posts on this topic.

References

Blomberg, S.P., J.G. Lefevre, J.A. Wells, and M. Waterhouse. 2012. Independent contrasts and PGLS estimators are equivalent. Systematic Biology 61: 1-61.

Felsenstein, J. 1985. Phylogenies and the Comparative Method. The American Naturalist 125:1-15.

Grafen, A. 1989. The Phylogenetic Regression. Philosophical Transactions of the Royal Society of London B 326:119-157.

Paradis, E. and J. Claude. 2002. Analysis of Comparative Data Using Generalized Estimating Equations. Journal of theoretical biology 218:175–185.

Paradis, E. 2006. Analysis of Phylogenetics and Evolution with R. Springer Science+Business Media, LLC, New York.