Friday, September 7, 2012

Phylogenetic Comparative Methods: Some Generalities

The comparative method, in one form or another, has been of paramount importance in biology since its inception. It is the basis for understanding how and why organisms differ. Prior to the 1980's the comparative method most often involved simple statistics (regression, correlation). There are several problems with this approach when comparing different taxa (species, genera etc.). Firstly, taxa do not have completely independent evolutionary histories. This is not a new concept. Even Linnaeus' hierarchical classification system indirectly represents the non-independence of species. Secondly, simple statistics such as correlation assume complete independence and the evolutionary process inherently violates this assumption. But comparative biologists need not despair!

There are a few methods for dealing with the issue of phylogenetic non-independence. These include (but are not limited to) Phylogenetically Independent Contrasts (PIC) (Felsenstein 1985) and Phylogenetic Generalized Least Squares Regression (PGLS) (Grafen 1989). The most commonly used method has been PIC.

PIC involves the calculation of contrasts (branch length calibrated differences) between sister taxa. Regressions are then carried out on the contrasts (through the origin) as opposed to the raw data, which effectively removes the influence of relatedness (Felsenstein 1985). PIC can be used for the comparison of one continuous with one categorical trait and for two continuous traits.

R code:
require(ape) # Paradis (2006)
tree <-"tree.nex") # A tree with branch lengths
data <- read.csv("data.csv", header=T,row.names=1) # Data with row names as taxon names and two traits
pic1 <-pic(trait1,tree) # Calculate contrasts for each trait
pic2 <-pic(trait2,tree)
piclm <- lm(pic1~pic2 -1) # Regress contrasts through the origin

On the other hand, PGLS is very similar to statistics employed by ecologists concerned about spatial autocorrelation (localities closer to each other are likely to be more similar). PGLS constructs a correlation matrix based on the distance of taxa on the tree. The matrix is then incorporated into a generalized linear model (Grafen, 1989). One advantage of PGLS over PIC is that it can accomodate several models of evolution (e.g. changes in evolutionary rate, stabilizing selection). In contrast, PIC assumes a Brownian Motion (stochastic; BM) process of trait evolution (Felsenstein, 1985). Although PIC and PGLS may be equivalent under a BM model (Blomberg et al. 2012), the comparison cannot be made under other evolutionary models (evolutionary models will be the subject of a coming post). PGLS can be used to compare two continuous traits (for the comparison of a continuous and categorical traits, Phylogenetic Generalized Estimating Equations (Paradis and Claude (2002)) function similarly to PGLS).

R code:
# PGLS assuming BM
tree <-"tree.nex")
data <- read.csv("data.csv", header=T,row.names=1) 
gls1<- gls(trait1~trait2,data=data,correlation=corBrownian(phy=tree),method="ML")
# Using the maximum likelihood method enables the comparison of evolutionary models using AIC

This post has been rather brief but I intend to continue posts on this topic.


Blomberg, S.P., J.G. Lefevre, J.A. Wells, and M. Waterhouse. 2012. Independent contrasts and PGLS estimators are equivalent. Systematic Biology 61: 1-61.

Felsenstein, J. 1985. Phylogenies and the Comparative Method. The American Naturalist 125:1-15.

Grafen, A. 1989. The Phylogenetic Regression. Philosophical Transactions of the Royal Society of London B 326:119-157.

Paradis, E. and J. Claude. 2002. Analysis of Comparative Data Using Generalized Estimating Equations. Journal of theoretical biology 218:175–185.

Paradis, E. 2006. Analysis of Phylogenetics and Evolution with R. Springer Science+Business Media, LLC, New York.


  1. Hi, you said:

    "PIC can be used for the comparison of one continuous with one categorical trait and for two continuous traits"

    How can I incorporate categorical traits in PIC? Do they have to be binary? If not, how many "values" can a category take in?


  2. Hi Pearsy,

    They don't have to be binary. I have done PIC with three dietary categories for ungulates. In my data the dietary categories were "Grazer, Browser, Mixed feeder." I didn't use any method of scoring (1,2,3) but that should not make any difference. You will obviously end up with some contrasts that are zero and others that are non-zero.

    Alternatively, you can use generalized estimating equations. The code for GEE is summarized in Paradis' book "Analysis of Phylogenetics and Evolution with R." The original paper is as follows:

    Paradis, E., and J. Claude. 2002. Analysis of Comparative Data Using Generalized Estimating Equations. Journal of theoretical biology 218:175–185.

  3. For your second question, I have not done any testing to determine how many categories can be used. That probably depends on the size of your data set and how many observations you have for each category.

  4. Everyone says you can't move a primary Tumlbr blog without deleting the account, but what about a secondary blog?

    comparative data