Monday, October 1, 2012

A Brief Introduction to Testing for Phylogenetic Signal in Comparative Data

Phylogenetic comparative methods (PCMs) were the subject of my last post. You can read it here ( It was a VERY brief description of two commonly employed PCMs (Phylogenetically Independent Contrasts and Phylogenetic Generalized Least Squares Regression). However, it is important to note that PCMs should not be applied unless their use is justified. It's true that the availability of phylogenies and the array of methods for reconstructing phylogenies has skyrocketed. As a result, reviewers are suggesting PCMs more and more. But it is important to consider whether PCMs are necessary and whether they add to your analysis or aid in the interpretation of your data. But how can you determine if your study needs PCMs?

One of the questions to ask is whether your data (rather the residuals; Revell 2010) are phylogenetically structured. In other words, do your data show phylogenetic signal? Two common methods are the K statistics of Blomberg et al. (2003) and Pagel's lambda (Pagel, 1999). The K statistic compares the observed and expected variance for calculated independent contrasts (Blomberg et al. 2009; Glor, 2009). Pagel's lambda is a multiplier of the off diagonal elements of the covariance matrix that varies between 0 and 1. Lambda transforms the phylogenetic tree with the purpose of comparing a complete lack of phylogenetic structure (lambda = 0; star phylogeny) to the untransformed topology and branch lengths of your original tree (lambda = 1) (Pagel, 1999; Gor, 2009). In other words, Pagel's lambda determines which situation, a star or structured phylogeny, fits your data best.

Here is some basic R code for using Blomberg et al.'s K:

# Help file
# Your data must have matching taxon names or be sorted in the same order as the tip labels of the phylogeny
# This will return the K statistics and p value (as well as the variance of the independent contrasts and the associated z value)

Here is some basic R code for using Pagel's lambda:

# Help file
lamb<- phylosig(tree,data,method="lambda")
# Your data also require names that match the tip labels on the tree
# This will return a lambda value and log likelihood, values of lambda closer to 1 indicate singificant phylogenetic signal

Using the K statistic and Pagel's lambda, you can justify the use of PCMs or demonstrate that they are not necessary. Although I feel strongly that PCMs are powerful tools in comparative studies, I also feel they should only be used when it is statistically justifiable to do so.

You can follow the instructions of Glor (2009) to further understand Pagel's lambda. There are also other methods for testing for phylogenetic signal that I have not covered here.


Blomberg, S. P., T. Garland, Jr., and A. R. Ives. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57:717-745.

Glor. 2009. IV. Testing Phylogenetic Signal in R. Bodega Phylogenetics Wiki.

Pagel, M. 1999. Inferring the historical patterns of biological evolution. Nature, 401, 877–884.

Revell, L. J. 2010. Phylogenetic signal and linear regression on species data. Methods in Ecology and Evolution 1:319-329.


  1. Hi Dani,

    I'm a bit out of my league when it comes to the details of when phylogenetic methods are necessary or not. But I have a different understanding than you of when they're required.

    You say that phylogenetic methods should only be used if the data have a phylogenetic signal. While this is a good rule of thumb, I believe it's actually signal in the model RESIDUALS that determine whether phylogenetic methods are required. In practice you don't often get residuals with signal unless there is also signal in the data set, but it's possible and the focus on residuals drives home how these methods work.

    I got most of this from a paper I read a while ago:
    Revell, L. J. 2010. Phylogenetic signal and linear regression on species data. Methods in Ecology and Evolution 1:319-329.

  2. You are correct and I have oversimplified the issue here!

  3. Further reading on the K statistic might also include Revell et al. 2008. Phylogenetic Signal, Evolutionary Process, and Rate. Syst. Biol. 57:591–601.

  4. You can also add test=T to phylosig to obtain a p value.

  5. Thanks for the clarification Dani.

    There have been a couple of interesting posts at Dynamic Ecology (an awesome blog, now even awesomer with more bloggers on board) questioning the necessity of phylogenetic and other autocorrelation-correcting methods.

    1. Original post:

    2. Subsequent clarification of basic stats underlying arguments in original post:

    And here are a few recent papers on the topic in case readers are interested (it's details of statistical methodology - how could anyone not be interested?!).

    1. Review finding that most comparative studies account for phylogenetic autocorrelation nowadays (but ignore intraspecific variation). Reanalysis of data sets reveals that accounting for phylogenetic autocorrelation didn't affect results much.
    Garamszegi, L. Z. and A. P. Møller. 2010. Effects of sample size and intraspecific variation in phylogenetic comparative studies: a meta-analytic review. Biological Reviews 85:797-805.

    2. Meta-analysis of meta-analyses (!) investigating whether accounting for phylogenetic autocorrelation affected results. It does; phylogenetic methods tended to increase confidence intervals around terms' coefficients, sometimes changing their effects from significant to non-significant.
    Chamberlain, S. A., S. M. Hovick, C. J. Dibble, N. L. Rasmussen, B. G. Van Allen, B. S. Maitner, J. R. Ahern, L. P. Bell-Dereske, C. L. Roy, M. Meza-Lopez, J. Carrillo, E. Siemann, M. J. Lajeunesse, and K. D. Whitney. 2012. Does phylogeny matter? Assessing the impact of phylogenetic information in ecological meta-analysis. Ecology Letters 15:627-636.

    3. Math-based paper breaking down phylogenetic generalized least squares approach to its mathy basics to demonstrate that it is robust to a lot of errors in phylogenetic trees.
    Stone, E. A. 2011. Why the phylogenetic regression appears robust to tree misspecification. Systematic Biology 60:245-260.

    Jay Fitzsimmons

  6. I have read the post on statistical machismo and thought it was quite good. I also agree with the second blog post. In my experience, it does make the biggest difference when your result is only marginally significant. This is when I have seen p values that are different by orders of magnitude between OLS and GLS. When your p value is far below 0.05, you wouldn't expect OLS and GLS to differ but sometimes reviewers request a phylogenetically controlled analysis (this may in fact be the difference between having your paper accepted or rejected), in which case I see no problem with further supporting your story by providing it.

    I also meant this post to be a very basic overview of two methods that are commonly used to test for phylogenetic signal and not an extensive review of the entire issue. But I agree that there are many issues surrounding the use of phylogenetic comparative methods.

  7. Hello Danielle!!
    I'm trying to run the analysis with morphometric data, apparently I have a matrix of coordinates, but what I need is a vector.waht can I do? to run phylogenetic signal

    1. Abril,

      I think you would typically be testing for phylogenetic signal in PCA scores or something else rather than the raw coordinate data. A good starting place might be the geomorph R package.