There are a few methods for dealing with the issue of phylogenetic non-independence. These include (but are not limited to) Phylogenetically Independent Contrasts (PIC) (Felsenstein 1985) and Phylogenetic Generalized Least Squares Regression (PGLS) (Grafen 1989). The most commonly used method has been PIC.
PIC involves the calculation of contrasts (branch length calibrated differences) between sister taxa. Regressions are then carried out on the contrasts (through the origin) as opposed to the raw data, which effectively removes the influence of relatedness (Felsenstein 1985). PIC can be used for the comparison of one continuous with one categorical trait and for two continuous traits.
R code:
require(ape) # Paradis (2006)
tree <- read.nexus("tree.nex") # A tree with branch lengths
data <- read.csv("data.csv", header=T,row.names=1) # Data with row names as taxon names and two traits
pic1 <-pic(trait1,tree) # Calculate contrasts for each trait
pic2 <-pic(trait2,tree)
piclm <- lm(pic1~pic2 -1) # Regress contrasts through the origin
On the other hand, PGLS is very similar to statistics employed by ecologists concerned about spatial autocorrelation (localities closer to each other are likely to be more similar). PGLS constructs a correlation matrix based on the distance of taxa on the tree. The matrix is then incorporated into a generalized linear model (Grafen, 1989). One advantage of PGLS over PIC is that it can accomodate several models of evolution (e.g. changes in evolutionary rate, stabilizing selection). In contrast, PIC assumes a Brownian Motion (stochastic; BM) process of trait evolution (Felsenstein, 1985). Although PIC and PGLS may be equivalent under a BM model (Blomberg et al. 2012), the comparison cannot be made under other evolutionary models (evolutionary models will be the subject of a coming post). PGLS can be used to compare two continuous traits (for the comparison of a continuous and categorical traits, Phylogenetic Generalized Estimating Equations (Paradis and Claude (2002)) function similarly to PGLS).
R code:
# PGLS assuming BM
require(ape)
tree <- read.nexus("tree.nex")
tree <- read.nexus("tree.nex")
data <- read.csv("data.csv", header=T,row.names=1)
gls1<- gls(trait1~trait2,data=data,correlation=corBrownian(phy=tree),method="ML")
# Using the maximum likelihood method enables the comparison of evolutionary models using AIC
This post has been rather brief but I intend to continue posts on this topic.
References
# Using the maximum likelihood method enables the comparison of evolutionary models using AIC
This post has been rather brief but I intend to continue posts on this topic.
References
Blomberg, S.P., J.G. Lefevre, J.A. Wells, and M. Waterhouse. 2012. Independent contrasts and PGLS estimators are equivalent. Systematic Biology 61: 1-61.
Felsenstein, J. 1985. Phylogenies and the Comparative Method. The American Naturalist 125:1-15.
Grafen, A. 1989. The Phylogenetic Regression. Philosophical Transactions of the Royal Society of London B 326:119-157.
Paradis, E. and J. Claude. 2002. Analysis of Comparative Data Using Generalized Estimating Equations. Journal of theoretical biology 218:175–185.
Paradis, E. 2006. Analysis of Phylogenetics and Evolution with R. Springer Science+Business Media, LLC, New York.
Hi, you said:
ReplyDelete"PIC can be used for the comparison of one continuous with one categorical trait and for two continuous traits"
How can I incorporate categorical traits in PIC? Do they have to be binary? If not, how many "values" can a category take in?
Thanks!
Hi Pearsy,
ReplyDeleteThey don't have to be binary. I have done PIC with three dietary categories for ungulates. In my data the dietary categories were "Grazer, Browser, Mixed feeder." I didn't use any method of scoring (1,2,3) but that should not make any difference. You will obviously end up with some contrasts that are zero and others that are non-zero.
Alternatively, you can use generalized estimating equations. The code for GEE is summarized in Paradis' book "Analysis of Phylogenetics and Evolution with R." The original paper is as follows:
Paradis, E., and J. Claude. 2002. Analysis of Comparative Data Using Generalized Estimating Equations. Journal of theoretical biology 218:175–185.
For your second question, I have not done any testing to determine how many categories can be used. That probably depends on the size of your data set and how many observations you have for each category.
ReplyDeleteEveryone says you can't move a primary Tumlbr blog without deleting the account, but what about a secondary blog?
ReplyDeletecomparative data