Seeing the trees for the network: Consensus, information content, and superphylogenies
Abstract
Phylogenetic inference often leads to solutions made up of multiple trees on a given set of leaves or taxa. These competing hypotheses might be the equally optimal trees obtained from the analysis of a single matrix using maximum parsimony (Hennig, 1979) or maximum likelihood (Felsenstein, 1981) or the set of most probable trees produced by Bayesian analysis (Rannala and Yang, 1996; Larget and Simon, 1999; Mau et al., 1999). They might also have been inferred independently from different data sets or even be the result of resampling methods such as the bootstrap or the jackknife (Felsenstein, 1985; Penny and Hendy, 1985, 1986). Consensus methods are commonly used to identify areas of conflict and agreement among such multiple trees; they can represent the relationships that are supported either unanimously or by a majority of trees while discarding other, less supported relationships (Bryant, 2003). Thus, a consensus method is usually defined as a function that takes as input a set, or profile, of trees on the same set of taxa and returns a single tree on the same set of taxa (Leclerc and Cucumel, 1987; Steel et al., 2000; Bryant, 2003). This function can take different forms, and many consensus methods have been proposed (see Swofford, 1991; Bryant, 2003, for reviews), amongst which the strict consensus (Sokal and Rohlf, 1981) and majority-rule consensus (Margush and McMorris, 1981) are perhaps the most widely used and understood. These various functions differ in two major respects: (1) the kind of information they preserve and (2) the way they deal with conflict among input trees. Indeed, the information contained in a tree can be considered in terms of nesting, three- or four-taxon statements, components, and branch lengths, while conflict can be left unresolved or dealt with using different criteria (Bryant, 2003). Notwithstanding these differences, most methods abide to the prevailing phylogenetic model: that of a tree embedding a hierarchy of descent. This model puts forward a fully resolved, or binary, tree as the ideal representation of evolution. In practice, consensus trees are seldom binary and they embed—i.e., are compatible with—multiple binary trees. Indeed, because it is often impossible to distinguish between hard and soft polytomies (Nelson and Platnick, 1980; Maddison, 1989) and because the latter can be resolved in a number of different ways, an unresolved tree can be refined by a set of binary trees (Rohlf, 1982; Mickevich and Platnick, 1989; Steel et al., 2000).