# 11.3: Tree Balance

- Page ID
- 21644

As we discussed in Chapter 10, tree balance considers how "balanced" the branches of a phylogenetic tree are. That is, if we look at each node in the tree, are the two sister clades of the same size (balanced) or wildly different (imbalanced)?

Birth-death trees have a certain amount of "balance," perhaps a bit less than your intuition might suggest (see chapter 10). We can look to real trees to see if the amount of balance matches what we expect under birth-death models. A less balanced pattern in real trees would suggest that speciation and/or extinction rate vary among lineages more than we would expect. By contrast, more balanced trees would suggest more even and predictable diversification across the tree of life than expected under birth-death models. This approach traces back to Raup and colleagues, who applied stochastic birth-death models to paleontology in a series of influential papers in the 1970s (e.g. Raup et al. 1973, Raup and Gould (1974)). I will show how to do this for both individual nodes and for whole trees in the following sections.

## Section 11.3a: Sister clades and the balance of individual nodes

For single nodes, we already know that the distribution of sister taxa species richness is uniform over all possible divisions of *N*_{n} species into two clades of size *N*_{a} and *N*_{b} (Chapter 11). This idea leads to simple test of whether the distribution of species between two sister clades is unusual compared to the expectation under a birth-death model (Slowinski and Guyer 1993). This test can be used, for example, to test whether the diversity of exceptional clades, like passerine birds, is higher than one would expect when compared to their sister clade. This is the simplest measure of tree balance, as it only considers one node in the tree at a time.

Slowinsky and Guyer (1993) developed a test based on calculating a P-value for a division at least as extreme as seen in a particular comparison of sister clades. We consider *N*_{n} total species divided into two sister clades of sizes *N*_{a} and *N*_{b}, where *N*_{a} < *N*_{b} and *N*_{a} + *N*_{b} = *N*_{n}. Then:

If *N*_{a} ≠ *N*_{b}:

\[ P = \frac{2 N_a}{N_n - 1} \label{11.7}\]

If *N*_{a} = *N*_{b} or *P* > 1 then set *P* = 1

For example, we can assess diversification in the Andean representatives of the legume genus *Lupinus* (Hughes and Eastwood 2006). This genus includes one young radiation of 81 Andean species, spanning a wide range of growth forms. The likely sister clade to this spectacular Andean radiation is a clade of *Lupinus* species in Mexico that includes 46 species (Drummond et al. 2012). In this case *N*_{a} = 81 − 46 = 35, and we can then calculate a P-value testing the null hypothesis that both of these clades have the same diversification rate:

\[ P = \frac{2 N_a}{N_n - 1} = \frac{2 \cdot 35}{81 - 1} = 0.875 \label{11.8} \]

We cannot reject the null hypothesis. Indeed, later work suggests that the actual increase in diversification rate for Lupinus occurred deeper in the phylogenetic tree, in the ancestor of a more broadly ranging New World clade (Hughes and Eastwood 2006; Drummond et al. 2012).

Often, we are interested in testing whether a particular trait - say, dispersal into the Páramo - is responsible for the increase in species richness that we see in some clades. In that case, a single comparison of sister clades may be unsatisfying, as sister clades almost always differ in many characters, beyond just the trait of interest. Even if the clade with our putative "key innovation" is more diverse, we still might not be confident in inferring a correlation from a single observation. We need replication.

To address this problem, many studies have used natural replicates across the tree of life, comparing the species richnesses of many pairs of sister clades that differ in a given trait of interest. Following Slowinsky and Guyer (1993), we could calculate a p-value for each clade, and then combine those p-values into an overall test. In this case, one clade (with diversity *N*_{1}) has the trait of interest and the other does not (*N*_{0}), and our formula is half of equation 11.5 since we will consider this a one-tailed test:

\[ P = \frac{N_0}{N_n - 1} \label{11.9} \]

When analyzing replicate clade comparisons - e.g. many sister clades, where in each case one has the trait of interest and the other does not - Slowinsky and Guyer (1993) recommended combining these p-values using Fisher's combined probability test, so that:

\[χ^2){combined} = −2∑\ln (P_i) \label{11.10}\]

Here, the *P*_{i} values are from *i* independent sister clade comparisons, each using equation 11.9. Under the null hypothesis where the character of interest does not increase diversification rates, the test statistic, *χ*^{2}_{combined}, should follow a chi-squared distribution with 2*k* degrees of freedom where k is the number of tests. But before you use this combined probability approach, see what happens when we apply it to a real example!

As an example, consider the following data, which compares the diversity of many sister pairs of plants. In each case, one clade has fleshy fruits and the other dry (data from Vamosi and Vamosi 2005):

Fleshy fruit clade | n_{fleshy} |
Dry fruit clade | n_{dry} |
---|---|---|---|

A | 1 | B | 2 |

C | 1 | D | 64 |

E | 1 | F | 300 |

G | 1 | H | 89 |

I | 1 | J | 67 |

K | 3 | L | 4 |

M | 3 | N | 34 |

O | 5 | P | 10 |

Q | 9 | R | 150 |

S | 16 | T | 35 |

U | 33 | V | 2 |

W | 40 | X | 60 |

Y | 50 | Z | 81 |

AA | 100 | BB | 1 |

CC | 216 | DD | 3 |

EE | 393 | FF | 1 |

GG | 850 | HH | 11 |

II | 947 | JJ | 1 |

KK | 1700 | LL | 18 |

The clades in the above table are as follows: A: *Pangium*, B: *Acharia*+*Kigellaria*, C: *Cyrilla*, D: *Clethra*, E: *Roussea*, F: *Lobelia*, G: *Myriophylum + Haloragis + Penthorum*, H: *Tetracarpaea*, I: *Austrobaileya*, J: *Illicium+Schisandra*, K: *Davidsonia*, L: *Bauera*, M: *Mitchella*, N: *Pentas*, O: *Milligania
*, P: *Borya*, Q: *Sambucus*, R:
*Viburnum*, S: *Pereskia*, T: *Mollugo*, U: *Decaisnea + Sargentodoxa + Tinospora + Menispermum +
Nandina Caulophyllum + Hydrastis + Glaucidium*, V: *Euptelea*, W: *Tetracera*, X: *Dillenia*, Y: *Osbeckia*, Z: *Mouriri*, AA: *Hippocratea*, BB: *Plagiopteron*, CC: *Cyclanthus + Sphaeradenia + Freycinetia*, DD: *Petrosavia + Japonlirion*, EE: *Bixa*, FF: *Theobroma + Grewia + Tilia + Sterculia + Durio*, GG: *Impatiens*, HH: *Idria*, II: *Lamium + Clerodendrum + Callicarpa + Phyla + Pedicularis + Paulownia*, JJ: *Euthystachys*, KK: *Callicarpa + Phyla + Pedicularis + Paulownia + Solanum*, LL: *Solanum*.

The individual clades show mixed support for the hypothesis, with only 7 of the 18 comparisons showing higher diversity in the fleshy clade, but 6 of those 7 comparisons significant at *P* < 0.05 using equation 11.9. The combined probability test gives a test statistic of *χ*^{2}_{combined} = 72.8. Comparing this to a *χ*^{2} distribution with 36 degrees of freedom, we obtain *P* = 0.00027, a highly significant result. This implies that fleshy fruits do, in fact, result in a higher diversification rate.

However, if we test the opposite hypothesis, we see a problem with the combined probability test of equation 11.10 (Vamosi and Vamosi 2005). First, notice that 11 of 18 comparisons show higher diversity in the non-fleshy clade, with 4 significant at *P* < 0.05. The combined probability test gives *χ*^{2}_{combined} = 58.9 and *P* = 0.0094. So we reject the null hypothesis and conclude that non-fleshy fruits diversify at a higher rate! In other words, we can reject the null hypothesis in both directions with this example.

What's going on here? It turns out that this test is very sensitive to outliers - that is, clades with extreme differences in diversity. These clades are very different than what one would expect under the null hypothesis, leading to rejection of the null - and, in some cases with two characters, when there are outliers on both sides (e.g. the proportion of species in each state has a u-shaped distribution; Paradis 2012) we can show that both characters significantly increase diversity (Vamosi and Vamosi 2005)!

Fortunately, there are a number of improved methods that can be used that are similar in spirit to the original Slowinsky and Guyer test but more statistically robust (e.g Paradis 2012). For example, we can apply the "richness Yule test" as described in Paradis (2012), to the data from Vamosi et al. (2005). This is a modified version of the McConway-Sims test (McConway and Sims 2004), and compares the likelihood of a equal rate yule model applied to all clades to a model where one trait is associated with higher or lower diversification rates. This test requires knowledge of clade ages, which I don't have for these data, but Paradis (2012) shows that the test is robust to this assumption and recommends substituting a large and equal age for each clade. I chose 1000 as an arbitrary age, and found a significant likelihood ratio test (null model *l**n**L* = −215.6, alternative model *l**n**L* = −205.7, *P* = 0.000008). This method estimates a higher rate of diversification for fleshy fruits (since the age of the clade is arbitrary, the actual rates are not meaningful, but their estimated ratio *λ*_{1}/*λ*_{0} = 1.39 suggests that fleshy fruited lineages have a diversification rate almost 40% higher).

## Section 11.3b: Balance of whole phylogenetic trees

We can assess the overall balance of an entire phylogenetic tree using tree balance statistics. As discussed, I will describe just one common statistic, Colless' I, since other metrics capture the same pattern in slightly different ways.

To calculate Colless' I, we can use Equation 10.18. This result will depend strongly on tree size, and so is not comparable across trees of different sizes; to allow comparisons, *I*_{c} is usually standardized by subtracting the expected mean for trees of that size under an random model (see below), and dividing by the standard deviation. Both of these can be calculated analytically (Blum et al. 2006), and standardized *I*_{c} calculated using a small approximation (following Bortolussi et al. 2006) as:

\[ I^{'}_c = \frac{I_c-n*log(n)-n(\gamma-1-log(2))}{n} \label{11.11} \]

Since the test statistics are based on descriptions of patterns in trees rather than particular processes, the relationship between imbalance and evolutionary processes can be difficult to untangle! But all tree balance indices allow one to reject the null hypothesis that the tree was generated under a birth-death model. Actually, the expected patterns of tree balance are absolutely identical under a broader class of models called "Equal-Rates Markov" (ERM) models (Harding 1971; Mooers and Heard 1997). ERM models specify that diversification rates (both speciation and extinction) are equal across all lineages for any particular point in time. However, those rates may or may not change through time. If they don't change through time, then we have a constant rate birth-death model, as described above - so birth-death models are ERM models. But ERM models also include, for example, models where birth rates slow through time, or extinction rates increase through time, and so on. As long as the changes in rates occur in exactly the same way across all lineages at any time, then all of these models predict exactly the same pattern of tree balance.

Typical steps for using tree balance indices to test the null hypothesis that the tree was generated under an ERM model are as follows:

- Calculate tree balance using a tree balance statistic.
- Simulate pure birth trees to general a null distribution of the test statistic. We are considering the set of ERM models as our null, but since pure-birth is simple and still ERM we can use it to get the correct null distribution.
- Compare the actual test statistic to the null distribution. If the actual test statistic is in the tails of the null distribution, then your data deviates from an ERM model.

Step 2 is unnecessary in cases where we know null distributions for tree balance statistics analytically, true for some (but not all) balance metrics (e.g. Blum and François 2006). There are also some examples in the literature of considering null distributions other than ERM. For example, Mooers and Heard (1997) consider two other null models, PDA and EPT, which consider different statistical distributions of tree shapes (but both of these are difficult to tie to any particular evolutionary process).

Typically, phylogenetic trees are more imbalanced than expected under the ERM model. In fact, this is one of the most robust generalizations that one can make about macroevolutionary patterns in phylogenetic trees. This deviation means that diversification rates vary among lineages in the tree of life. We will discuss how to quantify and describe this variation in later chapters. These tests are all similar in that they use multiple non-nested comparisons of species richness in sister clades to calculate a test statistic, which is then compared to a null distribution, usually based on a constant-rates birth-death process (reviewed in Vamosi and Vamosi 2005; Paradis 2012).

As an example, we can apply the whole-tree balance approach to the tree of *Lupinus* (Drummond et al. 2012). For this tree, which has 137 tips, we calculate *I*_{c} = 1010 and *I*_{c}^{′} = 3.57. This is much higher than expected by chance under an ERM model, with *P* = 0.0004. That is, our tree is significantly more imbalanced than expected under a ERM model, which includes both pure birth and birth-death. We can safely conclude that there is variation in speciation and/or extinction rates across lineages in the tree.