ECToBlob infers a tree of blobs under the network
multispecies coalescent (NMSC) model. The resulting unrooted tree
represents only the cut edges of the underlying species network, that
is, edges whose removal disconnects the network.
More complex structures, such as “blobs,” are collapsed into multifurcations (polytomies). In this way, the tree highlights where evolutionary relationships are tree-like and where they are not.
This makes the tree of blobs useful for network inference. It
isolates regions where reticulation has occurred. Researchers can then
investigate each blob separately, possibly focusing on smaller subsets
of taxa. While current methods may not be able to resolve complex blobs,
ECToBlob is statistically consistent under the NMSC model,
regardless of the unknown internal blob structure.
ECToBlob requires two inputs:
genedata)A collection of gene trees on a common set of taxa. These are typically obtained by:
Although gene trees are themselves inferred, ECToBlob
treats them as observed data. The root and branch lengths (if present)
are ignored.
Example:
ECToblob
either as a collection of gene trees in a phylo object or as the path
and filename of a document containing gene trees in Newick format, with
each gene tree on a separate row. Alternatively, the data may be
supplied as a table obtained by tallying quartets and conducting
statistical tests. Such a table can be generated from the output of
several functions, including a previous run of ECToBlob (or
NANUQ or TINNIK). Providing the latter is
usually faster, as it avoids the need to again tally quartets and
perform some statistical tests.tree)A tree assumed to be a resolution of the true tree of blobs. Such a tree can be produced by ASTRAL or TREE-QMC since, for sufficiently large datasets, trees inferred by these methods have been shown to be resolutions of the true tree of blobs.
MSCquartets contains the ASTRAL tree from the gene trees
of Lescroart et al. (2023):
If the user wants to explore the package but has not yet computed a suitable reference tree (for e.g., from ASTRAL or TOB-QMC), one can construct a tree for exploratory purposes using a quartet distance implemented in the MSCquartets package:
Note that, while there is no guarantee that the tree obtained from
the function quartetDistTree is a resolution of the tree of
blobs, in practice, it often gives the same output as ASTRAL. We
therefore suggest that the quartetDistTree tree should only
be used for exploratory purposes, and not for final analyses.
ECToBlob begins by counting quartet topologies across
gene trees and applying hypothesis tests to each quartet. These tests
evaluate fit to:
p_startree) →
p_T1A basic analysis is run by:
out <- ECToBlob(tableLeopardusLescroart, AstralLeopard_tree, alpha = 0.05)
#> Applying hypothesis tests for T1 model to 1820 quartets.
#> Applying hypothesis test for star tree model to 1820 quartets.
#> Contracting edges with star test at level beta = 0.8
pTable <- out$pTableThe function produces several plots:
p_star) → A plot of the input tree
after collapsing those edges for which the data does not support a
resolution. (This tree may be the same as the input tree.)For large datasets (many taxa or gene trees), computing the quartet
table is the most time-consuming step. Saving pTable for
reuse is recommended.
Each edge in the input tree is assigned a p-value by combining the p-values of quartets that support it. The set of quartet p-values combined for a given edge can be determined in three ways:
"bi": bipartition quartets"quad": quadripartition quartets"mul": multipartition quartets (updated during
contraction)Once the quartet p-values defining an edge have been determined, they can be combined in four different ways to account for multiple testing:
Bonferroni)Cauchy)Here we show examples of different choices of both multiple testing correction and quartet p-value selection.
out_1 <- ECToBlob(pTable, AstralLeopard_tree, alpha=0.05, qType="mul", testCorrection="Bon", plot=0)
#> Not recomputing T1 p-values for displayed trees.
#> Not recomputing p_star values.
#> Contracting edges with star test at level beta = 0.8
out_2 <- ECToBlob(pTable, AstralLeopard_tree, alpha=0.05, qType="quad", testCorrection="Cauchy", plot=0)
#> Not recomputing T1 p-values for displayed trees.
#> Not recomputing p_star values.
#> Contracting edges with star test at level beta = 0.8
out_3 <- ECToBlob(pTable, AstralLeopard_tree, alpha=0.05, qType="bi", testCorrection="BBC", plot=0)
#> Not recomputing T1 p-values for displayed trees.
#> Not recomputing p_star values.
#> Contracting edges with star test at level beta = 0.8
out_4 <- ECToBlob(pTable, AstralLeopard_tree, alpha=0.05, qType="mul", testCorrection="CBC", plot=0)
#> Not recomputing T1 p-values for displayed trees.
#> Not recomputing p_star values.
#> Contracting edges with star test at level beta = 0.8These choices may yield different results and should be explored.
The parameter \(\alpha\)
(alpha) determines which tree in the sequence will be
selected. In contrast, the choice of \(\beta\) (beta, test level for
collapsing edges showing no resolution) influences which edges will
initially be contracted as not consistent with any resolution.
Succinctly,
To select a tree based on \(\alpha\), one has two options, though often they will agree:
indexEarly: first tree where the tree
p-value exceeds \(\alpha\)indexLate: first tree where the tree
p-value and all later tree p-values exceeds \(\alpha\)One can extract the trees associated with these indices from the output as follows:
Any reported ECToBlob analysis should include:
qType)testCorrection)$indexEarly) or late
($indexLate).Exploring multiple parameter settings is strongly recommended.