Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Your vote was cast
Thank you for your feedback
Thank you for your feedback
Planet, Paul J.
KeywordComputer Science Applications
Horizontal gene transfer
MetadataShow full item record
AbstractBackground: Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. Findings: In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. Conclusions: Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.
CitationNarechania A, Baker R, DeSalle R, Mathema B, Kolokotronis SO, Kreiswirth B, Planet PJ. Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. Gigascience. 2016 Oct 24;5(1):44. doi: 10.1186/s13742-016-0152-3. PMID: 27776538; PMCID: PMC5078944.
The following license files are associated with this item:
- Creative Commons
- mILD: a tool for constructing and analyzing matrices of pairwise phylogenetic character incongruence tests.
- Authors: Planet PJ, Sarkar IN
- Issue date: 2005 Dec 15
- Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.
- Authors: Driebe EM, Sahl JW, Roe C, Bowers JR, Schupp JM, Gillece JD, Kelley E, Price LB, Pearson TR, Hepp CM, Brzoska PM, Cummings CA, Furtado MR, Andersen PS, Stegger M, Engelthaler DM, Keim PS
- Issue date: 2015
- The recombination dynamics of Staphylococcus aureus inferred from spA gene.
- Authors: Santos-Júnior CD, Veríssimo A, Costa J
- Issue date: 2016 Jul 11
- Semi-flocking algorithm for motion control of mobile sensors in large-scale surveillance systems.
- Authors: Semnani SH, Basir OA
- Issue date: 2015 Jan
- ClonalFrameML: efficient inference of recombination in whole bacterial genomes.
- Authors: Didelot X, Wilson DJ
- Issue date: 2015 Feb