Show simple item record

dc.contributor.authorNarechania, Apurva
dc.contributor.authorBaker, Richard
dc.contributor.authorDeSalle, Rob
dc.contributor.authorMathema, Barun
dc.contributor.authorKolokotronis, Sergios-Orestis
dc.contributor.authorKreiswirth, Barry
dc.contributor.authorPlanet, Paul J.
dc.date.accessioned2022-08-23T19:11:19Z
dc.date.available2022-08-23T19:11:19Z
dc.date.issued2016-10-24
dc.identifier.citationNarechania A, Baker R, DeSalle R, Mathema B, Kolokotronis SO, Kreiswirth B, Planet PJ. Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. Gigascience. 2016 Oct 24;5(1):44. doi: 10.1186/s13742-016-0152-3. PMID: 27776538; PMCID: PMC5078944.en_US
dc.identifier.eissn2047-217X
dc.identifier.doi10.1186/s13742-016-0152-3
dc.identifier.pmid27776538
dc.identifier.pii152
dc.identifier.urihttp://hdl.handle.net/20.500.12648/7491
dc.description.abstractBackground: Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. Findings: In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. Conclusions: Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.en_US
dc.description.sponsorshipNational Institute of Allergy and Infectious Diseases (US)en_US
dc.language.isoenen_US
dc.publisherOxford University Press (OUP)en_US
dc.relation.urlhttps://academic.oup.com/gigascience/article/5/1/s13742-016-0152-3/2737427en_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectComputer Science Applicationsen_US
dc.subjectHealth Informaticsen_US
dc.subjectData miningen_US
dc.subjectFlocking algorithmen_US
dc.subjectHorizontal gene transferen_US
dc.subjectRecombinationen_US
dc.subjectStaphylococcus aureusen_US
dc.subjectSwarmsen_US
dc.subjectUnsupervised clusteringen_US
dc.titleClusterflock: a flocking algorithm for isolating congruent phylogenomic datasetsen_US
dc.typeArticle/Reviewen_US
dc.source.journaltitleGigaScienceen_US
dc.source.volume5
dc.source.issue1
dc.description.versionVoRen_US
refterms.dateFOA2022-08-23T19:11:19Z
dc.description.institutionSUNY Downstateen_US
dc.description.departmentEpidemiology and Biostatisticsen_US
dc.description.degreelevelN/Aen_US


Files in this item

Thumbnail
Name:
gigascience.pdf
Size:
936.2Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's license is described as Attribution 4.0 International