Data in highlighted phylogenies
Lineage | Level | Taxa | Root |
---|---|---|---|
Viridiplantae | Genera | 513 | Klebsormidium nitens |
Liliopsida | Species | 182 | Amborella trichopoda |
Eudicots | Species | 817 | Amborella trichopoda |
Chlorophyta | Species | 83 | Klebsormidium nitens |
Fungi | Genera | 989 | Polychytrium aggregatum |
Ascomycota | Genera | 591 | Saitoella complicata |
Basidiomycota | Species | 782 | Wallemia mellicola |
Metazoa | Genera | 2405 | Amphimedon queenslandica |
Arthropoda | Genera | 993 | Limulus polyphemus |
Vertebrata | Genera | 1199 | Collorhinchus milii |
Bulk research data
Data file: Compiled genome statistics.
Description: NCBI metadata, assembly quality statistics, BUSCO annotation statistics and taxonomic information for all genomes analyzed.
Data file: CUSCO genes.
Description: Curated BUSCO gene set for 10 lineages that have upto 7% higher precision in annotations.
Data file: BUSCO gene length statistics.
Description: Annotated gene length summary statistics for all BUSCO genes in 10 lineages across all taxa.
Data files: Gene alignments.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Aligned fasta files of gene alignments done by Muscle v5.
Data file: Treefiles.
Description: Accounts for 3,566 computed trees are provided in the research description. There are three sets of trees.
1. One tree each per lineage computed with amino acid states ranging from 2-14 and alignment lengths ranging from 1,000-15,000.
2. For five selected lineages (Eudictos, Ascomycota, Basidiomcyota, Arthropoda, Vertebrata), 10 trees each from 5 sets of sampled sites at 9 total conditions (3 rates x 3 alignment lengths). Rate profiles were of amino acid states 2, 8 and 14. Alignment lengths were 1,000, 5,000 and 10,000.
3. Based on the results, sets of about 50-100 trees for all 10 lineages of the highest possible rate configurations. Amino acid states 8 and 14 were also included.
Data file: Treeset taxonomic congruity.
Description: For sets of 50 trees created under 9 experimental conditions (3 rates x 3 alignment lengths) for the 5 tested lineages, the extent of taxonomic congruity is measured by the number of families resolved as monophyletic by the phylogenies.
There are a total of 543 families that were tested.
Data file: Conserved gene blocks.
Description: For the 10 BUSCO lineages, colinear gene blocks with identified (true) and remnant (null) BUSCO genes that were found to be conserved across very long divergence times were extracted.
Search for gene blocks of up to about 8 genes were computationally feasible and the 10 gene blocks having the highest incidence have been cataloged.
Data file: Synteny plots of Oryza chromosomes.
Description: The Oryza genus was presented as a case-study demonstrating the utility of BUSCO syntenic information in assembly evaluations because of the presence of highly contigous reference assemblies for several species within the genus.
The synteny plots are split by chromosome.
Data files: Compleasm annotations for all assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled compleasm output for all assemblies. These have been referred to as true or identified genes.
Data files: Compleasm annotations for chromosome-level assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled compleasm output for chromosome-level assemblies. These assemblies are used during phyca (collinearity) analysis.
Data files: Compleasm annotations for all BUSCO-depleted assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled Compleasm output for all assemblies after deleting all BUSCO genes from the assemblies. These have been referred to as null or remnant genes.
Data files: Gene trees for comparisons to coalescent methods.
Eudicots
Ascomycota
Basidiomycota
Arthropoda
Vertebrata
Description: Gene trees were created for 5 lineages for comparing the BUSCO-concatenation method to multi-species coalescent methods, and for computing gene concordance factors.
Data files: BUSCO gene loss data.
Full table formatted in Excel.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: BUSCO gene loss events along the phylogenies. In the Excel file, each sheet represents a BUSCO lineage with genes in the horizontal axis and ordered leaves of the inferred phylogeny on the vertical axis. The cells contain sequence similarity to the ODB_v10 database query genes. The top 100 genes (57 for viridiplantae) with most frequently detected loss events are also visualized in pdf files for each lineage.