Fraxinus uhdei genome assembly

We have made available here the preliminary genome assembly of Fraxinus uhdei, assembled by Laura Kelly and Endymion Cooper. These data are as yet unpublished. If you want to publish any analysis of these data you must either wait until we have published them in a journal, or contact Richard Buggs to negotiate a co-authored paper.

Released on 12th April 2017. The genome of Fraxinus uhdei was sequenced to a depth of approximately 51x (but see update below) on the Illumina HiSeq platform. Paired reads for libraries made from total genomic DNA, with approximate average insert sizes of 300bp, 500bp and 800bp, were adapter trimmed and length and quality filtered. De novo assembly of the filtered read pairs, with a minimum read length of 50 nt, was conducted in the CLC Genomics Workbench under the following parameter settings: automatic optimization of word (k-mer) size; maximum size of bubble to try to resolve=5000; minimum contig length=200bp. As total genomic DNA was sequenced and assembled, contigs in the assembly include those that originate from the organellar genomes, as well as those from the nuclear genome. The assembly contained a single contig representing the Illumina PhiX control library; this contig was removed from the assembly. Assembled contigs were joined to form scaffolds using SSPACE (version 3.0) with default parameters. Library insert lengths were specified with a broad error range (ie ±40%). Gaps in the SSPACE scaffolds were filled using GapCloser (version 1.12) with default parameters. The average library insert lengths were specified using the estimates produced by SSPACE during scaffolding.

Update 14th August 2017: a recent C-value estimate (generated by Alan Whittemore and colleagues at the US National Arboretum) indicates that the individual used for genome sequencing is a hexaploid; the estimated genome size in the table below has been updated to reflect this new C-value and the genome coverage is now estimated to be c. 18x.

 

Assembly statistics
Number of scaffolds 754,045
Assembly size (Mbp) 757.8
Estimated genome size (Mbp) c. 2873
# scaffolds > 1000 bp 171,284
# scaffolds > 10000 bp 5,848
Largest scaffold (bp) 196,289
Smallest scaffold (bp) 200
GC (%) 34.6
N50 (bp) 2,413
L50 71,645
Ns (%) 0.07
Complete BUSCOs [% searched] 1,121 [77.8%]
Complete and single-copy BUSCOs 971 [67.4%]
Complete and duplicated BUSCOs 150 [10.4%]
Fragmented BUSCOs 124 [8.6%]
Missing BUSCOs 195 [13.5%]

As a public service, preliminary sequences of this genome are being made available before scientific publication. The purpose of this policy is to balance the desire that the ash genomes be made available to the scientific community as soon as possible with the reasonable expectation that the group responsible for the sequencing will publish their results in peer reviewed journals without concerns about potential pre-emption by other groups that did not directly participate in the effort.

These pre-publication data are preliminary and may contain errors. The goal of our policy is that early release should enable the progress of science. By accessing these data, you agree not to submit to scientific journals any articles containing analyses of these data data prior to peer-reviewed journal publication by us and our collaborators of a comprehensive genome analysis.

Any analyses involving data are included in this data usage policy, including annotation of genes, identification of sets of genomic features such as genes, gene families, regulatory elements, repeat structures, GC content, etc., and whole-genome comparisons of regions of among-species conservation. Also included are uses of the genome data as a reference for transcriptomic analyses (RNA seq, bisulfite seq, chip seq or similar). Interested parties are encouraged to contact the the principal investigator if they wish to discuss the possibility of collaborative publication of such analyses.

The data may be freely downloaded and used by all who respect the restrictions in the previous paragraphs. In the period before the peer-reviewed journal publication the assembly and raw sequence reads should not be redistributed or repackaged without permission of Richard Buggs.

Once moved to unreserved status, the data will be freely available for any subsequent use.

By downloading these data you are agreeing to the terms outlined above.

Proceed to data download