Fraxinus pennsylvanica genome assembly [accession PE_00248]

We have made available here the preliminary genome assembly of Fraxinus pennsylvanica [accession PE_00248], assembled by Laura Kelly and Endymion Cooper, using materials provided by Jennifer Koch (USDA Forest Service). These data are as yet unpublished. If you want to publish any analysis of these data you must either wait until we have published them in a journal, or contact Richard Buggs to negotiate a co-authored paper.

Released on 23rd February 2017. The genome of Fraxinus pennsylvanica was sequenced to a depth of approximately 41x on the Illumina NextSeq and HiSeq platforms. Paired reads for libraries made from total genomic DNA, with approximate average insert size of 500bp, were adapter trimmed and length and quality filtered. De novo assembly of the filtered read pairs, with a minimum read length of 50 nt, was conducted in the CLC Genomics Workbench under the following parameter settings: automatic optimization of word (k-mer) size; maximum size of bubble to try to resolve=5000; minimum contig length=200bp. As total genomic DNA was sequenced and assembled, contigs in the assembly include those that originate from the organellar genomes, as well as those from the nuclear genome. The assembly contained a single contig representing the Illumina PhiX control library; this contig was removed from the assembly. Assembled contigs were joined to form scaffolds using SSPACE (version 3.0) with default parameters, incorporating data from mate-pair libraries with 3kb and 10kb insert sizes. Library insert lengths were specified with a broad error range (ie ±40%). Gaps in the SSPACE scaffolds were filled using GapCloser (version 1.12) with default parameters. The average library insert lengths were specified using the estimates produced by SSPACE during scaffolding.


Assembly statistics
Number of scaffolds 555,484
Assembly size (Mbp) 902.5
Estimated genome size (Mbp) 893
# scaffolds > 1000 bp 74,643
# scaffolds > 10000 bp 20,214
Largest scaffold (bp) 524,151
Smallest scaffold (bp) 200
GC (%) 33.2
N50 (bp) 18,659
L50 11,707
Ns (%) 11.4
Complete BUSCOs [% searched] 1,266 [87.8%]
Complete and single-copy BUSCOs 1,045 [72.6%]
Complete and duplicated BUSCOs 221 [15.3%]
Fragmented BUSCOs 42 [2.9%]
Missing BUSCOs 132 [9.2%]

As a public service, preliminary sequences of this genome are being made available before scientific publication. The purpose of this policy is to balance the desire that the ash genomes be made available to the scientific community as soon as possible with the reasonable expectation that the group responsible for the sequencing will publish their results in peer reviewed journals without concerns about potential pre-emption by other groups that did not directly participate in the effort.

These pre-publication data are preliminary and may contain errors. The goal of our policy is that early release should enable the progress of science. By accessing these data, you agree not to submit to scientific journals any articles containing analyses of these data data prior to peer-reviewed journal publication by us and our collaborators of a comprehensive genome analysis.

Any analyses involving data are included in this data usage policy, including annotation of genes, identification of sets of genomic features such as genes, gene families, regulatory elements, repeat structures, GC content, etc., and whole-genome comparisons of regions of among-species conservation. Also included are uses of the genome data as a reference for transcriptomic analyses (RNA seq, bisulfite seq, chip seq or similar). Interested parties are encouraged to contact the the principal investigator if they wish to discuss the possibility of collaborative publication of such analyses.

The data may be freely downloaded and used by all who respect the restrictions in the previous paragraphs. In the period before the peer-reviewed journal publication the assembly and raw sequence reads should not be redistributed or repackaged without permission of Richard Buggs.

Once moved to unreserved status, the data will be freely available for any subsequent use.

By downloading these data you are agreeing to the terms outlined above.

Proceed to data download