Genome Assemblies for Fraxinus excelsior

BATG-0.5

Download Assembly

Released 29/10/2015, assembled by Lizzy Sollars. Paired reads with insert sizes of: 200bp, 300bp, 500b, 5kb and 454 reads were used to build contigs in CLC Genomics Workbench. Scaffolding was performed using SSPACE with all paired reads (those mentioned in addition to Long Jumping Distance libraries of 3, 8, 20 and 40 kbp). Gaps in the scaffolds were closed using GapCloser and further joining of scaffolds was done using PBJelly with the 454 reads. A major update in this version is that the chloroplast and mitochondrial genomes have been extracted from the nuclear genome; both are located at the end of the assembly file. The chloroplast genome is contained in one contig of 155,498 bp, named 'Cp1'. The mitochondrial genome is present as a draft version in 26 contigs, named 'Mt#' (1-26). Stats for the mitochondrial genome are shown in the table below, along with full stats of the whole genome assembly.

  Contigs Scaffolds Mt Genome
Number 118,959 89,514 26
Total size 718.4 Mbp 867.5 Mbp 580,788 bp
Longest 209,591 884,900 bp 184,534 bp
Shortest 326 326 bp 326 bp
Number > 1K nt 68,628 40,777 25
Number > 10K nt 18,860 10,151 11
Number > 100K nt 210 2,522 1
Mean size 6,039 9,691 22,338
Median size 1,228 911 6,487
N50 length 25,341 103,995 bp 60,627
L50 count 7,922 2,389 3
A 32.87% 27.22% 27.49%
C 17.13% 14.19% 22.50%
G 17.14% 14.19% 22.29%
T 32.85% 27.2% 27.67%
N 0.00% 17.19% 0.04%
CEGMA complete hits   208 genes (84%)
CEGMA partial hits   238 genes (96 %)

BATG-0.4-CLCbioSSPACE

Download Assembly

Released 11/11/2013, assembled by Lizzy Sollars. Since the last assembly release, we have improved the contiguity of the assembly by scaffolding the CLC contigs together using all paired read files, and lowering the SSPACE parameter '-k' (number of paired reads linking two contigs) to 7. This reduced the number of N nucleotides inserted into gaps between contigs, and SOAP's GapCloser reduced the number of N's still further.

  Contigs Scaffolds
Number 120,753 89,285
Total size 706,623,136 bp 875,243,685 bp
Longest 219,535 bp 696,341 bp
Shortest 500 bp 500 bp
Number > 1K nt 69,334 39,713
Number > 10K nt 19,491 10,818
Number > 100K nt 123 2,484
Mean size 5,852 bp 9,803 bp
Median size 1,246 bp 878 bp
N50 length 22,633 bp 98,766 bp
L50 count 8,788 2,526
A 32.87 % 26.54 %
C 17.13 % 13.83 %
G 17.14 % 13.84 %
T 32.85 % 26.52 %
N 0.00 % 19.27 %
CEGMA complete hits   220 genes (89 %)
CEGMA partial hits   241 genes (97 %)

 

BATG-0.3-CLCbioSSPACE

Download Assembly

Released 23/09/2013, assembled by Lizzy Sollars. This represents a significant improvement on our previous release: the total size is now closer to the 877 Mbp size of the genome measured by flow cytometry, and the N50 length is five times longer than the N50 of the previous release. Reads from paired libraries of insert sizes: 200bp, 300bp, 500bp, and Long-Jumping Distance (LJD) libraries of 3kb, 8kb, 20kb and 40kb, were trimmed to a minimum Phred quality score of 20, a minimum length of 50bp, and were also trimmed of any adaptor and repetitive telomere sequences. Reads were assembled de novo using CLC bio with a word size (k-mer) of 50, into contigs with a minimum length of 500bp. The LJD pairs and 454 contigs assembled using Newbler (BATG-0.1-Newbler) were used as guidance reads to resolve ambiguities in the de bruijn graphs. Contigs were then scaffolded using the stand-alone tool SSPACE, and gaps in the scaffolds were closed using the GapCloser program from SOAP.

  Contigs Scaffolds
Number 180,582 142,021
Total size 719,081,378 bp 982,425,322 bp
Longest 174,200 bp 560,578 bp
Shortest 500 bp 500 bp
Number > 1K nt 96,147 (53.2%) 60,768 (42.8%)
Number > 10K nt 19,217 (10.6%) 14,113 (9.9%)
Mean size 3,982 bp 6,917 bp
Median size 1,084 bp 866 bp
N50 length 14,766 bp 68,494 bp
L50 count 12,859 3,996
A 32.89 % 24.07 %
C 17.13 % 12.54 %
G 17.13 % 12.54 %
T 32.85 % 24.05 %
N 0.00 % 26.81 %
CEGMA complete hits   214 genes (86 %)
CEGMA partial hits   242 genes (98 %)

 

BATG-0.2-CLCbio

Download Assembly

Released 11/06/2013, assembled by Lizzy Sollars. Illumina 100bp reads were trimmed on two parameters: bases with a phred score of less than 15 were trimmed from the ends, and reads less than 50 bp long were then discarded from the dataset. Paired reads from a 200bp insert library were joined if they overlapped by at least 20bp. A de novo assembly was performed with the CLC assembler using a word size (k-mer) of 64, using trimmed reads from 200bp, 300bp, and 500bp insert size Illumina libraries to construct the de bruijn graph. Reads from an 8Kb LJD library and the 454 contigs (assembly BATG-0.1-Newbler) were used to resolve ambiguities in the graph. All paired-end reads were used for scaffolding. The gaps in the scaffolds were then filled using the GapCloser program from SOAP, which approximately halved the number of 'N' nucleotides.

  Contigs Scaffolds
Number 358,807 283,188
Total size 1,214,865,037 bp 1,469,483,817 bp
Longest 141,171 bp 221,212 bp
Shortest 56 bp 344 bp
Number > 1K nt 197,676 (55.1%) 163,534 (57.7%)
Number > 10K nt 33,093 (9.2%) 45,273 (16.0%)
Mean size 3,386 bp 5,189 bp
Median size 1,156 bp 1,266 bp
N50 length 9,493 bp 14,228 bp
L50 count 35,755 30,458
A 32.86 % 27.16 %
C 17.14 % 14.17 %
G 17.13 % 14.16 %
T 32.86 % 27.17 %
N 0.01 % 17.34 %
CEGMA complete hits   189 genes (76 %)
CEGMA partial hits   237 genes (96 %)

 

BATG-0.1-Newbler

Download Assembly

Released 22/04/2013. This is the first genome assembly release of the British Ash Tree Genome project. It is based on 4.3X coverage of the ash genome by Roche 454 sequencing, and assembled using Newbler and the CLC Genome Finishing Module by Lizzy Sollars. The options used in Newbler to make this assembly were: '-sl 32', '-urt', '-m', '-e 5'. Ten other assemblies with different options were carried out in Newbler, and this assembly was selected on the basis of highest contig N50 length and highest number of complete hits of core eukaryote genes using CEGMA (a search for 248 ultra-conserved core eukaryote genes). Statistics for this assembly are as follows:

Number of contigs 417,760
Total size of contigs 618,360,624 bp
Longest contig 51,710 bp
Shortest contig 100 bp
Number of contigs > 1K nt 209,375 (50.1%)
Number of contigs > 10K nt 1549 (0.4%)
Mean contig size 1,480 bp
Median contig size 1,004 bp
N50 contig length 2,412 bp
L50 contig count 73,440
A 32.60 %
C 17.38 %
G 17.15 %
T 32.86 %
N 0.00 %
CEGMA complete hits 127 genes (51 %)
CEGMA partial hits 220 genes (89 %)

 

We ask all users of these assemblies to adhere to a data usage policy that can be found here.