Pseudodomonas syringae

From Cbcb
Revision as of 17:11, 30 October 2007 by Dpuiu (talk | contribs)
Jump to navigation Jump to search

Pseudomonas syringae strain DC 3000

Originally sequenced and finished at TIGR: published Sept 2003

NCBI:

 AA: no assembly
 TA 80,959 reads 
 Genome Project
 Taxonomy TaxId=223283

UNC:

New sequence:

 * Solexa 3 lanes; 
 * 454 shotgun 1/4 Plate (250bp read); 
 * 454 paired ends 1/4 Plate : 
     * contain a 44 bp linker in the middle
     * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC
     * there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891 

UNC sequence data:

 http://biology622.dhcp.unc.edu/~labweb/DCData/

UNC assembly:

 * Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides). 
 * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp. 
 * 98.7% of the genome is covered by a contig; 
 * 84% of the genome is covered by contigs 10,000 bp or greater. 
 * The average gap size between contigs is 98 bp; 
 * average contig size 7401 bp. 
 * The N50 = 37,444 bp. 
 * Our largest BAMBUS "scaffold" is 2,565,761 bp,

Data stats

 .                               #elem           min     median  max     sum             mean    stdev   n50
 DC3000.format.454Reads.fna      123992          38      86      329     15623908        126.01  58.89   142     DC3000 Paired End Reads
 DC3000.TCA.454reads.format.fna  77466           35      244     371     18627363        240.46  26.85   245     DC3000 454 Reads
 DC3000.reads.filtered.fasta     6340136         32      32      32      202884352       32      0       32      DC3000 Solexa Reads
 DC3000Plasmids.fa               2               67473   73661   73661   141134          70567   3094    73661   Pseudomonas syringae pv. tomato DC3000 Plasmids
 Psudomonas_syringae.fa          1               6397126 6397126 6397126 6397126         6397126 0       6397126 Pseudomonas syringae pv. tomato DC3000 reference

Files location:

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Data
 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly

Best CBCB assemblies:

1. AMOSCmp

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1009_AMOSCmp-relaxed
 142 contigs (37 negative gaps)
 based on the mix of 454 single reads + Solexa reads (no 454 paired ends) 
 No read trimming was done. 
 AMOScmp used the following parameters:
   nucmer -c  20
   casm-layout -t 20 -o 5
 "-t 20" allows for 20 bp long  dirty sequence ends which seem to solve the "low quality" problem.
 22 large contigs

2. AMOSCmp

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1015_AMOSCmp-relaxed-MAJORITY50 
 131 contigs  (18 negative gaps)
 based on the mix of 454 single reads + Solexa reads (no 454 paired ends) 
 No read trimming was done. 
 AMOScmp used the following parameters:
   nucmer -c  20
   casm-layout -t 20 -o 5 -m 50
 No read trimming was done. 
 "-t 20" allows for 20 bp long  dirty sequence ends which seem to solve the "low quality" problem.
 "-m 20" merges some contigs together
 10 large contigs
 contig#        len     gc%
 4              2290968 59.00
 7              1817904 58.18
 3              1405326 58.08
 5              648413  58.48
 2              192413  57.86
 6              87152   58.02
 131            71251   56.47
 1              32939   54.86
 130            29120   59.36
 9              20309   53.56
 95             3589    59.46

3. AMOSCmp of Sanger reads

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed
 Many miss-oriented mates in the 4.8M-5M region of the chromosome

4. Celera 3.11 of Sanger reads

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA
 Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid

5. AMOSCmp of Sanger reads

 Reference=complete genome(chromosome+3 plasmids) + "circular contig" in Celera 3.11 assembly
 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids
 38 contigs: 15 for main chromosome, 1 for longer plasmid, 21 for shorter plasmid, 1 for "circular contig"
 The missoriented read pile corresponding to the chromosome (3. AMOSCmp of Sanger reads) has dissapeared