Pseudodomonas syringae: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 97: | Line 97: | ||
The missoriented read pile corresponding to the chromosome (3. AMOSCmp of Sanger reads) has dissapeared | The missoriented read pile corresponding to the chromosome (3. AMOSCmp of Sanger reads) has dissapeared | ||
AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz | AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz | ||
== Solexa assemblied for different read coverages == |
Revision as of 18:42, 27 November 2007
Pseudomonas syringae pv. tomato str. DC3000
Originally sequenced and finished at TIGR: published Sept 2003
NCBI:
AA: no assembly TA 80,959 reads Genome Project Taxonomy TaxId=223283
UNC:
New sequence:
* Solexa 3 lanes; * 454 shotgun 1/4 Plate (250bp read); * 454 paired ends 1/4 Plate : * contain a 44 bp linker in the middle * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC * there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891
UNC sequence data:
http://biology622.dhcp.unc.edu/~labweb/DCData/
UNC assembly:
* Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides). * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp. * 98.7% of the genome is covered by a contig; * 84% of the genome is covered by contigs 10,000 bp or greater. * The average gap size between contigs is 98 bp; * average contig size 7401 bp. * The N50 = 37,444 bp. * Our largest BAMBUS "scaffold" is 2,565,761 bp,
Data stats
. #elem min median max sum mean stdev n50 DC3000.format.454Reads.fna 123992 38 86 329 15623908 126.01 58.89 142 DC3000 Paired End Reads DC3000.TCA.454reads.format.fna 77466 35 244 371 18627363 240.46 26.85 245 DC3000 454 Reads DC3000.reads.filtered.fasta 6340136 32 32 32 202884352 32 0 32 DC3000 Solexa Reads DC3000Plasmids.fa 2 67473 73661 73661 141134 70567 3094 73661 Pseudomonas syringae pv. tomato DC3000 Plasmids Psudomonas_syringae.fa 1 6397126 6397126 6397126 6397126 6397126 0 6397126 Pseudomonas syringae pv. tomato DC3000 reference
Files location:
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Data /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly
Best CBCB assemblies:
1. AMOSCmp
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1009_AMOSCmp-relaxed 142 contigs (37 negative gaps) based on the mix of 454 single reads + Solexa reads (no 454 paired ends) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. 22 large contigs
2. AMOSCmp
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1015_AMOSCmp-relaxed-MAJORITY50 131 contigs (18 negative gaps) based on the mix of 454 single reads + Solexa reads (no 454 paired ends) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 -m 50 No read trimming was done. "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. "-m 20" merges some contigs together 10 large contigs
contig# len gc% 4 2290968 59.00 7 1817904 58.18 3 1405326 58.08 5 648413 58.48 2 192413 57.86 6 87152 58.02 131 71251 56.47 1 32939 54.86 130 29120 59.36 9 20309 53.56 95 3589 59.46
3. AMOSCmp of Sanger reads
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed Many miss-oriented mates in the 4.8M-5M region of the chromosome
4. Celera 3.11 of Sanger reads
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid
5. AMOSCmp of Sanger reads
Reference=complete genome(chromosome+3 plasmids) + "circular contig" in Celera 3.11 assembly /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids 38 contigs: 15 for main chromosome, 1 for longer plasmid, 21 for shorter plasmid, 1 for "circular contig" The missoriented read pile corresponding to the chromosome (3. AMOSCmp of Sanger reads) has dissapeared AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz