Pseudodomonas syringae: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 45: | Line 45: | ||
DC3000Plasmids.fa 2 67473 73661 73661 141134 70567 3094 73661 Pseudomonas syringae pv. tomato DC3000 Plasmids | DC3000Plasmids.fa 2 67473 73661 73661 141134 70567 3094 73661 Pseudomonas syringae pv. tomato DC3000 Plasmids | ||
Psudomonas_syringae.fa 1 6397126 6397126 6397126 6397126 6397126 0 6397126 Pseudomonas syringae pv. tomato DC3000 reference | Psudomonas_syringae.fa 1 6397126 6397126 6397126 6397126 6397126 0 6397126 Pseudomonas syringae pv. tomato DC3000 reference | ||
Quality values are missing for all data sets!!! | |||
I assigned default qual=3 to all the base (.frg & .afg files) | |||
Files location: | Files location: |
Revision as of 21:14, 28 November 2007
Pseudomonas syringae pv. tomato str. DC3000
Originally sequenced and finished at TIGR: published Sept 2003
NCBI:
AA: no assembly TA 80,959 reads Genome Project Taxonomy TaxId=223283
Chromosome + 2 plasmids:
Name Length %GC NC_004578.1 6,397,126 58.40 NC_004633.1 73,661 55.15 NC_004632.1 67,473 56.17
UNC: Jeff Dangl
New sequence:
* Solexa 3 lanes; * 454 shotgun 1/4 Plate (250bp read); * 454 paired ends 1/4 Plate : * contain a 44 bp linker in the middle * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC * there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891
UNC sequence data: (not avail any more?)
http://biology622.dhcp.unc.edu/~labweb/DCData/
UNC assembly:
* Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides). * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp. * 98.7% of the genome is covered by a contig; * 84% of the genome is covered by contigs 10,000 bp or greater. * The average gap size between contigs is 98 bp; * average contig size 7401 bp. * The N50 = 37,444 bp. * Our largest BAMBUS "scaffold" is 2,565,761 bp,
Data stats
. #elem min median max sum mean stdev n50 DC3000.format.454Reads.fna 123,992 38 86 329 15623908 126.01 58.89 142 DC3000 Paired End Reads DC3000.TCA.454reads.format.fna 77,466 35 244 371 18627363 240.46 26.85 245 DC3000 454 Reads DC3000.reads.filtered.fasta 6,340,136 32 32 32 202884352 32 0 32 DC3000 Solexa Reads DC3000Plasmids.fa 2 67473 73661 73661 141134 70567 3094 73661 Pseudomonas syringae pv. tomato DC3000 Plasmids Psudomonas_syringae.fa 1 6397126 6397126 6397126 6397126 6397126 0 6397126 Pseudomonas syringae pv. tomato DC3000 reference
Quality values are missing for all data sets!!! I assigned default qual=3 to all the base (.frg & .afg files)
Files location:
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Data /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly
Best CBCB assemblies:
1. AMOSCmp
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1009_AMOSCmp-relaxed 142 contigs (37 negative gaps) based on the mix of 454 single reads + Solexa reads (no 454 paired ends) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. => 22 large contigs
2. AMOSCmp
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1015_AMOSCmp-relaxed-MAJORITY50 131 contigs (18 negative gaps) based on the mix of 454 single reads + Solexa reads (no 454 paired ends) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 -m 50 No read trimming was done. "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. "-m 20" merges some contigs together => 10 large contigs
contig# len gc% 4 2290968 59.00 7 1817904 58.18 3 1405326 58.08 5 648413 58.48 2 192413 57.86 6 87152 58.02 131 71251 56.47 1 32939 54.86 130 29120 59.36 9 20309 53.56 95 3589 59.46
3. AMOSCmp of Sanger reads
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed Many miss-oriented mates in the 4.8M-5M region of the chromosome 22 contigs Chromosome Chromosome problem
4. Celera 3.11 of Sanger reads
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA 22 scaff, 46 contigs, 181 degens Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid
5. AMOSCmp of Sanger reads
Reference=complete genome(chromosome+3 plasmids) + "circular contig" in Celera 3.11 assembly /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids 38 contigs: 15 for main chromosome, 1 for longer plasmid, 21 for shorter plasmid, 1 for "circular contig" The missoriented read pile corresponding to the chromosome (3. AMOSCmp of Sanger reads) has dissapeared AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz
Solexa assemblied for different read coverages
qc stats for Solexa assemblies done at different coverage levels
cvg: 30,27,24...3
contig.summary :::::::::::::: %reads #elem #elem0 #elem<0 min median max sum mean stdev n50 100 5502 0 0 32 338 32148 7296600 1326.17 2157.6 3714 90 6463 0 0 32 330 25252 7252304 1122.13 1799.43 3009 80 7570 0 0 32 303 20690 7209479 952.38 1487.03 2573 70 9030 0 0 32 309 26306 7170384 794.06 1219.53 1986 60 10571 0 0 32 295 22249 7124996 674.01 961.22 1608 50 12598 0 0 32 274 22204 7075934 561.67 767.55 1266 40 15343 0 0 32 252 9176 7011485 456.98 575.64 934 30 21248 0 0 32 202 7751 6931907 326.24 376.06 597 20 38702 0 0 32 117 3276 6807914 175.91 178.92 278 10 84545 0 0 32 56 2652 6267925 74.14 57.62 90 positiveGaps.summary :::::::::::::: %reads #elem #elem0 #elem<0 min median max sum mean stdev n50 100 117 10 0 0 22 3065 19625 167.74 418.75 1308 90 130 16 0 0 19 2100 19725 151.73 369.07 1211 80 142 18 0 0 15 2174 20034 141.08 361.86 1209 70 178 15 0 0 9 3417 20443 114.85 395.13 1823 60 263 35 0 0 6 3875 21161 80.46 345.97 1457 50 450 64 0 0 4 3398 22305 49.57 278.39 1823 40 1047 156 0 0 4 3398 26488 25.3 173.77 929 30 2915 446 0 0 4 3426 39094 13.41 115.74 104 20 11154 1324 0 0 5 3420 110485 9.91 57.22 19 10 44751 3321 0 0 9 3875 631930 14.12 35.45 25