Pseudodomonas syringae: Difference between revisions
Jump to navigation
Jump to search
(→CBCB) |
(→CBCB) |
||
Line 118: | Line 118: | ||
<span style="color:red">Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid | <span style="color:red">Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid | ||
aligns to 4.8M-5M "problem" region in the chromosome</span> | aligns to 4.8M-5M "problem" region in the chromosome</span> | ||
[[Media:Pseudodomonas_syringae.Sanger.CA.circ_contig.7180000001443.png|7180000001443.png]] | |||
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] | [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] |
Revision as of 15:20, 30 November 2007
Pseudomonas syringae pv. tomato str. DC3000
Originally sequenced and finished at TIGR: published Sept 2003
Data
NCBI
AA: no assembly TA 80,959 reads Genome Project Taxonomy TaxId=223283
Chromosome + 2 plasmids:
Name Length %GC NC_004578.1 6,397,126 58.40 NC_004633.1 73,661 55.15 NC_004632.1 67,473 56.17
UNC: Jeff Dangl
New sequence:
* Solexa 3 lanes;
* 454 shotgun 1/4 Plate (250bp read);
* 454 paired ends 1/4 Plate :
* contain a 44 bp linker in the middle
* the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC
* there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891
UNC sequence data: (not avail any more?)
http://biology622.dhcp.unc.edu/~labweb/DCData/
UNC (e-mail):
* Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides). * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp. * 98.7% of the genome is covered by a contig; * 84% of the genome is covered by contigs 10,000 bp or greater. * The average gap size between contigs is 98 bp; * average contig size 7401 bp. * The N50 = 37,444 bp. * Our largest BAMBUS "scaffold" is 2,565,761 bp,
Data stats
. #elem min median max sum mean stdev n50
DC3000.format.454Reads.fna 123,992 38 86 329 15623908 126.01 58.89 142 DC3000 Paired End Reads (forward+linkerr+reverse)
DC3000.TCA.454reads.format.fna 77,466 35 244 371 18627363 240.46 26.85 245 DC3000 454 Reads
DC3000.reads.filtered.fasta 6,340,136 32 32 32 202884352 32 0 32 DC3000 Solexa Reads
DC3000Plasmids.fa 2 67473 73661 73661 141134 70567 3094 73661 Pseudomonas syringae pv. tomato DC3000 Plasmids
Psudomonas_syringae.fa 1 6397126 6397126 6397126 6397126 6397126 0 6397126 Pseudomonas syringae pv. tomato DC3000 reference
Quality values are missing for all data sets!!!
I assigned default qual=3 to all the base (.frg & .afg files)
Files location:
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Data /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly
Assemblies
CBCB
1. AMOSCmp
454 single reads + Solexa reads /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1009_AMOSCmp-relaxed 142 contigs (37 negative gaps, 89 positive gaps) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. => 22 large contigs 454 single reads + 30 bp Solexa reads => 167 contigs , 49 negative gaps, 100 positive gaps 454 single reads + 25 bp Solexa reads => 293 contigs, 144 negative gaps, 131 positive gaps
2. AMOSCmp
454 single reads + Solexa reads + 454 paired ends Only the 454 paired ends that contain 1 single complete adaptor sequence were used (allmost all) /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454-454p/2007_1011_AMOSCmp-relaxed-filtered 149 contigs; very similar to the prev ome
3. AMOSCmp (MAJORITY=50)
454 single reads + Solexa reads /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1015_AMOSCmp-relaxed-MAJORITY50 131 contigs (18 negative gaps) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 -m 50 No read trimming was done. "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. "-m 20" merges some contigs together => 10 large contigs
contig# len gc% 4 2290968 59.00 7 1817904 58.18 3 1405326 58.08 5 648413 58.48 2 192413 57.86 6 87152 58.02 131 71251 56.47 1 32939 54.86 130 29120 59.36 9 20309 53.56 95 3589 59.46
4. AMOSCmp
Sanger reads /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed Many miss-oriented mates in the 4.8M-5M region of the chromosome 22 contigs Chromosome Chromosome problem
5. Celera 3.11
Sanger reads /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA 22 scaff, 46 contigs, 181 degens Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid aligns to 4.8M-5M "problem" region in the chromosome 7180000001443.png
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] =============================================================================================================================== 1 175592 | 1 175592 | 175592 175592 | 100.00 | 175592 175592 | 100.00 100.00 | 7180000001443 7180000001443 [IDENTITY] 1 12519 | 163075 175592 | 12519 12518 | 99.98 | 175592 175592 | 7.13 7.13 | 7180000001443 7180000001443 [BEGIN] 163075 175592 | 1 12519 | 12518 12519 | 99.98 | 175592 175592 | 7.13 7.13 | 7180000001443 7180000001443 [END]
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] =============================================================================================================================== 4790727 4911492 | 120764 1 | 120766 120764 | 99.98 | 6397126 175592 | 1.89 68.78 | gi|28867243|ref|NC_004578.1| 7180000001443 4898971 4955870 | 175592 118697 | 56900 56896 | 99.98 | 6397126 175592 | 0.89 32.40 | gi|28867243|ref|NC_004578.1| 7180000001443
6. AMOSCmp (Chromosome+3 plasmids ref)
Sanger reads Reference=complete genome(chromosome+3 plasmids) use "circular contig" in Celera 3.11 assembly /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids 38 contigs: 15 for main chromosome, 1 for longer plasmid, 21 for shorter plasmid, 1 for "circular contig" The missoriented read pile corresponding to the chromosome (4. AMOSCmp of Sanger reads) has dissapeared AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz
Solexa assemblied for different read coverages
qc stats for Solexa assemblies done at different coverage levels
cvg: 30,27,24...3
contig.summary :::::::::::::: %reads #elem #elem0 #elem<0 min median max sum mean stdev n50 100 5502 0 0 32 338 32148 7296600 1326.17 2157.6 3714 90 6463 0 0 32 330 25252 7252304 1122.13 1799.43 3009 80 7570 0 0 32 303 20690 7209479 952.38 1487.03 2573 70 9030 0 0 32 309 26306 7170384 794.06 1219.53 1986 60 10571 0 0 32 295 22249 7124996 674.01 961.22 1608 50 12598 0 0 32 274 22204 7075934 561.67 767.55 1266 40 15343 0 0 32 252 9176 7011485 456.98 575.64 934 30 21248 0 0 32 202 7751 6931907 326.24 376.06 597 20 38702 0 0 32 117 3276 6807914 175.91 178.92 278 10 84545 0 0 32 56 2652 6267925 74.14 57.62 90 positiveGaps.summary :::::::::::::: %reads #elem #elem0 #elem<0 min median max sum mean stdev n50 100 117 10 0 0 22 3065 19625 167.74 418.75 1308 90 130 16 0 0 19 2100 19725 151.73 369.07 1211 80 142 18 0 0 15 2174 20034 141.08 361.86 1209 70 178 15 0 0 9 3417 20443 114.85 395.13 1823 60 263 35 0 0 6 3875 21161 80.46 345.97 1457 50 450 64 0 0 4 3398 22305 49.57 278.39 1823 40 1047 156 0 0 4 3398 26488 25.3 173.77 929 30 2915 446 0 0 4 3426 39094 13.41 115.74 104 20 11154 1324 0 0 5 3420 110485 9.91 57.22 19 10 44751 3321 0 0 9 3875 631930 14.12 35.45 25