Pseudodomonas syringae: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 45: | Line 45: | ||
Best CBCB assembly: | Best CBCB assembly: | ||
1. | |||
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1009_AMOSCmp-relaxed | /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1009_AMOSCmp-relaxed | ||
142 contigs (37 negative gaps) | 142 contigs (37 negative gaps) | ||
based on the mix of 454 single reads + Solexa reads (no 454 paired ends) | based on the mix of 454 single reads + Solexa reads (no 454 paired ends) | ||
No read trimming was done. | |||
AMOScmp used the following parameters: | AMOScmp used the following parameters: | ||
nucmer -c 20 | nucmer -c 20 | ||
casm-layout -t 20 -o 5 | casm-layout -t 20 -o 5 | ||
No read trimming was done. | "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. | ||
22 large contigs | |||
2. | |||
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1015_AMOSCmp-relaxed-MAJORITY50 | |||
131 contigs (18 negative gaps) | |||
based on the mix of 454 single reads + Solexa reads (no 454 paired ends) | |||
No read trimming was done. | |||
AMOScmp used the following parameters: | |||
nucmer -c 20 | |||
casm-layout -t 20 -o 5 -m 50 | |||
No read trimming was done. | |||
"-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. | |||
"-m 20" merges some contigs together | |||
10 large contigs | |||
contig# len gc% | |||
4 2290968 59.00 | |||
7 1817904 58.18 | |||
3 1405326 58.08 | |||
5 648413 58.48 | |||
2 192413 57.86 | |||
6 87152 58.02 | |||
131 71251 56.47 | |||
1 32939 54.86 | |||
130 29120 59.36 | |||
9 20309 53.56 | |||
95 3589 59.46 |
Revision as of 17:04, 16 October 2007
Pseudomonas syringae strain DC 3000
Originally sequenced and finished at TIGR: published Sept 2003
NCBI:
AA: no assembly TA: ftp://ftp.ncbi.nih.gov/pub/TraceDB/pseudomonas_syringae_pv_tomato_str_dc3000/ : 80,959 reads Genome Project: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genomeprj&cmd=ShowDetailView&TermToSearch=15584
UNC:
New sequence:
* Solexa 3 lanes; * 454 shotgun 1/4 Plate (250bp read); * 454 paired ends 1/4 Plate : * contain a 44 bp linker in the middle * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC * there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891
UNC sequence data:
http://biology622.dhcp.unc.edu/~labweb/DCData/
UNC assembly:
* Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides). * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp. * 98.7% of the genome is covered by a contig; * 84% of the genome is covered by contigs 10,000 bp or greater. * The average gap size between contigs is 98 bp; * average contig size 7401 bp. * The N50 = 37,444 bp. * Our largest BAMBUS "scaffold" is 2,565,761 bp,
Data stats
. #elem min median max sum mean stdev n50 DC3000.format.454Reads.fna 123992 38 86 329 15623908 126.01 58.89 142 DC3000 Paired End Reads DC3000.TCA.454reads.format.fna 77466 35 244 371 18627363 240.46 26.85 245 DC3000 454 Reads DC3000.reads.filtered.fasta 6340136 32 32 32 202884352 32 0 32 DC3000 Solexa Reads DC3000Plasmids.fa 2 67473 73661 73661 141134 70567 3094 73661 Pseudomonas syringae pv. tomato DC3000 Plasmids Psudomonas_syringae.fa 1 6397126 6397126 6397126 6397126 6397126 0 6397126 Pseudomonas syringae pv. tomato DC3000 reference
Files location:
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Data /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly
Best CBCB assembly:
1.
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1009_AMOSCmp-relaxed 142 contigs (37 negative gaps) based on the mix of 454 single reads + Solexa reads (no 454 paired ends) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. 22 large contigs
2.
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Solexa-454/2007_1015_AMOSCmp-relaxed-MAJORITY50 131 contigs (18 negative gaps) based on the mix of 454 single reads + Solexa reads (no 454 paired ends) No read trimming was done. AMOScmp used the following parameters: nucmer -c 20 casm-layout -t 20 -o 5 -m 50 No read trimming was done. "-t 20" allows for 20 bp long dirty sequence ends which seem to solve the "low quality" problem. "-m 20" merges some contigs together 10 large contigs
contig# len gc% 4 2290968 59.00 7 1817904 58.18 3 1405326 58.08 5 648413 58.48 2 192413 57.86 6 87152 58.02 131 71251 56.47 1 32939 54.86 130 29120 59.36 9 20309 53.56 95 3589 59.46