Pseudodomonas syringae: Difference between revisions
		
		
		
		Jump to navigation
		Jump to search
		
| No edit summary | |||
| (86 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| '''Pseudomonas syringae  | '''Pseudomonas syringae pv. tomato str. DC3000''' | ||
| == Data == | |||
| Originally sequenced and finished at TIGR: published Sept 2003 | Originally sequenced and finished at TIGR: published Sept 2003 | ||
| NCBI | === NCBI === | ||
|    AA: no assembly |    AA: no assembly | ||
|    [ftp://ftp.ncbi.nih.gov/pub/TraceDB/pseudomonas_syringae_pv_tomato_str_dc3000/ TA] 80,959 reads   | |||
|    [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genomeprj&cmd=ShowDetailView&TermToSearch=15584  Genome Project] | |||
|   [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=223283 Taxonomy] TaxId=223283 | |||
| Chromosome + 2 plasmids: | |||
|   Name           Length    %GC    Info | |||
|   NC_004578.1    6,397,126 58.40  chromosome | |||
|   NC_004633.1    73,661    55.15  plasmid pDC3000A | |||
|   NC_004632.1    67,473    56.17  plasmid pDC3000B | |||
|   total          6,538,260 | |||
|   Little similarity between the chromosome and plasmids. | |||
|   The 2 plasmids share a significant amount of DNA; see /fs/szasmg2/Bacteria/Pseudomonas_syringae/Data/nucmer/NC_004633-NC_004632.png | |||
| UNC: | === UNC: Jeff Dangl === | ||
| New sequence: | New sequence: | ||
| Read stats | |||
|   Type               File                            #reads            min     median  max     sum             mean    stdev   n50 | |||
|   Solexa             DC3000.reads.filtered.fasta     6,340,136         32      32      32      202884352       32      0       32 | |||
|   454p(end+linker)   DC3000.format.454Reads.fna      123,992           38      86      329     15623908        126.01  58.89   142 | |||
|  454                DC3000.TCA.454reads.format.fna   77,466            35      244     371     18627363        240.46  26.85   245  | |||
|    * Solexa 3 lanes;   |    * Solexa 3 lanes;   | ||
|    * 454 shotgun 1/4 Plate (250bp read);   |    * 454 shotgun 1/4 Plate (250bp read);   | ||
| Line 16: | Line 38: | ||
|        * contain a 44 bp linker in the middle |        * contain a 44 bp linker in the middle | ||
|        * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC |        * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC | ||
|        * there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891   |        * <span style="color:red">there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891 </span> | ||
|   <span style="color:red"> | |||
|   Quality values are missing for all data sets!!! | |||
|   I assigned default qual=3 to all the base (.frg & .afg files)  </span> | |||
| UNC sequence data: | 454p | ||
| * Out of 123992 454 paired ends, 111028 (90%) align to linker (nucmer -c 20 -l 20) | |||
| * Non linked(end) sequences (5' & 3')  | |||
|           #elem   min     max     mean    median  n50     sum | |||
|   five    111028  0       265     37      21      61      4090475 | |||
|   three   111028  0       266     39      20      81      4385391 | |||
| * 20bp is the mode | |||
| * 75% of the end sequences are 19-21 bp long | |||
| * 67871 out of 111028 end pairs align within 5kbp | |||
|              #elem   min     max     mean    stdev   sum | |||
|   distance   67871   1       4991    2450    702     166283200 | |||
| UNC sequence data: (not avail any more?) | |||
|    http://biology622.dhcp.unc.edu/~labweb/DCData/ |    http://biology622.dhcp.unc.edu/~labweb/DCData/ | ||
| UNC  | UNC (e-mail):   | ||
|    * Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides).   |    * Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides).   | ||
|    * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp.   |    * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp.   | ||
| Line 29: | Line 68: | ||
|    * average contig size 7401 bp.   |    * average contig size 7401 bp.   | ||
|    * The N50 = 37,444 bp.   |    * The N50 = 37,444 bp.   | ||
|    * Our largest BAMBUS "scaffold" is 2,565,761 bp |    * Our largest BAMBUS "scaffold" is 2,565,761 bp | ||
| Files location: | Files location: | ||
| Line 43: | Line 74: | ||
|    /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly |    /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly | ||
| == Assemblies == | |||
| === 454 AMOScmp === | |||
|   /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/454/2007_1015_AMOSCmp-relaxed | |||
|   no trimming;  | |||
|   AMOScmp -D MINCLUSTER=20 -D MAXTRIM=10 -D MAJORITY=50 ... | |||
|   Stats: | |||
|   desc            #elem   min     max     mean    stdev   sum | |||
|   contigs         6131    43      8261    966.57  829.44  5926089 | |||
|   pos_gaps        5622    1       10394   110.32  283.78  620259 | |||
|   Slight improvement by doing alignment based trimming of the 454 reads | |||
| === Solexa AMOScmp === | |||
|   /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/2008_0116_AMOSCmp-relaxed | |||
|   Duplication if ALIGNWIGGLE=15 | |||
|   Align all reads (Solexa) to the reference using nucmer.  | |||
|   6340136 reads | |||
|   5641782 (88.98%) aligned by nucmer -c 20 -l 20 | |||
|   3453618 (54.47%) aligned by nucmer -c 32 -l 20 | |||
|   2707005 (42.69%) aligned by nucmer -c 32 -l 32 | |||
|   AMOScmp -D MAJORITY=50 -D MINOVL=5 -D MINCLUSTER=20 -D ALIGNWIGGLE=2 ... | |||
|   Stats: | |||
|   desc            #elem   min     max     mean            stdev           sum | |||
|   contigs         187     20      577910  34862.83        91691.51        6519350 | |||
|   pos_gaps        147     1       1716    131.05          288.69          19265 | |||
| === Solexa maq === | |||
|   /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/2008_0213_maq/maq | |||
|   Stats: | |||
|   desc            #elem   min     max     mean            stdev           sum | |||
|   contigs         106     32      2067205 61489.83        230284.47       6517923 | |||
|   pos_gaps        104     1       3278    195.54          511.06          20337 | |||
| === 454 + Solexa AMOScmp === | |||
|   Locations:  | |||
|     /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa-454/2008_1016_AMOSCmp-relaxed/  | |||
|     ftp://ftp.cbcb.umd.edu/pub/data/dpuiu/Pseudomonas_syringae/Solexa-454/  | |||
|   AMOScmp -D MINCLUSTER=20 -D MAXTRIM=20 -D MINOVL=5 -D MAJORITY=50 -D ALIGNWIGGLE=2 ... | |||
|   All stats: | |||
|   desc            #elem   min     max     mean            stdev           sum | |||
|   contigs         139     20      1895644 46899.82        243273.92       6519075 | |||
|   pos_gaps        124     1       1809    156.78          323.66          19441 | |||
|   Chromosome stats: | |||
|   desc            #elem   min     max     mean            stdev           sum | |||
|   contigs         8       85757   1895607 799498.75       692179.25       6395990 | |||
|   pos_gape        2       4       9       6.5             3.53            13 | |||
| === 454 + Solexa + 454p AMOScmp === | |||
|    Only the 454 paired ends that contain 1 single complete adaptor sequence were used (allmost all) | |||
|    149 contigs; very similar to the prev ome | |||
| === Sanger AMOScmp === | |||
|    /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed |    /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed | ||
|    Many miss-oriented mates in the 4.8M-5M region of the chromosome |    <span style="color:red">Many miss-oriented mates in the 4.8M-5M region of the chromosome</span> | ||
|   22 contigs | |||
|   [[Media:Pseudodomonas_syringae.Sanger.AMOSCmp.chromosome.png|Chromosome]] | |||
|   [[Media:Pseudodomonas_syringae.Sanger.AMOSCmp.chromosome_problem.png|Chromosome problem]] | |||
| === Sanger Celera 3.11 === | |||
|    /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA |    /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA | ||
|    Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid |    22 scaff, 46 contigs, 181 degens | ||
|   <span style="color:red">Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid | |||
|   aligns to 4.8M-5M "problem" region in the chromosome</span> | |||
|   [[Media:Pseudodomonas_syringae.Sanger.CA.circ_contig.7180000001443.png|7180000001443.png]] | |||
|       [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS] | |||
|   =============================================================================================================================== | |||
|          1   175592  |        1   175592  |   175592   175592  |   100.00  |   175592   175592  |   100.00   100.00  | 7180000001443   7180000001443   [IDENTITY] | |||
|          1    12519  |   163075   175592  |    12519    12518  |    99.98  |   175592   175592  |     7.13     7.13  | 7180000001443   7180000001443   [BEGIN] | |||
|     163075   175592  |        1    12519  |    12518    12519  |    99.98  |   175592   175592  |     7.13     7.13  | 7180000001443   7180000001443   [END] | |||
|       [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q] | [TAGS] | |||
|   =============================================================================================================================== | |||
|    4790727  4911492  |   120764        1  |   120766   120764  |    99.98  |  6397126   175592  |     1.89    68.78  | gi|28867243|ref|NC_004578.1|    7180000001443 | |||
|    4898971  4955870  |   175592   118697  |    56900    56896  |    99.98  |  6397126   175592  |     0.89    32.40  | gi|28867243|ref|NC_004578.1|    7180000001443 | |||
| === Sanger AMOScmp  (Chromosome+3 plasmids ref) === | |||
|   Reference=complete genome(chromosome+3 plasmids) use "circular contig" in Celera 3.11 assembly | |||
|   /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids | |||
|   38 contigs: 15 for main chromosome, 1 for longer plasmid, 21 for shorter plasmid, 1 for "circular contig" | |||
|   The missoriented read pile corresponding to the chromosome (4. AMOSCmp of Sanger reads) has dissapeared | |||
|   AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz | |||
| === Solexa assembled at different read coverages === | |||
|   Location: /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/sample/ | |||
|   Several assemblies, using 10%,20%, ... 100%, of the P. syringae Solexa reads.  | |||
|   These would correspond to 3X,6X ... 30X coverage  | |||
|   The read sampling was done randomly. One sample set for each coverage. | |||
| ---- | |||
|   Assembler: Sanger maq | |||
|   all contigs | |||
|   cvg   %reads  #ctgs   min     max     mean            stdev           sum | |||
|   3     10      43136   32      7712    135.11          140.61          5828148 | |||
|   6     20      11243   32      20190   570.01          686.5           6408705 | |||
|   9     30      2972    32      27962   2185.32         2804.56         6494784 | |||
|   12    40      1058    32      63125   6152.98         7871.7          6509855 | |||
|   15    50      455     32      163430  14319.01        19663.15        6515153 | |||
|   18    60      267     32      328882  24406.61        46172.62        6516567 | |||
|   21    70      166     32      671064  39260.9         84200.42        6517311 | |||
|   24    80      143     32      906652  45577.16        111875.19       6517535 | |||
|   27    90      117     32      1433643 55708.4         164246.61       6517883 | |||
|   30    100     106     32      2067205 61489.83        230284.47       6517923 | |||
|   chromo contigs | |||
|   cvg   %reads  #ctgs   min     max     mean            stdev           sum | |||
|   3     10      42845   32      1845    133.32          118.36          5712348 | |||
|   6     20      11124   32      9650    565.41          625.32          6289649 | |||
|   9     30      2876    32      26076   2216.64         2714.92         6375063 | |||
|   12    40      965     32      63125   6621.71         7893.19         6389957 | |||
|   15    50      362     32      163430  17665.19        20565.31        6394800 | |||
|   18    60      167     32      328882  38299.32        53660.75        6395987 | |||
|   21    70      75      257     671064  85287.52        108858.19       6396564 | |||
|   24    80      49      940     906652  130546.42       160470.1        6396775 | |||
|   27    90      25      42603   1433643 255877.72       277650.54       6396943 | |||
|   30    100     18      42603   2067205 355387.77       465907.88       6396980 | |||
|   all gaps | |||
|   cvg   %reads  #gaps   min     max     mean    stdev   sum | |||
|   3     10      43137   1       3874    16.46   38.01   710112 | |||
|   6     20      11242   1       3919    11.52   64.43   129555 | |||
|   9     30      2971    1       3418    14.63   114.29  43476 | |||
|   12    40      1056    1       3873    26.89   196.7   28405 | |||
|   15    50      454     1       3415    50.89   291.04  23107 | |||
|   18    60      265     1       3870    81.86   380.9   21693 | |||
|   21    70      165     1       3868    126.96  486.88  20949 | |||
|   24    80      141     1       3414    146.98  461.06  20725 | |||
|   27    90      115     1       3418    177.19  520.11  20377 | |||
|   30    100     104     1       3278    195.54  511.06  20337 | |||
|   chromo gaps | |||
|   cvg   %reads  #gaps   min     max     mean    stdev   sum | |||
|   3     10      42846   1       240     15.98   16.33   684778 | |||
|   6     20      11125   1       146     9.66    9.72    107477 | |||
|   9     30      2876    1       76      7.67    7.73    22063 | |||
|   12    40      965     1       58      7.42    7.8     7169 | |||
|   15    50      362     1       48      6.42    7.08    2326 | |||
|   18    60      167     1       58      6.82    7.63    1139 | |||
|   21    70      76      1       55      7.39    7.9     562 | |||
|   24    80      49      1       55      7.16    10.08   351 | |||
|   27    90      25      1       45      7.31    10.12   183 | |||
|   30    100     18      1       55      8.11    13.62   146 | |||
| ---- | |||
|   Assembler: AMOScmp | |||
|   all contigs | |||
|   cvg   %reads  #ctgs   min     max     mean            stdev           sum | |||
|   3     10      61330   20      9181    97.08           101.04          5954113 | |||
|   6     20      18764   20      19803   343.93          431.9           6453593 | |||
|   9     30      5723    20      28103   1137.41         1498.76         6509417 | |||
|   12    40      2045    20      33780   3186.49         4337.72         6516385 | |||
|   15    50      859     20      90346   7588.66         11436.97        6518661 | |||
|   18    60      479     20      219894  13609.97        22470.18        6519176 | |||
|   21    70      319     20      289494  20436.94        37964.34        6519384 | |||
|   24    80      246     20      385663  26502.45        61309.04        6519605 | |||
|   27    90      237     20      577910  27510.48        71767.85        6519985 | |||
|   30    100     187     20      577910  34862.83        91691.51        6519350 | |||
|   chromo contigs | |||
|   cvg   %reads  #ctgs   min     max     mean            stdev           sum | |||
|   3     10      60923   20      1052    95.8            79.21           5836796 | |||
|   6     20      18583   20      4800    340.81          368.44          6333397 | |||
|   9     30      5567    22      20245   1147.53         1401.05         6388303 | |||
|   12    40      1883    20      33780   3396.09         4327.93         6394855 | |||
|   15    50      699     24      90346   9151.31         12023.55        6396771 | |||
|   18    60      313     29      219894  20437.56        25135.49        6396957 | |||
|   21    70      155     32      289494  41271.96        45893.6         6397154 | |||
|   24    80      82      28      385663  78014.63        85498.69        6397200 | |||
|   27    90      64      35      577910  99957.57        109269.83       6397285 | |||
|   30    100     40      46      577910  159930.32       139830.19       6397213 | |||
|   all gaps | |||
|   cvg   %reads  #gaps   min     max     mean    stdev   sum | |||
|   3     10      45068   1       2228    14.89   26.44   671499 | |||
|   6     20      11034   1       3148    10.81   49.56   119340 | |||
|   9     30      2816    1       2296    16.57   106.54  46663 | |||
|   12    40      1022    1       1903    25.98   125.51  26559 | |||
|   15    50      456     1       1716    46.91   159.75  21394 | |||
|   18    60      294     1       1445    68.35   189.57  20097 | |||
|   21    70      221     1       1716    88.33   225.56  19523 | |||
|   24    80      182     1       1716    105.12  244.55  19132 | |||
|   27    90      181     1       1716    103.78  235.41  18785 | |||
|   30    100     147     1       1716    131.05  288.69  19265 | |||
|   chromo gaps | |||
|   cvg   %reads  #gaps   min     max     mean    stdev   sum | |||
|   3     10      44767   1       197     14.45   14.86   647093 | |||
|   6     20      10884   1       1008    9.02    17.01   98181 | |||
|   9     30      2677    1       2296    9.88    70.77   26464 | |||
|   12    40      869     1       685     7.8     24.09   6786 | |||
|   15    50      303     1       59      6.55    6.88    1986 | |||
|   18    60      137     1       33      7.35    7.22    1007 | |||
|   21    70      65      1       33      6.83    6.9     444 | |||
|   24    80      27      1       36      8.37    9.74    226 | |||
|   27    90      18      1       42      8.83    11.44   159 | |||
|   30    100     10      1       33      10.7    12.58   107 | |||
Latest revision as of 17:16, 28 May 2008
Pseudomonas syringae pv. tomato str. DC3000
Data
Originally sequenced and finished at TIGR: published Sept 2003
NCBI
AA: no assembly TA 80,959 reads Genome Project Taxonomy TaxId=223283
Chromosome + 2 plasmids:
Name Length %GC Info NC_004578.1 6,397,126 58.40 chromosome NC_004633.1 73,661 55.15 plasmid pDC3000A NC_004632.1 67,473 56.17 plasmid pDC3000B total 6,538,260
Little similarity between the chromosome and plasmids. The 2 plasmids share a significant amount of DNA; see /fs/szasmg2/Bacteria/Pseudomonas_syringae/Data/nucmer/NC_004633-NC_004632.png
UNC: Jeff Dangl
New sequence:
Read stats
Type File #reads min median max sum mean stdev n50 Solexa DC3000.reads.filtered.fasta 6,340,136 32 32 32 202884352 32 0 32 454p(end+linker) DC3000.format.454Reads.fna 123,992 38 86 329 15623908 126.01 58.89 142 454 DC3000.TCA.454reads.format.fna 77,466 35 244 371 18627363 240.46 26.85 245
 * Solexa 3 lanes; 
 * 454 shotgun 1/4 Plate (250bp read); 
 * 454 paired ends 1/4 Plate : 
     * contain a 44 bp linker in the middle
     * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC
     * there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891 
 
 
 Quality values are missing for all data sets!!!
 I assigned default qual=3 to all the base (.frg & .afg files)  
454p
- Out of 123992 454 paired ends, 111028 (90%) align to linker (nucmer -c 20 -l 20)
- Non linked(end) sequences (5' & 3')
#elem min max mean median n50 sum five 111028 0 265 37 21 61 4090475 three 111028 0 266 39 20 81 4385391
- 20bp is the mode
- 75% of the end sequences are 19-21 bp long
- 67871 out of 111028 end pairs align within 5kbp
#elem min max mean stdev sum distance 67871 1 4991 2450 702 166283200
UNC sequence data: (not avail any more?)
http://biology622.dhcp.unc.edu/~labweb/DCData/
UNC (e-mail):
* Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides). * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp. * 98.7% of the genome is covered by a contig; * 84% of the genome is covered by contigs 10,000 bp or greater. * The average gap size between contigs is 98 bp; * average contig size 7401 bp. * The N50 = 37,444 bp. * Our largest BAMBUS "scaffold" is 2,565,761 bp
Files location:
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Data /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly
Assemblies
454 AMOScmp
/fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/454/2007_1015_AMOSCmp-relaxed no trimming; AMOScmp -D MINCLUSTER=20 -D MAXTRIM=10 -D MAJORITY=50 ...
Stats: desc #elem min max mean stdev sum contigs 6131 43 8261 966.57 829.44 5926089 pos_gaps 5622 1 10394 110.32 283.78 620259 Slight improvement by doing alignment based trimming of the 454 reads
Solexa AMOScmp
/fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/2008_0116_AMOSCmp-relaxed Duplication if ALIGNWIGGLE=15
Align all reads (Solexa) to the reference using nucmer.
6340136 reads 5641782 (88.98%) aligned by nucmer -c 20 -l 20 3453618 (54.47%) aligned by nucmer -c 32 -l 20 2707005 (42.69%) aligned by nucmer -c 32 -l 32
AMOScmp -D MAJORITY=50 -D MINOVL=5 -D MINCLUSTER=20 -D ALIGNWIGGLE=2 ...
Stats: desc #elem min max mean stdev sum contigs 187 20 577910 34862.83 91691.51 6519350 pos_gaps 147 1 1716 131.05 288.69 19265
Solexa maq
/fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/2008_0213_maq/maq
Stats: desc #elem min max mean stdev sum contigs 106 32 2067205 61489.83 230284.47 6517923 pos_gaps 104 1 3278 195.54 511.06 20337
454 + Solexa AMOScmp
Locations: /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa-454/2008_1016_AMOSCmp-relaxed/ ftp://ftp.cbcb.umd.edu/pub/data/dpuiu/Pseudomonas_syringae/Solexa-454/
AMOScmp -D MINCLUSTER=20 -D MAXTRIM=20 -D MINOVL=5 -D MAJORITY=50 -D ALIGNWIGGLE=2 ...
All stats: desc #elem min max mean stdev sum contigs 139 20 1895644 46899.82 243273.92 6519075 pos_gaps 124 1 1809 156.78 323.66 19441
Chromosome stats: desc #elem min max mean stdev sum contigs 8 85757 1895607 799498.75 692179.25 6395990 pos_gape 2 4 9 6.5 3.53 13
454 + Solexa + 454p AMOScmp
Only the 454 paired ends that contain 1 single complete adaptor sequence were used (allmost all) 149 contigs; very similar to the prev ome
Sanger AMOScmp
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed Many miss-oriented mates in the 4.8M-5M region of the chromosome 22 contigs Chromosome Chromosome problem
Sanger Celera 3.11
/fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA 22 scaff, 46 contigs, 181 degens Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid aligns to 4.8M-5M "problem" region in the chromosome 7180000001443.png
     [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS]
 ===============================================================================================================================
        1   175592  |        1   175592  |   175592   175592  |   100.00  |   175592   175592  |   100.00   100.00  | 7180000001443   7180000001443   [IDENTITY]
        1    12519  |   163075   175592  |    12519    12518  |    99.98  |   175592   175592  |     7.13     7.13  | 7180000001443   7180000001443   [BEGIN]
   163075   175592  |        1    12519  |    12518    12519  |    99.98  |   175592   175592  |     7.13     7.13  | 7180000001443   7180000001443   [END]
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] =============================================================================================================================== 4790727 4911492 | 120764 1 | 120766 120764 | 99.98 | 6397126 175592 | 1.89 68.78 | gi|28867243|ref|NC_004578.1| 7180000001443 4898971 4955870 | 175592 118697 | 56900 56896 | 99.98 | 6397126 175592 | 0.89 32.40 | gi|28867243|ref|NC_004578.1| 7180000001443
Sanger AMOScmp (Chromosome+3 plasmids ref)
Reference=complete genome(chromosome+3 plasmids) use "circular contig" in Celera 3.11 assembly /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids 38 contigs: 15 for main chromosome, 1 for longer plasmid, 21 for shorter plasmid, 1 for "circular contig" The missoriented read pile corresponding to the chromosome (4. AMOSCmp of Sanger reads) has dissapeared AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz
Solexa assembled at different read coverages
Location: /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/sample/ Several assemblies, using 10%,20%, ... 100%, of the P. syringae Solexa reads. These would correspond to 3X,6X ... 30X coverage The read sampling was done randomly. One sample set for each coverage.
Assembler: Sanger maq all contigs cvg %reads #ctgs min max mean stdev sum 3 10 43136 32 7712 135.11 140.61 5828148 6 20 11243 32 20190 570.01 686.5 6408705 9 30 2972 32 27962 2185.32 2804.56 6494784 12 40 1058 32 63125 6152.98 7871.7 6509855 15 50 455 32 163430 14319.01 19663.15 6515153 18 60 267 32 328882 24406.61 46172.62 6516567 21 70 166 32 671064 39260.9 84200.42 6517311 24 80 143 32 906652 45577.16 111875.19 6517535 27 90 117 32 1433643 55708.4 164246.61 6517883 30 100 106 32 2067205 61489.83 230284.47 6517923
chromo contigs cvg %reads #ctgs min max mean stdev sum 3 10 42845 32 1845 133.32 118.36 5712348 6 20 11124 32 9650 565.41 625.32 6289649 9 30 2876 32 26076 2216.64 2714.92 6375063 12 40 965 32 63125 6621.71 7893.19 6389957 15 50 362 32 163430 17665.19 20565.31 6394800 18 60 167 32 328882 38299.32 53660.75 6395987 21 70 75 257 671064 85287.52 108858.19 6396564 24 80 49 940 906652 130546.42 160470.1 6396775 27 90 25 42603 1433643 255877.72 277650.54 6396943 30 100 18 42603 2067205 355387.77 465907.88 6396980
all gaps cvg %reads #gaps min max mean stdev sum 3 10 43137 1 3874 16.46 38.01 710112 6 20 11242 1 3919 11.52 64.43 129555 9 30 2971 1 3418 14.63 114.29 43476 12 40 1056 1 3873 26.89 196.7 28405 15 50 454 1 3415 50.89 291.04 23107 18 60 265 1 3870 81.86 380.9 21693 21 70 165 1 3868 126.96 486.88 20949 24 80 141 1 3414 146.98 461.06 20725 27 90 115 1 3418 177.19 520.11 20377 30 100 104 1 3278 195.54 511.06 20337
chromo gaps cvg %reads #gaps min max mean stdev sum 3 10 42846 1 240 15.98 16.33 684778 6 20 11125 1 146 9.66 9.72 107477 9 30 2876 1 76 7.67 7.73 22063 12 40 965 1 58 7.42 7.8 7169 15 50 362 1 48 6.42 7.08 2326 18 60 167 1 58 6.82 7.63 1139 21 70 76 1 55 7.39 7.9 562 24 80 49 1 55 7.16 10.08 351 27 90 25 1 45 7.31 10.12 183 30 100 18 1 55 8.11 13.62 146
Assembler: AMOScmp all contigs cvg %reads #ctgs min max mean stdev sum 3 10 61330 20 9181 97.08 101.04 5954113 6 20 18764 20 19803 343.93 431.9 6453593 9 30 5723 20 28103 1137.41 1498.76 6509417 12 40 2045 20 33780 3186.49 4337.72 6516385 15 50 859 20 90346 7588.66 11436.97 6518661 18 60 479 20 219894 13609.97 22470.18 6519176 21 70 319 20 289494 20436.94 37964.34 6519384 24 80 246 20 385663 26502.45 61309.04 6519605 27 90 237 20 577910 27510.48 71767.85 6519985 30 100 187 20 577910 34862.83 91691.51 6519350
chromo contigs cvg %reads #ctgs min max mean stdev sum 3 10 60923 20 1052 95.8 79.21 5836796 6 20 18583 20 4800 340.81 368.44 6333397 9 30 5567 22 20245 1147.53 1401.05 6388303 12 40 1883 20 33780 3396.09 4327.93 6394855 15 50 699 24 90346 9151.31 12023.55 6396771 18 60 313 29 219894 20437.56 25135.49 6396957 21 70 155 32 289494 41271.96 45893.6 6397154 24 80 82 28 385663 78014.63 85498.69 6397200 27 90 64 35 577910 99957.57 109269.83 6397285 30 100 40 46 577910 159930.32 139830.19 6397213
all gaps cvg %reads #gaps min max mean stdev sum 3 10 45068 1 2228 14.89 26.44 671499 6 20 11034 1 3148 10.81 49.56 119340 9 30 2816 1 2296 16.57 106.54 46663 12 40 1022 1 1903 25.98 125.51 26559 15 50 456 1 1716 46.91 159.75 21394 18 60 294 1 1445 68.35 189.57 20097 21 70 221 1 1716 88.33 225.56 19523 24 80 182 1 1716 105.12 244.55 19132 27 90 181 1 1716 103.78 235.41 18785 30 100 147 1 1716 131.05 288.69 19265
chromo gaps cvg %reads #gaps min max mean stdev sum 3 10 44767 1 197 14.45 14.86 647093 6 20 10884 1 1008 9.02 17.01 98181 9 30 2677 1 2296 9.88 70.77 26464 12 40 869 1 685 7.8 24.09 6786 15 50 303 1 59 6.55 6.88 1986 18 60 137 1 33 7.35 7.22 1007 21 70 65 1 33 6.83 6.9 444 24 80 27 1 36 8.37 9.74 226 27 90 18 1 42 8.83 11.44 159 30 100 10 1 33 10.7 12.58 107