Pseudodomonas syringae: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 220: Line 220:


   Assembler: AMOScmp
   Assembler: AMOScmp
 
 
   all contigs
   all contigs
   desc    #elem  min    max    mean          stdev          sum
   desc    #elem  min    max    mean          stdev          sum
Line 233: Line 233:
   90    237    20      577910  27510.48        71767.85        6519985
   90    237    20      577910  27510.48        71767.85        6519985
   100  187    20      577910  34862.83        91691.51        6519350
   100  187    20      577910  34862.83        91691.51        6519350
 
 
   chromo contigs
   chromo contigs
   desc    #elem  min    max    mean            stdev          sum
   desc    #elem  min    max    mean            stdev          sum
Line 246: Line 246:
   90      64      35      577910  99957.57        109269.83      6397285
   90      64      35      577910  99957.57        109269.83      6397285
   100    40      46      577910  159930.32      139830.19      6397213
   100    40      46      577910  159930.32      139830.19      6397213
 
 
   all gaps
   all gaps
   desc    #elem  min    max    mean    stdev  sum
   desc    #elem  min    max    mean    stdev  sum
Line 259: Line 259:
   90    181    1      1716    103.78  235.41  18785
   90    181    1      1716    103.78  235.41  18785
   100  147    1      1716    131.05  288.69  19265
   100  147    1      1716    131.05  288.69  19265
 
 
   chromo gaps
   chromo gaps
   desc    #elem  min    max    mean    stdev  sum
   desc    #elem  min    max    mean    stdev  sum

Revision as of 19:51, 21 February 2008

Pseudomonas syringae pv. tomato str. DC3000


Data

Originally sequenced and finished at TIGR: published Sept 2003

NCBI

 AA: no assembly
 TA 80,959 reads 
 Genome Project
 Taxonomy TaxId=223283

Chromosome + 2 plasmids:

 Name           Length    %GC    Info
 NC_004578.1    6,397,126 58.40  chromosome
 NC_004633.1    73,661    55.15  plasmid pDC3000A
 NC_004632.1    67,473    56.17  plasmid pDC3000B
 total          6,538,260
 Little similarity between the chromosome and plasmids.
 The 2 plasmids share a significant amount of DNA; see /fs/szasmg2/Bacteria/Pseudomonas_syringae/Data/nucmer/NC_004633-NC_004632.png

UNC: Jeff Dangl

New sequence:

Read stats

 Type   File                            #reads            min     median  max     sum             mean    stdev   n50
 Solexa DC3000.reads.filtered.fasta     6,340,136         32      32      32      202884352       32      0       32
 454p   DC3000.format.454Reads.fna      123,992           38      86      329     15623908        126.01  58.89   142
 454    DC3000.TCA.454reads.format.fna  77,466            35      244     371     18627363        240.46  26.85   245 
 * Solexa 3 lanes; 
 * 454 shotgun 1/4 Plate (250bp read); 
 * 454 paired ends 1/4 Plate : 
     * contain a 44 bp linker in the middle
     * the linker sequence is: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC
     * there are some (not many) 454 paired end sequences that contain multiple instances of the linker (tandem): Example EUEIEUN01ANUGL_length=128_xy=0154_1891 
 
 
 Quality values are missing for all data sets!!!
 I assigned default qual=3 to all the base (.frg & .afg files)  

UNC sequence data: (not avail any more?)

 http://biology622.dhcp.unc.edu/~labweb/DCData/

UNC (e-mail):

 * Theoretical minimum number of contigs we can obtain is 268 (our reads fail to cover 269 nucleotides). 
 * Our de novo assembly spans the genome in 853 contigs totaling 6,313,026 bp. 
 * 98.7% of the genome is covered by a contig; 
 * 84% of the genome is covered by contigs 10,000 bp or greater. 
 * The average gap size between contigs is 98 bp; 
 * average contig size 7401 bp. 
 * The N50 = 37,444 bp. 
 * Our largest BAMBUS "scaffold" is 2,565,761 bp

Files location:

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Data
 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly

Assemblies

454 AMOScmp

 /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/454/2007_1015_AMOSCmp-relaxed
 no trimming; 
 AMOScmp -D MINCLUSTER=20 -D MAXTRIM=10 -D MAJORITY=50 ...
 Stats:
 desc            #elem   min     max     mean    stdev   sum
 contigs         6131    43      8261    966.57  829.44  5926089
 pos_gaps        5622    1       10394   110.32  283.78  620259

 Slight improvement by doing alignment based trimming of the 454 reads

Solexa AMOScmp

 /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/2008_0116_AMOSCmp-relaxed
 Duplication if ALIGNWIGGLE=15
 Align all reads (Solexa) to the reference using nucmer. 
 6340136 reads
 5641782 (88.98%) aligned by nucmer -c 20 -l 20
 3453618 (54.47%) aligned by nucmer -c 32 -l 20
 2707005 (42.69%) aligned by nucmer -c 32 -l 32
 AMOScmp -D MAJORITY=50 -D MINOVL=5 -D MINCLUSTER=20 -D ALIGNWIGGLE=2 ...
 Stats:
 desc            #elem   min     max     mean            stdev           sum
 contigs         187     20      577910  34862.83        91691.51        6519350
 pos_gaps        147     1       1716    131.05          288.69          19265

Solexa maq

 /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/2008_0213_maq/maq
 Stats:
 desc            #elem   min     max     mean            stdev           sum
 contigs         106     32      2067205 61489.83        230284.47       6517923
 pos_gaps        104     1       3278    195.54          511.06          20337

454 + Solexa AMOScmp

 AMOScmp -D MINCLUSTER=20 -D MAXTRIM=20 -D MINOVL=5 -D MAJORITY=50 -D ALIGNWIGGLE=2 ...
 Stats:
 desc            #elem   min     max     mean            stdev           sum
 contigs         139     20      1895644 46899.82        243273.92       6519075
 pos_gaps        124     1       1809    156.78          323.66          19441

454 + Solexa + 454p AMOScmp

 Only the 454 paired ends that contain 1 single complete adaptor sequence were used (allmost all)
 149 contigs; very similar to the prev ome

Sanger AMOScmp

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_AMOSCmp-relaxed
 Many miss-oriented mates in the 4.8M-5M region of the chromosome
 22 contigs
 Chromosome
 Chromosome problem

Sanger Celera 3.11

 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1011_WGA
 22 scaff, 46 contigs, 181 degens
 Scaffold 7180000001443 looks circular: possible 163,074 bp plasmid
 aligns to 4.8M-5M "problem" region in the chromosome
 7180000001443.png
     [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS]
 ===============================================================================================================================
        1   175592  |        1   175592  |   175592   175592  |   100.00  |   175592   175592  |   100.00   100.00  | 7180000001443   7180000001443   [IDENTITY]
        1    12519  |   163075   175592  |    12519    12518  |    99.98  |   175592   175592  |     7.13     7.13  | 7180000001443   7180000001443   [BEGIN]
   163075   175592  |        1    12519  |    12518    12519  |    99.98  |   175592   175592  |     7.13     7.13  | 7180000001443   7180000001443   [END]


     [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q] | [TAGS]
 ===============================================================================================================================
  4790727  4911492  |   120764        1  |   120766   120764  |    99.98  |  6397126   175592  |     1.89    68.78  | gi|28867243|ref|NC_004578.1|    7180000001443
  4898971  4955870  |   175592   118697  |    56900    56896  |    99.98  |  6397126   175592  |     0.89    32.40  | gi|28867243|ref|NC_004578.1|    7180000001443

Sanger AMOScmp (Chromosome+3 plasmids ref)

 Reference=complete genome(chromosome+3 plasmids) use "circular contig" in Celera 3.11 assembly
 /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids
 38 contigs: 15 for main chromosome, 1 for longer plasmid, 21 for shorter plasmid, 1 for "circular contig"
 The missoriented read pile corresponding to the chromosome (4. AMOSCmp of Sanger reads) has dissapeared
 AA ready for submission: /fs/szasmg2/Bacteria/Pseudodomonas_syringae/Assembly/Sanger/2007_1012_AMOSCmp-relaxed-3plasmids/AA/umd-20071030-141700.tar.gz

Solexa assembled at different read coverages

 Location: /fs/szasmg2/Bacteria/Pseudomonas_syringae/Assembly/Solexa/sample/
 
 Several assemblies, using 10%,20%, ... 100%, of the P. syringae Solexa reads. 
 These would correspond to 3X,6X ... 30X coverage 
 The read sampling was done randomly. One sample set for each coverage.

 Assembler: Sanger maq

 all contigs
 desc  #elem   min     max     mean            stdev           sum
 10    43136   32      7712    135.11          140.61          5828148
 20    11243   32      20190   570.01          686.5           6408705
 30    2972    32      27962   2185.32         2804.56         6494784
 40    1058    32      63125   6152.98         7871.7          6509855
 50    455     32      163430  14319.01        19663.15        6515153
 60    267     32      328882  24406.61        46172.62        6516567
 70    166     32      671064  39260.9         84200.42        6517311
 80    143     32      906652  45577.16        111875.19       6517535
 90    117     32      1433643 55708.4         164246.61       6517883
 100   106     32      2067205 61489.83        230284.47       6517923
 
 chromo contigs
 desc  #elem   min     max     mean            stdev           sum
 10    42845   32      1845    133.32          118.36          5712348
 20    11124   32      9650    565.41          625.32          6289649
 30    2876    32      26076   2216.64         2714.92         6375063
 40    965     32      63125   6621.71         7893.19         6389957
 50    362     32      163430  17665.19        20565.31        6394800
 60    167     32      328882  38299.32        53660.75        6395987
 70    75      257     671064  85287.52        108858.19       6396564
 80    49      940     906652  130546.42       160470.1        6396775
 90    25      42603   1433643 255877.72       277650.54       6396943
 100   18      42603   2067205 355387.77       465907.88       6396980
 
 all gaps
 desc  #elem   min     max     mean    stdev   sum
 10    43137   1       3874    16.46   38.01   710112
 20    11242   1       3919    11.52   64.43   129555
 30    2971    1       3418    14.63   114.29  43476
 40    1056    1       3873    26.89   196.7   28405
 50    454     1       3415    50.89   291.04  23107
 60    265     1       3870    81.86   380.9   21693
 70    165     1       3868    126.96  486.88  20949
 80    141     1       3414    146.98  461.06  20725
 90    115     1       3418    177.19  520.11  20377
 100   104     1       3278    195.54  511.06  20337
 
 chromo gaps
 desc  #elem   min     max     mean    stdev   sum
 10    42846   1       240     15.98   16.33   684778
 20    11125   1       146     9.66    9.72    107477
 30    2876    1       76      7.67    7.73    22063
 40    965     1       58      7.42    7.8     7169
 50    362     1       48      6.42    7.08    2326
 60    167     1       58      6.82    7.63    1139
 70    76      1       55      7.39    7.9     562
 80    49      1       55      7.16    10.08   351
 90    25      1       45      7.31    10.12   183
 100   18      1       55      8.11    13.62   146

 Assembler: AMOScmp
 
 all contigs
 desc    #elem   min     max     mean          stdev           sum
 10    61330   20      9181    97.08           101.04          5954113
 20    18764   20      19803   343.93          431.9           6453593
 30    5723    20      28103   1137.41         1498.76         6509417
 40    2045    20      33780   3186.49         4337.72         6516385
 50    859     20      90346   7588.66         11436.97        6518661
 60    479     20      219894  13609.97        22470.18        6519176
 70    319     20      289494  20436.94        37964.34        6519384
 80    246     20      385663  26502.45        61309.04        6519605
 90    237     20      577910  27510.48        71767.85        6519985
 100   187     20      577910  34862.83        91691.51        6519350
 
 chromo contigs
 desc    #elem   min     max     mean            stdev           sum
 10      60923   20      1052    95.8            79.21           5836796
 20      18583   20      4800    340.81          368.44          6333397
 30      5567    22      20245   1147.53         1401.05         6388303
 40      1883    20      33780   3396.09         4327.93         6394855
 50      699     24      90346   9151.31         12023.55        6396771
 60      313     29      219894  20437.56        25135.49        6396957
 70      155     32      289494  41271.96        45893.6         6397154
 80      82      28      385663  78014.63        85498.69        6397200
 90      64      35      577910  99957.57        109269.83       6397285
 100     40      46      577910  159930.32       139830.19       6397213
 
 all gaps
 desc    #elem   min     max     mean    stdev   sum
 10    45068   1       2228    14.89   26.44   671499
 20    11034   1       3148    10.81   49.56   119340
 30    2816    1       2296    16.57   106.54  46663
 40    1022    1       1903    25.98   125.51  26559
 50    456     1       1716    46.91   159.75  21394
 60    294     1       1445    68.35   189.57  20097
 70    221     1       1716    88.33   225.56  19523
 80    182     1       1716    105.12  244.55  19132
 90    181     1       1716    103.78  235.41  18785
 100   147     1       1716    131.05  288.69  19265
 
 chromo gaps
 desc    #elem   min     max     mean    stdev   sum
 10    44767   1       197     14.45   14.86   647093
 20    10884   1       1008    9.02    17.01   98181
 30    2677    1       2296    9.88    70.77   26464
 40    869     1       685     7.8     24.09   6786
 50    303     1       59      6.55    6.88    1986
 60    137     1       33      7.35    7.22    1007
 70    65      1       33      6.83    6.9     444
 80    27      1       36      8.37    9.74    226
 90    18      1       42      8.83    11.44   159
 100   10      1       33      10.7    12.58   107