Bacillus anthracis: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Dpuiu (talk | contribs)
Dpuiu (talk | contribs)
 
(24 intermediate revisions by the same user not shown)
Line 9: Line 9:


* Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements
* Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements
* The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id


= Genome Projects =  
 
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=PubMed&Cmd=Retrieve&list_uids=12004073  TIGR Publication]
 
= Genome Projects (listed by NCBI) =  


[http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects]
[http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects]
Line 28: Line 32:
   1(+2)    101,379  Complete  2004/05/20  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=403  Ames Ancestor (Ames 0581)]  AA ; Insignia; pXO1, pXO2
   1(+2)    101,379  Complete  2004/05/20  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=403  Ames Ancestor (Ames 0581)]  AA ; Insignia; pXO1, pXO2
   42      86,181  Assembly  2004/06/04  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5239 A1055] AA ; Insignia
   42      86,181  Assembly  2004/06/04  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5239 A1055] AA ; Insignia
   1(+2)469 83,552  Assembly  2005/05/16  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5278 A2012] ??? not in Insignia; The  1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2
   1(+2)469 83,552  Assembly  2005/05/16  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5278 A2012] Insignia; The  1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2
   1        125,879  Complete  2002/05/16  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=299 Ames]  ??? complete but not in AA ; Insignia
   1        125,879  Complete  2002/05/16  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=299 Ames]  ??? complete but not in AA ; Insignia
   49      0        Assembly  2004/06/07  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5241 Australia 94]  ??? no TRACES ; Insignia
   49      0        Assembly  2004/06/07  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5241 Australia 94]  ??? no TRACES ; Insignia
Line 68: Line 72:
   AmesAncestor  Complete    5227419        181677    94830
   AmesAncestor  Complete    5227419        181677    94830
   Sterne        Complete    5228663        181654    96231
   Sterne        Complete    5228663        181654    96231
= Genome Projects (not listed by NCBI) =
== BCM ==
Data available "by request"
  Strains  reads    cvg  ctgs  N50ctg    AssemblyDate
  31-101  451,308  9.4  3,418  2,592    5-22-2006
  500      363,269  7.6  5,578  1,744    5-22-2006


= Strain Assemblies =
= Strain Assemblies =
Line 76: Line 90:
               RefId          Len    GC%
               RefId          Len    GC%
   chromosome  NZ_AAAC02000001 5093554 35.36
   chromosome  NZ_AAAC02000001 5093554 35.36
   pXO1        NC_003980      181677  32
   pXO1        NC_003980      181677  32             # 100%  identical to AmesAncestor pXO1
   pXO2        NC_003981      94829  33
   pXO2        NC_003981      94829  33             # 99.99% identical to AmesAncestor pXO2; 1 del


   chromosome
   chromosome
Line 83: Line 97:
   ctg  425    94      132589  11885  6855    24366  5051252 # 42,346 N's
   ctg  425    94      132589  11885  6855    24366  5051252 # 42,346 N's


* Publication: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=PubMed&Cmd=Retrieve&list_uids=12004073


Traces:
Traces:
Line 103: Line 116:
Summary
Summary
           #elem  min    max    mean    median  n50    sum
           #elem  min    max    mean    median  n50    sum
  scf    325    1001    245405  16824  1250    86284  5467834
   ctg    476    844    122753  11450  2025    33522  5450075  # larger N50 than the NCBI assembly
   ctg    476    844    122753  11450  2025    33522  5450075  # larger N50 than the NCBI assembly
   deg    1760    172    4878    852    818    882    1498894
   deg    1760    172    4878    852    818    882    1498894
  ctg+deg 2236    172    122753  3108    881    23707  6948969
 
  scf    325    1001    245405  16824  1250    86284  5467834


0cvg(no plasmids)
0cvg(no plasmids)
Line 117: Line 130:
   ctg-deg 841    1      3183    183    82      402    154114
   ctg-deg 841    1      3183    183    82      402    154114


Conclusion: there are some regions in the CA assembly not present in the reference
!!! there are some regions in the CA assembly not present in the reference
 
=== CA bog ===
 
          #elem  min    max    mean    median  n50    sum
  scf    334    930    282449  16611  1227    92676  5547969
  ctg    513    643    105647  10776  1630    36335  5528273
  deg    1898    172    8320    875    827    883    1659854
 
bog vs unitigger:
        count avg  max  N50  totalBases
scaff    +    -    +    +    +
ctg      +    -    -    +    +
deg      +    +    +    .    +
sur      +    -    +    .    +
utg      -    +    +    .    .
sing    -    .    .    .    .
 
=== AMOScmp-alignmentTrimmed ===
 
* Ref: Ames
        #elem  min    max    mean    median  n50    sum
  ctg  123    122    215418  42281  24322  98944  5200569
 
!!! larger contigs than NCBI/CA assembly
 
* Ref : A2012 pXO1 => 1ctg
* Ref : A2012 pXO2 => 1ctg & 2 snps


== Ames ==
== Ames ==
Line 137: Line 177:
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed


=== CA ===
=== CA ===


Output:
Output:
Line 146: Line 186:


* There are many stretched mates; no compressed ones !!!
* There are many stretched mates; no compressed ones !!!
0cvg
          #elem  min    max    mean    median  n50    sum
  1con    10      9      804    147    80      804    1467
  ctg-deg 224    1      2197    604    633    715    135220


Ctg 0 cvg regions:
Ctg 0 cvg regions:
Line 164: Line 209:
   NC_003997.3  2085561
   NC_003997.3  2085561
   NC_003997.3  3515380
   NC_003997.3  3515380
=== CA bog ===
          #elem  min    max    mean    median  n50    sum
  scf    94      1006    1158003 55534  14912  636683  5220223
  ctg    103    1006    594903  50630  15054  280568  5214932
  deg    501    70      15945  1044    609    1432    522839
 
0cvg
          #elem  min    max    mean    median  n50    sum
  1con    7      9      120    72.14  75      112    505
  ctg-deg 236    3      2197    607    632    713    143262
Ref breaks:
  NC_003997.3    145564
  NC_003997.3    627742
  NC_003997.3    1151234
  NC_003997.3    2085561
  NC_003997.3    3515380
bog vs unitigger:
        count avg  max  N50  totalBases
scaff    +    -    +    -    +
ctg      +    -    -    +    +
deg      +    -    -    .    +
sur      +    -    -    .    +
utg      +    -    +    .    .
sing    -    .    .    .    .


== Ames Ancestor ==
== Ames Ancestor ==
Line 171: Line 244:
   NC_007322.2    181677 32.53  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence
   NC_007322.2    181677 32.53  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence
   NC_007323.3    94830  33.04  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence
   NC_007323.3    94830  33.04  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence
  total          5503926


* Downloaded from AA and converted to bank
* Downloaded from AA and converted to bank
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293


=== CA ====
=== CA ===


           #elem  min    max    mean    median  n50    sum
           #elem  min    max    mean    median  n50    sum
Line 253: Line 327:
   ctg+singl      329    24      468316  17163  1103    296377  5646574
   ctg+singl      329    24      468316  17163  1103    296377  5646574


== Vollum ==
== Vollum (DOE) ==


NCBI data:
NCBI data:
Line 260: Line 334:
   ctg    52      311    812727  105547  29178  400992  5488459
   ctg    52      311    812727  105547  29178  400992  5488459


CA
=== CA ===
           #elem  min    max    mean    median  n50    sum
           #elem  min    max    mean    median  n50    sum
ctg+deg  57      693    1541457 96154  14641  422289  5480756
   ctg    39      1073    1541457 136843  51912  422289  5336858
   ctg    39      1073    1541457 136843  51912  422289  5336858
   deg    18      693    54711  7994    2244    54711  143898
   deg    18      693    54711  7994    2244    54711  143898
   scf    25      1440    1593252 214001  94958  676360  5350037
   scf    25      1440    1593252 214001  94958  676360  5350037


!!! Bigger contigs
* No alignment breaks vs NCBI assembly
* No alignment breaks vs NCBI assembly
=== CA bog ===
Same number of scf & ctg; fewer & longer unitigs than CA
== A0442 (LANL) ==
* no traces
* NCBI
        #elem  min    max    mean    median  n50    sum
  ctg  46      7229    1040654 116844  74960  223192  5374836
Some contigs align to Ames Ancestor pXO1 & pXO2
There are some "unique" regions in A0442 not present in Ames Ancestor

Latest revision as of 14:59, 23 September 2008

Background

  • 89 known strains
  • Most virulent: Ames(USA 2001), Vollum (WWII, biological weapon)
  • Benign: Sterne (used as vaccine)

Virulence factors that distinguish Bacillus anthracis from Bacillus cereus are encoded on two plasmids, pXO1 (anthrax toxin) and pXO2 (capsule genes). The capsule protects against phagocytosis once the vegetative bacterium enters the bloodstream. The anthrax toxin consists of 3 components, a protective antigen (PA), lethal factor (LF), and edema factor (EF). PA/LF and PA/EF complexes are internalized by host cells where the LF (metalloprotease) and EF (calmodulin-dependent adenylate cyclase) components act. At high levels LF induces cell death and release of the bacterium while EF increases host susceptibility to infection and promotes fluid accumulation in the cells. (NCBI)

  • Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements
  • The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id


Genome Projects (listed by NCBI)

NIH Genome Projects

 Center          Complete     Assembly    Progress  Total
 TIGR/JCVI       2            8           1         11
 LANL            0            6           0         6
 DOE             1            0           0         1 
 NMRC            0            0           2         2
 Total           3            14          3         20

TIGR/JCVI Strains

 Contigs  Traces   Status    Completed   Strain 
 0        96,532   Progress  .           A0039  
 62       67,600   Assembly  2007/07/25  Tsiankovskii-I AA ; ??? possible update
 1(+2)    101,379  Complete  2004/05/20  Ames Ancestor (Ames 0581)  AA ; Insignia; pXO1, pXO2
 42       86,181   Assembly  2004/06/04  A1055 AA ; Insignia
 1(+2)469 83,552   Assembly  2005/05/16  A2012  Insignia; The  1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2
 1        125,879  Complete  2002/05/16  Ames  ??? complete but not in AA ; Insignia
 49       0        Assembly  2004/06/07  Australia 94   ??? no TRACES ; Insignia
 30       90,308   Assembly  2004/06/04  CNEVA-9066 Insignia
 64       92,429   Assembly  2004/06/07  Kruger B AA ; Insignia
 52       103,144  Assembly  2004/06/04  Vollum Insignia
 44       95,078   Assembly  2004/06/07  Western North America USA6153 Insignia

LANL Strains

No traces in TA; none in Insignia

 Contigs  Traces   Status    Completed    Strain       
 60       0        Assembly  2008/04/08   A0174
 60       0        Assembly  2008/02/12   A0193                              
 68       0        Assembly  2008/03/24   A0389
 46       0        Assembly  2008/02/12   A0442
 57       0        Assembly  2008/03/24   A0465
 63       0        Assembly  2008/01/16   A0488

+ 2 plasmid genome projects (pX01,pX02) completed in 1999

DOE Strains

 Contigs  Traces   Status    Completed    Strain
 1        147,665  Complete  2004/06/24   Sterne ??? complete but not in AA; Insignia; pXO1, pXO2

Naval Medical Research Center

 Contigs  Traces   Status    Strain
 0        0        Progress  34F2(NMRC)
 0        0        Progress  34F2 delta gerH

Complete

 Strain        Status       chromosome      pXO1       pXO2
 A2012         Assembly     .               181677     94829 
 Ames          Complete     5227293         .          .
 AmesAncestor  Complete     5227419         181677     94830
 Sterne        Complete     5228663         181654     96231

Genome Projects (not listed by NCBI)

BCM

Data available "by request"

 Strains  reads    cvg   ctgs   N50ctg    AssemblyDate
 31-101   451,308  9.4   3,418  2,592     5-22-2006
 500      363,269  7.6   5,578  1,744     5-22-2006

Strain Assemblies

A2012

  • NCBI Genome
              RefId           Len     GC%
 chromosome   NZ_AAAC02000001 5093554 35.36
 pXO1         NC_003980       181677  32             # 100%   identical to AmesAncestor pXO1
 pXO2         NC_003981       94829   33             # 99.99% identical to AmesAncestor pXO2; 1 del
 chromosome
       #elem   min     max     mean    median  n50     sum
 ctg   425     94      132589  11885   6855    24366   5051252 # 42,346 N's


Traces:

  • 18,045 reads have qual. & 65,507 don't
 Libraries:
 Lib             Mean    Stdev           Count
 T13322          2000    600             32133
 T13323          4000    1200            31455
 1047127226559   2000    600             18036
 T10914          3000    900             1719
 T10930          10000   3000            150
 GBZH            4500    .               29
 ...
 Total                                   83553

CA

Summary

         #elem   min     max     mean    median  n50     sum
 scf     325     1001    245405  16824   1250    86284   5467834
 ctg     476     844     122753  11450   2025    33522   5450075   # larger N50 than the NCBI assembly
 deg     1760    172     4878    852     818     882     1498894


0cvg(no plasmids)

         #elem   min     max     mean    median  n50     sum
 1con    223     1       783     91.28   33      252     20355
 ctg-deg 973     1       10316   239     140     444     232946

0cvg(including plasmids)

         #elem   min     max     mean    median  n50     sum
 ctg-deg 841     1       3183    183     82      402     154114

!!! there are some regions in the CA assembly not present in the reference

CA bog

         #elem   min     max     mean    median  n50     sum
 scf     334     930     282449  16611   1227    92676   5547969
 ctg     513     643     105647  10776   1630    36335   5528273
 deg     1898    172     8320    875     827     883     1659854

bog vs unitigger:

        count avg  max  N50  totalBases
scaff    +    -    +    +    +
ctg      +    -    -    +    +
deg      +    +    +    .    +
sur      +    -    +    .    +
utg      -    +    +    .    .
sing     -    .    .    .    .

AMOScmp-alignmentTrimmed

  • Ref: Ames
       #elem   min     max     mean    median  n50     sum
 ctg   123     122     215418  42281   24322   98944   5200569

!!! larger contigs than NCBI/CA assembly

  • Ref : A2012 pXO1 => 1ctg
  • Ref : A2012 pXO2 => 1ctg & 2 snps

Ames

  • Complete
 NC_003997.3    5227293 35.38  Bacillus anthracis str. Ames, complete genome
  • not in AA

AMOScmp-alignmentTrimmed

  • no 0 cvg regions when factory trimmed reads aligned to it
  • -D LAYERR=90 => 1 piece
 ref=5227293 bp
 assembly=5227311
 amosvalidate=>1555 snps
 nucmer align of assembly to ref & filter -q => 287 snps
  • Many stretched & missoriented mates
  • Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed

CA

Output:

         #elem   min     max     mean    median  n50     sum       snps
 scf     59      1021    893362  83396   1970    593436  4920387
 ctg     67      1021    736364  73430   2483    280568  4919826   496
 deg     245     70      141988  2016    671     34515   493924
  • There are many stretched mates; no compressed ones !!!

0cvg

         #elem   min     max     mean    median  n50     sum
 1con    10      9       804     147     80      804     1467
 ctg-deg 224     1       2197    604     633     715     135220

Ctg 0 cvg regions:

 #id                   len     gc%     start   end     len     cvg
 ctg7180000001099      2637    35.04   439     2637    2198    0        33.97% gc ; blastn:96% identity to Bacilus cereus ; blastx: alpha/beta hydrolase & protein disulfide isomerase hits
 ctg7180000001288      28653   36.17   19125   19233   108     0        52.29% gc ; blastn: 100% identity to Ames Ancestor & Sterne
 ctg7180000001288      28653   36.17   28636   28653   17      0        aligned to other Bacillus anthracis strains
 ctg7180000001300      156978  35.35   1       161     160     0        46.58% gc ; clonning vector
 ctg7180000001302      70072   34.28   69464   70072   608     0        55.83% gc ; clonning vector

Deg 0 cvg regions:

 all have high GC% (>48.24%) probably cloning vector

Ref breaks:

 NC_003997.3   145564
 NC_003997.3   627742
 NC_003997.3   1151234
 NC_003997.3   2085561
 NC_003997.3   3515380

CA bog

         #elem   min     max     mean    median  n50     sum
 scf     94      1006    1158003 55534   14912   636683  5220223
 ctg     103     1006    594903  50630   15054   280568  5214932
 deg     501     70      15945   1044    609     1432    522839
 

0cvg

         #elem   min     max     mean    median  n50     sum
 1con    7       9       120     72.14   75      112     505
 ctg-deg 236     3       2197    607     632     713     143262

Ref breaks:

 NC_003997.3     145564
 NC_003997.3     627742
 NC_003997.3     1151234
 NC_003997.3     2085561
 NC_003997.3     3515380

bog vs unitigger:

        count avg  max  N50  totalBases
scaff    +    -    +    -    +
ctg      +    -    -    +    +
deg      +    -    -    .    +
sur      +    -    -    .    +
utg      +    -    +    .    .
sing     -    .    .    .    .

Ames Ancestor

  • Complete & in AA
 NC_007530.2    5227419 35.38  Bacillus anthracis str. 'Ames Ancestor', complete genome
 NC_007322.2    181677 32.53  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence
 NC_007323.3    94830  33.04  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence
 total          5503926
  • Downloaded from AA and converted to bank
  • Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293

CA

         #elem   min     max     mean    median  n50     sum
 ctg     13      1316    2750171 413057  187947  2750171 5369744
 deg     11      271     62530   13876   1627    62530   152635
 ctg+deg 24      271     2750171 230099  37452   2750171 5522379
 scf     11      1316    2750171 488185  212112  2750171 5370036

Ref: no alignment breaks, no 0cvg regions

Sterne

  • Complete but not in AA.
            RefId        Len        GC%
 chromosome NC_005945.1  5228663   35.38
 pXO1       NC_001496.1  181654    32
 pXO2       NC_002146.1  96231     33 

!!! the plasmids are not listed with the genome project

  • Traces available in TA. Looks like some reads are missing; I'm getting many 0 cvg regions when aligning the reads to the finished genome
       #elem   min     max     mean    median  n50     sum
 0cvg  53      8       4946    529     272     1508    28017

AMOScmp

  • version: June 12 2007
  • untrimmed reads => 358 ctg
  • 53 zero cvg regions, max is almost 5K

AMOScmp-alignmentTrimmed

  • reads are trimmed according to alignment coords
       #elem   min     max     mean    median  n50     sum
 ctg   46      736     468060  112792  49000   355370  5188425

CA

  • runCA-OBT.pl script
  • version 5.1
  • Output:
 #elem   min     max     mean    median  n50     sum
 ctg     204     1000    468299  26560   1310    189443  5418252
 deg     145     266     32820   1756    804     21949   254596
 ctg+deg 349     266     468299  16255   1146    181331  5672848
 scf     186     1000    671877  29170   1274    294809  5425585
 singleton  2418

Ctg 0 cvg regions:

 all have high GC% (>52.08%) probably cloning vector

Deg 0 cvg regions: actually they align to pXO1

 #id                             len     gc%
 deg7180000001258.1-297          297     31.65
 deg7180000001300.420-6468       6049    32.05
 deg7180000001254.1-28442        28442   33.01
 deg7180000001258.557-1060       504     34.52
 deg7180000001253.1-576          576     35.76
 deg7180000001300.6728-6824      97      36.08
 deg7180000001300.1-160          160     38.12
 ..

minimus2

Input:

                 #elem   min     max     mean    median  n50     sum
 AMOS            46      736     468060  112792  49000   355370  5188425
 CA              349     266     468299  16255   1146    181331  5672848
 AMOS+CA         395     266     468299  27497   1210    215336  10861273

Output:

                 #elem   min     max     mean    median  n50     sum
 ctg             47      931     468316  110527  37238   355370  5194753
 singl           282     24      32820   1602    1053    1341    451821
 ctg+singl       329     24      468316  17163   1103    296377  5646574

Vollum (DOE)

NCBI data:

         #elem   min     max     mean    median  n50     sum
 ctg     52      311     812727  105547  29178   400992  5488459

CA

         #elem   min     max     mean    median  n50     sum
 ctg     39      1073    1541457 136843  51912   422289  5336858
 deg     18      693     54711   7994    2244    54711   143898
 scf     25      1440    1593252 214001  94958   676360  5350037

!!! Bigger contigs

  • No alignment breaks vs NCBI assembly

CA bog

Same number of scf & ctg; fewer & longer unitigs than CA

A0442 (LANL)

  • no traces
  • NCBI
       #elem   min     max     mean    median  n50     sum
 ctg   46      7229    1040654 116844  74960   223192  5374836

Some contigs align to Ames Ancestor pXO1 & pXO2 There are some "unique" regions in A0442 not present in Ames Ancestor