Bacillus anthracis: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Dpuiu (talk | contribs)
No edit summary
Dpuiu (talk | contribs)
 
(91 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Genome Projects =  
= Background =
 
* 89 known strains
* Most virulent: Ames(USA 2001), Vollum (WWII, biological weapon)
* Benign: Sterne (used as vaccine)
 
''Virulence factors that distinguish Bacillus anthracis from Bacillus cereus are encoded on two plasmids, pXO1 (anthrax toxin) and pXO2 (capsule genes). The capsule protects against phagocytosis once the vegetative bacterium enters the bloodstream. The anthrax toxin consists of 3 components, a protective antigen (PA), lethal factor (LF), and edema factor (EF). PA/LF and PA/EF complexes are internalized by host cells where the LF (metalloprotease) and EF (calmodulin-dependent adenylate cyclase) components act. At high levels LF induces cell death and release of the bacterium while EF increases host susceptibility to infection and promotes fluid accumulation in the cells. (NCBI)
''
 
* Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements
* The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id
 
 
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=PubMed&Cmd=Retrieve&list_uids=12004073  TIGR Publication]
 
= Genome Projects (listed by NCBI) =  


[http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects]
[http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects]
Line 12: Line 27:
== TIGR/JCVI Strains ==
== TIGR/JCVI Strains ==


   Contigs  Traces  Status    Date        Strain  
   Contigs  Traces  Status    Completed  Strain  
   0        96,532  Progress               A0039   
   0        96,532  Progress .          A0039   
   62      67,600  Assembly  16-JUN-2008 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5765 Tsiankovskii-I] AA
   62      67,600  Assembly  2007/07/25 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5765 Tsiankovskii-I] AA ; ??? possible update
   1(+2)    101,379  Complete  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=403  Ames Ancestor (Ames 0581)]  AA ; Insignia
   1(+2)    101,379  Complete 2004/05/20 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=403  Ames Ancestor (Ames 0581)]  AA ; Insignia; pXO1, pXO2
   42      86,181  Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5239 A1055] AA ; Insignia
   42      86,181  Assembly 2004/06/04 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5239 A1055] AA ; Insignia
   1(+2)   83,552  Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5278 A2012]  
   1(+2)469 83,552  Assembly 2005/05/16 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5278 A2012] Insignia; The  1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2
   1        125,879  Complete  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=299 Ames]  ??? complete but not in AA ; Insignia
   1        125,879  Complete 2002/05/16 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=299 Ames]  ??? complete but not in AA ; Insignia
   49      0        Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5241 Australia 94]  ??? no TRACES ; Insignia
   49      0        Assembly 2004/06/07 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5241 Australia 94]  ??? no TRACES ; Insignia
   30      90,308  Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5240 CNEVA-9066] Insignia
   30      90,308  Assembly 2004/06/04 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5240 CNEVA-9066] Insignia
   64      92,429  Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5127 Kruger B] AA ; Insignia
   64      92,429  Assembly 2004/06/07 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5127 Kruger B] AA ; Insignia
   52      103,144  Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5238 Vollum] Insignia
   52      103,144  Assembly 2004/06/04 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5238 Vollum] Insignia
   44      95,078  Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5165 Western North America USA6153] Insignia
   44      95,078  Assembly 2004/06/07 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5165 Western North America USA6153] Insignia


== LANL Strains ==
== LANL Strains ==


   Contigs  Traces  Status    Strain       
No traces in TA; none in Insignia
   60      0        Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5943 A0174]
 
   60      0        Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=27923 A0193]
   Contigs  Traces  Status   Completed   Strain       
   68      0        Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=27917 A0389]
   60      0        Assembly  2008/04/08  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5943 A0174]
   46      0        Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5910 A0442]
   60      0        Assembly  2008/02/12  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5911 A0193]                            
   57      0        Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5932 A0465]
   68      0        Assembly  2008/03/24  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5935 A0389]
   63      0        Assembly  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5890 A0488]
   46      0        Assembly  2008/02/12  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5910 A0442]
   57      0        Assembly  2008/03/24  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5932 A0465]
   63      0        Assembly  2008/01/16  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5890 A0488]
 
+ 2 plasmid genome projects (pX01,pX02) completed in 1999


== DOE Strains ==
== DOE Strains ==


   Contigs  Traces  Status    Strain
   Contigs  Traces  Status   Completed   Strain
   1        147,665  Complete  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=405 Sterne] ??? complete but not in AA; Insignia
   1        147,665  Complete  2004/06/24  [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=405 Sterne] ??? complete but not in AA; Insignia; pXO1, pXO2


== Naval Medical Research Center ==
== Naval Medical Research Center ==
Line 45: Line 64:
   0        0        Progress  34F2(NMRC)
   0        0        Progress  34F2(NMRC)
   0        0        Progress  34F2 delta gerH
   0        0        Progress  34F2 delta gerH
== Complete ==
  Strain        Status      chromosome      pXO1      pXO2
  A2012        Assembly    .              181677    94829
  Ames          Complete    5227293        .          .
  AmesAncestor  Complete    5227419        181677    94830
  Sterne        Complete    5228663        181654    96231
= Genome Projects (not listed by NCBI) =
== BCM ==
Data available "by request"
  Strains  reads    cvg  ctgs  N50ctg    AssemblyDate
  31-101  451,308  9.4  3,418  2,592    5-22-2006
  500      363,269  7.6  5,578  1,744    5-22-2006
= Strain Assemblies =
== A2012 ==
* NCBI Genome 
              RefId          Len    GC%
  chromosome  NZ_AAAC02000001 5093554 35.36
  pXO1        NC_003980      181677  32            # 100%  identical to AmesAncestor pXO1
  pXO2        NC_003981      94829  33            # 99.99% identical to AmesAncestor pXO2; 1 del
  chromosome
        #elem  min    max    mean    median  n50    sum
  ctg  425    94      132589  11885  6855    24366  5051252 # 42,346 N's
Traces:
* 18,045 reads have qual. & 65,507 don't
  Libraries:
  Lib            Mean    Stdev          Count
  T13322          2000    600            32133
  T13323          4000    1200            31455
  1047127226559  2000    600            18036
  T10914          3000    900            1719
  T10930          10000  3000            150
  GBZH            4500    .              29
  ...
  Total                                  83553
=== CA  ===
Summary
          #elem  min    max    mean    median  n50    sum
  scf    325    1001    245405  16824  1250    86284  5467834
  ctg    476    844    122753  11450  2025    33522  5450075  # larger N50 than the NCBI assembly
  deg    1760    172    4878    852    818    882    1498894
0cvg(no plasmids)
          #elem  min    max    mean    median  n50    sum
  1con    223    1      783    91.28  33      252    20355
  ctg-deg 973    1      10316  239    140    444    232946
0cvg(including plasmids)
          #elem  min    max    mean    median  n50    sum
  ctg-deg 841    1      3183    183    82      402    154114
!!! there are some regions in the CA assembly not present in the reference
=== CA bog ===
          #elem  min    max    mean    median  n50    sum
  scf    334    930    282449  16611  1227    92676  5547969
  ctg    513    643    105647  10776  1630    36335  5528273
  deg    1898    172    8320    875    827    883    1659854
bog vs unitigger:
        count avg  max  N50  totalBases
scaff    +    -    +    +    +
ctg      +    -    -    +    +
deg      +    +    +    .    +
sur      +    -    +    .    +
utg      -    +    +    .    .
sing    -    .    .    .    .
=== AMOScmp-alignmentTrimmed ===
* Ref: Ames
        #elem  min    max    mean    median  n50    sum
  ctg  123    122    215418  42281  24322  98944  5200569
!!! larger contigs than NCBI/CA assembly
* Ref : A2012 pXO1 => 1ctg
* Ref : A2012 pXO2 => 1ctg & 2 snps
== Ames ==
* Complete
  NC_003997.3    5227293 35.38  Bacillus anthracis str. Ames, complete genome
* not in AA
=== AMOScmp-alignmentTrimmed ===
* no 0 cvg regions when factory trimmed reads aligned to it
* -D LAYERR=90 => 1 piece
  ref=5227293 bp
  assembly=5227311
  amosvalidate=>1555 snps
  nucmer align of assembly to ref & filter -q => 287 snps
* Many stretched & missoriented mates
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed
=== CA  ===
Output:
          #elem  min    max    mean    median  n50    sum      snps
  scf    59      1021    893362  83396  1970    593436  4920387
  ctg    67      1021    736364  73430  2483    280568  4919826  496
  deg    245    70      141988  2016    671    34515  493924
* There are many stretched mates; no compressed ones !!!
0cvg
          #elem  min    max    mean    median  n50    sum
  1con    10      9      804    147    80      804    1467
  ctg-deg 224    1      2197    604    633    715    135220
Ctg 0 cvg regions:
  #id                  len    gc%    start  end    len    cvg
  ctg7180000001099      2637    35.04  439    2637    2198    0        33.97% gc ; blastn:96% identity to Bacilus cereus ; blastx: alpha/beta hydrolase & protein disulfide isomerase hits
  ctg7180000001288      28653  36.17  19125  19233  108    0        52.29% gc ; blastn: 100% identity to Ames Ancestor & Sterne
  ctg7180000001288      28653  36.17  28636  28653  17      0        aligned to other Bacillus anthracis strains
  ctg7180000001300      156978  35.35  1      161    160    0        46.58% gc ; clonning vector
  ctg7180000001302      70072  34.28  69464  70072  608    0        55.83% gc ; clonning vector
Deg 0 cvg regions:
  all have high GC% (>48.24%) probably cloning vector
Ref breaks:
  NC_003997.3  145564
  NC_003997.3  627742
  NC_003997.3  1151234
  NC_003997.3  2085561
  NC_003997.3  3515380
=== CA bog ===
          #elem  min    max    mean    median  n50    sum
  scf    94      1006    1158003 55534  14912  636683  5220223
  ctg    103    1006    594903  50630  15054  280568  5214932
  deg    501    70      15945  1044    609    1432    522839
 
0cvg
          #elem  min    max    mean    median  n50    sum
  1con    7      9      120    72.14  75      112    505
  ctg-deg 236    3      2197    607    632    713    143262
Ref breaks:
  NC_003997.3    145564
  NC_003997.3    627742
  NC_003997.3    1151234
  NC_003997.3    2085561
  NC_003997.3    3515380
bog vs unitigger:
        count avg  max  N50  totalBases
scaff    +    -    +    -    +
ctg      +    -    -    +    +
deg      +    -    -    .    +
sur      +    -    -    .    +
utg      +    -    +    .    .
sing    -    .    .    .    .
== Ames Ancestor ==
* Complete & in AA
  NC_007530.2    5227419 35.38  Bacillus anthracis str. 'Ames Ancestor', complete genome
  NC_007322.2    181677 32.53  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence
  NC_007323.3    94830  33.04  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence
  total          5503926
* Downloaded from AA and converted to bank
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293
=== CA ===
          #elem  min    max    mean    median  n50    sum
  ctg    13      1316    2750171 413057  187947  2750171 5369744
  deg    11      271    62530  13876  1627    62530  152635
  ctg+deg 24      271    2750171 230099  37452  2750171 5522379
  scf    11      1316    2750171 488185  212112  2750171 5370036
Ref: no alignment breaks, no 0cvg regions
== Sterne ==
* Complete but not in AA.
            RefId        Len        GC%
  chromosome NC_005945.1  5228663  35.38
  pXO1      NC_001496.1  181654    32
  pXO2      NC_002146.1  96231    33
!!! the plasmids are not listed with the genome project
* Traces available in TA. Looks like some reads are missing; I'm getting many 0 cvg regions when aligning the reads to the finished genome
        #elem  min    max    mean    median  n50    sum
  0cvg  53      8      4946    529    272    1508    28017
=== AMOScmp ===
* version: June 12 2007
* untrimmed reads => 358 ctg
* 53 zero cvg regions, max is almost 5K
=== AMOScmp-alignmentTrimmed ===
* reads are trimmed according to alignment coords
        #elem  min    max    mean    median  n50    sum
  ctg  46      736    468060  112792  49000  355370  5188425
=== CA ===
* runCA-OBT.pl script
* version 5.1
* Output:
  #elem  min    max    mean    median  n50    sum
  ctg    204    1000    468299  26560  1310    189443  5418252
  deg    145    266    32820  1756    804    21949  254596
  ctg+deg 349    266    468299  16255  1146    181331  5672848
  scf    186    1000    671877  29170  1274    294809  5425585
  singleton  2418
Ctg 0 cvg regions:
  all have high GC% (>52.08%) probably cloning vector
Deg 0 cvg regions: actually they align to pXO1
  #id                            len    gc%
  deg7180000001258.1-297          297    31.65
  deg7180000001300.420-6468      6049    32.05
  deg7180000001254.1-28442        28442  33.01
  deg7180000001258.557-1060      504    34.52
  deg7180000001253.1-576          576    35.76
  deg7180000001300.6728-6824      97      36.08
  deg7180000001300.1-160          160    38.12
  ..
=== minimus2 ===
Input:
                  #elem  min    max    mean    median  n50    sum
  AMOS            46      736    468060  112792  49000  355370  5188425
  CA              349    266    468299  16255  1146    181331  5672848
  AMOS+CA        395    266    468299  27497  1210    215336  10861273
Output:
                  #elem  min    max    mean    median  n50    sum
  ctg            47      931    468316  110527  37238  355370  5194753
  singl          282    24      32820  1602    1053    1341    451821
  ctg+singl      329    24      468316  17163  1103    296377  5646574
== Vollum (DOE) ==
NCBI data:
          #elem  min    max    mean    median  n50    sum
  ctg    52      311    812727  105547  29178  400992  5488459
=== CA ===
          #elem  min    max    mean    median  n50    sum
  ctg    39      1073    1541457 136843  51912  422289  5336858
  deg    18      693    54711  7994    2244    54711  143898
  scf    25      1440    1593252 214001  94958  676360  5350037
!!! Bigger contigs
* No alignment breaks vs NCBI assembly
=== CA bog ===
Same number of scf & ctg; fewer & longer unitigs than CA
== A0442 (LANL) ==
* no traces
* NCBI
        #elem  min    max    mean    median  n50    sum
  ctg  46      7229    1040654 116844  74960  223192  5374836
Some contigs align to Ames Ancestor pXO1 & pXO2
There are some "unique" regions in A0442 not present in Ames Ancestor

Latest revision as of 14:59, 23 September 2008

Background

  • 89 known strains
  • Most virulent: Ames(USA 2001), Vollum (WWII, biological weapon)
  • Benign: Sterne (used as vaccine)

Virulence factors that distinguish Bacillus anthracis from Bacillus cereus are encoded on two plasmids, pXO1 (anthrax toxin) and pXO2 (capsule genes). The capsule protects against phagocytosis once the vegetative bacterium enters the bloodstream. The anthrax toxin consists of 3 components, a protective antigen (PA), lethal factor (LF), and edema factor (EF). PA/LF and PA/EF complexes are internalized by host cells where the LF (metalloprotease) and EF (calmodulin-dependent adenylate cyclase) components act. At high levels LF induces cell death and release of the bacterium while EF increases host susceptibility to infection and promotes fluid accumulation in the cells. (NCBI)

  • Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements
  • The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id


Genome Projects (listed by NCBI)

NIH Genome Projects

 Center          Complete     Assembly    Progress  Total
 TIGR/JCVI       2            8           1         11
 LANL            0            6           0         6
 DOE             1            0           0         1 
 NMRC            0            0           2         2
 Total           3            14          3         20

TIGR/JCVI Strains

 Contigs  Traces   Status    Completed   Strain 
 0        96,532   Progress  .           A0039  
 62       67,600   Assembly  2007/07/25  Tsiankovskii-I AA ; ??? possible update
 1(+2)    101,379  Complete  2004/05/20  Ames Ancestor (Ames 0581)  AA ; Insignia; pXO1, pXO2
 42       86,181   Assembly  2004/06/04  A1055 AA ; Insignia
 1(+2)469 83,552   Assembly  2005/05/16  A2012  Insignia; The  1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2
 1        125,879  Complete  2002/05/16  Ames  ??? complete but not in AA ; Insignia
 49       0        Assembly  2004/06/07  Australia 94   ??? no TRACES ; Insignia
 30       90,308   Assembly  2004/06/04  CNEVA-9066 Insignia
 64       92,429   Assembly  2004/06/07  Kruger B AA ; Insignia
 52       103,144  Assembly  2004/06/04  Vollum Insignia
 44       95,078   Assembly  2004/06/07  Western North America USA6153 Insignia

LANL Strains

No traces in TA; none in Insignia

 Contigs  Traces   Status    Completed    Strain       
 60       0        Assembly  2008/04/08   A0174
 60       0        Assembly  2008/02/12   A0193                              
 68       0        Assembly  2008/03/24   A0389
 46       0        Assembly  2008/02/12   A0442
 57       0        Assembly  2008/03/24   A0465
 63       0        Assembly  2008/01/16   A0488

+ 2 plasmid genome projects (pX01,pX02) completed in 1999

DOE Strains

 Contigs  Traces   Status    Completed    Strain
 1        147,665  Complete  2004/06/24   Sterne ??? complete but not in AA; Insignia; pXO1, pXO2

Naval Medical Research Center

 Contigs  Traces   Status    Strain
 0        0        Progress  34F2(NMRC)
 0        0        Progress  34F2 delta gerH

Complete

 Strain        Status       chromosome      pXO1       pXO2
 A2012         Assembly     .               181677     94829 
 Ames          Complete     5227293         .          .
 AmesAncestor  Complete     5227419         181677     94830
 Sterne        Complete     5228663         181654     96231

Genome Projects (not listed by NCBI)

BCM

Data available "by request"

 Strains  reads    cvg   ctgs   N50ctg    AssemblyDate
 31-101   451,308  9.4   3,418  2,592     5-22-2006
 500      363,269  7.6   5,578  1,744     5-22-2006

Strain Assemblies

A2012

  • NCBI Genome
              RefId           Len     GC%
 chromosome   NZ_AAAC02000001 5093554 35.36
 pXO1         NC_003980       181677  32             # 100%   identical to AmesAncestor pXO1
 pXO2         NC_003981       94829   33             # 99.99% identical to AmesAncestor pXO2; 1 del
 chromosome
       #elem   min     max     mean    median  n50     sum
 ctg   425     94      132589  11885   6855    24366   5051252 # 42,346 N's


Traces:

  • 18,045 reads have qual. & 65,507 don't
 Libraries:
 Lib             Mean    Stdev           Count
 T13322          2000    600             32133
 T13323          4000    1200            31455
 1047127226559   2000    600             18036
 T10914          3000    900             1719
 T10930          10000   3000            150
 GBZH            4500    .               29
 ...
 Total                                   83553

CA

Summary

         #elem   min     max     mean    median  n50     sum
 scf     325     1001    245405  16824   1250    86284   5467834
 ctg     476     844     122753  11450   2025    33522   5450075   # larger N50 than the NCBI assembly
 deg     1760    172     4878    852     818     882     1498894


0cvg(no plasmids)

         #elem   min     max     mean    median  n50     sum
 1con    223     1       783     91.28   33      252     20355
 ctg-deg 973     1       10316   239     140     444     232946

0cvg(including plasmids)

         #elem   min     max     mean    median  n50     sum
 ctg-deg 841     1       3183    183     82      402     154114

!!! there are some regions in the CA assembly not present in the reference

CA bog

         #elem   min     max     mean    median  n50     sum
 scf     334     930     282449  16611   1227    92676   5547969
 ctg     513     643     105647  10776   1630    36335   5528273
 deg     1898    172     8320    875     827     883     1659854

bog vs unitigger:

        count avg  max  N50  totalBases
scaff    +    -    +    +    +
ctg      +    -    -    +    +
deg      +    +    +    .    +
sur      +    -    +    .    +
utg      -    +    +    .    .
sing     -    .    .    .    .

AMOScmp-alignmentTrimmed

  • Ref: Ames
       #elem   min     max     mean    median  n50     sum
 ctg   123     122     215418  42281   24322   98944   5200569

!!! larger contigs than NCBI/CA assembly

  • Ref : A2012 pXO1 => 1ctg
  • Ref : A2012 pXO2 => 1ctg & 2 snps

Ames

  • Complete
 NC_003997.3    5227293 35.38  Bacillus anthracis str. Ames, complete genome
  • not in AA

AMOScmp-alignmentTrimmed

  • no 0 cvg regions when factory trimmed reads aligned to it
  • -D LAYERR=90 => 1 piece
 ref=5227293 bp
 assembly=5227311
 amosvalidate=>1555 snps
 nucmer align of assembly to ref & filter -q => 287 snps
  • Many stretched & missoriented mates
  • Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed

CA

Output:

         #elem   min     max     mean    median  n50     sum       snps
 scf     59      1021    893362  83396   1970    593436  4920387
 ctg     67      1021    736364  73430   2483    280568  4919826   496
 deg     245     70      141988  2016    671     34515   493924
  • There are many stretched mates; no compressed ones !!!

0cvg

         #elem   min     max     mean    median  n50     sum
 1con    10      9       804     147     80      804     1467
 ctg-deg 224     1       2197    604     633     715     135220

Ctg 0 cvg regions:

 #id                   len     gc%     start   end     len     cvg
 ctg7180000001099      2637    35.04   439     2637    2198    0        33.97% gc ; blastn:96% identity to Bacilus cereus ; blastx: alpha/beta hydrolase & protein disulfide isomerase hits
 ctg7180000001288      28653   36.17   19125   19233   108     0        52.29% gc ; blastn: 100% identity to Ames Ancestor & Sterne
 ctg7180000001288      28653   36.17   28636   28653   17      0        aligned to other Bacillus anthracis strains
 ctg7180000001300      156978  35.35   1       161     160     0        46.58% gc ; clonning vector
 ctg7180000001302      70072   34.28   69464   70072   608     0        55.83% gc ; clonning vector

Deg 0 cvg regions:

 all have high GC% (>48.24%) probably cloning vector

Ref breaks:

 NC_003997.3   145564
 NC_003997.3   627742
 NC_003997.3   1151234
 NC_003997.3   2085561
 NC_003997.3   3515380

CA bog

         #elem   min     max     mean    median  n50     sum
 scf     94      1006    1158003 55534   14912   636683  5220223
 ctg     103     1006    594903  50630   15054   280568  5214932
 deg     501     70      15945   1044    609     1432    522839
 

0cvg

         #elem   min     max     mean    median  n50     sum
 1con    7       9       120     72.14   75      112     505
 ctg-deg 236     3       2197    607     632     713     143262

Ref breaks:

 NC_003997.3     145564
 NC_003997.3     627742
 NC_003997.3     1151234
 NC_003997.3     2085561
 NC_003997.3     3515380

bog vs unitigger:

        count avg  max  N50  totalBases
scaff    +    -    +    -    +
ctg      +    -    -    +    +
deg      +    -    -    .    +
sur      +    -    -    .    +
utg      +    -    +    .    .
sing     -    .    .    .    .

Ames Ancestor

  • Complete & in AA
 NC_007530.2    5227419 35.38  Bacillus anthracis str. 'Ames Ancestor', complete genome
 NC_007322.2    181677 32.53  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence
 NC_007323.3    94830  33.04  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence
 total          5503926
  • Downloaded from AA and converted to bank
  • Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293

CA

         #elem   min     max     mean    median  n50     sum
 ctg     13      1316    2750171 413057  187947  2750171 5369744
 deg     11      271     62530   13876   1627    62530   152635
 ctg+deg 24      271     2750171 230099  37452   2750171 5522379
 scf     11      1316    2750171 488185  212112  2750171 5370036

Ref: no alignment breaks, no 0cvg regions

Sterne

  • Complete but not in AA.
            RefId        Len        GC%
 chromosome NC_005945.1  5228663   35.38
 pXO1       NC_001496.1  181654    32
 pXO2       NC_002146.1  96231     33 

!!! the plasmids are not listed with the genome project

  • Traces available in TA. Looks like some reads are missing; I'm getting many 0 cvg regions when aligning the reads to the finished genome
       #elem   min     max     mean    median  n50     sum
 0cvg  53      8       4946    529     272     1508    28017

AMOScmp

  • version: June 12 2007
  • untrimmed reads => 358 ctg
  • 53 zero cvg regions, max is almost 5K

AMOScmp-alignmentTrimmed

  • reads are trimmed according to alignment coords
       #elem   min     max     mean    median  n50     sum
 ctg   46      736     468060  112792  49000   355370  5188425

CA

  • runCA-OBT.pl script
  • version 5.1
  • Output:
 #elem   min     max     mean    median  n50     sum
 ctg     204     1000    468299  26560   1310    189443  5418252
 deg     145     266     32820   1756    804     21949   254596
 ctg+deg 349     266     468299  16255   1146    181331  5672848
 scf     186     1000    671877  29170   1274    294809  5425585
 singleton  2418

Ctg 0 cvg regions:

 all have high GC% (>52.08%) probably cloning vector

Deg 0 cvg regions: actually they align to pXO1

 #id                             len     gc%
 deg7180000001258.1-297          297     31.65
 deg7180000001300.420-6468       6049    32.05
 deg7180000001254.1-28442        28442   33.01
 deg7180000001258.557-1060       504     34.52
 deg7180000001253.1-576          576     35.76
 deg7180000001300.6728-6824      97      36.08
 deg7180000001300.1-160          160     38.12
 ..

minimus2

Input:

                 #elem   min     max     mean    median  n50     sum
 AMOS            46      736     468060  112792  49000   355370  5188425
 CA              349     266     468299  16255   1146    181331  5672848
 AMOS+CA         395     266     468299  27497   1210    215336  10861273

Output:

                 #elem   min     max     mean    median  n50     sum
 ctg             47      931     468316  110527  37238   355370  5194753
 singl           282     24      32820   1602    1053    1341    451821
 ctg+singl       329     24      468316  17163   1103    296377  5646574

Vollum (DOE)

NCBI data:

         #elem   min     max     mean    median  n50     sum
 ctg     52      311     812727  105547  29178   400992  5488459

CA

         #elem   min     max     mean    median  n50     sum
 ctg     39      1073    1541457 136843  51912   422289  5336858
 deg     18      693     54711   7994    2244    54711   143898
 scf     25      1440    1593252 214001  94958   676360  5350037

!!! Bigger contigs

  • No alignment breaks vs NCBI assembly

CA bog

Same number of scf & ctg; fewer & longer unitigs than CA

A0442 (LANL)

  • no traces
  • NCBI
       #elem   min     max     mean    median  n50     sum
 ctg   46      7229    1040654 116844  74960   223192  5374836

Some contigs align to Ames Ancestor pXO1 & pXO2 There are some "unique" regions in A0442 not present in Ames Ancestor