Bacillus anthracis: Difference between revisions
| (17 intermediate revisions by the same user not shown) | |||
| Line 11: | Line 11: | ||
| * The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id | * The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id | ||
| = Genome Projects =   | |||
| * [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=PubMed&Cmd=Retrieve&list_uids=12004073  TIGR Publication] | |||
| = Genome Projects (listed by NCBI) =   | |||
| [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects] | [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects] | ||
| Line 69: | Line 72: | ||
|    AmesAncestor  Complete     5227419         181677     94830 |    AmesAncestor  Complete     5227419         181677     94830 | ||
|    Sterne        Complete     5228663         181654     96231 |    Sterne        Complete     5228663         181654     96231 | ||
| = Genome Projects (not listed by NCBI) =  | |||
| == BCM == | |||
| Data available "by request" | |||
|   Strains  reads    cvg   ctgs   N50ctg    AssemblyDate | |||
|   31-101   451,308  9.4   3,418  2,592     5-22-2006 | |||
|   500      363,269  7.6   5,578  1,744     5-22-2006 | |||
| = Strain Assemblies = | = Strain Assemblies = | ||
| Line 77: | Line 90: | ||
|                 RefId           Len     GC% |                 RefId           Len     GC% | ||
|    chromosome   NZ_AAAC02000001 5093554 35.36 |    chromosome   NZ_AAAC02000001 5093554 35.36 | ||
|    pXO1         NC_003980       181677  32 |    pXO1         NC_003980       181677  32             # 100%   identical to AmesAncestor pXO1 | ||
|    pXO2         NC_003981       94829   33 |    pXO2         NC_003981       94829   33             # 99.99% identical to AmesAncestor pXO2; 1 del | ||
|    chromosome |    chromosome | ||
| Line 84: | Line 97: | ||
|    ctg   425     94      132589  11885   6855    24366   5051252 # 42,346 N's |    ctg   425     94      132589  11885   6855    24366   5051252 # 42,346 N's | ||
| Traces: | Traces: | ||
| Line 104: | Line 116: | ||
| Summary | Summary | ||
|            #elem   min     max     mean    median  n50     sum |            #elem   min     max     mean    median  n50     sum | ||
|   scf     325     1001    245405  16824   1250    86284   5467834 | |||
|    ctg     476     844     122753  11450   2025    33522   5450075   # larger N50 than the NCBI assembly |    ctg     476     844     122753  11450   2025    33522   5450075   # larger N50 than the NCBI assembly | ||
|    deg     1760    172     4878    852     818     882     1498894 |    deg     1760    172     4878    852     818     882     1498894 | ||
| 0cvg(no plasmids) | 0cvg(no plasmids) | ||
| Line 119: | Line 131: | ||
| !!! there are some regions in the CA assembly not present in the reference | !!! there are some regions in the CA assembly not present in the reference | ||
| === CA bog === | |||
|           #elem   min     max     mean    median  n50     sum | |||
|   scf     334     930     282449  16611   1227    92676   5547969 | |||
|   ctg     513     643     105647  10776   1630    36335   5528273 | |||
|   deg     1898    172     8320    875     827     883     1659854 | |||
| bog vs unitigger:  | |||
|          count avg  max  N50  totalBases | |||
|  scaff    +    -    +    +    + | |||
|  ctg      +    -    -    +    + | |||
|  deg      +    +    +    .    + | |||
|  sur      +    -    +    .    + | |||
|  utg      -    +    +    .    . | |||
|  sing     -    .    .    .    . | |||
| === AMOScmp-alignmentTrimmed === | === AMOScmp-alignmentTrimmed === | ||
| Ref: Ames | * Ref: Ames | ||
|          #elem   min     max     mean    median  n50     sum |          #elem   min     max     mean    median  n50     sum | ||
|    ctg   123     122     215418  42281   24322   98944   5200569 |    ctg   123     122     215418  42281   24322   98944   5200569 | ||
| !!! larger contigs than NCBI/CA assembly | !!! larger contigs than NCBI/CA assembly | ||
| * Ref : A2012 pXO1 => 1ctg | |||
| * Ref : A2012 pXO2 => 1ctg & 2 snps | |||
| == Ames == | == Ames == | ||
| Line 146: | Line 177: | ||
| * Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed | * Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed | ||
| === CA === | === CA  === | ||
| Output: | Output: | ||
| Line 155: | Line 186: | ||
| * There are many stretched mates; no compressed ones !!! | * There are many stretched mates; no compressed ones !!! | ||
| 0cvg | |||
|           #elem   min     max     mean    median  n50     sum | |||
|   1con    10      9       804     147     80      804     1467 | |||
|   ctg-deg 224     1       2197    604     633     715     135220 | |||
| Ctg 0 cvg regions: | Ctg 0 cvg regions: | ||
| Line 173: | Line 209: | ||
|    NC_003997.3   2085561 |    NC_003997.3   2085561 | ||
|    NC_003997.3   3515380 |    NC_003997.3   3515380 | ||
| === CA bog === | |||
|           #elem   min     max     mean    median  n50     sum | |||
|   scf     94      1006    1158003 55534   14912   636683  5220223 | |||
|   ctg     103     1006    594903  50630   15054   280568  5214932 | |||
|   deg     501     70      15945   1044    609     1432    522839 | |||
| 0cvg | |||
|           #elem   min     max     mean    median  n50     sum | |||
|   1con    7       9       120     72.14   75      112     505 | |||
|   ctg-deg 236     3       2197    607     632     713     143262 | |||
| Ref breaks: | |||
|   NC_003997.3     145564 | |||
|   NC_003997.3     627742 | |||
|   NC_003997.3     1151234 | |||
|   NC_003997.3     2085561 | |||
|   NC_003997.3     3515380 | |||
| bog vs unitigger:  | |||
|          count avg  max  N50  totalBases | |||
|  scaff    +    -    +    -    + | |||
|  ctg      +    -    -    +    + | |||
|  deg      +    -    -    .    + | |||
|  sur      +    -    -    .    + | |||
|  utg      +    -    +    .    . | |||
|  sing     -    .    .    .    . | |||
| == Ames Ancestor == | == Ames Ancestor == | ||
| Line 180: | Line 244: | ||
|    NC_007322.2    181677 32.53  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence |    NC_007322.2    181677 32.53  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence | ||
|    NC_007323.3    94830  33.04  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence |    NC_007323.3    94830  33.04  Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence | ||
|   total          5503926 | |||
| * Downloaded from AA and converted to bank | * Downloaded from AA and converted to bank | ||
| Line 269: | Line 334: | ||
|    ctg     52      311     812727  105547  29178   400992  5488459 |    ctg     52      311     812727  105547  29178   400992  5488459 | ||
| CA | === CA === | ||
|            #elem   min     max     mean    median  n50     sum |            #elem   min     max     mean    median  n50     sum | ||
|    ctg     39      1073    1541457 136843  51912   422289  5336858 |    ctg     39      1073    1541457 136843  51912   422289  5336858 | ||
|    deg     18      693     54711   7994    2244    54711   143898 |    deg     18      693     54711   7994    2244    54711   143898 | ||
|    scf     25      1440    1593252 214001  94958   676360  5350037 |    scf     25      1440    1593252 214001  94958   676360  5350037 | ||
| !!! Bigger contigs | |||
| * No alignment breaks vs NCBI assembly | * No alignment breaks vs NCBI assembly | ||
| === CA bog === | |||
| Same number of scf & ctg; fewer & longer unitigs than CA | |||
| == A0442 (LANL) == | == A0442 (LANL) == | ||
| Line 286: | Line 354: | ||
|          #elem   min     max     mean    median  n50     sum |          #elem   min     max     mean    median  n50     sum | ||
|    ctg   46      7229    1040654 116844  74960   223192  5374836 |    ctg   46      7229    1040654 116844  74960   223192  5374836 | ||
| Some contigs align to Ames Ancestor pXO1 & pXO2 | |||
| There are some "unique" regions in A0442 not present in Ames Ancestor | |||
Latest revision as of 14:59, 23 September 2008
Background
- 89 known strains
- Most virulent: Ames(USA 2001), Vollum (WWII, biological weapon)
- Benign: Sterne (used as vaccine)
Virulence factors that distinguish Bacillus anthracis from Bacillus cereus are encoded on two plasmids, pXO1 (anthrax toxin) and pXO2 (capsule genes). The capsule protects against phagocytosis once the vegetative bacterium enters the bloodstream. The anthrax toxin consists of 3 components, a protective antigen (PA), lethal factor (LF), and edema factor (EF). PA/LF and PA/EF complexes are internalized by host cells where the LF (metalloprotease) and EF (calmodulin-dependent adenylate cyclase) components act. At high levels LF induces cell death and release of the bacterium while EF increases host susceptibility to infection and promotes fluid accumulation in the cells. (NCBI)
- Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements
- The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id
Genome Projects (listed by NCBI)
Center Complete Assembly Progress Total TIGR/JCVI 2 8 1 11 LANL 0 6 0 6 DOE 1 0 0 1 NMRC 0 0 2 2 Total 3 14 3 20
TIGR/JCVI Strains
Contigs Traces Status Completed Strain 0 96,532 Progress . A0039 62 67,600 Assembly 2007/07/25 Tsiankovskii-I AA ; ??? possible update 1(+2) 101,379 Complete 2004/05/20 Ames Ancestor (Ames 0581) AA ; Insignia; pXO1, pXO2 42 86,181 Assembly 2004/06/04 A1055 AA ; Insignia 1(+2)469 83,552 Assembly 2005/05/16 A2012 Insignia; The 1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2 1 125,879 Complete 2002/05/16 Ames ??? complete but not in AA ; Insignia 49 0 Assembly 2004/06/07 Australia 94 ??? no TRACES ; Insignia 30 90,308 Assembly 2004/06/04 CNEVA-9066 Insignia 64 92,429 Assembly 2004/06/07 Kruger B AA ; Insignia 52 103,144 Assembly 2004/06/04 Vollum Insignia 44 95,078 Assembly 2004/06/07 Western North America USA6153 Insignia
LANL Strains
No traces in TA; none in Insignia
Contigs Traces Status Completed Strain 60 0 Assembly 2008/04/08 A0174 60 0 Assembly 2008/02/12 A0193 68 0 Assembly 2008/03/24 A0389 46 0 Assembly 2008/02/12 A0442 57 0 Assembly 2008/03/24 A0465 63 0 Assembly 2008/01/16 A0488
+ 2 plasmid genome projects (pX01,pX02) completed in 1999
DOE Strains
Contigs Traces Status Completed Strain 1 147,665 Complete 2004/06/24 Sterne ??? complete but not in AA; Insignia; pXO1, pXO2
Contigs Traces Status Strain 0 0 Progress 34F2(NMRC) 0 0 Progress 34F2 delta gerH
Complete
Strain Status chromosome pXO1 pXO2 A2012 Assembly . 181677 94829 Ames Complete 5227293 . . AmesAncestor Complete 5227419 181677 94830 Sterne Complete 5228663 181654 96231
Genome Projects (not listed by NCBI)
BCM
Data available "by request"
Strains reads cvg ctgs N50ctg AssemblyDate 31-101 451,308 9.4 3,418 2,592 5-22-2006 500 363,269 7.6 5,578 1,744 5-22-2006
Strain Assemblies
A2012
- NCBI Genome
RefId Len GC% chromosome NZ_AAAC02000001 5093554 35.36 pXO1 NC_003980 181677 32 # 100% identical to AmesAncestor pXO1 pXO2 NC_003981 94829 33 # 99.99% identical to AmesAncestor pXO2; 1 del
 chromosome
       #elem   min     max     mean    median  n50     sum
 ctg   425     94      132589  11885   6855    24366   5051252 # 42,346 N's
Traces:
- 18,045 reads have qual. & 65,507 don't
Libraries: Lib Mean Stdev Count T13322 2000 600 32133 T13323 4000 1200 31455 1047127226559 2000 600 18036 T10914 3000 900 1719 T10930 10000 3000 150 GBZH 4500 . 29 ... Total 83553
CA
Summary
#elem min max mean median n50 sum scf 325 1001 245405 16824 1250 86284 5467834 ctg 476 844 122753 11450 2025 33522 5450075 # larger N50 than the NCBI assembly deg 1760 172 4878 852 818 882 1498894
0cvg(no plasmids)
#elem min max mean median n50 sum 1con 223 1 783 91.28 33 252 20355 ctg-deg 973 1 10316 239 140 444 232946
0cvg(including plasmids)
#elem min max mean median n50 sum ctg-deg 841 1 3183 183 82 402 154114
!!! there are some regions in the CA assembly not present in the reference
CA bog
#elem min max mean median n50 sum scf 334 930 282449 16611 1227 92676 5547969 ctg 513 643 105647 10776 1630 36335 5528273 deg 1898 172 8320 875 827 883 1659854
bog vs unitigger:
count avg max N50 totalBases scaff + - + + + ctg + - - + + deg + + + . + sur + - + . + utg - + + . . sing - . . . .
AMOScmp-alignmentTrimmed
- Ref: Ames
#elem min max mean median n50 sum ctg 123 122 215418 42281 24322 98944 5200569
!!! larger contigs than NCBI/CA assembly
- Ref : A2012 pXO1 => 1ctg
- Ref : A2012 pXO2 => 1ctg & 2 snps
Ames
- Complete
NC_003997.3 5227293 35.38 Bacillus anthracis str. Ames, complete genome
- not in AA
AMOScmp-alignmentTrimmed
- no 0 cvg regions when factory trimmed reads aligned to it
- -D LAYERR=90 => 1 piece
ref=5227293 bp assembly=5227311 amosvalidate=>1555 snps nucmer align of assembly to ref & filter -q => 287 snps
- Many stretched & missoriented mates
- Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed
CA
Output:
#elem min max mean median n50 sum snps scf 59 1021 893362 83396 1970 593436 4920387 ctg 67 1021 736364 73430 2483 280568 4919826 496 deg 245 70 141988 2016 671 34515 493924
- There are many stretched mates; no compressed ones !!!
0cvg
#elem min max mean median n50 sum 1con 10 9 804 147 80 804 1467 ctg-deg 224 1 2197 604 633 715 135220
Ctg 0 cvg regions:
#id len gc% start end len cvg ctg7180000001099 2637 35.04 439 2637 2198 0 33.97% gc ; blastn:96% identity to Bacilus cereus ; blastx: alpha/beta hydrolase & protein disulfide isomerase hits ctg7180000001288 28653 36.17 19125 19233 108 0 52.29% gc ; blastn: 100% identity to Ames Ancestor & Sterne ctg7180000001288 28653 36.17 28636 28653 17 0 aligned to other Bacillus anthracis strains ctg7180000001300 156978 35.35 1 161 160 0 46.58% gc ; clonning vector ctg7180000001302 70072 34.28 69464 70072 608 0 55.83% gc ; clonning vector
Deg 0 cvg regions:
all have high GC% (>48.24%) probably cloning vector
Ref breaks:
NC_003997.3 145564 NC_003997.3 627742 NC_003997.3 1151234 NC_003997.3 2085561 NC_003997.3 3515380
CA bog
#elem min max mean median n50 sum scf 94 1006 1158003 55534 14912 636683 5220223 ctg 103 1006 594903 50630 15054 280568 5214932 deg 501 70 15945 1044 609 1432 522839
0cvg
#elem min max mean median n50 sum 1con 7 9 120 72.14 75 112 505 ctg-deg 236 3 2197 607 632 713 143262
Ref breaks:
NC_003997.3 145564 NC_003997.3 627742 NC_003997.3 1151234 NC_003997.3 2085561 NC_003997.3 3515380
bog vs unitigger:
count avg max N50 totalBases scaff + - + - + ctg + - - + + deg + - - . + sur + - - . + utg + - + . . sing - . . . .
Ames Ancestor
- Complete & in AA
NC_007530.2 5227419 35.38 Bacillus anthracis str. 'Ames Ancestor', complete genome NC_007322.2 181677 32.53 Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence NC_007323.3 94830 33.04 Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence total 5503926
- Downloaded from AA and converted to bank
- Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293
CA
#elem min max mean median n50 sum ctg 13 1316 2750171 413057 187947 2750171 5369744 deg 11 271 62530 13876 1627 62530 152635 ctg+deg 24 271 2750171 230099 37452 2750171 5522379 scf 11 1316 2750171 488185 212112 2750171 5370036
Ref: no alignment breaks, no 0cvg regions
Sterne
- Complete but not in AA.
RefId Len GC% chromosome NC_005945.1 5228663 35.38 pXO1 NC_001496.1 181654 32 pXO2 NC_002146.1 96231 33
!!! the plasmids are not listed with the genome project
- Traces available in TA. Looks like some reads are missing; I'm getting many 0 cvg regions when aligning the reads to the finished genome
#elem min max mean median n50 sum 0cvg 53 8 4946 529 272 1508 28017
AMOScmp
- version: June 12 2007
- untrimmed reads => 358 ctg
- 53 zero cvg regions, max is almost 5K
AMOScmp-alignmentTrimmed
- reads are trimmed according to alignment coords
#elem min max mean median n50 sum ctg 46 736 468060 112792 49000 355370 5188425
CA
- runCA-OBT.pl script
- version 5.1
- Output:
#elem min max mean median n50 sum ctg 204 1000 468299 26560 1310 189443 5418252 deg 145 266 32820 1756 804 21949 254596 ctg+deg 349 266 468299 16255 1146 181331 5672848 scf 186 1000 671877 29170 1274 294809 5425585 singleton 2418
Ctg 0 cvg regions:
all have high GC% (>52.08%) probably cloning vector
Deg 0 cvg regions: actually they align to pXO1
#id len gc% deg7180000001258.1-297 297 31.65 deg7180000001300.420-6468 6049 32.05 deg7180000001254.1-28442 28442 33.01 deg7180000001258.557-1060 504 34.52 deg7180000001253.1-576 576 35.76 deg7180000001300.6728-6824 97 36.08 deg7180000001300.1-160 160 38.12 ..
minimus2
Input:
#elem min max mean median n50 sum AMOS 46 736 468060 112792 49000 355370 5188425 CA 349 266 468299 16255 1146 181331 5672848 AMOS+CA 395 266 468299 27497 1210 215336 10861273
Output:
#elem min max mean median n50 sum ctg 47 931 468316 110527 37238 355370 5194753 singl 282 24 32820 1602 1053 1341 451821 ctg+singl 329 24 468316 17163 1103 296377 5646574
Vollum (DOE)
NCBI data:
#elem min max mean median n50 sum ctg 52 311 812727 105547 29178 400992 5488459
CA
#elem min max mean median n50 sum ctg 39 1073 1541457 136843 51912 422289 5336858 deg 18 693 54711 7994 2244 54711 143898 scf 25 1440 1593252 214001 94958 676360 5350037
!!! Bigger contigs
- No alignment breaks vs NCBI assembly
CA bog
Same number of scf & ctg; fewer & longer unitigs than CA
A0442 (LANL)
- no traces
- NCBI
#elem min max mean median n50 sum ctg 46 7229 1040654 116844 74960 223192 5374836
Some contigs align to Ames Ancestor pXO1 & pXO2 There are some "unique" regions in A0442 not present in Ames Ancestor