Bacillus anthracis: Difference between revisions
No edit summary |
|||
(92 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Genome Projects = | = Background = | ||
* 89 known strains | |||
* Most virulent: Ames(USA 2001), Vollum (WWII, biological weapon) | |||
* Benign: Sterne (used as vaccine) | |||
''Virulence factors that distinguish Bacillus anthracis from Bacillus cereus are encoded on two plasmids, pXO1 (anthrax toxin) and pXO2 (capsule genes). The capsule protects against phagocytosis once the vegetative bacterium enters the bloodstream. The anthrax toxin consists of 3 components, a protective antigen (PA), lethal factor (LF), and edema factor (EF). PA/LF and PA/EF complexes are internalized by host cells where the LF (metalloprotease) and EF (calmodulin-dependent adenylate cyclase) components act. At high levels LF induces cell death and release of the bacterium while EF increases host susceptibility to infection and promotes fluid accumulation in the cells. (NCBI) | |||
'' | |||
* Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements | |||
* The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id | |||
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=PubMed&Cmd=Retrieve&list_uids=12004073 TIGR Publication] | |||
= Genome Projects (listed by NCBI) = | |||
[http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects] | [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Bacillus%20anthracis NIH Genome Projects] | ||
Line 12: | Line 27: | ||
== TIGR/JCVI Strains == | == TIGR/JCVI Strains == | ||
Contigs Traces Status Strain | Contigs Traces Status Completed Strain | ||
0 96,532 Progress A0039 | 0 96,532 Progress . A0039 | ||
62 67,600 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5765 Tsiankovskii-I] AA | 62 67,600 Assembly 2007/07/25 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5765 Tsiankovskii-I] AA ; ??? possible update | ||
1(+2) 101,379 Complete [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=403 Ames Ancestor (Ames 0581)] AA ; Insignia | 1(+2) 101,379 Complete 2004/05/20 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=403 Ames Ancestor (Ames 0581)] AA ; Insignia; pXO1, pXO2 | ||
42 86,181 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5239 A1055] AA ; Insignia | 42 86,181 Assembly 2004/06/04 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5239 A1055] AA ; Insignia | ||
1(+2) | 1(+2)469 83,552 Assembly 2005/05/16 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5278 A2012] Insignia; The 1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2 | ||
1 125,879 Complete [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=299 Ames] ??? complete but not in AA ; Insignia | 1 125,879 Complete 2002/05/16 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=299 Ames] ??? complete but not in AA ; Insignia | ||
49 0 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5241 Australia 94] ??? no TRACES ; Insignia | 49 0 Assembly 2004/06/07 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5241 Australia 94] ??? no TRACES ; Insignia | ||
30 90,308 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5240 CNEVA-9066] Insignia | 30 90,308 Assembly 2004/06/04 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5240 CNEVA-9066] Insignia | ||
64 92,429 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5127 Kruger B] AA ; Insignia | 64 92,429 Assembly 2004/06/07 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5127 Kruger B] AA ; Insignia | ||
52 103,144 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5238 Vollum] Insignia | 52 103,144 Assembly 2004/06/04 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5238 Vollum] Insignia | ||
44 95,078 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5165 Western North America USA6153] Insignia | 44 95,078 Assembly 2004/06/07 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5165 Western North America USA6153] Insignia | ||
== LANL Strains == | == LANL Strains == | ||
Contigs Traces Status Strain | No traces in TA; none in Insignia | ||
60 0 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5943 A0174] | |||
60 0 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids= | Contigs Traces Status Completed Strain | ||
68 0 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids= | 60 0 Assembly 2008/04/08 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5943 A0174] | ||
46 0 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5910 A0442] | 60 0 Assembly 2008/02/12 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5911 A0193] | ||
57 0 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5932 A0465] | 68 0 Assembly 2008/03/24 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5935 A0389] | ||
63 0 Assembly [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5890 A0488] | 46 0 Assembly 2008/02/12 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5910 A0442] | ||
57 0 Assembly 2008/03/24 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5932 A0465] | |||
63 0 Assembly 2008/01/16 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=5890 A0488] | |||
+ 2 plasmid genome projects (pX01,pX02) completed in 1999 | |||
== DOE Strains == | == DOE Strains == | ||
Contigs Traces Status Strain | Contigs Traces Status Completed Strain | ||
1 147,665 Complete [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=405 Sterne] ??? complete but not in AA; Insignia | 1 147,665 Complete 2004/06/24 [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=Retrieve&dopt=Overview&list_uids=405 Sterne] ??? complete but not in AA; Insignia; pXO1, pXO2 | ||
== Naval Medical Research Center == | == Naval Medical Research Center == | ||
Line 45: | Line 64: | ||
0 0 Progress 34F2(NMRC) | 0 0 Progress 34F2(NMRC) | ||
0 0 Progress 34F2 delta gerH | 0 0 Progress 34F2 delta gerH | ||
== Complete == | |||
Strain Status chromosome pXO1 pXO2 | |||
A2012 Assembly . 181677 94829 | |||
Ames Complete 5227293 . . | |||
AmesAncestor Complete 5227419 181677 94830 | |||
Sterne Complete 5228663 181654 96231 | |||
= Genome Projects (not listed by NCBI) = | |||
== BCM == | |||
Data available "by request" | |||
Strains reads cvg ctgs N50ctg AssemblyDate | |||
31-101 451,308 9.4 3,418 2,592 5-22-2006 | |||
500 363,269 7.6 5,578 1,744 5-22-2006 | |||
= Strain Assemblies = | |||
== A2012 == | |||
* NCBI Genome | |||
RefId Len GC% | |||
chromosome NZ_AAAC02000001 5093554 35.36 | |||
pXO1 NC_003980 181677 32 # 100% identical to AmesAncestor pXO1 | |||
pXO2 NC_003981 94829 33 # 99.99% identical to AmesAncestor pXO2; 1 del | |||
chromosome | |||
#elem min max mean median n50 sum | |||
ctg 425 94 132589 11885 6855 24366 5051252 # 42,346 N's | |||
Traces: | |||
* 18,045 reads have qual. & 65,507 don't | |||
Libraries: | |||
Lib Mean Stdev Count | |||
T13322 2000 600 32133 | |||
T13323 4000 1200 31455 | |||
1047127226559 2000 600 18036 | |||
T10914 3000 900 1719 | |||
T10930 10000 3000 150 | |||
GBZH 4500 . 29 | |||
... | |||
Total 83553 | |||
=== CA === | |||
Summary | |||
#elem min max mean median n50 sum | |||
scf 325 1001 245405 16824 1250 86284 5467834 | |||
ctg 476 844 122753 11450 2025 33522 5450075 # larger N50 than the NCBI assembly | |||
deg 1760 172 4878 852 818 882 1498894 | |||
0cvg(no plasmids) | |||
#elem min max mean median n50 sum | |||
1con 223 1 783 91.28 33 252 20355 | |||
ctg-deg 973 1 10316 239 140 444 232946 | |||
0cvg(including plasmids) | |||
#elem min max mean median n50 sum | |||
ctg-deg 841 1 3183 183 82 402 154114 | |||
!!! there are some regions in the CA assembly not present in the reference | |||
=== CA bog === | |||
#elem min max mean median n50 sum | |||
scf 334 930 282449 16611 1227 92676 5547969 | |||
ctg 513 643 105647 10776 1630 36335 5528273 | |||
deg 1898 172 8320 875 827 883 1659854 | |||
bog vs unitigger: | |||
count avg max N50 totalBases | |||
scaff + - + + + | |||
ctg + - - + + | |||
deg + + + . + | |||
sur + - + . + | |||
utg - + + . . | |||
sing - . . . . | |||
=== AMOScmp-alignmentTrimmed === | |||
* Ref: Ames | |||
#elem min max mean median n50 sum | |||
ctg 123 122 215418 42281 24322 98944 5200569 | |||
!!! larger contigs than NCBI/CA assembly | |||
* Ref : A2012 pXO1 => 1ctg | |||
* Ref : A2012 pXO2 => 1ctg & 2 snps | |||
== Ames == | |||
* Complete | |||
NC_003997.3 5227293 35.38 Bacillus anthracis str. Ames, complete genome | |||
* not in AA | |||
=== AMOScmp-alignmentTrimmed === | |||
* no 0 cvg regions when factory trimmed reads aligned to it | |||
* -D LAYERR=90 => 1 piece | |||
ref=5227293 bp | |||
assembly=5227311 | |||
amosvalidate=>1555 snps | |||
nucmer align of assembly to ref & filter -q => 287 snps | |||
* Many stretched & missoriented mates | |||
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed | |||
=== CA === | |||
Output: | |||
#elem min max mean median n50 sum snps | |||
scf 59 1021 893362 83396 1970 593436 4920387 | |||
ctg 67 1021 736364 73430 2483 280568 4919826 496 | |||
deg 245 70 141988 2016 671 34515 493924 | |||
* There are many stretched mates; no compressed ones !!! | |||
0cvg | |||
#elem min max mean median n50 sum | |||
1con 10 9 804 147 80 804 1467 | |||
ctg-deg 224 1 2197 604 633 715 135220 | |||
Ctg 0 cvg regions: | |||
#id len gc% start end len cvg | |||
ctg7180000001099 2637 35.04 439 2637 2198 0 33.97% gc ; blastn:96% identity to Bacilus cereus ; blastx: alpha/beta hydrolase & protein disulfide isomerase hits | |||
ctg7180000001288 28653 36.17 19125 19233 108 0 52.29% gc ; blastn: 100% identity to Ames Ancestor & Sterne | |||
ctg7180000001288 28653 36.17 28636 28653 17 0 aligned to other Bacillus anthracis strains | |||
ctg7180000001300 156978 35.35 1 161 160 0 46.58% gc ; clonning vector | |||
ctg7180000001302 70072 34.28 69464 70072 608 0 55.83% gc ; clonning vector | |||
Deg 0 cvg regions: | |||
all have high GC% (>48.24%) probably cloning vector | |||
Ref breaks: | |||
NC_003997.3 145564 | |||
NC_003997.3 627742 | |||
NC_003997.3 1151234 | |||
NC_003997.3 2085561 | |||
NC_003997.3 3515380 | |||
=== CA bog === | |||
#elem min max mean median n50 sum | |||
scf 94 1006 1158003 55534 14912 636683 5220223 | |||
ctg 103 1006 594903 50630 15054 280568 5214932 | |||
deg 501 70 15945 1044 609 1432 522839 | |||
0cvg | |||
#elem min max mean median n50 sum | |||
1con 7 9 120 72.14 75 112 505 | |||
ctg-deg 236 3 2197 607 632 713 143262 | |||
Ref breaks: | |||
NC_003997.3 145564 | |||
NC_003997.3 627742 | |||
NC_003997.3 1151234 | |||
NC_003997.3 2085561 | |||
NC_003997.3 3515380 | |||
bog vs unitigger: | |||
count avg max N50 totalBases | |||
scaff + - + - + | |||
ctg + - - + + | |||
deg + - - . + | |||
sur + - - . + | |||
utg + - + . . | |||
sing - . . . . | |||
== Ames Ancestor == | |||
* Complete & in AA | |||
NC_007530.2 5227419 35.38 Bacillus anthracis str. 'Ames Ancestor', complete genome | |||
NC_007322.2 181677 32.53 Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence | |||
NC_007323.3 94830 33.04 Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence | |||
total 5503926 | |||
* Downloaded from AA and converted to bank | |||
* Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293 | |||
=== CA === | |||
#elem min max mean median n50 sum | |||
ctg 13 1316 2750171 413057 187947 2750171 5369744 | |||
deg 11 271 62530 13876 1627 62530 152635 | |||
ctg+deg 24 271 2750171 230099 37452 2750171 5522379 | |||
scf 11 1316 2750171 488185 212112 2750171 5370036 | |||
Ref: no alignment breaks, no 0cvg regions | |||
== Sterne == | |||
* Complete but not in AA. | |||
RefId Len GC% | |||
chromosome NC_005945.1 5228663 35.38 | |||
pXO1 NC_001496.1 181654 32 | |||
pXO2 NC_002146.1 96231 33 | |||
!!! the plasmids are not listed with the genome project | |||
* Traces available in TA. Looks like some reads are missing; I'm getting many 0 cvg regions when aligning the reads to the finished genome | |||
#elem min max mean median n50 sum | |||
0cvg 53 8 4946 529 272 1508 28017 | |||
=== AMOScmp === | |||
* version: June 12 2007 | |||
* untrimmed reads => 358 ctg | |||
* 53 zero cvg regions, max is almost 5K | |||
=== AMOScmp-alignmentTrimmed === | |||
* reads are trimmed according to alignment coords | |||
#elem min max mean median n50 sum | |||
ctg 46 736 468060 112792 49000 355370 5188425 | |||
=== CA === | |||
* runCA-OBT.pl script | |||
* version 5.1 | |||
* Output: | |||
#elem min max mean median n50 sum | |||
ctg 204 1000 468299 26560 1310 189443 5418252 | |||
deg 145 266 32820 1756 804 21949 254596 | |||
ctg+deg 349 266 468299 16255 1146 181331 5672848 | |||
scf 186 1000 671877 29170 1274 294809 5425585 | |||
singleton 2418 | |||
Ctg 0 cvg regions: | |||
all have high GC% (>52.08%) probably cloning vector | |||
Deg 0 cvg regions: actually they align to pXO1 | |||
#id len gc% | |||
deg7180000001258.1-297 297 31.65 | |||
deg7180000001300.420-6468 6049 32.05 | |||
deg7180000001254.1-28442 28442 33.01 | |||
deg7180000001258.557-1060 504 34.52 | |||
deg7180000001253.1-576 576 35.76 | |||
deg7180000001300.6728-6824 97 36.08 | |||
deg7180000001300.1-160 160 38.12 | |||
.. | |||
=== minimus2 === | |||
Input: | |||
#elem min max mean median n50 sum | |||
AMOS 46 736 468060 112792 49000 355370 5188425 | |||
CA 349 266 468299 16255 1146 181331 5672848 | |||
AMOS+CA 395 266 468299 27497 1210 215336 10861273 | |||
Output: | |||
#elem min max mean median n50 sum | |||
ctg 47 931 468316 110527 37238 355370 5194753 | |||
singl 282 24 32820 1602 1053 1341 451821 | |||
ctg+singl 329 24 468316 17163 1103 296377 5646574 | |||
== Vollum (DOE) == | |||
NCBI data: | |||
#elem min max mean median n50 sum | |||
ctg 52 311 812727 105547 29178 400992 5488459 | |||
=== CA === | |||
#elem min max mean median n50 sum | |||
ctg 39 1073 1541457 136843 51912 422289 5336858 | |||
deg 18 693 54711 7994 2244 54711 143898 | |||
scf 25 1440 1593252 214001 94958 676360 5350037 | |||
!!! Bigger contigs | |||
* No alignment breaks vs NCBI assembly | |||
=== CA bog === | |||
Same number of scf & ctg; fewer & longer unitigs than CA | |||
== A0442 (LANL) == | |||
* no traces | |||
* NCBI | |||
#elem min max mean median n50 sum | |||
ctg 46 7229 1040654 116844 74960 223192 5374836 | |||
Some contigs align to Ames Ancestor pXO1 & pXO2 | |||
There are some "unique" regions in A0442 not present in Ames Ancestor |
Latest revision as of 14:59, 23 September 2008
Background
- 89 known strains
- Most virulent: Ames(USA 2001), Vollum (WWII, biological weapon)
- Benign: Sterne (used as vaccine)
Virulence factors that distinguish Bacillus anthracis from Bacillus cereus are encoded on two plasmids, pXO1 (anthrax toxin) and pXO2 (capsule genes). The capsule protects against phagocytosis once the vegetative bacterium enters the bloodstream. The anthrax toxin consists of 3 components, a protective antigen (PA), lethal factor (LF), and edema factor (EF). PA/LF and PA/EF complexes are internalized by host cells where the LF (metalloprotease) and EF (calmodulin-dependent adenylate cyclase) components act. At high levels LF induces cell death and release of the bacterium while EF increases host susceptibility to infection and promotes fluid accumulation in the cells. (NCBI)
- Ames, AmesAncestor, Stern are 99.9% identical; no rearrangements
- The chromosome and plasmids don't seem to share sequence; few < 2Kb alignments at < 92%id
Genome Projects (listed by NCBI)
Center Complete Assembly Progress Total TIGR/JCVI 2 8 1 11 LANL 0 6 0 6 DOE 1 0 0 1 NMRC 0 0 2 2 Total 3 14 3 20
TIGR/JCVI Strains
Contigs Traces Status Completed Strain 0 96,532 Progress . A0039 62 67,600 Assembly 2007/07/25 Tsiankovskii-I AA ; ??? possible update 1(+2) 101,379 Complete 2004/05/20 Ames Ancestor (Ames 0581) AA ; Insignia; pXO1, pXO2 42 86,181 Assembly 2004/06/04 A1055 AA ; Insignia 1(+2)469 83,552 Assembly 2005/05/16 A2012 Insignia; The 1 contig contains 469 gaps; 65508 of the traces have no qualities; pXO1, pXO2 1 125,879 Complete 2002/05/16 Ames ??? complete but not in AA ; Insignia 49 0 Assembly 2004/06/07 Australia 94 ??? no TRACES ; Insignia 30 90,308 Assembly 2004/06/04 CNEVA-9066 Insignia 64 92,429 Assembly 2004/06/07 Kruger B AA ; Insignia 52 103,144 Assembly 2004/06/04 Vollum Insignia 44 95,078 Assembly 2004/06/07 Western North America USA6153 Insignia
LANL Strains
No traces in TA; none in Insignia
Contigs Traces Status Completed Strain 60 0 Assembly 2008/04/08 A0174 60 0 Assembly 2008/02/12 A0193 68 0 Assembly 2008/03/24 A0389 46 0 Assembly 2008/02/12 A0442 57 0 Assembly 2008/03/24 A0465 63 0 Assembly 2008/01/16 A0488
+ 2 plasmid genome projects (pX01,pX02) completed in 1999
DOE Strains
Contigs Traces Status Completed Strain 1 147,665 Complete 2004/06/24 Sterne ??? complete but not in AA; Insignia; pXO1, pXO2
Contigs Traces Status Strain 0 0 Progress 34F2(NMRC) 0 0 Progress 34F2 delta gerH
Complete
Strain Status chromosome pXO1 pXO2 A2012 Assembly . 181677 94829 Ames Complete 5227293 . . AmesAncestor Complete 5227419 181677 94830 Sterne Complete 5228663 181654 96231
Genome Projects (not listed by NCBI)
BCM
Data available "by request"
Strains reads cvg ctgs N50ctg AssemblyDate 31-101 451,308 9.4 3,418 2,592 5-22-2006 500 363,269 7.6 5,578 1,744 5-22-2006
Strain Assemblies
A2012
- NCBI Genome
RefId Len GC% chromosome NZ_AAAC02000001 5093554 35.36 pXO1 NC_003980 181677 32 # 100% identical to AmesAncestor pXO1 pXO2 NC_003981 94829 33 # 99.99% identical to AmesAncestor pXO2; 1 del
chromosome #elem min max mean median n50 sum ctg 425 94 132589 11885 6855 24366 5051252 # 42,346 N's
Traces:
- 18,045 reads have qual. & 65,507 don't
Libraries: Lib Mean Stdev Count T13322 2000 600 32133 T13323 4000 1200 31455 1047127226559 2000 600 18036 T10914 3000 900 1719 T10930 10000 3000 150 GBZH 4500 . 29 ... Total 83553
CA
Summary
#elem min max mean median n50 sum scf 325 1001 245405 16824 1250 86284 5467834 ctg 476 844 122753 11450 2025 33522 5450075 # larger N50 than the NCBI assembly deg 1760 172 4878 852 818 882 1498894
0cvg(no plasmids)
#elem min max mean median n50 sum 1con 223 1 783 91.28 33 252 20355 ctg-deg 973 1 10316 239 140 444 232946
0cvg(including plasmids)
#elem min max mean median n50 sum ctg-deg 841 1 3183 183 82 402 154114
!!! there are some regions in the CA assembly not present in the reference
CA bog
#elem min max mean median n50 sum scf 334 930 282449 16611 1227 92676 5547969 ctg 513 643 105647 10776 1630 36335 5528273 deg 1898 172 8320 875 827 883 1659854
bog vs unitigger:
count avg max N50 totalBases scaff + - + + + ctg + - - + + deg + + + . + sur + - + . + utg - + + . . sing - . . . .
AMOScmp-alignmentTrimmed
- Ref: Ames
#elem min max mean median n50 sum ctg 123 122 215418 42281 24322 98944 5200569
!!! larger contigs than NCBI/CA assembly
- Ref : A2012 pXO1 => 1ctg
- Ref : A2012 pXO2 => 1ctg & 2 snps
Ames
- Complete
NC_003997.3 5227293 35.38 Bacillus anthracis str. Ames, complete genome
- not in AA
AMOScmp-alignmentTrimmed
- no 0 cvg regions when factory trimmed reads aligned to it
- -D LAYERR=90 => 1 piece
ref=5227293 bp assembly=5227311 amosvalidate=>1555 snps nucmer align of assembly to ref & filter -q => 287 snps
- Many stretched & missoriented mates
- Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/Ames/Assembly/2008_0820_AMOScmp-alignmentTrimmed
CA
Output:
#elem min max mean median n50 sum snps scf 59 1021 893362 83396 1970 593436 4920387 ctg 67 1021 736364 73430 2483 280568 4919826 496 deg 245 70 141988 2016 671 34515 493924
- There are many stretched mates; no compressed ones !!!
0cvg
#elem min max mean median n50 sum 1con 10 9 804 147 80 804 1467 ctg-deg 224 1 2197 604 633 715 135220
Ctg 0 cvg regions:
#id len gc% start end len cvg ctg7180000001099 2637 35.04 439 2637 2198 0 33.97% gc ; blastn:96% identity to Bacilus cereus ; blastx: alpha/beta hydrolase & protein disulfide isomerase hits ctg7180000001288 28653 36.17 19125 19233 108 0 52.29% gc ; blastn: 100% identity to Ames Ancestor & Sterne ctg7180000001288 28653 36.17 28636 28653 17 0 aligned to other Bacillus anthracis strains ctg7180000001300 156978 35.35 1 161 160 0 46.58% gc ; clonning vector ctg7180000001302 70072 34.28 69464 70072 608 0 55.83% gc ; clonning vector
Deg 0 cvg regions:
all have high GC% (>48.24%) probably cloning vector
Ref breaks:
NC_003997.3 145564 NC_003997.3 627742 NC_003997.3 1151234 NC_003997.3 2085561 NC_003997.3 3515380
CA bog
#elem min max mean median n50 sum scf 94 1006 1158003 55534 14912 636683 5220223 ctg 103 1006 594903 50630 15054 280568 5214932 deg 501 70 15945 1044 609 1432 522839
0cvg
#elem min max mean median n50 sum 1con 7 9 120 72.14 75 112 505 ctg-deg 236 3 2197 607 632 713 143262
Ref breaks:
NC_003997.3 145564 NC_003997.3 627742 NC_003997.3 1151234 NC_003997.3 2085561 NC_003997.3 3515380
bog vs unitigger:
count avg max N50 totalBases scaff + - + - + ctg + - - + + deg + - - . + sur + - - . + utg + - + . . sing - . . . .
Ames Ancestor
- Complete & in AA
NC_007530.2 5227419 35.38 Bacillus anthracis str. 'Ames Ancestor', complete genome NC_007322.2 181677 32.53 Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence NC_007323.3 94830 33.04 Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence total 5503926
- Downloaded from AA and converted to bank
- Location: /fs/szasmg3/dpuiu/Bacilus_anthracis/AmesAncestor/CS--AI-293
CA
#elem min max mean median n50 sum ctg 13 1316 2750171 413057 187947 2750171 5369744 deg 11 271 62530 13876 1627 62530 152635 ctg+deg 24 271 2750171 230099 37452 2750171 5522379 scf 11 1316 2750171 488185 212112 2750171 5370036
Ref: no alignment breaks, no 0cvg regions
Sterne
- Complete but not in AA.
RefId Len GC% chromosome NC_005945.1 5228663 35.38 pXO1 NC_001496.1 181654 32 pXO2 NC_002146.1 96231 33
!!! the plasmids are not listed with the genome project
- Traces available in TA. Looks like some reads are missing; I'm getting many 0 cvg regions when aligning the reads to the finished genome
#elem min max mean median n50 sum 0cvg 53 8 4946 529 272 1508 28017
AMOScmp
- version: June 12 2007
- untrimmed reads => 358 ctg
- 53 zero cvg regions, max is almost 5K
AMOScmp-alignmentTrimmed
- reads are trimmed according to alignment coords
#elem min max mean median n50 sum ctg 46 736 468060 112792 49000 355370 5188425
CA
- runCA-OBT.pl script
- version 5.1
- Output:
#elem min max mean median n50 sum ctg 204 1000 468299 26560 1310 189443 5418252 deg 145 266 32820 1756 804 21949 254596 ctg+deg 349 266 468299 16255 1146 181331 5672848 scf 186 1000 671877 29170 1274 294809 5425585 singleton 2418
Ctg 0 cvg regions:
all have high GC% (>52.08%) probably cloning vector
Deg 0 cvg regions: actually they align to pXO1
#id len gc% deg7180000001258.1-297 297 31.65 deg7180000001300.420-6468 6049 32.05 deg7180000001254.1-28442 28442 33.01 deg7180000001258.557-1060 504 34.52 deg7180000001253.1-576 576 35.76 deg7180000001300.6728-6824 97 36.08 deg7180000001300.1-160 160 38.12 ..
minimus2
Input:
#elem min max mean median n50 sum AMOS 46 736 468060 112792 49000 355370 5188425 CA 349 266 468299 16255 1146 181331 5672848 AMOS+CA 395 266 468299 27497 1210 215336 10861273
Output:
#elem min max mean median n50 sum ctg 47 931 468316 110527 37238 355370 5194753 singl 282 24 32820 1602 1053 1341 451821 ctg+singl 329 24 468316 17163 1103 296377 5646574
Vollum (DOE)
NCBI data:
#elem min max mean median n50 sum ctg 52 311 812727 105547 29178 400992 5488459
CA
#elem min max mean median n50 sum ctg 39 1073 1541457 136843 51912 422289 5336858 deg 18 693 54711 7994 2244 54711 143898 scf 25 1440 1593252 214001 94958 676360 5350037
!!! Bigger contigs
- No alignment breaks vs NCBI assembly
CA bog
Same number of scf & ctg; fewer & longer unitigs than CA
A0442 (LANL)
- no traces
- NCBI
#elem min max mean median n50 sum ctg 46 7229 1040654 116844 74960 223192 5374836
Some contigs align to Ames Ancestor pXO1 & pXO2 There are some "unique" regions in A0442 not present in Ames Ancestor