Clostridium botulinum: Difference between revisions
		
		
		
		Jump to navigation
		Jump to search
		
| (19 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| = Hall strain A (ATCC 3502) = | |||
| == Data sources == | == Data sources == | ||
| Line 8: | Line 10: | ||
| * [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB.dbs Complete Genome] | * [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB.dbs Complete Genome] | ||
|    chromosome: 3,886,916 bp 28.24 GC% |    chromosome: 3,886,916 bp 28.24 GC% | ||
|    plasmid:  |    plasmid:    16,344 bp 26.80 GC% | ||
|   genes:      3,616 | |||
|    [[Media:CB-CB.png|Mummerplot: Complete Genome vs Complete Genome]] |    [[Media:CB-CB.png|Mummerplot: Complete Genome vs Complete Genome]] | ||
| * [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB_shotgun.dbs 07/16/2001 Traces] | * [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB_shotgun.dbs 07/16/2001 Traces] | ||
|   Justin Parkhill : "vector and quality trimming, as well as contamination checks has been done on all traces" | |||
|    63,115 Sanger reads |    63,115 Sanger reads | ||
| Line 21: | Line 26: | ||
|      no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid) |      no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid) | ||
|      no library info  : assumed there was only one library used |      no library info  : assumed there was only one library used | ||
|      there are 124 regions in the reference which are not covered by reads |      there are 124 regions in the reference which are not covered by reads | ||
|      17K reads missing from Sanger ftp |      17K reads missing from Sanger ftp | ||
| Line 39: | Line 42: | ||
|    Total                 78976 |    Total                 78976 | ||
| NCBI :   |   77250 reads aligned by nucmer -c 30 to the reference | ||
|   reads were trimmed based on alignment | |||
|   avgReadLen=503 | |||
|   avgReadClr=499 | |||
| * [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB.pep 3,616 predicted genes] | |||
| '''NCBI:'''  | |||
| * [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&Cmd=ShowDetailView&TermToSearch=21011 Genome project] | |||
|   Name           Length  %GC | |||
|   AM412317.1     3886916 28.24  # chromosome | |||
|   AM412318.1     16344   26.80  # plasmid pBOT3502 | |||
|   3574 chromosome genes | |||
|    114 chromosome rRNA's | |||
|     18 plasmid genes  | |||
|    Reads have not been submitted to TA |    Reads have not been submitted to TA | ||
| * [http://www.genome.org/cgi/reprint/gr.6282807v1 Paper: Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes; Genome Res. published online May 22, 2007] | * [http://www.genome.org/cgi/reprint/gr.6282807v1 Paper: Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes; Genome Res. published online May 22, 2007] | ||
| Line 52: | Line 72: | ||
| == Assembly  == | == Assembly  == | ||
| '''Location:''' | |||
|   /fs/szasmg/Bacteria/C_botulinum | |||
|   /fs/szdata/ncbi/genomes/Bacteria/Clostridium_botulinum_A/  | |||
| '''2007_0725_WGA''' | '''2007_0725_WGA''' | ||
|      on the 63,115 Sanger reads | |||
|      runCA-OBT.pl (default params)   |      runCA-OBT.pl (default params)   | ||
|      location: 2007_0725_WGA |      location: 2007_0725_WGA | ||
|      => 109 scaffolds, 243 contigs |      => 109 scaffolds, 243 contigs, 3,823,075 bp | ||
|      => library inser estimates mean=1840.917 stdev=866.039 |      => library inser estimates mean=1840.917 stdev=866.039 | ||
| '''2007_0801_AMOScmp-relaxed''' | '''2007_0801_AMOScmp-relaxed''' | ||
|    on the 63,115 Sanger reads | |||
|     MINCLUSTER=30 , MAXTRIM=50 |     MINCLUSTER=30 , MAXTRIM=50 | ||
|     => 2 scaffolds, 148 contigs |     => 2 scaffolds, 148 contigs, 3,883,789 bp | ||
|    [[Media:CB.2007_0801_AMOScmp-relaxed.qc|CB.qc]] |    [[Media:CB.2007_0801_AMOScmp-relaxed.qc|CB.qc]] | ||
|    [[Media:CB.2007_0801_AMOScmp-relaxed.chromo.png|CB.chromo.png]] |    [[Media:CB.2007_0801_AMOScmp-relaxed.chromo.png|CB.chromo.png]] | ||
| Line 68: | Line 93: | ||
|    [[Media:CB-scaff.2007_0801_AMOScmp-relaxed.png|CB-scaff.png]] |    [[Media:CB-scaff.2007_0801_AMOScmp-relaxed.png|CB-scaff.png]] | ||
| ''' | ---- | ||
|    /fs/ | |||
| '''2007_0830_WGA''' | |||
|   on the 78,975  Sanger reads; no OBT | |||
|   => 81 scaff, 106 contigs, 3,873,432 bp | |||
| '''2007_0830_AMOScmp-relaxed''' | |||
|   on the 78,975  Sanger reads | |||
|   => 2 scaff,  24 contigs, 3,902,812 bp | |||
| '''2007_0831_AMOScmp-relaxed''' | |||
|   on the 78,975  Sanger reads | |||
|   => 2 scaff,  22 contigs, 3,902,971 bp | |||
| '''2007_0906_AMOScmp-nucmer -> best''' | |||
|   on the 78,975  Sanger reads | |||
|   reads have been trimmed to their maximum alignment coordinates | |||
|   => 2 scaff, 2 contigs, 3,087 singletons ; 3,903,275 bp  | |||
|   1              3886795 28.25 (121 bp shorter than the reference) | |||
|   2                16344 26.80 | |||
|   Gene mappings: | |||
|     /fs/szasmg/Bacteria/C_botulinum/2007_0906_AMOScmp-nucmer/CB.ptt | |||
|  Hawkeye screen captures: Cbot:yellow; Cbot_ends:pink; J:green; CBOT:red; CbBAC:blue | |||
|   [[Media:CB.2007_0906_AMOScmp-nucmer.chromo.png|CB.chromo.png]] | |||
|   [[Media:CB.2007_0906_AMOScmp-nucmer.chromo.lib.png|CB.chromo.lib.png]] | |||
|   [[Media:CB.2007_0906_AMOScmp-nucmer.plasmid.png|CB.plasmid.png]] | |||
|   [[Media:CB.2007_0906_AMOScmp-nucmer.plasmid.lib.png|CB.plasmid.lib.png]] | |||
| = Other strains = | |||
| Summary: | |||
|   ~ 20 strains in NCBI Taxonomy | |||
|   9 genome projects | |||
|   8 complete genomes | |||
|   3 assemblies in NCBI AA (all TIGR/JCVI) | |||
|   6 trace sets (5 TIGR/JCVI , 1 Sanger) | |||
| == Data sources == | |||
| '''NCBI:'''  | |||
| * [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1491 Tax Browser] | |||
| * [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=search&term=Clostridium%20botulinum Genome Projects] | |||
| * [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ NCBI_TA_FTP] | |||
| Genome projects:	 | |||
|    1. A str. ATCC 19397 [LANL/JCVI/DOE] complete ; uploaded to Insignia | |||
|   2*: A str. ATCC 3502 [Sanger] complete ; uploaded to Insignia | |||
|   3: A str. Hall [LANL/JCVI/DOE] complete ; uploaded to Insignia | |||
|   4: Bf [JCVI/MSC] 70 contigs ; in AA | |||
|   5: C str. Eklund [JCVI/MSC] 76 contigs  ; in AA | |||
|   6: F str. Langeland plasmid pCLI [LANL/JCVI/DOE] complete ; uploaded to Insignia | |||
|   7: G [JCVI/MSC] 64 contigs  ; not in AA | |||
|   8: NCTC 2916 [JCVI/MSC] 70 contigs  ; in AA | |||
|   9: str. Iwanei E [JCVI/MSC] 66 contigs ; not in AA | |||
| New (Aug 19th 2008): | |||
|   10: A2 str. Kyoto-F [LANL/JCVI/DOE] progress | |||
|   11: A3 str. Loch Maree [LANL/JCVI/DOE] complete   | |||
|   12: B str. Eklund 17B [LANL/JCVI/DOE] complete  | |||
|   13: B1 str. Okra [LANL/JCVI/DOE] complete  | |||
|   14: Ba4 str. 657 [LANL/JCVI/DOE] progress | |||
|   15: E1 str. 'BoNT E Beluga' [LANL/JCVI/DOE] progress | |||
|   16: E3 str. Alaska E43 [LANL/JCVI/DOE] complete !!! not in Insignia | |||
| File locations: | |||
|   /fs/szasmg2/Bacteria/C_botulinum/ | |||
| NCBI AA assemblied: | |||
|   [[Media:C_botulinum.AA.qc.combine|qc stats]] | |||
| CBCB CA3 assemblies: | |||
|                         Placed  Deg  Total | |||
|   4: Bf                     52   12     64   better than AA(fewer, avg contig len is larger) | |||
|   5: C str. Eklund          51    3     54   better than AA(fewer, avg contig len is larger) | |||
|   7: G                      47    7     54   fewer contigs than AA | |||
|   8: NCTC 2916              55   11     66   better than AA(fewer, avg contig len is larger) | |||
|   9: str. Iwanei E          44   10     55   fewer contigs than AA | |||
|   [[Media:C_botulinum.2007_1005_WGA.qc.combine|qc stats]] | |||
| Other links: | |||
|   * [http://www.cfsan.fda.gov/~mow/chap2.html FDA] | |||
|   "Clostridium botulinum is an anaerobic, Gram-positive, spore-forming rod that roduces a potent  neurotoxin.  | |||
|   The spores are heat-resistant and can survive in foods that are incorrectly or minimally processed.  | |||
|   Seven types (A, B, C, D, E, F and G) of botulism are recognized, based on the antigenic specificity of the toxin produced by each strain.  | |||
|   Types A, B, E and F cause human botulism.  | |||
|   Types C and D cause most cases of botulism in animals.  | |||
|   Animals most commonly affected are wild fowl and poultry, cattle, horses and some species of fish.  | |||
|   Although type G has been isolated from soil in Argentina, no outbreaks involving it have been recognized." | |||
| == Insignia uploads == | |||
| Assemblies selected: | |||
|     /fs/szasmg2/Bacteria/C_botulinum/A_ATCC_3502/best/ | |||
|     /fs/szasmg2/Bacteria/C_botulinum/Bf/best/ | |||
|     /fs/szasmg2/Bacteria/C_botulinum/C_str__eklund/best/ | |||
|     /fs/szasmg2/Bacteria/C_botulinum/G/best/ | |||
|     /fs/szasmg2/Bacteria/C_botulinum/NCTN_2916/best/ | |||
|     /fs/szasmg2/Bacteria/C_botulinum/str__iwanei_e/best/ | |||
Latest revision as of 17:03, 8 December 2008
Hall strain A (ATCC 3502)
Data sources
Sanger:
chromosome: 3,886,916 bp 28.24 GC% plasmid: 16,344 bp 26.80 GC% genes: 3,616
Mummerplot: Complete Genome vs Complete Genome
Justin Parkhill : "vector and quality trimming, as well as contamination checks has been done on all traces"
63,115 Sanger reads
Read problems: no quality : default 20 assigned to all the bases no mate pairing : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid) no library info : assumed there was only one library used there are 124 regions in the reference which are not covered by reads 17K reads missing from Sanger ftp
78,975 Sanger reads
Cbot[1-9]*.[pq][12] 68028 #article: insert sizes of 1.5–2.kb and 2.0–2.kb, 2.2–2.kb, and 2.5–4.0 kb CbBAC1*.s1c 305 CbBAC4*.[pq]1c 430 CbBAC7*.[spq]1c 474 Cbot_ends*.[pq]1c 1604 #article: 19 kb inserts (2kb stdev) ; based on nucmer alignements: 9kb inserts (2kb stdev) CBOT[1-9]*.[pqw] 509 #415 primer walks CBOTC 166 #all primer walks J*.[pqs] 7459 Total 78976
77250 reads aligned by nucmer -c 30 to the reference reads were trimmed based on alignment avgReadLen=503 avgReadClr=499
NCBI:
Name Length %GC AM412317.1 3886916 28.24 # chromosome AM412318.1 16344 26.80 # plasmid pBOT3502 3574 chromosome genes 114 chromosome rRNA's 18 plasmid genes
Reads have not been submitted to TA
The initial genome assembly was obtained from:
- 69,632 paired end sequences (giving 9.15-fold coverage) derived from four genomic shotgun libraries (all in pUC18 with insert sizes of 1.5–2.0 kb and 2.0–2.2 kb, 2.2–2.5 kb, and 2.5–4.0 kb) using dye terminator chemistry on ABI3700 automated sequencers;
- 1,604 pairedend sequences from one pBACe3.6 library with insert sizes of 15–23 kb (a clone coverage of 3.9-fold) were used as a scaffold.
- 9,343 directed sequencing reads were generated during finishing.
(Total 80,579 reads => 17,464 missing from ftp site)
Assembly
Location:
/fs/szasmg/Bacteria/C_botulinum /fs/szdata/ncbi/genomes/Bacteria/Clostridium_botulinum_A/
2007_0725_WGA
on the 63,115 Sanger reads runCA-OBT.pl (default params) location: 2007_0725_WGA => 109 scaffolds, 243 contigs, 3,823,075 bp => library inser estimates mean=1840.917 stdev=866.039
2007_0801_AMOScmp-relaxed
on the 63,115 Sanger reads MINCLUSTER=30 , MAXTRIM=50 => 2 scaffolds, 148 contigs, 3,883,789 bp CB.qc CB.chromo.png CB.plasmid.png CB-scaff.png
2007_0830_WGA
on the 78,975 Sanger reads; no OBT => 81 scaff, 106 contigs, 3,873,432 bp
2007_0830_AMOScmp-relaxed
on the 78,975 Sanger reads => 2 scaff, 24 contigs, 3,902,812 bp
2007_0831_AMOScmp-relaxed
on the 78,975 Sanger reads => 2 scaff, 22 contigs, 3,902,971 bp
2007_0906_AMOScmp-nucmer -> best
on the 78,975 Sanger reads reads have been trimmed to their maximum alignment coordinates => 2 scaff, 2 contigs, 3,087 singletons ; 3,903,275 bp 1 3886795 28.25 (121 bp shorter than the reference) 2 16344 26.80 Gene mappings: /fs/szasmg/Bacteria/C_botulinum/2007_0906_AMOScmp-nucmer/CB.ptt Hawkeye screen captures: Cbot:yellow; Cbot_ends:pink; J:green; CBOT:red; CbBAC:blue CB.chromo.png CB.chromo.lib.png CB.plasmid.png CB.plasmid.lib.png
Other strains
Summary:
~ 20 strains in NCBI Taxonomy 9 genome projects 8 complete genomes 3 assemblies in NCBI AA (all TIGR/JCVI) 6 trace sets (5 TIGR/JCVI , 1 Sanger)
Data sources
NCBI:
Genome projects:
1. A str. ATCC 19397 [LANL/JCVI/DOE] complete ; uploaded to Insignia 2*: A str. ATCC 3502 [Sanger] complete ; uploaded to Insignia 3: A str. Hall [LANL/JCVI/DOE] complete ; uploaded to Insignia 4: Bf [JCVI/MSC] 70 contigs ; in AA 5: C str. Eklund [JCVI/MSC] 76 contigs ; in AA 6: F str. Langeland plasmid pCLI [LANL/JCVI/DOE] complete ; uploaded to Insignia 7: G [JCVI/MSC] 64 contigs ; not in AA 8: NCTC 2916 [JCVI/MSC] 70 contigs ; in AA 9: str. Iwanei E [JCVI/MSC] 66 contigs ; not in AA
New (Aug 19th 2008):
10: A2 str. Kyoto-F [LANL/JCVI/DOE] progress 11: A3 str. Loch Maree [LANL/JCVI/DOE] complete 12: B str. Eklund 17B [LANL/JCVI/DOE] complete 13: B1 str. Okra [LANL/JCVI/DOE] complete 14: Ba4 str. 657 [LANL/JCVI/DOE] progress 15: E1 str. 'BoNT E Beluga' [LANL/JCVI/DOE] progress 16: E3 str. Alaska E43 [LANL/JCVI/DOE] complete !!! not in Insignia
File locations:
/fs/szasmg2/Bacteria/C_botulinum/
NCBI AA assemblied:
qc stats
CBCB CA3 assemblies:
Placed Deg Total 4: Bf 52 12 64 better than AA(fewer, avg contig len is larger) 5: C str. Eklund 51 3 54 better than AA(fewer, avg contig len is larger) 7: G 47 7 54 fewer contigs than AA 8: NCTC 2916 55 11 66 better than AA(fewer, avg contig len is larger) 9: str. Iwanei E 44 10 55 fewer contigs than AA
qc stats
Other links:
* FDA "Clostridium botulinum is an anaerobic, Gram-positive, spore-forming rod that roduces a potent neurotoxin. The spores are heat-resistant and can survive in foods that are incorrectly or minimally processed. Seven types (A, B, C, D, E, F and G) of botulism are recognized, based on the antigenic specificity of the toxin produced by each strain. Types A, B, E and F cause human botulism. Types C and D cause most cases of botulism in animals. Animals most commonly affected are wild fowl and poultry, cattle, horses and some species of fish. Although type G has been isolated from soil in Argentina, no outbreaks involving it have been recognized."
Insignia uploads
Assemblies selected:
/fs/szasmg2/Bacteria/C_botulinum/A_ATCC_3502/best/ /fs/szasmg2/Bacteria/C_botulinum/Bf/best/ /fs/szasmg2/Bacteria/C_botulinum/C_str__eklund/best/ /fs/szasmg2/Bacteria/C_botulinum/G/best/ /fs/szasmg2/Bacteria/C_botulinum/NCTN_2916/best/ /fs/szasmg2/Bacteria/C_botulinum/str__iwanei_e/best/