Clostridium botulinum: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Dpuiu (talk | contribs)
Dpuiu (talk | contribs)
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Hall strain A (ATCC 3502) =
== Data sources ==
== Data sources ==


Sanger:
'''Sanger:'''


* [http://www.sanger.ac.uk/Projects/C_botulinum/ Genome Project]
* [http://www.sanger.ac.uk/Projects/C_botulinum/ Genome Project]
* [http://www.sanger.ac.uk/Software/formats/glossary/naming.shtml Naming Conventions]


* [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB.dbs Complete Genome]
* [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB.dbs Complete Genome]
  Hall strain A (ATCC 3502)
   chromosome: 3,886,916 bp 28.24 GC%
   chromosome: 3,886,916 bp 28.24 GC%
   plasmid:     16,344 bp 26.80 GC%
   plasmid:   16,344 bp 26.80 GC%
  genes:      3,616
 
  [[Media:CB-CB.png|Mummerplot: Complete Genome vs Complete Genome]]
 
* [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB_shotgun.dbs 07/16/2001 Traces]


   [https://wiki.umiacs.umd.edu/cbcb/images/3/31/CB-CB.png Mummerplot  Complete Genome vs Complete Genome]
   Justin Parkhill : "vector and quality trimming, as well as contamination checks has been done on all traces"


* [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB_shotgun.dbs Traces]
   63,115 Sanger reads
   63,115 Sanger reads


   Read problems:
   Read problems:
     no quality      : default 20 assigned to all the bases
     no quality      : default 20 assigned to all the bases
     no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates)
     no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid)
     no library info  : assumed there was only one library used
     no library info  : assumed there was only one library used
    no trimming info : almost all reads have "CONTAINED" alignments to the reference
                      CLR=1,len(read)
     there are 124 regions in the reference which are not covered by reads
     there are 124 regions in the reference which are not covered by reads
    17K reads missing from Sanger ftp
* [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB_shotgun_all.dbs 08/30/2007 Traces]
  78,975  Sanger reads
  Cbot[1-9]*.[pq][12]    68028    #article: insert sizes of 1.5–2.kb and 2.0–2.kb, 2.2–2.kb, and 2.5–4.0 kb
  CbBAC1*.s1c            305
  CbBAC4*.[pq]1c          430
  CbBAC7*.[spq]1c        474
  Cbot_ends*.[pq]1c      1604    #article: 19 kb inserts (2kb stdev) ; based on nucmer alignements: 9kb inserts (2kb stdev)
  CBOT[1-9]*.[pqw]        509    #415 primer walks
  CBOTC                  166    #all primer walks
  J*.[pqs]              7459
  Total                78976
  77250 reads aligned by nucmer -c 30 to the reference
  reads were trimmed based on alignment
  avgReadLen=503
  avgReadClr=499
* [ftp://ftp.sanger.ac.uk/pub/pathogens/cb/CB.pep 3,616 predicted genes]
'''NCBI:'''
* [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&Cmd=ShowDetailView&TermToSearch=21011 Genome project]
  Name          Length  %GC
  AM412317.1    3886916 28.24  # chromosome
  AM412318.1    16344  26.80  # plasmid pBOT3502
  3574 chromosome genes
  114 chromosome rRNA's
    18 plasmid genes


NCBI :
   Reads have not been submitted to TA
   Reads have not been submitted to TA


* [http://www.genome.org/cgi/reprint/gr.6282807v1 Paper: Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes; Genome Res. published online May 22, 2007]
* [http://www.genome.org/cgi/reprint/gr.6282807v1 Paper: Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes; Genome Res. published online May 22, 2007]


''The initial genome assembly was obtained from 69,632 paired end sequences (giving 9.15-fold coverage) derived from four genomic shotgun libraries (all in pUC18 with insert sizes of 1.5–2.0 kb and 2.0–2.2 kb, 2.2–2.5 kb, and 2.5–4.0 kb) using dye terminator chemistry on ABI3700 automated sequencers; 1604 pairedend sequences from one pBACe3.6 library with insert sizes of 15–23 kb (a clone coverage of 3.9-fold) were used as a scaffold. A further 9343 directed sequencing reads were generated during finishing.
The initial genome assembly was obtained from:
''
* 69,632 paired end sequences (giving 9.15-fold coverage) derived from four genomic shotgun libraries (all in pUC18 with insert sizes of 1.5–2.0 kb and 2.0–2.2 kb, 2.2–2.5 kb, and 2.5–4.0 kb) using dye terminator chemistry on ABI3700 automated sequencers;  
* 1,604 pairedend sequences from one pBACe3.6 library with insert sizes of 15–23 kb (a clone coverage of 3.9-fold) were used as a scaffold.  
* 9,343 directed sequencing reads were generated during finishing.
(Total 80,579 reads => 17,464 missing from ftp site)


== Assembly  ==
== Assembly  ==


Location:
'''Location:'''
   /fs/szasmg/Bacteria/C_botulinum
   /fs/szasmg/Bacteria/C_botulinum
  /fs/szdata/ncbi/genomes/Bacteria/Clostridium_botulinum_A/


* WGA
'''2007_0725_WGA'''
     create a .frg file
     on the 63,115 Sanger reads
     runCA-OBT.pl (default params)  
     runCA-OBT.pl (default params)  
     location: 2007_0725_WGA
     location: 2007_0725_WGA
     => 109 scaffolds, 243 contigs
     => 109 scaffolds, 243 contigs, 3,823,075 bp
     => library inser estimates mean=1840.917 stdev=866.039
     => library inser estimates mean=1840.917 stdev=866.039


*AMOScmp
'''2007_0801_AMOScmp-relaxed'''
  on the 63,115 Sanger reads
   MINCLUSTER=30 , MAXTRIM=50
   MINCLUSTER=30 , MAXTRIM=50
  location: 2007_0801_AMOScmp-relaxed
   => 2 scaffolds, 148 contigs, 3,883,789 bp
   => 2 scaffolds, 148 contigs
   [[Media:CB.2007_0801_AMOScmp-relaxed.qc|CB.qc]]
   [https://wiki.umiacs.umd.edu/cbcb/images/a/a3/CB.2007_0801_AMOScmp-relaxed.qc CB.qc]
  [[Media:CB.2007_0801_AMOScmp-relaxed.chromo.png|CB.chromo.png]]
   [https://wiki.umiacs.umd.edu/cbcb/images/b/b8/CB.2007_0801_AMOScmp-relaxed.chromo.png CB.chromo.png]
  [[Media:CB.2007_0801_AMOScmp-relaxed.plasmid.png|CB.plasmid.png]]
   [https://wiki.umiacs.umd.edu/cbcb/images/9/97/CB.2007_0801_AMOScmp-relaxed.plasmid.png CB.plasmid.png]
   [[Media:CB-scaff.2007_0801_AMOScmp-relaxed.png|CB-scaff.png]]
   [https://wiki.umiacs.umd.edu/cbcb/images/a/a2/CB-scaff.2007_0801_AMOScmp-relaxed.png CB-scaff.png]
 
----
 
'''2007_0830_WGA'''
  on the 78,975  Sanger reads; no OBT
  => 81 scaff, 106 contigs, 3,873,432 bp
 
'''2007_0830_AMOScmp-relaxed'''
  on the 78,975  Sanger reads
  => 2 scaff,  24 contigs, 3,902,812 bp
 
'''2007_0831_AMOScmp-relaxed'''
  on the 78,975  Sanger reads
  => 2 scaff,  22 contigs, 3,902,971 bp
 
'''2007_0906_AMOScmp-nucmer -> best'''
  on the 78,975  Sanger reads
  reads have been trimmed to their maximum alignment coordinates
  => 2 scaff, 2 contigs, 3,087 singletons ; 3,903,275 bp
  1              3886795 28.25 (121 bp shorter than the reference)
  2                16344 26.80
  Gene mappings:
    /fs/szasmg/Bacteria/C_botulinum/2007_0906_AMOScmp-nucmer/CB.ptt
Hawkeye screen captures: Cbot:yellow; Cbot_ends:pink; J:green; CBOT:red; CbBAC:blue
  [[Media:CB.2007_0906_AMOScmp-nucmer.chromo.png|CB.chromo.png]]
  [[Media:CB.2007_0906_AMOScmp-nucmer.chromo.lib.png|CB.chromo.lib.png]]
   [[Media:CB.2007_0906_AMOScmp-nucmer.plasmid.png|CB.plasmid.png]]
  [[Media:CB.2007_0906_AMOScmp-nucmer.plasmid.lib.png|CB.plasmid.lib.png]]
 
= Other strains =
 
Summary:
  ~ 20 strains in NCBI Taxonomy
  9 genome projects
  8 complete genomes
  3 assemblies in NCBI AA (all TIGR/JCVI)
  6 trace sets (5 TIGR/JCVI , 1 Sanger)
 
== Data sources ==
 
'''NCBI:'''
* [http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1491 Tax Browser]
* [http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome&cmd=search&term=Clostridium%20botulinum Genome Projects]
* [ftp://ftp.ncbi.nih.gov/pub/TraceDB/ NCBI_TA_FTP]
 
Genome projects:
  1. A str. ATCC 19397 [LANL/JCVI/DOE] complete ; uploaded to Insignia
  2*: A str. ATCC 3502 [Sanger] complete ; uploaded to Insignia
  3: A str. Hall [LANL/JCVI/DOE] complete ; uploaded to Insignia
  4: Bf [JCVI/MSC] 70 contigs ; in AA
  5: C str. Eklund [JCVI/MSC] 76 contigs  ; in AA
  6: F str. Langeland plasmid pCLI [LANL/JCVI/DOE] complete ; uploaded to Insignia
  7: G [JCVI/MSC] 64 contigs  ; not in AA
  8: NCTC 2916 [JCVI/MSC] 70 contigs  ; in AA
  9: str. Iwanei E [JCVI/MSC] 66 contigs ; not in AA
 
New (Aug 19th 2008):
  10: A2 str. Kyoto-F [LANL/JCVI/DOE] progress
  11: A3 str. Loch Maree [LANL/JCVI/DOE] complete 
   12: B str. Eklund 17B [LANL/JCVI/DOE] complete
  13: B1 str. Okra [LANL/JCVI/DOE] complete
  14: Ba4 str. 657 [LANL/JCVI/DOE] progress
  15: E1 str. 'BoNT E Beluga' [LANL/JCVI/DOE] progress
  16: E3 str. Alaska E43 [LANL/JCVI/DOE] complete !!! not in Insignia
 
File locations:
  /fs/szasmg2/Bacteria/C_botulinum/
 
NCBI AA assemblied:
  [[Media:C_botulinum.AA.qc.combine|qc stats]]
 
CBCB CA3 assemblies:
                        Placed  Deg  Total
  4: Bf                    52  12    64  better than AA(fewer, avg contig len is larger)
  5: C str. Eklund          51    3    54  better than AA(fewer, avg contig len is larger)
  7: G                      47    7    54  fewer contigs than AA
  8: NCTC 2916              55  11    66  better than AA(fewer, avg contig len is larger)
  9: str. Iwanei E          44  10    55  fewer contigs than AA
 
  [[Media:C_botulinum.2007_1005_WGA.qc.combine|qc stats]]
 
Other links:
  * [http://www.cfsan.fda.gov/~mow/chap2.html FDA]
  "Clostridium botulinum is an anaerobic, Gram-positive, spore-forming rod that roduces a potent  neurotoxin.  
  The spores are heat-resistant and can survive in foods that are incorrectly or minimally processed.
  Seven types (A, B, C, D, E, F and G) of botulism are recognized, based on the antigenic specificity of the toxin produced by each strain.
  Types A, B, E and F cause human botulism.
  Types C and D cause most cases of botulism in animals.
  Animals most commonly affected are wild fowl and poultry, cattle, horses and some species of fish.  
  Although type G has been isolated from soil in Argentina, no outbreaks involving it have been recognized."
 
== Insignia uploads ==
 
Assemblies selected:
    /fs/szasmg2/Bacteria/C_botulinum/A_ATCC_3502/best/
    /fs/szasmg2/Bacteria/C_botulinum/Bf/best/
    /fs/szasmg2/Bacteria/C_botulinum/C_str__eklund/best/
    /fs/szasmg2/Bacteria/C_botulinum/G/best/
    /fs/szasmg2/Bacteria/C_botulinum/NCTN_2916/best/
    /fs/szasmg2/Bacteria/C_botulinum/str__iwanei_e/best/

Latest revision as of 17:03, 8 December 2008

Hall strain A (ATCC 3502)

Data sources

Sanger:

 chromosome: 3,886,916 bp 28.24 GC%
 plasmid:    16,344 bp 26.80 GC%
 genes:      3,616
 Mummerplot: Complete Genome vs Complete Genome
 Justin Parkhill : "vector and quality trimming, as well as contamination checks has been done on all traces"
 63,115 Sanger reads
 Read problems:
   no quality       : default 20 assigned to all the bases
   no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid)
   no library info  : assumed there was only one library used
   there are 124 regions in the reference which are not covered by reads
   17K reads missing from Sanger ftp
 78,975  Sanger reads
 Cbot[1-9]*.[pq][12]    68028    #article: insert sizes of 1.5–2.kb and 2.0–2.kb, 2.2–2.kb, and 2.5–4.0 kb
 CbBAC1*.s1c             305
 CbBAC4*.[pq]1c          430
 CbBAC7*.[spq]1c         474
 Cbot_ends*.[pq]1c      1604     #article: 19 kb inserts (2kb stdev) ; based on nucmer alignements: 9kb inserts (2kb stdev)
 CBOT[1-9]*.[pqw]        509     #415 primer walks
 CBOTC                   166     #all primer walks
 J*.[pqs]               7459
 Total                 78976
 77250 reads aligned by nucmer -c 30 to the reference
 reads were trimmed based on alignment
 avgReadLen=503
 avgReadClr=499

NCBI:

 Name           Length  %GC
 AM412317.1     3886916 28.24  # chromosome
 AM412318.1     16344   26.80  # plasmid pBOT3502

 3574 chromosome genes
  114 chromosome rRNA's
   18 plasmid genes 
 Reads have not been submitted to TA

The initial genome assembly was obtained from:

  • 69,632 paired end sequences (giving 9.15-fold coverage) derived from four genomic shotgun libraries (all in pUC18 with insert sizes of 1.5–2.0 kb and 2.0–2.2 kb, 2.2–2.5 kb, and 2.5–4.0 kb) using dye terminator chemistry on ABI3700 automated sequencers;
  • 1,604 pairedend sequences from one pBACe3.6 library with insert sizes of 15–23 kb (a clone coverage of 3.9-fold) were used as a scaffold.
  • 9,343 directed sequencing reads were generated during finishing.

(Total 80,579 reads => 17,464 missing from ftp site)

Assembly

Location:

 /fs/szasmg/Bacteria/C_botulinum
 /fs/szdata/ncbi/genomes/Bacteria/Clostridium_botulinum_A/ 

2007_0725_WGA

   on the 63,115 Sanger reads
   runCA-OBT.pl (default params) 
   location: 2007_0725_WGA
   => 109 scaffolds, 243 contigs, 3,823,075 bp
   => library inser estimates mean=1840.917 stdev=866.039

2007_0801_AMOScmp-relaxed

  on the 63,115 Sanger reads
  MINCLUSTER=30 , MAXTRIM=50
  => 2 scaffolds, 148 contigs, 3,883,789 bp
 CB.qc
 CB.chromo.png
 CB.plasmid.png
 CB-scaff.png

2007_0830_WGA

 on the 78,975  Sanger reads; no OBT
 => 81 scaff, 106 contigs, 3,873,432 bp

2007_0830_AMOScmp-relaxed

 on the 78,975  Sanger reads
 => 2 scaff,  24 contigs, 3,902,812 bp

2007_0831_AMOScmp-relaxed

 on the 78,975  Sanger reads
 => 2 scaff,  22 contigs, 3,902,971 bp

2007_0906_AMOScmp-nucmer -> best

 on the 78,975  Sanger reads
 reads have been trimmed to their maximum alignment coordinates
 => 2 scaff, 2 contigs, 3,087 singletons ; 3,903,275 bp 
 1              3886795 28.25 (121 bp shorter than the reference)
 2                16344 26.80
 Gene mappings:
   /fs/szasmg/Bacteria/C_botulinum/2007_0906_AMOScmp-nucmer/CB.ptt

Hawkeye screen captures: Cbot:yellow; Cbot_ends:pink; J:green; CBOT:red; CbBAC:blue
 CB.chromo.png
 CB.chromo.lib.png
 CB.plasmid.png
 CB.plasmid.lib.png

Other strains

Summary:

 ~ 20 strains in NCBI Taxonomy
 9 genome projects
 8 complete genomes
 3 assemblies in NCBI AA (all TIGR/JCVI)
 6 trace sets (5 TIGR/JCVI , 1 Sanger)

Data sources

NCBI:

Genome projects:

 1. A str. ATCC 19397 [LANL/JCVI/DOE] complete ; uploaded to Insignia
 2*: A str. ATCC 3502 [Sanger] complete ; uploaded to Insignia
 3: A str. Hall [LANL/JCVI/DOE] complete ; uploaded to Insignia
 4: Bf [JCVI/MSC] 70 contigs ; in AA
 5: C str. Eklund [JCVI/MSC] 76 contigs  ; in AA
 6: F str. Langeland plasmid pCLI [LANL/JCVI/DOE] complete ; uploaded to Insignia
 7: G [JCVI/MSC] 64 contigs  ; not in AA
 8: NCTC 2916 [JCVI/MSC] 70 contigs  ; in AA
 9: str. Iwanei E [JCVI/MSC] 66 contigs ; not in AA

New (Aug 19th 2008):

 10: A2 str. Kyoto-F [LANL/JCVI/DOE] progress
 11: A3 str. Loch Maree [LANL/JCVI/DOE] complete  
 12: B str. Eklund 17B [LANL/JCVI/DOE] complete 
 13: B1 str. Okra [LANL/JCVI/DOE] complete 
 14: Ba4 str. 657 [LANL/JCVI/DOE] progress
 15: E1 str. 'BoNT E Beluga' [LANL/JCVI/DOE] progress
 16: E3 str. Alaska E43 [LANL/JCVI/DOE] complete !!! not in Insignia

File locations:

 /fs/szasmg2/Bacteria/C_botulinum/

NCBI AA assemblied:

 qc stats
 

CBCB CA3 assemblies:

                       Placed  Deg  Total
 4: Bf                     52   12     64   better than AA(fewer, avg contig len is larger)
 5: C str. Eklund          51    3     54   better than AA(fewer, avg contig len is larger)
 7: G                      47    7     54   fewer contigs than AA
 8: NCTC 2916              55   11     66   better than AA(fewer, avg contig len is larger)
 9: str. Iwanei E          44   10     55   fewer contigs than AA
 qc stats

Other links:

 * FDA
 "Clostridium botulinum is an anaerobic, Gram-positive, spore-forming rod that roduces a potent  neurotoxin. 
 The spores are heat-resistant and can survive in foods that are incorrectly or minimally processed. 
 Seven types (A, B, C, D, E, F and G) of botulism are recognized, based on the antigenic specificity of the toxin produced by each strain. 
 Types A, B, E and F cause human botulism. 
 Types C and D cause most cases of botulism in animals. 
 Animals most commonly affected are wild fowl and poultry, cattle, horses and some species of fish. 
 Although type G has been isolated from soil in Argentina, no outbreaks involving it have been recognized."

Insignia uploads

Assemblies selected:

   /fs/szasmg2/Bacteria/C_botulinum/A_ATCC_3502/best/
   /fs/szasmg2/Bacteria/C_botulinum/Bf/best/
   /fs/szasmg2/Bacteria/C_botulinum/C_str__eklund/best/
   /fs/szasmg2/Bacteria/C_botulinum/G/best/
   /fs/szasmg2/Bacteria/C_botulinum/NCTN_2916/best/
   /fs/szasmg2/Bacteria/C_botulinum/str__iwanei_e/best/