Clostridium botulinum

From Cbcb
Revision as of 15:57, 5 September 2007 by Dpuiu (talk | contribs) (→‎Assembly)
Jump to navigation Jump to search

Data sources

Sanger:

 Hall strain A (ATCC 3502)
 chromosome: 3,886,916 bp 28.24 GC%
 plasmid:    16,344 bp 26.80 GC%
 genes:      3,616
 Mummerplot: Complete Genome vs Complete Genome
 63,115 Sanger reads
 Read problems:
   no quality       : default 20 assigned to all the bases
   no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid)
   no library info  : assumed there was only one library used
   no trimming info : almost all reads have "CONTAINED" alignments to the reference
                      CLR=1,len(read)
   there are 124 regions in the reference which are not covered by reads
   17K reads missing from Sanger ftp
 78,975  Sanger reads
 Cbot[1-9]*.[pq][12]    68028    #article: insert sizes of 1.5–2.kb and 2.0–2.kb, 2.2–2.kb, and 2.5–4.0 kb
 CbBAC1*.s1c             305
 CbBAC4*.[pq]1c          430
 CbBAC7*.[spq]1c         474
 Cbot_ends*.[pq]1c      1604     #article: 19 kb inserts (2kb stdev) ; based on nucmer alignements: 9kb inserts (2kb stdev)
 CBOT[1-9]*.[pqw]        509     #415 primer walks
 CBOTC                   166     #all primer walks
 J*.[pqs]               7459
 Total                 78976
 77250 reads aligned by nucmer -c 30 to the reference
 reads were trimmed based on alignment
 avgReadLen=503
 avgReadClr=499

NCBI:

 Name           Length  %GC
 AM412317.1     3886916 28.24  # chromosome
 AM412318.1     16344   26.80  # plasmid pBOT3502

 3574 chromosome genes
  114 chromosome rRNA's
   18 plasmid genes 
 Reads have not been submitted to TA

The initial genome assembly was obtained from:

  • 69,632 paired end sequences (giving 9.15-fold coverage) derived from four genomic shotgun libraries (all in pUC18 with insert sizes of 1.5–2.0 kb and 2.0–2.2 kb, 2.2–2.5 kb, and 2.5–4.0 kb) using dye terminator chemistry on ABI3700 automated sequencers;
  • 1,604 pairedend sequences from one pBACe3.6 library with insert sizes of 15–23 kb (a clone coverage of 3.9-fold) were used as a scaffold.
  • 9,343 directed sequencing reads were generated during finishing.

(Total 80,579 reads => 17,464 missing from ftp site)

Assembly

Location:

 /fs/szasmg/Bacteria/C_botulinum
 /fs/szdata/ncbi/genomes/Bacteria/Clostridium_botulinum_A/ 

2007_0725_WGA

   on the 63,115 Sanger reads
   runCA-OBT.pl (default params) 
   location: 2007_0725_WGA
   => 109 scaffolds, 243 contigs, 3,823,075 bp
   => library inser estimates mean=1840.917 stdev=866.039

2007_0801_AMOScmp-relaxed

  on the 63,115 Sanger reads
  MINCLUSTER=30 , MAXTRIM=50
  => 2 scaffolds, 148 contigs, 3,883,789 bp
 CB.qc
 CB.chromo.png
 CB.plasmid.png
 CB-scaff.png

2007_0830_WGA

 on the 78,975  Sanger reads; no OBT
 => 81 scaff, 106 contigs, 3,873,432 bp

2007_0830_AMOScmp-relaxed

 on the 78,975  Sanger reads
 => 2 scaff,  24 contigs, 3,902,812 bp

2007_0831_AMOScmp-relaxed

 on the 78,975  Sanger reads
 => 2 scaff,  22 contigs, 3,902,971 bp

2007_0904_AMOScmp-nucmer -> best

 on the 78,975  Sanger reads
 reads have been trimmed to their maximum alignment coordinates
 => 2 scaff,  2 contigs, 3,903,275 bp 
 1              3886795 28.25 (121 bp shorter than the reference)
 2                16344 26.80
 Gene mappings:
   /fs/szasmg/Bacteria/C_botulinum/2007_0904_AMOScmp-nucmer/nucmer-genes/CB.ptt