Clostridium botulinum: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 18: Line 18:
     no quality      : default 20 assigned to all the bases
     no quality      : default 20 assigned to all the bases
     no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid)
     no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid)
''    no library info  : assumed there was only one library used
    no library info  : assumed there was only one library used
     no trimming info : almost all reads have "CONTAINED" alignments to the reference
     no trimming info : almost all reads have "CONTAINED" alignments to the reference
                       CLR=1,len(read)
                       CLR=1,len(read)
     there are 124 regions in the reference which are not covered by reads
     there are 124 regions in the reference which are not covered by reads
    17K reads missing from Sanger ftp


NCBI :  
NCBI :  

Revision as of 15:57, 20 August 2007

Data sources

Sanger:

 Hall strain A (ATCC 3502)
 chromosome: 3,886,916 bp 28.24 GC%
 plasmid:      16,344 bp 26.80 GC%
 Mummerplot: Complete Genome vs Complete Genome
 63,115 Sanger reads
 Read problems:
   no quality       : default 20 assigned to all the bases
   no mate pairing  : can be inferred from names (.p1c, .q1c => 27,331 mates); however there seem to be many errors (links from chromosome to the plasmid)
   no library info  : assumed there was only one library used
   no trimming info : almost all reads have "CONTAINED" alignments to the reference
                      CLR=1,len(read)
   there are 124 regions in the reference which are not covered by reads
   17K reads missing from Sanger ftp

NCBI :

 Reads have not been submitted to TA

The initial genome assembly was obtained from:

  • 69,632 paired end sequences (giving 9.15-fold coverage) derived from four genomic shotgun libraries (all in pUC18 with insert sizes of 1.5–2.0 kb and 2.0–2.2 kb, 2.2–2.5 kb, and 2.5–4.0 kb) using dye terminator chemistry on ABI3700 automated sequencers;
  • 1,604 pairedend sequences from one pBACe3.6 library with insert sizes of 15–23 kb (a clone coverage of 3.9-fold) were used as a scaffold.
  • 9,343 directed sequencing reads were generated during finishing.

(Total 80,579 reads => 17,464 missing from ftp site)

Assembly

2007_0725_WGA

   create a .frg file
   runCA-OBT.pl (default params) 
   location: 2007_0725_WGA
   => 109 scaffolds, 243 contigs
   => library inser estimates mean=1840.917 stdev=866.039

2007_0801_AMOScmp-relaxed

  MINCLUSTER=30 , MAXTRIM=50
  => 2 scaffolds, 148 contigs
 CB.qc
 CB.chromo.png
 CB.plasmid.png
 CB-scaff.png

Location:

 /fs/szasmg/Bacteria/C_botulinum