Francisella tularensis holarctica OSU18

From Cbcb
Revision as of 15:13, 9 August 2007 by Dpuiu (talk | contribs) (→‎Assembly)
Jump to navigation Jump to search

Center: Baylor Status: Complete

Data

Reference:

 NC_008369.1

Paper:

Chromosome Rearrangement and Diversification of Francisella tularensis Revealed by the Type B (OSU18) Genome Sequence ; JBacteriol 2006

Genome sequencing and assembly. Sequencing and assembly of the F. tularensis subsp. holarctica strain OSU18 genome were accomplished by the whole-genome shotgun (WGS) method, similar to a previously described method (22). Briefly, the WGS clones were sequenced using ABI 3730 sequencers, and the sequence bases were called using the Applied Biosystems sequencing analysis software KB Basecaller. The WGS reads were assembled by using Atlas (11) and Phrap (7). The initial WGS assembly resulted in 132 contigs in 33 scaffolds with approximately 26× sequence coverage. Gaps between contigs and scaffolds were closed by sequencing PCR products that spanned gaps or by sequencing small insert libraries generated from the PCR products. Low-quality regions were resequenced using clones or PCR products spanning the regions to ensure that the Phrap quality score for each base was equal to or greater than 30. This relatively deep data set should enable further studies involving new sequencing, comparative genomics, and proteomics strategies and technologies. Included among these strategies and technologies are (i) using sequencing reads to scan for possible phase variation in Francisella cultures, (ii) using WGS clones as gene expression constructs for peptide array-based antigen screens, and (iii) using the deep coverage of sequencing reads as a representative data set for comparing existing sequencing methodologies to new technologies in various stages of development.

Traces: from NCBI TA

Libraries:

 CENTER        PROJECT	        STRAIN	SEQ_LIB_ID	        TYPE	SIZE	STDEV	COUNT	Location	Comment
 BCM  	        BFTB	        OSU-18	BFTBP	                WGS	2,000	1,000	58,053	TA	        acc=3379596; INSERT_SIZE,STD=2690,643 (WGA)
 BCM	        BFTD	        OSU-18	BFTDP	                WGS	2,000	1,000	10,409	TA	        acc=3379603; INSERT_SIZE,STD=3675,1407 (WGA)
 BCM	        4WG_FTOS	OSU18	4WGS_FTOSA	        454	.	.	310,747	TA	
 BCM	        4WG_FTUL.OS	OSU18	4WG_FTUL.OS_000pA	454	.	.	216,495	SRA	        ?
 StrainSubtotal                                                                        595,704		        Baylor site: 65,131 Sanger & 317,789 454 Reads

Assembly

Location:

 /fs/szasmg/Bacteria/F_tularensis_holarctica_OSU18/

Steps:

 1. The complete genome sequence was downloaded from NCBI: NC_008369.1
 2. Reads were downloaded from TA and formatted using tarchive2ca 
 3. Only the 2 Sanger libraries for this project were considered
      BFTBP: #reads=58051 , insert_mean=2000, insert_stdev=666 
      BFTDP: #reads=10409 , insert_mean=2000, insert_stdev=666 
 4. The reads have been retrimmed using veraTrim (-T 10 -M 100 -E 500)
 5. runCA-OBT.pl has been used to assemble all the reads 
    location: 2007_0724_WGA-default/ 
    =>160 scaff, 163 contigs, 23X coverage
 6. The library sizes were updates using the WGA estimates
      BFTBP: insert_mean=2690.042, insert_stdev=643.126
      BFTDP: insert_mean=3675.914, insert_stdev=1225
 7. The WGA was aligned to the reference using nucmer; one rearrangement, one  
    deletion and several SNP's were noticed
 8. The reads were assembled using AMOScmp (default parameters) 
    location: 2007_0724_AMOSCMP-default/
    => 1 scaffold, 22 contigs
    2 missoriented read pile regions were noticed
 9. The assembly was aligned to itself; 950 bp inverted repeats were identified as 
    flanking the problem regions; the coordinates are:
     16336-21562   (5  KB)
     167086-184936 (17 KB)
 10. The 2 regions were flipped ; the new reference is called NC_008369.2
 11. Several small contig (step 8) read clear ranges have been extend to their OBT 
     trimming points
 12. AMOScmp was rerun using more relaxed parameters: 
       nucmer      MINCLUSTER=30 
       casm-layout MAXTRIM=50
    location: 2007_0731_AMOSCMP-veraTrim-updateDst-relaxed-updateClr-fixRef2->best
    => 1 scaffold, 8 contigs 
    Francisella_tularensis.qc
 

Final assembly location:

 /fs/szasmg/Bacteria/F_tularensis_holarctica_OSU18/best