Francisella tularensis holarctica OSU18
Center: Baylor Status: Complete
Chromosome Rearrangement and Diversification of Francisella tularensis Revealed by the Type B (OSU18) Genome Sequence ; JBacteriol 2006
Genome sequencing and assembly. Sequencing and assembly of the F. tularensis subsp. holarctica strain OSU18 genome were accomplished by the whole-genome shotgun (WGS) method, similar to a previously described method (22). Briefly, the WGS clones were sequenced using ABI 3730 sequencers, and the sequence bases were called using the Applied Biosystems sequencing analysis software KB Basecaller. The WGS reads were assembled by using Atlas (11) and Phrap (7). The initial WGS assembly resulted in 132 contigs in 33 scaffolds with approximately 26× sequence coverage. Gaps between contigs and scaffolds were closed by sequencing PCR products that spanned gaps or by sequencing small insert libraries generated from the PCR products. Low-quality regions were resequenced using clones or PCR products spanning the regions to ensure that the Phrap quality score for each base was equal to or greater than 30. This relatively deep data set should enable further studies involving new sequencing, comparative genomics, and proteomics strategies and technologies. Included among these strategies and technologies are (i) using sequencing reads to scan for possible phase variation in Francisella cultures, (ii) using WGS clones as gene expression constructs for peptide array-based antigen screens, and (iii) using the deep coverage of sequencing reads as a representative data set for comparing existing sequencing methodologies to new technologies in various stages of development.
NCBI TA Libraries:
CENTER PROJECT STRAIN SEQ_LIB_ID TYPE SIZE STDEV COUNT Location Comment BCM BFTB OSU-18 BFTBP WGS 2,000 1,000 58,053 TA acc=3379596; INSERT_SIZE,STD=2690,643 (WGA) BCM BFTD OSU-18 BFTDP WGS 2,000 1,000 10,409 TA acc=3379603; INSERT_SIZE,STD=3675,1407 (WGA) BCM 4WG_FTOS OSU18 4WGS_FTOSA 454 . . 310,747 TA BCM 4WG_FTUL.OS OSU18 4WG_FTUL.OS_000pA 454 . . 216,495 SRA ? StrainSubtotal 595,704 Baylor site: 65,131 Sanger & 317,789 454 Reads
1. The complete genome sequence was downloaded from NCBI: NC_008369.1 2. Reads were downloaded from TA and formatted using tarchive2ca 3. Only the 2 Sanger libraries for this project were considered BFTBP: #reads=58051 , insert_mean=2000, insert_stdev=666 BFTDP: #reads=10409 , insert_mean=2000, insert_stdev=666 4. The reads have been retrimmed using veraTrim (-T 10 -M 100 -E 500) 5. runCA-OBT.pl has been used to assemble all the reads location: 2007_0724_WGA-default/ =>160 scaff, 163 contigs, 23X coverage 6. The library sizes were updates using the WGA estimates BFTBP: insert_mean=2690.042, insert_stdev=643.126 BFTDP: insert_mean=3675.914, insert_stdev=1225 7. The WGA was aligned to the reference using nucmer; one rearrangement, one deletion and several SNP's were noticed 8. The reads were assembled using AMOScmp (default parameters) location: 2007_0724_AMOSCMP-default/ => 1 scaffold, 22 contigs 2 missoriented read pile regions were noticed 9. The assembly was aligned to itself; 950 bp inverted repeats were identified as flanking the problem regions; the coordinates are: 16336-21562 (5 KB) 167086-184936 (17 KB) 10. The 2 regions were flipped ; the new reference is called NC_008369.2 11. Several small contig (step 8) read clear ranges have been extend to their OBT trimming points 12. AMOScmp was rerun using more relaxed parameters: nucmer MINCLUSTER=30 casm-layout MAXTRIM=50 location: 2007_0731_AMOSCMP-veraTrim-updateDst-relaxed-updateClr-fixRef2->best => 1 scaffold, 8 contigs
Final assembly location: