Salmonella: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 6: | Line 6: | ||
Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150: B_SPA | Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150: B_SPA | ||
Salmonella typhimurium LT2 : B_STM | Salmonella typhimurium LT2 : B_STM | ||
Other data: | |||
NCBI: | |||
[http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Salmonella%20enterica%20subsp.%20enterica%20serovar Genome Projects] | |||
1 Salmonella enterica subsp. enterica serovar 4,[5],12:i:- str. CVM23701 [TIGR] | |||
2 Salmonella enterica subsp. enterica serovar Agona str. SL483 [J. Craig Venter Institute] | |||
3 Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 [Chang Gung Memorial Hospital] | |||
4 Salmonella enterica subsp. enterica serovar Dublin [University of Illinois at Urbana-Champaign] | |||
5 Salmonella enterica subsp. enterica serovar Dublin str. CT_02021853 [TIGR] | |||
6 Salmonella enterica subsp. enterica serovar Enteritidis str. LK5 [University of Illinois at Urbana-Champaign] | |||
7 Salmonella enterica subsp. enterica serovar Heidelberg str. SL476 [J. Craig Venter Institute] | |||
8 Salmonella enterica subsp. enterica serovar Heidelberg str. SL486 [TIGR/JCVI/J. Craig Venter Institute] | |||
9 Salmonella enterica subsp. enterica serovar Javiana str. GA_MM04042433 [J. Craig Venter Institute] | |||
10 Salmonella enterica subsp. enterica serovar Kentucky str. CDC 191 [J. Craig Venter Institute] | |||
11 Salmonella enterica subsp. enterica serovar Kentucky str. CVM29188 [TIGR] | |||
12 Salmonella enterica subsp. enterica serovar Newport str. SL254 [TIGR/J. Craig Venter Institute] | |||
13 Salmonella enterica subsp. enterica serovar Newport str. SL317 [J. Craig Venter Institute] | |||
14 Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150 [Washington University (WashU)] | |||
15 Salmonella enterica subsp. enterica serovar Paratyphi C strain RKS4594 [Peking University Health Science Center] | |||
16 Salmonella enterica subsp. enterica serovar Pullorum [University of Illinois at Urbana-Champaign] | |||
17 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA23 [TIGR] | |||
18 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA29 [TIGR] | |||
19 Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 [TIGR] | |||
20 Salmonella enterica subsp. enterica serovar Schwarzengrund str. SL480 [J. Craig Venter Institute] | |||
21 Salmonella enterica subsp. enterica serovar Typhi Ty2 [University of Wisconsin-Madison, USA] | |||
22 Salmonella enterica subsp. enterica serovar Typhi str. CT18 [Sanger Institute] | |||
23 Salmonella typhimurium DT104 [Sanger Institute] | |||
24 Salmonella typhimurium LT2 [Washington University (WashU)] | |||
25 Salmonella typhimurium SL1344 [Sanger Institute] | |||
26 Salmonella typhimurium TR7095 [Washington University (WashU)] | |||
Goals: | Goals: |
Revision as of 16:30, 20 November 2007
Data
From Washington Univ in St. Louis
Strains:
Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150: B_SPA Salmonella typhimurium LT2 : B_STM
Other data:
NCBI:
Genome Projects
1 Salmonella enterica subsp. enterica serovar 4,[5],12:i:- str. CVM23701 [TIGR] 2 Salmonella enterica subsp. enterica serovar Agona str. SL483 [J. Craig Venter Institute] 3 Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 [Chang Gung Memorial Hospital] 4 Salmonella enterica subsp. enterica serovar Dublin [University of Illinois at Urbana-Champaign] 5 Salmonella enterica subsp. enterica serovar Dublin str. CT_02021853 [TIGR] 6 Salmonella enterica subsp. enterica serovar Enteritidis str. LK5 [University of Illinois at Urbana-Champaign] 7 Salmonella enterica subsp. enterica serovar Heidelberg str. SL476 [J. Craig Venter Institute] 8 Salmonella enterica subsp. enterica serovar Heidelberg str. SL486 [TIGR/JCVI/J. Craig Venter Institute] 9 Salmonella enterica subsp. enterica serovar Javiana str. GA_MM04042433 [J. Craig Venter Institute] 10 Salmonella enterica subsp. enterica serovar Kentucky str. CDC 191 [J. Craig Venter Institute] 11 Salmonella enterica subsp. enterica serovar Kentucky str. CVM29188 [TIGR] 12 Salmonella enterica subsp. enterica serovar Newport str. SL254 [TIGR/J. Craig Venter Institute] 13 Salmonella enterica subsp. enterica serovar Newport str. SL317 [J. Craig Venter Institute] 14 Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150 [Washington University (WashU)] 15 Salmonella enterica subsp. enterica serovar Paratyphi C strain RKS4594 [Peking University Health Science Center] 16 Salmonella enterica subsp. enterica serovar Pullorum [University of Illinois at Urbana-Champaign] 17 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA23 [TIGR] 18 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA29 [TIGR] 19 Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 [TIGR] 20 Salmonella enterica subsp. enterica serovar Schwarzengrund str. SL480 [J. Craig Venter Institute] 21 Salmonella enterica subsp. enterica serovar Typhi Ty2 [University of Wisconsin-Madison, USA] 22 Salmonella enterica subsp. enterica serovar Typhi str. CT18 [Sanger Institute] 23 Salmonella typhimurium DT104 [Sanger Institute] 24 Salmonella typhimurium LT2 [Washington University (WashU)] 25 Salmonella typhimurium SL1344 [Sanger Institute] 26 Salmonella typhimurium TR7095 [Washington University (WashU)]
Goals:
1. Validate the assemblies 2. Submit traces to NCBI TA: Problems: * some traces were edited (phd.2,phd.3,...); showed these edits appear in the SCF files? 3. Convert assemblies to XML format and submit them NCBI AA
File locations:
/fs/ftp-cbcb/pub/data/dsommer/ /fs/sztmpscratch/dsommer/backup_sal /fs/szasmg/Bacteria/Salmonella/ /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/
SPA
NCBI
Genome Taxonomy (TaxID: 295319)
Traces:
All directories: 103971 (unique) B_SPA : 102405 (unique) => 1566 missing ~ 10X coverage
The *.b1,*g1 reads seem to be mated!
Mate pairs:
p(.*).[bg]1 oyg(.*).[bg]1 P_AA(.*).[bg]1
WUSTL assemblies:
1. ace.83: (best assembly of reads)
/fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/B_SPA.fasta.screen.ace.83 $ grep ^CO *ace.83 | grep -v COMM | wc -l 571 # total number of contigs
Longest contig: $ cat B_SPA.fasta.screen.ace.83 AS 571 89509 # 571 contigs, 89509 reads ... CO Contig1368 4813926 88824 1869182 C Contig1368 is 4,813,926 (GDE format) 4,579,713 bp (FASTA format) Ends don't overlap There are missoriented reads at the ends (=>circular) Contains 88824 reads Other Salmonella strains are ~ 4.8M
Problem: * Collapsed repeat: high coverage, missoriented mates in the 2076881-2079555 region * Expanded into 3 copy tandem repeat in the finished assembly * 3 copies also in CA
2. Finished assembly: (assembly of contigs)
File: finished.fasta.screen.ace.0 1 contig 4,585,228 bp (FASTA format) : 5,515bp longer than ace.83 contig 571; ends don't overlap 11 long reads(contig reads)
Estimate lib insert sizes:
$ toAmos -ace B_SPA.fasta.screen.ace.83 $ grep -c ^rds B_SPA.afg # check if links were created $ more toAmos.error # check if there were any convertion errors $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c $ bank2contig B_SPA.bnk > B_SPA.contig $ cat B_SPA.contig | grep ^# | grep -v ^## | sort # look at distances between mated reads
Create mate pair file (Bambus format, tab delimited)
$ cat B_SPA.mates library small 2000 4000 (p).* pair (p.*)\.b1$ (p.*)\.g1$ library medium 4500 5500 (oyg).* pair (oyg.*).b1$ (oyg.*).g1$ library large 35000 45000 (P_AA).* pair (P_AA.*).b1$ (P_AA.*).g1$
Rerun convertion utilities:
$ toAmos -m B_SPA.mates -ace B_SPA.fasta.screen.ace.83 -o B_SPA.afg $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c
CBCB assemblies:
1. CA default params /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/83/CA-qual 87 scaff, 194 contigs, 19K singletons, 4,425,716 bp
2. CA genomeSize=3M /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/83/CA-qual-3M 75 scaff, 183 contigs, 19K singletons, 4,515,434 bp No rearrangements compared to finished genome Significant number of SNP's
3. AMOSCmp Ref=finished assembly; 89,509 reads; .ace.83 trimming => 31 contigs; 4,579,852 bp
4. AMOSCmp Ref=finished assembly; 101,621 reads (.fasta.screen); nucmer trimming => 8 contigs; 4,583,946 bp
5. merge of 9 contigs using slice tools /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/final2/9r12345678-circ-rev-tr-recall.* Steps: * recruit unassembled reads to span the Contig8.4 - Contig6.6 gap and assemble them into a new contig. * The 9 overlapping contigs (8 provided by Damon + 1 I assembled) were merged using the slice tools (zipclap program) into one piece. * The new contig was circularized, reversed and rotated to align to the published one. * I also recalled the consensus due to some ambiguity codes introduced in the process. * The new contig sequence is 70 bp shorter (4,585,158 bp vs 4,585,228), but it aligns in one piece to the published contig.
6. merge of 9 contigs using slice tools (best) /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/final3/9r12345678-circ-rev-tr.* Steps: * Same as 5 but a modifies version of "modContig --circularize" was called * The circularizan step did not recall the consensus * Reacll was not used in the end * The new contig sequence is 5 bp shorter (4,585,223 bp vs 4,585,228), but it aligns in one piece to the published contig. * show-snps 1con-9r12345678-circ-rev-tr.delta | grep -c 9r12345678$ => 46 SNPs