Salmonella
Jump to navigation
Jump to search
Data
From Washington Univ in St. Louis
Strains:
Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150: B_SPA Salmonella typhimurium LT2 : B_STM
Other data:
NCBI:
Genome Projects 1 Salmonella enterica subsp. enterica serovar 4,[5],12:i:- str. CVM23701 [TIGR] 2 Salmonella enterica subsp. enterica serovar Agona str. SL483 [J. Craig Venter Institute] 3 Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 [Chang Gung Memorial Hospital] 4 Salmonella enterica subsp. enterica serovar Dublin [University of Illinois at Urbana-Champaign] 5 Salmonella enterica subsp. enterica serovar Dublin str. CT_02021853 [TIGR] 6 Salmonella enterica subsp. enterica serovar Enteritidis str. LK5 [University of Illinois at Urbana-Champaign] 7 Salmonella enterica subsp. enterica serovar Heidelberg str. SL476 [J. Craig Venter Institute] 8 Salmonella enterica subsp. enterica serovar Heidelberg str. SL486 [TIGR/JCVI/J. Craig Venter Institute] 9 Salmonella enterica subsp. enterica serovar Javiana str. GA_MM04042433 [J. Craig Venter Institute] 10 Salmonella enterica subsp. enterica serovar Kentucky str. CDC 191 [J. Craig Venter Institute] 11 Salmonella enterica subsp. enterica serovar Kentucky str. CVM29188 [TIGR] 12 Salmonella enterica subsp. enterica serovar Newport str. SL254 [TIGR/J. Craig Venter Institute] 13 Salmonella enterica subsp. enterica serovar Newport str. SL317 [J. Craig Venter Institute] in TA but not AA; 63 contigs; shold be submitted to AA!! 14 Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150 [Washington University (WashU)] 15 Salmonella enterica subsp. enterica serovar Paratyphi C strain RKS4594 [Peking University Health Science Center] 16 Salmonella enterica subsp. enterica serovar Pullorum [University of Illinois at Urbana-Champaign] 17 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA23 [TIGR] 18 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA29 [TIGR] 19 Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 [TIGR] 20 Salmonella enterica subsp. enterica serovar Schwarzengrund str. SL480 [J. Craig Venter Institute] 21 Salmonella enterica subsp. enterica serovar Typhi Ty2 [University of Wisconsin-Madison, USA] 22 Salmonella enterica subsp. enterica serovar Typhi str. CT18 [Sanger Institute] 23 Salmonella typhimurium DT104 [Sanger Institute] 24 Salmonella typhimurium LT2 [Washington University (WashU)] 25 Salmonella typhimurium SL1344 [Sanger Institute] 26 Salmonella typhimurium TR7095 [Washington University (WashU)]
TA: 1 salmonella_enterica_subsp__enterica_serovar_4__5__12_i___str__cvm23701 2 salmonella_enterica_subsp__enterica_serovar_agona_str__sl483 3 salmonella_enterica_subsp__enterica_serovar_dublin_str__ct_02021853 4 salmonella_enterica_subsp__enterica_serovar_hadar_str__ri_05p066 : not in Genome Projects/AA (JCVI MSC) 5 salmonella_enterica_subsp__enterica_serovar_heidelberg_str__sl476 6 salmonella_enterica_subsp__enterica_serovar_heidelberg_str__sl486 7 salmonella_enterica_subsp__enterica_serovar_javiana_str__ga_mm04042433 8 salmonella_enterica_subsp__enterica_serovar_kentucky_str__cdc_191 9 salmonella_enterica_subsp__enterica_serovar_kentucky_str__cvm29188 10 salmonella_enterica_subsp__enterica_serovar_newport_str__sl254 11 salmonella_enterica_subsp__enterica_serovar_newport_str__sl317 12 salmonella_enterica_subsp__enterica_serovar_saintpaul_str__sara23 13 salmonella_enterica_subsp__enterica_serovar_saintpaul_str__sara29 14 salmonella_enterica_subsp__enterica_serovar_schwarzengrund_str__cvm19633 15 salmonella_enterica_subsp__enterica_serovar_schwarzengrund_str__sl480 16 salmonella_enterica_subsp__enterica_serovar_virchow_str__sl491 : not in Genome Projects/AA (JCVI MSC) 17 salmonella_enterica_subsp__enterica_serovar_weltevreden_str__hi_n05_537 : not in Genome Projects/AA (JCVI MSC)
AA: 1 Salmonella enterica subsp. enterica serovar 4,[5],12:i:- str. CVM23701 TIGR 2740 440534 4,895,918 113 53,284 8.0X 2 Salmonella enterica subsp. enterica serovar Agona str. SL483 JCVI 2924 454166 4,835,750 56 51,307 9.5X 3 Salmonella enterica subsp. enterica serovar Dublin str. CT_02021853 TIGR 2741 439851 4,885,976 142 50,129 7.8X 4 Salmonella enterica subsp. enterica serovar Hadar str. RI_05P066 JCVI 2995 465516 4,793,325 50 50,470 9.6X 5 Salmonella enterica subsp. enterica serovar Heidelberg str. SL476 JCVI 2927 454169 5,083,392 49 54,058 9.2X 6 Salmonella enterica subsp. enterica serovar Heidelberg str. SL486 JCVI 2925 454164 4,728,232 48 53,785 10.1X 7 Salmonella enterica subsp. enterica serovar Javiana str. GA_MM04042433 JCVI 2921 454167 4,553,049 74 52,375 9.9X 8 Salmonella enterica subsp. enterica serovar Kentucky str. CDC 191 JCVI 2922 454231 4,696,566 53 51,826 9.6X 9 Salmonella enterica subsp. enterica serovar Kentucky str. CVM29188 TIGR 2737 439842 5,000,919 75 55,311 9.1X 10 Salmonella enterica subsp. enterica serovar Newport str. SL254 JCVI 2926 423368 4,831,246 2 50,473 8.8X 11 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA23 TIGR 2735 439846 4,785,870 143 50,936 8.6X 12 Salmonella enterica subsp. enterica serovar Saintpaul str. SARA29 TIGR 2739 439847 4,928,961 182 50,405 7.9X 13 Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 TIGR 2738 439843 4,734,042 160 49,533 7.4X 14 Salmonella enterica subsp. enterica serovar Schwarzengrund str. SL480 JCVI 2923 454165 4,761,576 67 50,418 9.1X 15 Salmonella enterica subsp. enterica serovar Virchow str. SL491 JCVI 2996 465517 4,858,188 73 54,841 10.3X 16 Salmonella enterica subsp. enterica serovar Weltevreden str. HI_N05-537 JCVI 2994 465518 5,047,463 81 54,390 9.8X
TIGR/JCVI
MSC
Goals:
1. Validate the assemblies 2. Submit traces to NCBI TA: Problems: * some traces were edited (phd.2,phd.3,...); showed these edits appear in the SCF files? 3. Convert assemblies to XML format and submit them NCBI AA
File locations:
/fs/ftp-cbcb/pub/data/dsommer/ /fs/sztmpscratch/dsommer/backup_sal /fs/szasmg/Bacteria/Salmonella/ /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/
SPA
NCBI
Genome Taxonomy (TaxID: 295319)
Traces:
All directories: 103971 (unique) B_SPA : 102405 (unique) => 1566 missing ~ 10X coverage
The *.b1,*g1 reads seem to be mated!
Mate pairs:
p(.*).[bg]1 oyg(.*).[bg]1 P_AA(.*).[bg]1
WUSTL assemblies:
1. ace.83: (best assembly of reads)
/fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/B_SPA.fasta.screen.ace.83 $ grep ^CO *ace.83 | grep -v COMM | wc -l 571 # total number of contigs
Longest contig: $ cat B_SPA.fasta.screen.ace.83 AS 571 89509 # 571 contigs, 89509 reads ... CO Contig1368 4813926 88824 1869182 C Contig1368 is 4,813,926 (GDE format) 4,579,713 bp (FASTA format) Ends don't overlap There are missoriented reads at the ends (=>circular) Contains 88824 reads Other Salmonella strains are ~ 4.8M
Problem: * Collapsed repeat: high coverage, missoriented mates in the 2076881-2079555 region * Expanded into 3 copy tandem repeat in the finished assembly * 3 copies also in CA
2. Finished assembly: (assembly of contigs)
File: finished.fasta.screen.ace.0 1 contig 4,585,228 bp (FASTA format) : 5,515bp longer than ace.83 contig 571; ends don't overlap 11 long reads(contig reads)
Estimate lib insert sizes:
$ toAmos -ace B_SPA.fasta.screen.ace.83 $ grep -c ^rds B_SPA.afg # check if links were created $ more toAmos.error # check if there were any convertion errors $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c $ bank2contig B_SPA.bnk > B_SPA.contig $ cat B_SPA.contig | grep ^# | grep -v ^## | sort # look at distances between mated reads
Create mate pair file (Bambus format, tab delimited)
$ cat B_SPA.mates library small 2000 4000 (p).* pair (p.*)\.b1$ (p.*)\.g1$ library medium 4500 5500 (oyg).* pair (oyg.*).b1$ (oyg.*).g1$ library large 35000 45000 (P_AA).* pair (P_AA.*).b1$ (P_AA.*).g1$
Rerun convertion utilities:
$ toAmos -m B_SPA.mates -ace B_SPA.fasta.screen.ace.83 -o B_SPA.afg $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c
CBCB assemblies:
1. CA default params /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/83/CA-qual 87 scaff, 194 contigs, 19K singletons, 4,425,716 bp
2. CA genomeSize=3M /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/83/CA-qual-3M 75 scaff, 183 contigs, 19K singletons, 4,515,434 bp No rearrangements compared to finished genome Significant number of SNP's
3. AMOSCmp Ref=finished assembly; 89,509 reads; .ace.83 trimming => 31 contigs; 4,579,852 bp
4. AMOSCmp Ref=finished assembly; 101,621 reads (.fasta.screen); nucmer trimming => 8 contigs; 4,583,946 bp
5. merge of 9 contigs using slice tools /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/final2/9r12345678-circ-rev-tr-recall.* Steps: * recruit unassembled reads to span the Contig8.4 - Contig6.6 gap and assemble them into a new contig. * The 9 overlapping contigs (8 provided by Damon + 1 I assembled) were merged using the slice tools (zipclap program) into one piece. * The new contig was circularized, reversed and rotated to align to the published one. * I also recalled the consensus due to some ambiguity codes introduced in the process. * The new contig sequence is 70 bp shorter (4,585,158 bp vs 4,585,228), but it aligns in one piece to the published contig.
6. merge of 9 contigs using slice tools (best) /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/final3/9r12345678-circ-rev-tr.* Steps: * Same as 5 but a modifies version of "modContig --circularize" was called * The circularizan step did not recall the consensus * Reacll was not used in the end * The new contig sequence is 5 bp shorter (4,585,223 bp vs 4,585,228), but it aligns in one piece to the published contig. * show-snps 1con-9r12345678-circ-rev-tr.delta | grep -c 9r12345678$ => 46 SNPs