Salmonella: Difference between revisions
Jump to navigation
Jump to search
(→Data) |
(→SPA) |
||
Line 22: | Line 22: | ||
B_SPA : 102405 (unique) => 1566 missing | B_SPA : 102405 (unique) => 1566 missing | ||
The *.b1,*g1 reads seem to be mated! | |||
Mate pairs: | |||
p(.*).[bg]1 | |||
oyg(.*).[bg]1 | |||
P_AA(.*).[bg]1 | |||
Assembly 83 : (best assembly of reads) | |||
/fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/B_SPA.fasta.screen.ace.83 | /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/B_SPA.fasta.screen.ace.83 | ||
$ grep ^CO *ace.83 | grep -v COMM | wc -l | $ grep ^CO *ace.83 | grep -v COMM | wc -l | ||
571 | 571 # total number of contigs | ||
Longest contig: | Longest contig: | ||
Line 31: | Line 38: | ||
AS 571 89509 # 571 contigs, 89509 reads | AS 571 89509 # 571 contigs, 89509 reads | ||
... | ... | ||
CO Contig1368 4813926 88824 1869182 C | CO Contig1368 4813926 88824 1869182 C | ||
Contig1368 is 4,813,926 (GDE format) 4,579,713 bp (FASTA format) | |||
Ends don't overlap | |||
There are missoriented reads at the ends (=>circular) | |||
Contains 88824 reads | |||
Other Salmonella strains are ~ 4.8M | |||
Finished assembly: (assembly of contigs) | |||
File: finished.fasta.screen.ace.0 | |||
1 contig | |||
4,585,228 bp (FASTA format) : 5,515bp longer than ace.83 contig 571 | |||
11 long reads(contig reads) | |||
Estimate lib insert sizes: | Estimate lib insert sizes: |
Revision as of 18:56, 23 October 2007
Data
From Washington Univ in St. Louis
Strains:
Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150: B_SPA Salmonella typhimurium LT2 : B_STM
Goals:
1. Validate the assemblies 2. Convert assemblies to NCBI AA format and submit them
File locations:
/fs/ftp-cbcb/pub/data/dsommer/ /fs/szasmg/Bacteria/Salmonella/ /fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/
SPA
Traces:
All directories: 103971 (unique) B_SPA : 102405 (unique) => 1566 missing
The *.b1,*g1 reads seem to be mated!
Mate pairs:
p(.*).[bg]1 oyg(.*).[bg]1 P_AA(.*).[bg]1
Assembly 83 : (best assembly of reads)
/fs/szasmg/Bacteria/Salmonella/S_enterica_paratyphi_A/edit_dir/B_SPA.fasta.screen.ace.83 $ grep ^CO *ace.83 | grep -v COMM | wc -l 571 # total number of contigs
Longest contig:
$ cat B_SPA.fasta.screen.ace.83 AS 571 89509 # 571 contigs, 89509 reads ... CO Contig1368 4813926 88824 1869182 C
Contig1368 is 4,813,926 (GDE format) 4,579,713 bp (FASTA format) Ends don't overlap There are missoriented reads at the ends (=>circular) Contains 88824 reads Other Salmonella strains are ~ 4.8M
Finished assembly: (assembly of contigs)
File: finished.fasta.screen.ace.0 1 contig 4,585,228 bp (FASTA format) : 5,515bp longer than ace.83 contig 571 11 long reads(contig reads)
Estimate lib insert sizes:
$ toAmos -ace B_SPA.fasta.screen.ace.83 $ grep -c ^rds B_SPA.afg # check if links were created $ more toAmos.error # check if there were any convertion errors $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c $ bank2contig B_SPA.bnk > B_SPA.contig $ cat B_SPA.contig | grep ^# | grep -v ^## | sort # look at distances between mated reads
Create mate pair file (Bambus format, tab delimited)
$ cat B_SPA.mates library small 2000 4000 (p).* pair (p.*)\.b1$ (p.*)\.g1$ library medium 4500 5500 (oyg).* pair (oyg.*).b1$ (oyg.*).g1$ library large 35000 45000 (P_AA).* pair (P_AA.*).b1$ (P_AA.*).g1$
Rerun convertion utilities:
$ toAmos -m B_SPA.mates -ace B_SPA.fasta.screen.ace.83 -o B_SPA.afg $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c