Salmonella

From Cbcb
Revision as of 00:30, 22 October 2007 by Dpuiu (talk | contribs) (→‎SPA)
Jump to navigation Jump to search

Data

Strain:

 Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150: B_SPA
 Salmonella typhimurium LT2                                            : B_STM

File locations:

 /fs/ftp-cbcb/pub/data/dsommer/
 /fs/szasmg/Bacteria/Salmonella/
 /tmp/B_SPA (host: sycamore)

SPA

  1. Traces:
 All directories: 103971 (unique)
 B_SPA : 102405  (unique) => 1566 missing

Best assembly:

 B_SPA/edit_dir/B_SPA.fasta.screen.ace.83

File location:

 /tmp/B_SPA/edit_dir/B_SPA.fasta.screen.ace.83

Longest contig:

 CO Contig1368 4813926 88824 1869182 C 
 !!! Other Salmonella's are also 4.8M

The *.b1,*g1 reads seem to be mated!

Mate pairs:

 p(.*).[bg]1
 oyg(.*).[bg]1
 P_AA(.*).[bg]1

Estimate lib insert sizes:

 $ toAmos -ace B_SPA.fasta.screen.ace.83
 $ grep -c ^rds B_SPA.afg         # check if links were created
 $ more toAmos.error              # check if there were any convertion errors
 $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c
 $ bank2contig B_SPA.bnk > B_SPA.contig
 $ cat B_SPA.contig | grep ^# | grep -v ^## | sort 
 # look at distances between mated reads

Create mate pair file (Bambus format, tab delimited)

 $ cat B_SPA.mates
   library small   2000    4000    .*
   pair    p(.*).b1$       p(.*).g1$
   
   library medium  4500    5500    .*
   pair    oyg(.*).b1$     oyg(.*).g1$
  
   library large   35000   45000   .*
   pair    P_AA(.*).b1$    P_AA(.*).g1$

Rerun convertion utilities:

 $ toAmos -m B_SPA.mates -ace B_SPA.fasta.screen.ace.83 -o B_SPA.afg 
 $ bank-transact -b B_SPA.bnk -m B_SPA.afg -c