Dpuiu Assemblathon: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 52: Line 52:


=== 454FLX ===
=== 454FLX ===
* SFF data
                        reads  min    q1    q2    q3    max        mean      n50        sum           
  sff                  334241  36    201    256    277    362        235        264        78432227     


* sffToCa Output
* sffToCa Output

Revision as of 14:55, 20 December 2010

Links

Assemblers

* CA             /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/wgs-6.1/Linux-amd64/bin/runCA 
* Newbler
* Velvet
* SOAPdenovo
* Maq

CBCB genomes

  • a bacterial genome. Instead of E. coli, we can use S. aureus USA300, which has sequence data in SRA from 454 and Illumina, paired and unpaired. Daniela has already assemblied it using CA, Newbler, Velvet, SOAPdenovo, and Maq (using its comparative assembly mode, where it aligns to a reference).
  • A medium-sized eukaryote. I'd like to use the Argentine ant or the Bombus impatiens bee - I've just written to Gene Robinson to ask about the bee.
  • Another eukaryote, ideally a larger one. Human would be great, but we just don't have enough time to do multiple human assemblies. So maybe another insect, or perhaps a plant if we can find one for which data is available.

If we can agree on the data sets, then the next step would be to design the experiment - decide in advance which assemblers to run and how many ways to try each one. I'm thinking we should also trim all the data with Quake.

Argentine ant

Bee, Bombus impatiens

Data

  • 497,318,144 Illumina 124bp reads
  • 8 libraries; inserts:
    • 400bp
    • 3k (outie)
    • 8k (outie)
  • Traces

Adapters: in 3k & 8k libraries

C CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA
3 CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
5 GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG

Location:

/fs/szattic-asmg4/Bees/Bombus_impatiens/s_[12356789]_[12]_sequence.txt                             # original fastq files

/fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[129]_[012]_sequence.cor.rev.txt        # adaptor free corrected reads (long inserts)
/fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[35678]_[012]_sequence.cor.txt          # corrected reads (short inserts)

Bacterium, Staph aureus USA300

 Complete genome        : NC_010079       2872915bp Staphylococcus aureus subsp. aureus USA300_TCH1516
 In progress genome     : NZ_AASB00000000 2810505bp Staphylococcus aureus subsp. aureus USA300_TCH959, 256 contigs
 454 FLX                : Staphylococcus aureus subsp. aureus USA300_TCH959 HMP0023  http://www.ncbi.nlm.nih.gov/sra/SRX002327?report=full 
 Illumina 101bp paired  : Staphylococcus aureus subsp. aureus USA300_TCH1516         http://www.ncbi.nlm.nih.gov/sra/SRX007714?report=full

454FLX

  • SFF data
                       reads   min    q1     q2     q3     max        mean       n50        sum            
 sff                   334241  36     201    256    277    362        235        264        78432227      
  • sffToCa Output
 LibraryName           numActiveFRG   numDeletedFRG  numMatedFRG  readLength  clearLength  
 454p                  325555         25272          115924       56301686    51225615     
  • DeNovo
 # ctg stats
 .                     ctgs min  q1     q2      q3      max     mean       n50      sum      
 CA.6.1.bog            18   238  21014  135971  239466  567548* 155836     277888*  2805055
 newbler.2.5p1.deNovo  100  103  295    3287    39467   229053  27879      78379    2787870
 # scf stats
 CA.6.1.bog            6    284  21014  173065  1032129 1458733* 467554    1458733* 2805325
 newbler.2.5p1.deNovo  8    2475 20731  110137  1030785 1408642  349895    1408642  2799157
  • Reference based(Saureus USA300)
  .                    ctgs min  q1     q2      q3      max     mean       n50      sum      
 newbler.2.3.refMapper 206  103  556    3098    15366   117487  12749      40687    2626469

Bacterium, E coli

 Compelte genome        : NC_000913      4639675bp  Escherichia coli str. K-12 substr. MG1655

 454 FLX                : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX000348?report=full 
 Illumina 101bp paired  : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX016044?report=full

Human, a single chromosome, medium-sized.