Dpuiu Assemblathon: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
Line 53: Line 53:
== Bacterium, E coli ==  
== Bacterium, E coli ==  


* Compelte genome        : NC_000913      4639675bp  Escherichia coli str. K-12 substr. MG1655
  Compelte genome        : NC_000913      4639675bp  Escherichia coli str. K-12 substr. MG1655
* 454 FLX                : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX000348?report=full  
* Illumina 101bp paired  : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX016044?report=full
  454 FLX                : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX000348?report=full  
  Illumina 101bp paired  : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX016044?report=full


== Human, a single chromosome, medium-sized. ==
== Human, a single chromosome, medium-sized. ==

Revision as of 20:32, 15 December 2010

Links

Assemblers

* CA             /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/wgs-6.1/Linux-amd64/bin/runCA 
* Newbler
* Velvet
* SOAPdenovo
* Maq

CBCB genomes

  • a bacterial genome. Instead of E. coli, we can use S. aureus USA300, which has sequence data in SRA from 454 and Illumina, paired and unpaired. Daniela has already assemblied it using CA, Newbler, Velvet, SOAPdenovo, and Maq (using its comparative assembly mode, where it aligns to a reference).
  • A medium-sized eukaryote. I'd like to use the Argentine ant or the Bombus impatiens bee - I've just written to Gene Robinson to ask about the bee.
  • Another eukaryote, ideally a larger one. Human would be great, but we just don't have enough time to do multiple human assemblies. So maybe another insect, or perhaps a plant if we can find one for which data is available.

If we can agree on the data sets, then the next step would be to design the experiment - decide in advance which assemblers to run and how many ways to try each one. I'm thinking we should also trim all the data with Quake.

Argentine ant

Bee, Bombus impatiens

Data

  • 497,318,144 Illumina 124bp reads
  • 8 libraries; inserts:
    • 400bp
    • 3k (outie)
    • 8k (outie)
  • Traces

Adapters: in 3k & 8k libraries

C CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA
3 CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
5 GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG

Location:

/fs/szattic-asmg4/Bees/Bombus_impatiens/s_[12356789]_[12]_sequence.txt                             # original fastq files

/fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[129]_[012]_sequence.cor.rev.txt        # adaptor free corrected reads (long inserts)
/fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[35678]_[012]_sequence.cor.txt          # corrected reads (short inserts)

Bacterium, Staph aureus USA300

 Complete genome        : NC_010079       2872915bp Staphylococcus aureus subsp. aureus USA300_TCH1516
 In progress genome     : NZ_AASB00000000 2810505bp Staphylococcus aureus subsp. aureus USA300_TCH959, 256 contigs
 454 FLX                : Staphylococcus aureus subsp. aureus USA300_TCH959 HMP0023  http://www.ncbi.nlm.nih.gov/sra/SRX002327?report=full 
 Illumina 101bp paired  : Staphylococcus aureus subsp. aureus USA300_TCH1516         http://www.ncbi.nlm.nih.gov/sra/SRX007714?report=full

Bacterium, E coli

 Compelte genome        : NC_000913      4639675bp  Escherichia coli str. K-12 substr. MG1655

 454 FLX                : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX000348?report=full 
 Illumina 101bp paired  : Escherichia coli str. K-12 substr. MG1655                  http://www.ncbi.nlm.nih.gov/sra/SRX016044?report=full

Human, a single chromosome, medium-sized.