Dpuiu Assemblathon: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 50: | Line 50: | ||
454 FLX : Staphylococcus aureus subsp. aureus USA300_TCH959 HMP0023 http://www.ncbi.nlm.nih.gov/sra/SRX002327?report=full | 454 FLX : Staphylococcus aureus subsp. aureus USA300_TCH959 HMP0023 http://www.ncbi.nlm.nih.gov/sra/SRX002327?report=full | ||
Illumina 101bp paired : Staphylococcus aureus subsp. aureus USA300_TCH1516 http://www.ncbi.nlm.nih.gov/sra/SRX007714?report=full | Illumina 101bp paired : Staphylococcus aureus subsp. aureus USA300_TCH1516 http://www.ncbi.nlm.nih.gov/sra/SRX007714?report=full | ||
=== 454FLX === | |||
* sffToCa Output | |||
LibraryName numActiveFRG numDeletedFRG numMatedFRG readLength clearLength | |||
454p 325555 25272 115924 56301686 51225615 | |||
* DeNovo | |||
# ctg stats | |||
. ctgs min q1 q2 q3 max mean n50 sum | |||
CA.6.1.bog 18 238 21014 135971 239466 567548* 155836 277888* 2805055 | |||
newbler.2.5p1.deNovo 100 103 295 3287 39467 229053 27879 78379 2787870 | |||
# scf stats | |||
CA.6.1.bog 6 284 21014 173065 1032129 1458733* 467554 1458733* 2805325 | |||
newbler.2.5p1.deNovo 8 2475 20731 110137 1030785 1408642 349895 1408642 2799157 | |||
* Reference based(Saureus USA300) | |||
. ctgs min q1 q2 q3 max mean n50 sum | |||
newbler.2.3.refMapper 206 103 556 3098 15366 117487 12749 40687 2626469 | |||
== Bacterium, E coli == | == Bacterium, E coli == |
Revision as of 14:52, 20 December 2010
Links
Assemblers
* CA /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/wgs-6.1/Linux-amd64/bin/runCA * Newbler * Velvet * SOAPdenovo * Maq
CBCB genomes
- a bacterial genome. Instead of E. coli, we can use S. aureus USA300, which has sequence data in SRA from 454 and Illumina, paired and unpaired. Daniela has already assemblied it using CA, Newbler, Velvet, SOAPdenovo, and Maq (using its comparative assembly mode, where it aligns to a reference).
- A medium-sized eukaryote. I'd like to use the Argentine ant or the Bombus impatiens bee - I've just written to Gene Robinson to ask about the bee.
- Another eukaryote, ideally a larger one. Human would be great, but we just don't have enough time to do multiple human assemblies. So maybe another insect, or perhaps a plant if we can find one for which data is available.
If we can agree on the data sets, then the next step would be to design the experiment - decide in advance which assemblers to run and how many ways to try each one. I'm thinking we should also trim all the data with Quake.
Argentine ant
Bee, Bombus impatiens
Data
- 497,318,144 Illumina 124bp reads
- 8 libraries; inserts:
- 400bp
- 3k (outie)
- 8k (outie)
- Traces
Adapters: in 3k & 8k libraries
C CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA 3 CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT 5 GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG
Location:
/fs/szattic-asmg4/Bees/Bombus_impatiens/s_[12356789]_[12]_sequence.txt # original fastq files /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[129]_[012]_sequence.cor.rev.txt # adaptor free corrected reads (long inserts) /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[35678]_[012]_sequence.cor.txt # corrected reads (short inserts)
Bacterium, Staph aureus USA300
Complete genome : NC_010079 2872915bp Staphylococcus aureus subsp. aureus USA300_TCH1516 In progress genome : NZ_AASB00000000 2810505bp Staphylococcus aureus subsp. aureus USA300_TCH959, 256 contigs
454 FLX : Staphylococcus aureus subsp. aureus USA300_TCH959 HMP0023 http://www.ncbi.nlm.nih.gov/sra/SRX002327?report=full Illumina 101bp paired : Staphylococcus aureus subsp. aureus USA300_TCH1516 http://www.ncbi.nlm.nih.gov/sra/SRX007714?report=full
454FLX
- sffToCa Output
LibraryName numActiveFRG numDeletedFRG numMatedFRG readLength clearLength 454p 325555 25272 115924 56301686 51225615
- DeNovo
# ctg stats . ctgs min q1 q2 q3 max mean n50 sum CA.6.1.bog 18 238 21014 135971 239466 567548* 155836 277888* 2805055 newbler.2.5p1.deNovo 100 103 295 3287 39467 229053 27879 78379 2787870
# scf stats CA.6.1.bog 6 284 21014 173065 1032129 1458733* 467554 1458733* 2805325 newbler.2.5p1.deNovo 8 2475 20731 110137 1030785 1408642 349895 1408642 2799157
- Reference based(Saureus USA300)
. ctgs min q1 q2 q3 max mean n50 sum newbler.2.3.refMapper 206 103 556 3098 15366 117487 12749 40687 2626469
Bacterium, E coli
Compelte genome : NC_000913 4639675bp Escherichia coli str. K-12 substr. MG1655 454 FLX : Escherichia coli str. K-12 substr. MG1655 http://www.ncbi.nlm.nih.gov/sra/SRX000348?report=full Illumina 101bp paired : Escherichia coli str. K-12 substr. MG1655 http://www.ncbi.nlm.nih.gov/sra/SRX016044?report=full