Dpuiu Assemblathon: Difference between revisions
		
		
		
		Jump to navigation
		Jump to search
		
| (20 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| = Links = | = Links = | ||
| * [http://assemblathon.org/ The Assemblathon] University of California, Santa Cruz & UC Davis;  synthetic & real genome. | |||
| * [http://cnag.bsc.es/ dnGASP] De Novo Genome Assembly Assessment Project (dnGASP):  Centro Nacional de Análisis Genómico in Barcelona, Spain, synthetic genome | |||
| * [http://gage.cbcb.umd.edu/ GAGE] | |||
| * [http://www.genomeweb.com/informatics/us-european-teams-launch-parallel-challenges-improve-computational-methods-genom genomeweb announcement] | * [http://www.genomeweb.com/informatics/us-european-teams-launch-parallel-challenges-improve-computational-methods-genom genomeweb announcement] | ||
| = GAGE = | = GAGE = | ||
| Line 19: | Line 16: | ||
| # Which assembly software should I use?        | # Which assembly software should I use?        | ||
| # What parameters should I use when I run the software? | # What parameters should I use when I run the software? | ||
| = Read correction = | |||
| * quake | |||
|   echo frag_1.fastq      frag_2.fastq      >  genome.ls | |||
|   echo shortjump_1.fastq shortjump_2.fastq >> genome.ls | |||
|   echo longjump_1.fastq  longjump_2.fastq  >> genome.ls  | |||
|   /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/quake.py -f genome.ls  -k 18 -p 20 >&! quake.log | |||
| = Assemblers = | = Assemblers = | ||
| * [http://www.broadinstitute.org/software/allpaths-lg/blog/ Allpaths-LG ]   /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/allpaths3-35218/ | * [http://www.broadinstitute.org/software/allpaths-lg/blog/ Allpaths-LG ]     | ||
| * [http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page CA]             /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/wgs-6.1/ |   paths:  | ||
| * [http://www.ebi.ac.uk/~zerbino/velvet/ Velvet]         /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/velvet_1.0.13/ |     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/allpaths3-35218/ | ||
| * [http://soap.genomics.org.cn/soapdenovo.html SOAPdenovo]     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/SOAPdenovo-V1.05/ |     /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/ | ||
| * MSR-CA         Maryland Super-Reads + Celera Assembler.   | |||
| * [http://www.bcgsc.ca/platform/bioinfo/software/abyss ABYSS]          /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/abyss-1.2.7/ |   RunAllPaths3G \ | ||
| * [https://github.com/jts/sga/wiki SGA]            /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/sga/src/SGA/                      # Version: 0.9.8 |      PRE=$PWD REFERENCE_NAME=. DATA_SUBDIR=. RUN=allpaths SUBDIR=run1.orig THREADS=$P | ||
| * [http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page CA]               | |||
|   paths:  | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/wgs-6.1/ | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/ | |||
|   runCA \ | |||
|      -d . \ | |||
|      -p asm \ | |||
|      -s /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/runCA.parallel.spec \ | |||
|      doOverlapBasedTrimming=0 ovlOverlapper=ovl unitigger=bog bogBreakAtIntersections=0 bogBadMateDepth=1000 \ | |||
|      *.frg | |||
| * [http://www.ebi.ac.uk/~zerbino/velvet/ Velvet]           | |||
|   paths:  | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/velvet_1.0.13/ | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/ | |||
|   velveth . $K -fastq \  | |||
|     -shortPaired  frag_12.fastq \ | |||
|     -shortPaired2 shortjump_12.rev.fastq \ | |||
|     -shortPaired3 longjump_12.fastq | |||
|   velvetg . -exp_cov auto  | |||
|     -ins_length  $MEA_FRAG      -ins_length_sd  $STD_FRAG \ | |||
|     -ins_length2 $MEA_SHORTJUMP -ins_length2_sd $STD_SHORTJUMP \ | |||
|     -ins_length3 $MEA_LONGJUMP  -ins_length3_sd $STD_LONGJUMP \ | |||
|     -scaffolding yes -exportFiltered yes -unused_reads yes | |||
| * [http://soap.genomics.org.cn/soapdenovo.html SOAPdenovo]      | |||
|   paths: | |||
|      /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/SOAPdenovo-V1.05/ | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/ | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/GapCloser/ | |||
|   echo "[LIB]\navg_ins=$MEA_FRAG\nreverse_seq=0\nasm_flags=1\nrank=1\nq1=frag_1.fastq\nq2=frag_2.fastq\n" >! SOAPdenovo.config | |||
|   echo "[LIB]\navg_ins=$MEA_SHORTJUMP\nreverse_seq=1\nasm_flags=2\nrank=2\nq1=shortjump_1.fastq\nq2=shortjump_2.fastq\n" >> SOAPdenovo.config | |||
|   echo "[LIB]\navg_ins=$MEA_LONGJUMP\nreverse_seq=0\nasm_flags=2\nrank=4\nq1=longjump_1.fastq\nq2=longjump_2.fastq\n" >> SOAPdenovo.config | |||
|   SOAPdenovo all -K $K -p $P -s ./SOAPdenovo.config -o asm | |||
|   GapCloser -b SOAPdenovo.config -a asm.scafSeq -o asm2.scafSeq -t $P -p 31 | |||
| * [http://www.genome.umd.edu/SR_CA_MANUAL.htm MSR-CA         Maryland Super-Reads + Celera Assembler.] | |||
|   paths: | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/SR-CA-1.1/CA/Linux-amd64/bin/ | |||
| * [http://www.bcgsc.ca/platform/bioinfo/software/abyss ABYSS]            | |||
|   paths: | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/abyss-1.2.7/ | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/bin | |||
|   abyss-pe  \ | |||
|     k=$K n=5 name=asm lib='frag short' frag=frag_12.fastq short=short_12.fastq aligner=bowtie | |||
| * [https://github.com/jts/sga/wiki SGA]              | |||
|   paths: | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/sga/src/SGA/                      # Version: 0.9.8 | |||
|     /fs/szdevel/core-cbcb-software/Linux-x86_64/bin | |||
|   sga preprocess -p 1 frag_?.fastq > frag.pp.fa  | |||
|   sga index -t $P frag.pp.fa  | |||
|   sga correct -k $K -t $P frag.pp.fa -o frag.pp.ec.fa   | |||
|   sga index -t $K frag.pp.ec.fa  | |||
|   sga filter frag.pp.ec.fa  | |||
|   sga overlap -t $P frag.pp.ec.filter.pass.fa | |||
|   sga assemble frag.pp.ec.filter.pass.asqg.gz | |||
| = CBCB genomes = | = CBCB genomes = | ||
| Line 106: | Line 182: | ||
|   CA.quakeCor                /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/CA.s_1-8.cor.redo2/                               # Celera Assembly |   CA.quakeCor                /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/CA.s_1-8.cor.redo2/                               # Celera Assembly | ||
|   SOAPdenovo.quakeCor(K=47)  /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.K47.s_1-9.cor/                         # SOAPdenovo assembly (2011) K=47 quake corrected reads |   SOAPdenovo.quakeCor(K=47)  /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.K47.s_1-9.cor/                         # SOAPdenovo assembly (2011) K=47 quake corrected reads | ||
|  #SOAPdenovo.orig(K=47)      /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.K47.s_1-9.orig/                       # SOAPdenovo assembly (2011) K=47 original reads    | |||
|  #SOAPdenovo.quakeCor(K=31)  /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.s_1-9.cor/                            # SOAPdenovo assembly (2010) K=31 quake corrected reads (prev assembler version) | |||
|  MSR-CA                     /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/MSR-CA  ; genome9.umd.edu:/genome9/raid/alekseyz/GAGE/bombus/assembly/CA       # MSR-CA | |||
| == Staph aureus USA300 == | == Staph aureus USA300 == | ||
| Line 418: | Line 496: | ||
| * Location | * Location | ||
|    /fs/szattic-asmg7/argentine_ant/Illumina/ |    /fs/szattic-asmg7/argentine_ant/Illumina/ | ||
| = UC Assemblaton1 = | |||
| * [http://www.drive5.com/evolver/ Evolver] | |||
| * [https://github.com/jstjohn/SimSeq Read simulator] | |||
| * [http://korflab.ucdavis.edu/Datasets/Assemblathon/Assemblathon1/ Data download] | |||
| * speciesA.diploid.fa len | |||
|   chr0_1         76252953 | |||
|   chr0_2         76285600 | |||
|   chr1_1         18509915 | |||
|   chr1_2         18539192 | |||
|   chr2_1         17699484 | |||
|   chr2_2         17710169 | |||
| = UC Assemblaton1 = | |||
| ... | |||
Latest revision as of 17:07, 1 August 2011
Links
- The Assemblathon University of California, Santa Cruz & UC Davis; synthetic & real genome.
- dnGASP De Novo Genome Assembly Assessment Project (dnGASP): Centro Nacional de Análisis Genómico in Barcelona, Spain, synthetic genome
- GAGE
- genomeweb announcement
GAGE
- Location
http://gage.cbcb.umd.edu/ -> /fs/web-cbcb-new/html/gage
- Answer following questions:
- How much sequencing coverage do I need for my genome project?
- What can I expect the resulting assembly to look like?
- Which assembly software should I use?
- What parameters should I use when I run the software?
Read correction
- quake
echo frag_1.fastq frag_2.fastq > genome.ls echo shortjump_1.fastq shortjump_2.fastq >> genome.ls echo longjump_1.fastq longjump_2.fastq >> genome.ls /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/quake.py -f genome.ls -k 18 -p 20 >&! quake.log
Assemblers
paths: /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/allpaths3-35218/ /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/
 RunAllPaths3G \
    PRE=$PWD REFERENCE_NAME=. DATA_SUBDIR=. RUN=allpaths SUBDIR=run1.orig THREADS=$P
paths: /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/wgs-6.1/ /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/
 runCA \
    -d . \
    -p asm \
    -s /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/runCA.parallel.spec \
    doOverlapBasedTrimming=0 ovlOverlapper=ovl unitigger=bog bogBreakAtIntersections=0 bogBadMateDepth=1000 \
    *.frg
paths: /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/velvet_1.0.13/ /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/
velveth . $K -fastq \ -shortPaired frag_12.fastq \ -shortPaired2 shortjump_12.rev.fastq \ -shortPaired3 longjump_12.fastq velvetg . -exp_cov auto -ins_length $MEA_FRAG -ins_length_sd $STD_FRAG \ -ins_length2 $MEA_SHORTJUMP -ins_length2_sd $STD_SHORTJUMP \ -ins_length3 $MEA_LONGJUMP -ins_length3_sd $STD_LONGJUMP \ -scaffolding yes -exportFiltered yes -unused_reads yes
paths: /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/SOAPdenovo-V1.05/ /fs/szdevel/core-cbcb-software/Linux-x86_64/bin/ /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/GapCloser/
echo "[LIB]\navg_ins=$MEA_FRAG\nreverse_seq=0\nasm_flags=1\nrank=1\nq1=frag_1.fastq\nq2=frag_2.fastq\n" >! SOAPdenovo.config echo "[LIB]\navg_ins=$MEA_SHORTJUMP\nreverse_seq=1\nasm_flags=2\nrank=2\nq1=shortjump_1.fastq\nq2=shortjump_2.fastq\n" >> SOAPdenovo.config echo "[LIB]\navg_ins=$MEA_LONGJUMP\nreverse_seq=0\nasm_flags=2\nrank=4\nq1=longjump_1.fastq\nq2=longjump_2.fastq\n" >> SOAPdenovo.config SOAPdenovo all -K $K -p $P -s ./SOAPdenovo.config -o asm GapCloser -b SOAPdenovo.config -a asm.scafSeq -o asm2.scafSeq -t $P -p 31
paths: /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/SR-CA-1.1/CA/Linux-amd64/bin/
paths: /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/abyss-1.2.7/ /fs/szdevel/core-cbcb-software/Linux-x86_64/bin
abyss-pe \ k=$K n=5 name=asm lib='frag short' frag=frag_12.fastq short=short_12.fastq aligner=bowtie
paths: /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/sga/src/SGA/ # Version: 0.9.8 /fs/szdevel/core-cbcb-software/Linux-x86_64/bin
sga preprocess -p 1 frag_?.fastq > frag.pp.fa sga index -t $P frag.pp.fa sga correct -k $K -t $P frag.pp.fa -o frag.pp.ec.fa sga index -t $K frag.pp.ec.fa sga filter frag.pp.ec.fa sga overlap -t $P frag.pp.ec.filter.pass.fa sga assemble frag.pp.ec.filter.pass.asqg.gz
CBCB genomes
- a bacterial genome. Instead of E. coli, we can use S. aureus USA300, which has sequence data in SRA from 454 and Illumina, paired and unpaired. Daniela has already assemblied it using CA, Newbler, Velvet, SOAPdenovo, and Maq (using its comparative assembly mode, where it aligns to a reference).
- A medium-sized eukaryote. I'd like to use the Argentine ant or the Bombus impatiens bee - I've just written to Gene Robinson to ask about the bee.
- Another eukaryote, ideally a larger one. Human would be great, but we just don't have enough time to do multiple human assemblies. So maybe another insect, or perhaps a plant if we can find one for which data is available.
If we can agree on the data sets, then the next step would be to design the experiment - decide in advance which assemblers to run and how many ways to try each one. I'm thinking we should also trim all the data with Quake.
Bombus impatiens
Data
- Estimated haploid genome size: 250M
- 497,318,144 Illumina 124bp reads (246X cvg)
- Reads:
. readLen orientation insLen #reads readCvg comments frag 124 innie 400 303,118,594 150X 6 libs short 124 outie 3-8K 194,199,550 96X 2 libs
- Issue: Adapters: in 3k & 8k libraries
C CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA 3 CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT 5 GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG
- Read directories:
/fs/szattic-asmg4/Bees/Bombus_impatiens/s_[12356789]_[12]_sequence.txt # original fastq files /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[129]_[012]_sequence.cor.rev.txt # adaptor free corrected reads (long inserts) /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[35678]_[012]_sequence.cor.txt # corrected reads (short inserts)
- Original read files:
/fs/szattic-asmg4/Bees/Bombus_impatiens/s_1_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_1_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_2_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_2_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_3_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_3_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_5_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_5_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_6_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_6_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_7_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_7_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_8_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_8_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_9_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_9_2_sequence.txt
- Quake corrected files:
/fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_1_1_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_1_2_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_2_1_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_2_2_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_3_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_3_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_5_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_5_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_6_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_6_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_7_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_7_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_8_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_8_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_9_1_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_9_2_sequence.cor.rev.txt
- k_unitig corrected files: (in progress --Dpuiu 10:38, 5 April 2011 (EDT))
Assembly
- Bombus_impatiens.assembly.summary
- Assembly directories:
CA.quakeCor /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/CA.s_1-8.cor.redo2/ # Celera Assembly SOAPdenovo.quakeCor(K=47) /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.K47.s_1-9.cor/ # SOAPdenovo assembly (2011) K=47 quake corrected reads #SOAPdenovo.orig(K=47) /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.K47.s_1-9.orig/ # SOAPdenovo assembly (2011) K=47 original reads #SOAPdenovo.quakeCor(K=31) /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.s_1-9.cor/ # SOAPdenovo assembly (2010) K=31 quake corrected reads (prev assembler version) MSR-CA /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/MSR-CA ; genome9.umd.edu:/genome9/raid/alekseyz/GAGE/bombus/assembly/CA # MSR-CA
Staph aureus USA300
Data
- Complete genome:
 id             len     description
 NC_010079      2872915 Staphylococcus aureus subsp. aureus USA300_TCH1516, complete genome
 NC_010063.1    27041   Staphylococcus aureus subsp. aureus USA300_TCH1516 plasmid pUSA300HOUMR, complete sequence
 NC_012417.1    3125    Staphylococcus aureus subsp. aureus USA300_TCH1516 plasmid pUSA01-HOU, complete sequence
                2903081 total 
- Reads (90X):
. readLen insLen orientation #reads readCvg SRA runs frag 101 180 innie 1,294,104 45X SRR022868 shortjump 37 3500 outie 3,494,070 45X SRR022865
SRP001086 Staphylococcus aureus Sequencing on Illumina SRX007714 pair lib SRX007711 jumping lib
- Read directories:
/nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminap100/ /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminaj/
- Original read files:
/nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminap100/frag_1.fastq /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminap100/frag_2.fastq /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminaj/short_1.fastq /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminaj/short_2.fastq
- Quake corrected files:
/nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/frag_1.cor.fastq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/frag_2.cor.fastq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/short_1.cor.fastq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/short_2.cor.fastq
- Allpaths-LG corrected files:
/fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/frag_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/frag_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/short_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/short_2.cor.fasta
- k_unitig corrected files:
/nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/frag_1.cor.seq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/frag_2.cor.seq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/short_1.cor.seq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/short_2.cor.seq
Assembly
- Staphylococcus_aureus.genome.summary
- Assembly directories:
allpaths.orig /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpaths CA.orig /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/CA.orig CA.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/CA.quakeCor.k18 CA.allpathsCor /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/CA.allpathsCor CA.SuperReads /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/CA.SuperReads.latest SOAPdenovo.orig(K=31) /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/SOAPdenovo.K31.orig SOAPdenovo.orig(K=47) /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/SOAPdenovo.K47.orig SOAPdenovo.quakeCor(K=31) /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/SOAPdenovo.K31.quakeCor.k18 SOAPdenovo.quakeCor(K=47) /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/SOAPdenovo.K47.quakeCor.k18 SOAPdenovo.allpathsCor(K=31) /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/SOAPdenovo.K31.allpathsCor SOAPdenovo.allpathsCor(K=47) /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/SOAPdenovo.K47.allpathsCor velvet.orig /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/velvet.orig velvet.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/velvet.quakeCor.k18 velvet.allpathsCor /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/velvet.allpathsCor ABYSS.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/ABYSS.K31.quakeCor.k18 SGA.orig /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/SGA.orig
- SOAPdenovo v1.05 :
- new quake version did not help much (quake-0.2.2 vs davek44-error_correction-28dbe11)
- SOAPdenovo map -K 37+ : fails on quakeCor.k18 corrected reads
- "according" to kmerFreq , should probably not use -K >47
- longer kmer => longer scaffolds (K=63 : largest N50scf)
- longer kmer => shorted contigs (K=31 : largest N50ctg)
- K40+ too large: no "valley" in the kmerFreq histogram
 
paste SOAPdenovo.K??.quakeCor.k18/genome.K??.kmerFreq | nl0 | head paste SOAPdenovo.K??.allpathsCor/genome.K??.kmerFreq | nl0 | more
Rhodobacter sphaeroides
Data
- Complete genome: 2 chromosomes, 5 plasmids
 id             len     description
 CP000143       3188609 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence.
 CP000144       943016  Rhodobacter sphaeroides 2.4.1 chromosome 2, complete sequence.
 DQ232586       114045  Rhodobacter sphaeroides 2.4.1 plasmid A, partial sequence.
 CP000145       114178  Rhodobacter sphaeroides 2.4.1 plasmid B, complete sequence.
 CP000146       105284  Rhodobacter sphaeroides 2.4.1 plasmid C, complete sequence.
 CP000147       100828  Rhodobacter sphaeroides 2.4.1 plasmid D, complete sequence.
 DQ232587       37100   Rhodobacter sphaeroides 2.4.1 plasmid E, partial sequence.
                4603060 total 
- Reads (90X):
. readLen insLen orientation #reads readCvg SRA runs frag 101 180 innie 2,050,868 45X SRR081522 shortjump 101 3500 outie 2,050,868 45X SRR034528
- SRA traces
SRX033397 pair lib ; readLen=101 ; insMea=180 SRX016063 jumping lib ; readLen=101 ; insMea~=3455; ~15% of the mates are short inserts (~250bp)
- Original read files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminap/frag_1.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminap/frag_2.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminaj/short_1.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminaj/short_2.fastq
- Quake corrected read files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/frag_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/frag_2.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/short_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/short_2.cor.fastq
- QuakeIter2 corrected read files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/frag_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/frag_2.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/short_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/short_2.cor.fastq
- Allpaths-LG corrected files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/frag_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/frag_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/short_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/short_2.cor.fasta
- k_unitig corrected files:
/nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/frag_1.cor.seq /nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/frag_2.cor.seq /nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/short_1.cor.seq /nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/short_2.cor.seq
Assembly
- Assembly directories:
allpaths.orig /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpaths CA.orig /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/CA.orig CA.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/CA.quakeCor.k18 CA.allpathsCor /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/CA.allpathsCor CA.SuperReads /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/CA.SuperReads.latest SOAPdenovo.orig(K=31) /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/SOAPdenovo.orig/K31 SOAPdenovo.orig(K=47) /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/SOAPdenovo.orig SOAPdenovo.quakeCor(K=31) /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/SOAPdenovo.quakeCor.k18/K31 SOAPdenovo.quakeCor(K=47) /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/SOAPdenovo.quakeCor.k18 SOAPdenovo.allpathsCor(K=31) /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/SOAPdenovo.allpathsCor/K31 SOAPdenovo.allpathsCor(K=47) /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/SOAPdenovo.allpathsCor velvet.orig /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/velvet.orig velvet.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/velvet.quakeCor velvet.allpathsCor /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/velvet.allpathsCor ABYSS.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/ABYSS.K31.quakeCor.k18 SGA.orig /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/SGA.quakeCor.k18
Human, a single chromosome, medium-sized
Data
- Latest online assembly
ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/seq/ NC_000014.8 107,349,540 # total, with telomeric N's 88,289,540 # clean
- Human bowtie indexes
/fs/szdata/bowtie_indexes/h_sapiens_37_asm
- Chr14 filtered reads (69.3X):
. readLen insLen orientation #reads readCvg frag 101 155 innie 36,504,800 42 shortjump 101 2283-2803 outie 22,669,408 26 longjump 76-101 35295-35318 innie 2,405,064 1.3
- Illumina reads (all genome)
Human NA12878 Genome on Illumina ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/litesra/SRP/SRP003/SRP003680/ ginko:/scratch1/Human_NA12878_on_Illumina/
#Fragment (mean insert size: 155bp, SD 26), 101 bp read length Lib #Spots #Bases #Reads #Mates ReadLen InsMea InStd InsMin InsMax TrimReadLen Comments SRR067787 82.4M 16.6G 652448124 324283604 101 155 26 77 458 Human HapMap individual NA12878 HiSeq 2000 SRR067789 82.6M 16.7G 654133372 324876520 101 155 26 77 458 SRR067780 83.3M 16.8G 660001672 328021140 101 155 26 77 458 SRR067791 83.0M 16.8G 657963460 327205952 101 155 26 77 458 SRR067793 77.0M 15.5G 609634756 303094956 101 155 26 77 458 SRR067784 83.3M 16.8G 660118460 328244560 101 155 26 77 458 SRR067785 81.6M 16.5G 646350512 321174108 101 155 26 77 458 SRR067792 83.8M 16.9G 663997828 330084304 101 155 26 77 458 SRR067577 46.3M 9.3G 367673108 183472948 101 155 26 77 458 Human HapMap individual NA12878 Illumina GAII SRR067579 46.0M 9.3G 365743380 182532676 101 155 26 77 458 SRR067578 46.5M 9.4G 369557476 184410788 101 155 26 77 458 #Jumping1 (mean insert size: 2283bp, SD 221), 101 bp read length SRR067771 81.5M 16.5G 644846296 320822716 101 2283 221 1620 2586 Human HapMap individual NA12878 HiSeq 2000 SRR067777 82.6M 16.7G 653163608 325232944 101 2283 221 1620 2586 SRR067781 82.1M 16.6G 649748720 323656576 101 2283 221 1620 2586 SRR067776 79.9M 16.1G 632590344 315165892 101 2283 221 1620 2586 #Jumping2 (mean insert size: 2803bp, SD 271), 101 bp read length SRR067773 93.1M 18.8G 736456192 366884512 101 2803 271 1990 3106 Human HapMap individual NA12878 HiSeq 2000 SRR067779 94.0M 19.0G 743564440 370214028 101 2803 271 1990 3106 SRR067778 97.3M 19.6G 767984324 381879652 101 2803 271 1990 3106 SRR067786 94.6M 19.1G 747631104 372002548 101 2803 271 1990 3106 #Fosmid1 (mean insert size: 35295bp, SD 2703), 76 bp read length SRR068214 13.1M 2.0G 104505420 52087176 76 35295 2703 27186 35523 36(trim 20bp at 5',20bp at 3') Human HapMap individual NA12878 Illumina GAII SRR068211 4.8M 736.9M 38612196 19252408 76 35295 2703 27186 35523 36(trim 20bp at 5',20bp at 3') Human HapMap individual NA12878 Illumina GAII #Fosmid2 (mean insert size: 35318bp, SD 2759), 101 bp read length SRR068335 67.4M 13.6G 533805860 265481252 101 35318 2759 27041 35621 61(trim 20bp at 5',20bp at 3') Human HapMap individual NA12878 HiSeq 2000
- Comments
- Human chromosome 14. The chromosome may change, but this is a new data set with 100X coverage in 100bp and 76bp reads, just assembled by the Broad group using Allpaths-LG and Soap. We've downloaded the data and Todd is going to create a data set representing just chr 14, to make it feasible. We'll then try to assemble that data w/all 3 assemblers: CA, SOAP, Allpaths-LG.
 
- Illumina chr14 reads (aligned with bowtie & corrected)
/fs/szattic-asmg8/treangen/*fastq hard to align: bowtie -5 20 -3 20 -e 1000 ... jumping reads: only the ones aligned within coorect mean, stdev selected; these libraries usually have a high % of short inserts!!!
- Original read files:
/fs/szattic-asmg8/treangen/chr14_fragment_1.fastq /fs/szattic-asmg8/treangen/chr14_fragment_2.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_1.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_2.fastq /fs/szattic-asmg8/treangen/chr14_longjump_1.fastq /fs/szattic-asmg8/treangen/chr14_longjump_2.fastq
- Quake corrected files:
/fs/szattic-asmg8/treangen/chr14_fragment_1.cor.fastq /fs/szattic-asmg8/treangen/chr14_fragment_2.cor.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_1.cor.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_2.cor.fastq /fs/szattic-asmg8/treangen/chr14_longjump_1.cor.fastq /fs/szattic-asmg8/treangen/chr14_longjump_2.cor.fastq
- Allpaths-LG corrected files:
/fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_fragment_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_fragment_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_shortjump_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_shortjump_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_longjump_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_longjump_2.cor.fasta
Assembly
- Assembly directories
allpaths /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpaths CA.allpathsCor /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/CA.allpathsCor , /scratch1/dpuiu/HTS/Homo_sapiens/Assembly/CA.allpathsCor CA.quakeCor /fs/szattic-asmg8/tmagoc/GAGE/human CA.SuperReads ginkgo:/scratch1/dpuiu/HTS/Homo_sapiens/Assembly/CA.SuperReads SOAPdenovo.orig(K=47) /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.orig/ SOAPdenovo.quakeCor(K=31) /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.quakeCor/K31 , /scratch1/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.quakeCor/K31 SOAPdenovo.quakeCor(K=47) /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.quakeCor/ , /scratch1/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.quakeCor SOAPdenovo.allpathsCor(K=31) /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.allpathsCor/K31 , /scratch1/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.allpathsCor/K31 SOAPdenovo.allpathsCor(K=47) /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.allpathsCor , /scratch1/dpuiu/HTS/Homo_sapiens/Assembly/SOAPdenovo.allpathsCor velvet.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/velvet.quakeCor ABYSS.quakeCor /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/ABYSS.K31.quakeCor.K18 , /scratch1/dpuiu/HTS/Homo_sapiens/Assembly/ABYSS.K31.quakeCor.K18 SGA.orig /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/SGA.orig , /scratch1/dpuiu/HTS/Homo_sapiens/Assembly/SGA.orig
Allpaths-lg
- Read counts
orig cor cor(paired,all >64bp) chr14_fragment_12.fastq 36504800 35571477(97.44%) 34268444(10+bp ovl F/R) chr14_shortjump_12.fastq 22669408 11255320(49.64%) 11255320 chr14_longjump_12.fastq 2405064 187398 (7.79%) 187398
- Assembly stats:
. elem min q1 q2 q3 max mean n50 sum scf 418 96 131 256 1236 81646936 209781 81646936 87688255 scf10K+ 17 10330 11780 26536 269876 81646936 5135452 81646936 87302692 ctg 4722 96 2342 9101 24174 240773 17887 36530 84461065
- Runtime 1104299.893u 126549.756s 18:50:05.80 1815.2% 0+0k 0+0io 8463pf+0w
18hr 50min : multiprocessor 1104299/(3600*24)=12.78 days : singleprocessor
Argentine ant
Data
#reads readLen readCvg Shotgun: 39,741,216 75 12 3kb: 46,435,880 75 13 8kb: 43,839,748 75 13 Total: 130,016,844 75 40
- Location
/fs/szattic-asmg7/argentine_ant/Illumina/
UC Assemblaton1
- Evolver
- Read simulator
- Data download
- speciesA.diploid.fa len
chr0_1 76252953 chr0_2 76285600 chr1_1 18509915 chr1_2 18539192 chr2_1 17699484 chr2_2 17710169
UC Assemblaton1
...