Dpuiu Assemblathon: Difference between revisions
		
		
		
		Jump to navigation
		Jump to search
		
| Line 263: | Line 263: | ||
|    #Fragment (mean insert size: 155bp, SD 26), 101 bp read length |    #Fragment (mean insert size: 155bp, SD 26), 101 bp read length | ||
|    Lib          #Spots  #Bases  #Reads     #Mates     ReadLen  InsMea  InStd  InsMin  InsMax   TrimReadLen |    Lib          #Spots  #Bases  #Reads     #Mates     ReadLen  InsMea  InStd  InsMin  InsMax   TrimReadLen    Comments | ||
|    SRR067787    82.4M   16.6G   652448124  324283604  101      155     26     77      458  |    SRR067787    82.4M   16.6G   652448124  324283604  101      155     26     77      458                     Human HapMap individual NA12878 HiSeq 2000 | ||
|    SRR067789    82.6M   16.7G   654133372  324876520  101      155     26     77      458       |    SRR067789    82.6M   16.7G   654133372  324876520  101      155     26     77      458       | ||
|    SRR067780    83.3M   16.8G   660001672  328021140  101      155     26     77      458       |    SRR067780    83.3M   16.8G   660001672  328021140  101      155     26     77      458       | ||
| Line 271: | Line 271: | ||
|    SRR067784    83.3M   16.8G   660118460  328244560  101      155     26     77      458       |    SRR067784    83.3M   16.8G   660118460  328244560  101      155     26     77      458       | ||
|    SRR067785    81.6M   16.5G   646350512  321174108  101      155     26     77      458       |    SRR067785    81.6M   16.5G   646350512  321174108  101      155     26     77      458       | ||
|    SRR067792    83.8M   16.9G   663997828  330084304  101      155     26     77      458  |    SRR067792    83.8M   16.9G   663997828  330084304  101      155     26     77      458                       | ||
|    SRR067577    46.3M   9.3G    367673108  183472948  101      155     26     77      458  | |||
|    SRR067577    46.3M   9.3G    367673108  183472948  101      155     26     77      458                      Human HapMap individual NA12878 Illumina GAII | |||
|    SRR067579    46.0M   9.3G    365743380  182532676  101      155     26     77      458       |    SRR067579    46.0M   9.3G    365743380  182532676  101      155     26     77      458       | ||
|    SRR067578    46.5M   9.4G    369557476  184410788  101      155     26     77      458       |    SRR067578    46.5M   9.4G    369557476  184410788  101      155     26     77      458       | ||
|    #Jumping1 (mean insert size: 2283bp, SD 221), 101 bp read length |    #Jumping1 (mean insert size: 2283bp, SD 221), 101 bp read length | ||
|    SRR067771    81.5M   16.5G   644846296  320822716  101      2283    221    1620    2586  |    SRR067771    81.5M   16.5G   644846296  320822716  101      2283    221    1620    2586                     Human HapMap individual NA12878 HiSeq 2000 | ||
|    SRR067777    82.6M   16.7G   653163608  325232944  101      2283    221    1620    2586      |    SRR067777    82.6M   16.7G   653163608  325232944  101      2283    221    1620    2586      | ||
|    SRR067781    82.1M   16.6G   649748720  323656576  101      2283    221    1620    2586      |    SRR067781    82.1M   16.6G   649748720  323656576  101      2283    221    1620    2586      | ||
| Line 283: | Line 284: | ||
|    #Jumping2 (mean insert size: 2803bp, SD 271), 101 bp read length |    #Jumping2 (mean insert size: 2803bp, SD 271), 101 bp read length | ||
|    SRR067773    93.1M   18.8G   736456192  366884512  101      2803    271    1990    3106  |    SRR067773    93.1M   18.8G   736456192  366884512  101      2803    271    1990    3106                      Human HapMap individual NA12878 HiSeq 2000 | ||
|    SRR067779    94.0M   19.0G   743564440  370214028  101      2803    271    1990    3106      |    SRR067779    94.0M   19.0G   743564440  370214028  101      2803    271    1990    3106      | ||
|    SRR067778    97.3M   19.6G   767984324  381879652  101      2803    271    1990    3106      |    SRR067778    97.3M   19.6G   767984324  381879652  101      2803    271    1990    3106      | ||
| Line 289: | Line 290: | ||
|    #Fosmid1  (mean insert size: 35295bp, SD 2703), 76 bp read length |    #Fosmid1  (mean insert size: 35295bp, SD 2703), 76 bp read length | ||
|    SRR068214    13.1M   2.0G    104505420  52087176   76       35295   2703   27186   35523   36(trim 20bp at 5',20bp at 3') |    SRR068214    13.1M   2.0G    104505420  52087176   76       35295   2703   27186   35523   36(trim 20bp at 5',20bp at 3')       Human HapMap individual NA12878 Illumina GAII | ||
|    SRR068211    4.8M    736.9M  38612196   19252408   76       35295   2703   27186   35523   36(trim 20bp at 5',20bp at 3') |    SRR068211    4.8M    736.9M  38612196   19252408   76       35295   2703   27186   35523   36(trim 20bp at 5',20bp at 3')       Human HapMap individual NA12878 Illumina GAII | ||
|    #Fosmid2 (mean insert size: 35318bp, SD 2759),  101 bp read length |    #Fosmid2 (mean insert size: 35318bp, SD 2759),  101 bp read length | ||
|    SRR068335    67.4M   13.6G   533805860  265481252  101      35318   2759   27041   35621   61(trim 20bp at 5',20bp at 3') |    SRR068335    67.4M   13.6G   533805860  265481252  101      35318   2759   27041   35621   61(trim 20bp at 5',20bp at 3')       Human HapMap individual NA12878 HiSeq 2000 | ||
| * Comments | * Comments | ||
Revision as of 18:11, 2 June 2011
Links
- The Assemblathon: University of California, Santa Cruz & UC Davis; synthetic & real genome.
- De Novo Genome Assembly Assessment Project (dnGASP): Centro Nacional de Análisis Genómico in Barcelona, Spain, synthetic genome.
- Genome Assembly Gold-Standard Evaluation (GAGE): UC Berkeley and the University of Maryland
GAGE
- Location
http://gage.cbcb.umd.edu/ -> /fs/web-cbcb-new/html/gage
- Answer following questions:
- How much sequencing coverage do I need for my genome project?
- What can I expect the resulting assembly to look like?
- Which assembly software should I use?
- What parameters should I use when I run the software?
Assemblers
* Allpaths-LG /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/allpaths3-35218/ * CA /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/wgs-6.1/ * Velvet /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/velvet_1.0.13/ * SOAPdenovo /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/SOAPdenovo_Release1.04 * MSR-CA Maryland Super-Reads + Celera Assembler. * ABYSS /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/abyss-1.2.7/ * SGA /fs/szdevel/core-cbcb-software/Linux-x86_64/packages/sga/src/SGA/ # Version: 0.9.8
CBCB genomes
- a bacterial genome. Instead of E. coli, we can use S. aureus USA300, which has sequence data in SRA from 454 and Illumina, paired and unpaired. Daniela has already assemblied it using CA, Newbler, Velvet, SOAPdenovo, and Maq (using its comparative assembly mode, where it aligns to a reference).
- A medium-sized eukaryote. I'd like to use the Argentine ant or the Bombus impatiens bee - I've just written to Gene Robinson to ask about the bee.
- Another eukaryote, ideally a larger one. Human would be great, but we just don't have enough time to do multiple human assemblies. So maybe another insect, or perhaps a plant if we can find one for which data is available.
If we can agree on the data sets, then the next step would be to design the experiment - decide in advance which assemblers to run and how many ways to try each one. I'm thinking we should also trim all the data with Quake.
Argentine ant
Bombus impatiens
Data
- Estimated haploid genome size: 250M
- 497,318,144 Illumina 124bp reads (246X cvg)
- Reads:
. readLen orientation insLen #reads readCvg comments frag 124 innie 400 303,118,594 150X 6 libs short 124 outie 3-8K 194,199,550 96X 2 libs
- Issue: Adapters: in 3k & 8k libraries
C CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA 3 CGGCATTCCTGCTGAACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT 5 GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG
- Read directories:
/fs/szattic-asmg4/Bees/Bombus_impatiens/s_[12356789]_[12]_sequence.txt # original fastq files /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[129]_[012]_sequence.cor.rev.txt # adaptor free corrected reads (long inserts) /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_[35678]_[012]_sequence.cor.txt # corrected reads (short inserts)
- Original read files:
/fs/szattic-asmg4/Bees/Bombus_impatiens/s_1_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_1_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_2_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_2_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_3_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_3_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_5_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_5_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_6_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_6_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_7_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_7_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_8_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_8_2_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_9_1_sequence.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/s_9_2_sequence.txt
- Quake corrected files:
/fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_1_1_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_1_2_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_2_1_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_2_2_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_3_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_3_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_5_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_5_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_6_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_6_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_7_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_7_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_8_1_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_8_2_sequence.cor.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_9_1_sequence.cor.rev.txt /fs/szattic-asmg4/Bees/Bombus_impatiens/error_free/fastq/s_9_2_sequence.cor.rev.txt
- k_unitig corrected files: (in progress --Dpuiu 10:38, 5 April 2011 (EDT))
Assembly
- Bombus_impatiens.assembly.summary
- Assembly directories:
/fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/CA.s_1-8.cor.redo2/ # best Celera Assembly /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.s_1-9.cor/ # best SOAPdenovo assembly (2010) /fs/szattic-asmg5/Bees/Bombus_impatiens/Assembly/SOAPdenovo.K47.s_1-9.cor # best SOAPdenovo assembly (2011)
Staph aureus USA300
Data
- Complete genome:
 id             len     description
 NC_010079      2872915 Staphylococcus aureus subsp. aureus USA300_TCH1516, complete genome
 NC_010063.1    27041   Staphylococcus aureus subsp. aureus USA300_TCH1516 plasmid pUSA300HOUMR, complete sequence
 NC_012417.1    3125    Staphylococcus aureus subsp. aureus USA300_TCH1516 plasmid pUSA01-HOU, complete sequence
                2903081 total 
- Reads (90X):
. readLen insLen orientation #reads readCvg SRA runs frag 101 180 innie 1,294,104 45X SRR022868 shortjump 37 3500 outie 3,494,070 45X SRR022865
SRP001086 Staphylococcus aureus Sequencing on Illumina SRX007714 pair lib SRX007711 jumping lib
- Read directories:
/nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminap100/ /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminaj/
- Original read files:
/nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminap100/frag_1.fastq /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminap100/frag_2.fastq /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminaj/short_1.fastq /nfshomes/dpuiu/HTS/Staphylococcus_aureus/Data/Illuminaj/short_2.fastq
- Quake corrected files:
/nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/frag_1.cor.fastq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/frag_2.cor.fastq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/short_1.cor.fastq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/quake/short_2.cor.fastq
- Allpaths-LG corrected files:
/fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/frag_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/frag_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/short_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/allpathsCor/short_2.cor.fasta
- k_unitig corrected files:
/nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/frag_1.cor.seq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/frag_2.cor.seq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/short_1.cor.seq /nfshomes/dpuiu/GAGE/Staphylococcus_aureus/Illumina.180_45X.3500_45X/k_unitig/short_2.cor.seq
Assembly
- Staphylococcus_aureus.genome.summary
- Assembly directories:
~dpuiu/GAGE/Staphylococcus_aureus/ /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X /fs/szattic-asmg5/dpuiu/HTS/Staphylococcus_aureus/Illumina.180_45X.3500_45X/
- SOAPdenovo v1.05 :
- new quake version did not help much (quake-0.2.2 vs davek44-error_correction-28dbe11)
- SOAPdenovo map -K 37+ : fails on quakeCor.k18 corrected reads
- "according" to kmerFreq , should probably not use -K >47
- longer kmer => longer scaffolds (K=63 : largest N50scf)
- longer kmer => shorted contigs (K=31 : largest N50ctg)
- K40+ too large: no "valley" in the kmerFreq histogram
 
paste SOAPdenovo.K??.quakeCor.k18/genome.K??.kmerFreq | nl0 | head paste SOAPdenovo.K??.allpathsCor/genome.K??.kmerFreq | nl0 | more
Rhodobacter sphaeroides
Data
- Complete genome: 2 chromosomes, 5 plasmids
 id             len     description
 CP000143       3188609 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence.
 CP000144       943016  Rhodobacter sphaeroides 2.4.1 chromosome 2, complete sequence.
 DQ232586       114045  Rhodobacter sphaeroides 2.4.1 plasmid A, partial sequence.
 CP000145       114178  Rhodobacter sphaeroides 2.4.1 plasmid B, complete sequence.
 CP000146       105284  Rhodobacter sphaeroides 2.4.1 plasmid C, complete sequence.
 CP000147       100828  Rhodobacter sphaeroides 2.4.1 plasmid D, complete sequence.
 DQ232587       37100   Rhodobacter sphaeroides 2.4.1 plasmid E, partial sequence.
                4603060 total 
- Reads (90X):
. readLen insLen orientation #reads readCvg SRA runs frag 101 180 innie 2,050,868 45X SRR081522 shortjump 101 3500 outie 2,050,868 45X SRR034528
- SRA traces
SRX033397 pair lib ; readLen=101 ; insMea=180 SRX016063 jumping lib ; readLen=101 ; insMea~=3455; ~15% of the mates are short inserts (~250bp)
- Original read files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminap/frag_1.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminap/frag_2.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminaj/short_1.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Data/Illuminaj/short_2.fastq
- Quake corrected read files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/frag_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/frag_2.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/short_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/short_2.cor.fastq
- QuakeIter2 corrected read files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/frag_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/frag_2.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/short_1.cor.fastq /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/quake/iter2_dk/short_2.cor.fastq
- Allpaths-LG corrected files:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/frag_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/frag_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/short_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/allpathsCor/short_2.cor.fasta
- k_unitig corrected files:
/nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/frag_1.cor.seq /nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/frag_2.cor.seq /nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/short_1.cor.seq /nfshomes/dpuiu/GAGE/Rhodobacter_sphaeroides//Illumina.180_45X.3500_45X/k_unitig/short_2.cor.seq
Assembly
- Assembly directories:
/fs/szattic-asmg5/dpuiu/HTS/Rhodobacter_sphaeroides/Illumina.180_45X.3500_45X/ genome9.umd.edu:/genome9/raid/alekseyz/rhodobacter/assembly/complete_assembly/ # SuperRead technique shown by Aleksey (May 3 2011) genome9.umd.edu:/home/dpuiu/GAGE/Rhodobacter_sphaeroides/Assembly/SuperReads/
Human, a single chromosome, medium-sized
Data
- Latest online assembly
ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/seq/ NC_000014.8 107,349,540 # total, with telomeric N's 88,289,540 # clean
- Human bowtie indexes
/fs/szdata/bowtie_indexes/h_sapiens_37_asm
- Chr14 filtered reads (69.3X):
. readLen insLen orientation #reads readCvg frag 101 155 innie 36,504,800 42 shortjump 101 2283-2803 outie 22,669,408 26 longjump 76-101 35295-35318 innie 2,405,064 1.3
- Illumina reads (all genome)
Human NA12878 Genome on Illumina ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/litesra/SRP/SRP003/SRP003680/ ginko:/scratch1/Human_NA12878_on_Illumina/
#Fragment (mean insert size: 155bp, SD 26), 101 bp read length Lib #Spots #Bases #Reads #Mates ReadLen InsMea InStd InsMin InsMax TrimReadLen Comments SRR067787 82.4M 16.6G 652448124 324283604 101 155 26 77 458 Human HapMap individual NA12878 HiSeq 2000 SRR067789 82.6M 16.7G 654133372 324876520 101 155 26 77 458 SRR067780 83.3M 16.8G 660001672 328021140 101 155 26 77 458 SRR067791 83.0M 16.8G 657963460 327205952 101 155 26 77 458 SRR067793 77.0M 15.5G 609634756 303094956 101 155 26 77 458 SRR067784 83.3M 16.8G 660118460 328244560 101 155 26 77 458 SRR067785 81.6M 16.5G 646350512 321174108 101 155 26 77 458 SRR067792 83.8M 16.9G 663997828 330084304 101 155 26 77 458 SRR067577 46.3M 9.3G 367673108 183472948 101 155 26 77 458 Human HapMap individual NA12878 Illumina GAII SRR067579 46.0M 9.3G 365743380 182532676 101 155 26 77 458 SRR067578 46.5M 9.4G 369557476 184410788 101 155 26 77 458 #Jumping1 (mean insert size: 2283bp, SD 221), 101 bp read length SRR067771 81.5M 16.5G 644846296 320822716 101 2283 221 1620 2586 Human HapMap individual NA12878 HiSeq 2000 SRR067777 82.6M 16.7G 653163608 325232944 101 2283 221 1620 2586 SRR067781 82.1M 16.6G 649748720 323656576 101 2283 221 1620 2586 SRR067776 79.9M 16.1G 632590344 315165892 101 2283 221 1620 2586 #Jumping2 (mean insert size: 2803bp, SD 271), 101 bp read length SRR067773 93.1M 18.8G 736456192 366884512 101 2803 271 1990 3106 Human HapMap individual NA12878 HiSeq 2000 SRR067779 94.0M 19.0G 743564440 370214028 101 2803 271 1990 3106 SRR067778 97.3M 19.6G 767984324 381879652 101 2803 271 1990 3106 SRR067786 94.6M 19.1G 747631104 372002548 101 2803 271 1990 3106 #Fosmid1 (mean insert size: 35295bp, SD 2703), 76 bp read length SRR068214 13.1M 2.0G 104505420 52087176 76 35295 2703 27186 35523 36(trim 20bp at 5',20bp at 3') Human HapMap individual NA12878 Illumina GAII SRR068211 4.8M 736.9M 38612196 19252408 76 35295 2703 27186 35523 36(trim 20bp at 5',20bp at 3') Human HapMap individual NA12878 Illumina GAII #Fosmid2 (mean insert size: 35318bp, SD 2759), 101 bp read length SRR068335 67.4M 13.6G 533805860 265481252 101 35318 2759 27041 35621 61(trim 20bp at 5',20bp at 3') Human HapMap individual NA12878 HiSeq 2000
- Comments
- Human chromosome 14. The chromosome may change, but this is a new data set with 100X coverage in 100bp and 76bp reads, just assembled by the Broad group using Allpaths-LG and Soap. We've downloaded the data and Todd is going to create a data set representing just chr 14, to make it feasible. We'll then try to assemble that data w/all 3 assemblers: CA, SOAP, Allpaths-LG.
 
- Illumina chr14 reads (aligned with bowtie & corrected)
/fs/szattic-asmg8/treangen/*fastq hard to align: bowtie -5 20 -3 20 -e 1000 ... jumping reads: only the ones aligned within coorect mean, stdev selected; these libraries usually have a high % of short inserts!!!
- Original read files:
/fs/szattic-asmg8/treangen/chr14_fragment_1.fastq /fs/szattic-asmg8/treangen/chr14_fragment_2.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_1.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_2.fastq /fs/szattic-asmg8/treangen/chr14_longjump_1.fastq /fs/szattic-asmg8/treangen/chr14_longjump_2.fastq
- Quake corrected files:
/fs/szattic-asmg8/treangen/chr14_fragment_1.cor.fastq /fs/szattic-asmg8/treangen/chr14_fragment_2.cor.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_1.cor.fastq /fs/szattic-asmg8/treangen/chr14_shortjump_2.cor.fastq /fs/szattic-asmg8/treangen/chr14_longjump_1.cor.fastq /fs/szattic-asmg8/treangen/chr14_longjump_2.cor.fastq
- Allpaths-LG corrected files:
/fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_fragment_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_fragment_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_shortjump_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_shortjump_2.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_longjump_1.cor.fasta /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor/chr14_longjump_2.cor.fasta
Assembly
Allpaths-lg
- Read counts
orig cor cor(paired,all >64bp) chr14_fragment_12.fastq 36504800 35571477(97.44%) 34268444(10+bp ovl F/R) chr14_shortjump_12.fastq 22669408 11255320(49.64%) 11255320 chr14_longjump_12.fastq 2405064 187398 (7.79%) 187398
- Assembly stats:
. elem min q1 q2 q3 max mean n50 sum scf 418 96 131 256 1236 81646936 209781 81646936 87688255 scf10K+ 17 10330 11780 26536 269876 81646936 5135452 81646936 87302692 ctg 4722 96 2342 9101 24174 240773 17887 36530 84461065
- Runtime 1104299.893u 126549.756s 18:50:05.80 1815.2% 0+0k 0+0io 8463pf+0w
18hr 50min : multiprocessor 1104299/(3600*24)=12.78 days : singleprocessor
- Assembly directories
/scratch1/dpuiu/HTS/Homo_sapiens/Assembly/allpaths # original /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpaths # final contigs, scaff /fs/szattic-asmg5/dpuiu/HTS/Homo_sapiens/Assembly/allpathsCor # corrected reads
CA
- Directories:
/fs/szattic-asmg4/tmagoc/GAGE/human/CA/second/ ??? stats don't match