Short read sequencing: Difference between revisions
Jump to navigation
Jump to search
Line 56: | Line 56: | ||
=== SeqMap === | === SeqMap === | ||
* [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap] developed at Stanford | * [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap] developed at Stanford | ||
* up to | * allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping | ||
* allows sequences to contain N’s, and to have unequal lengths | |||
./seqmap | |||
Usage: seqmap <number of mismatches> <probe FASTA file name> <transcript FASTA file name> <output file name> [options] | |||
Parameters: | |||
<number of mismatches> maximum edit distance allowed | |||
<probe FASTA file name> probe/tag/read sequences | |||
<transcript FASTA file name> reference sequences | |||
<output file name> name of the output file | |||
... | |||
=== SOAP === | === SOAP === |
Revision as of 14:54, 6 October 2008
Articles
07/11/2008 Denovo short read assembly
Velvet
- Velvet: Algorithms for De Novo Short Read Assembly Using De Bruijn Graphs Genome Res. Mar 2008
- Project Web Site at EBI Current version: 0.6 02/06/2008: Velvet 0.6
Edena
- De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer Genome Res. Apr 2008
- Project Web Site Current verision: 2.1.1 03/17/2008
ALLPATHS
- ALLPATHS : De novo assembly of whole-genome shotgun microreads Genome Res. Mar 2008
Others
- SSAKE
- VCAKE
- SHARCGS
- SeqMan Genome Assembler (SMGA) by DNAStar(commercial)
07/04/2008 4th of July: no meeting
06/27/2008 Consensus calling and Structural variation
- Consensus generation and variant detection by Celera Assembler.
- Detecting Structural Variations, Brudno et al.
06/20/2008 Read Mapping Software
BLAT
- BLAT—The BLAST-Like Alignment Tool, Genome Research 2002
- FAQ
- Can align any type of reads
- Can do nt:aa translation
- Command: blat
MAQ
- Maq Sourceforge
- Maq Poster from Sanger
- Illumina-Solexa/AB-SOLiD , not 454 or capillary reads
- Uses FASTQ format
- Command: maq map ...
RMAP
- RMAP : designed for Illumina-Solexa
- Command: rmap
SHRiMP
- Web site
- Commands: rmapper-cs , rmapper-ls, ...
SeqMap
- SeqMap developed at Stanford
- allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping
- allows sequences to contain N’s, and to have unequal lengths
./seqmap Usage: seqmap <number of mismatches> <probe FASTA file name> <transcript FASTA file name> <output file name> [options] Parameters: <number of mismatches> maximum edit distance allowed <probe FASTA file name> probe/tag/read sequences <transcript FASTA file name> reference sequences <output file name> name of the output file ...
SOAP
- Web site (China)
- SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008
- Commands: soap, soap.contig, soap_dealign, soap.huge, soap.short
- can use qualities, do read trimming, use pair ends, RNA alignments
SSAHA
- Web site(Sanger)
- Focused on exact, nearly exact matches
- Does not find all the exact matches???
- Example: Solexa 33bp ~30% of reads are not found
06/13/2008 Genome Resequencing
- The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008
- J.Watson's genome (supplementary info)
Links
Data
Solexa
- Strep suis Solexa at Sanger 36bp, ~49X coverage
- Staphylococcus aureus strain MW2 (edena paper) 35bp, ~47X coverage
- Pseudomonas aeruginosa: 33bp, ~43X coverage
- Pseudomonas syringae: 32bp, ~31X coverage
- 1000 Genomes (June 14th 2008): 47bp
Accession #Runs Instrument Center Study [Individual] SRA000303 41 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12878 SRA000304 49 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12891 SRA000305 56 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12892 SRA000307 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA10851 SRA000308 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11993 SRA000309 3 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11995 SRA000310 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12006 SRA000311 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12044 SRA000312 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12156 SRA000313 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12414 SRA000314 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12776 SRA000315 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12828 SRA000316 12 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12878 SRA000317 8 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12891 SRA000318 14 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12892 SRA000319 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12004
June 14th 2008: Sept 19th 2008
SRA001100 23 Illumina Genome Analyzer BGI 1000Genomes Project Pilot 2 NA19240 ... SRA002029 1 Illumina Genome Analyzer II WUGSC 1000Genomes Project Pilot 2 NA19239
/fs/szdata/Solexa/1000genomes
- Example SRR001113.seq :
7,058,926 47 bp sequences 2,402,398 contain at least 1 '.'
454
- 1000 Genomes
June 14th 2008
Accession #Runs Instrument Center Study [Individual] SRA000302 121 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001032 2 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001036 1 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 SRA001094 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
June 14th 2008: Sept 19th 2008
SRA001037 2 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 ... SRA001819 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
Refseq
- /fs/szdata/genomes/human_ncbi_build36/ NCBI build36.1 May 2006 (Current build is 36.3 March 2008)
- /fs/szdata/genomes/human_celera_2001_Orig/
Software @ CBCB
Under /fs/sz-user-supported/Linux-x86_64/bin/
Denovo assembly
- edena
- ssake
- velveth,velvetg
Read mapping
- blat
- maq
- soap