Short read sequencing: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 22: | Line 22: | ||
* Euler-SR | * Euler-SR | ||
= Consensus calling and Structural variation = | |||
= | |||
* [http://www.ncbi.nlm.nih.gov/pubmed/18321888 Consensus generation and variant detection by Celera Assembler.] | * [http://www.ncbi.nlm.nih.gov/pubmed/18321888 Consensus generation and variant detection by Celera Assembler.] | ||
* [http://compbio.cs.toronto.edu/structvar/ Detecting Structural Variations, Brudno et al. ] | * [http://compbio.cs.toronto.edu/structvar/ Detecting Structural Variations, Brudno et al. ] | ||
= | = Read Mapping Software = | ||
* [http://en.wikipedia.org/wiki/Sequence_alignment_software#Short-Read_Sequence_Alignment Short-Read_Sequence_Alignment programs] | * [http://en.wikipedia.org/wiki/Sequence_alignment_software#Short-Read_Sequence_Alignment Short-Read_Sequence_Alignment programs] | ||
=== BLAT | == BFAST == | ||
* need to e-mail to author to get the code | |||
== BLAT == | |||
* [http://www.genome.org/cgi/reprint/GR-2292Rv1 BLAT—The BLAST-Like Alignment Tool, Genome Research 2002] | * [http://www.genome.org/cgi/reprint/GR-2292Rv1 BLAT—The BLAST-Like Alignment Tool, Genome Research 2002] | ||
* [http://genome.ucsc.edu/FAQ/FAQblat FAQ] | * [http://genome.ucsc.edu/FAQ/FAQblat FAQ] | ||
Line 41: | Line 42: | ||
blat -noHead -t=dna -q=dna -tileSize=10 -stepSize=3 Pa.1con Pa.seq Pa.blat | blat -noHead -t=dna -q=dna -tileSize=10 -stepSize=3 Pa.1con Pa.seq Pa.blat | ||
== MAQ == | |||
* [http://maq.sourceforge.net/ Maq Sourceforge] | * [http://maq.sourceforge.net/ Maq Sourceforge] | ||
* [http://www.sanger.ac.uk/Users/lh3/maq-poster.pdf Maq Poster from Sanger] | * [http://www.sanger.ac.uk/Users/lh3/maq-poster.pdf Maq Poster from Sanger] | ||
Line 62: | Line 63: | ||
maq assemble cns.cns ref.bfa aln.nt.map 2> cns.log | maq assemble cns.cns ref.bfa aln.nt.map 2> cns.log | ||
== RMAP == | |||
* [http://rulai.cshl.edu/rmap/ RMAP] : designed for Illumina-Solexa | * [http://rulai.cshl.edu/rmap/ RMAP] : designed for Illumina-Solexa | ||
* Command: rmap | * Command: rmap | ||
rmap -m 3 -w 33 -c Pa.1con Pa.seq -o Pa.rmap | rmap -m 3 -w 33 -c Pa.1con Pa.seq -o Pa.rmap | ||
== SHRiMP == | |||
* [http://compbio.cs.toronto.edu/shrimp/ Web site] | * [http://compbio.cs.toronto.edu/shrimp/ Web site] | ||
* Commands: rmapper-cs , rmapper-ls, ... | * Commands: rmapper-cs , rmapper-ls, ... | ||
== SeqMap == | |||
* [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap] developed at Stanford | * [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap] developed at Stanford | ||
* allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping | * allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping | ||
Line 86: | Line 87: | ||
... | ... | ||
=== SOAP | == SHORE == | ||
* [http://1001genomes.org/downloads/ SHORE] | |||
== SOAP == | |||
* [http://soap.genomics.org.cn/ Web site (China)] | * [http://soap.genomics.org.cn/ Web site (China)] | ||
* [http://soap.genomics.org.cn/#Formatofoutput Formatofoutput] | * [http://soap.genomics.org.cn/#Formatofoutput Formatofoutput] | ||
Line 94: | Line 98: | ||
soap -v 5 -d Pa.1con -a Pa.seq -o Pa.soap | soap -v 5 -d Pa.1con -a Pa.seq -o Pa.soap | ||
== SOCS == | |||
* ABI color space | * ABI color space | ||
socs socs.pref | socs socs.pref | ||
Line 121: | Line 113: | ||
0 | 0 | ||
=== | == SOLiD == | ||
* | * [http://solidsoftwaretools.com/gf/project/corona/ SOLID System Analysis Pipeline Tool (Corona Lite)] | ||
== SSAHA == | |||
* [http://www.sanger.ac.uk/Software/analysis/SSAHA/ Web site(Sanger)] | |||
* Focused on exact, nearly exact matches | |||
* Does not find all the exact matches??? | |||
* Example: Solexa 33bp ~30% of reads are not found | |||
== | == ZOOM == | ||
* [http:// | * [http://bioinformatics.oxfordjournals.org/cgi/content/full/24/21/2431 ZOOM] | ||
= | = Genome Resequencing = | ||
* [http://www.nature.com/nature/journal/v452/n7189/pdf/nature06884.pdf The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008] | * [http://www.nature.com/nature/journal/v452/n7189/pdf/nature06884.pdf The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008] |
Revision as of 15:22, 4 December 2008
Articles
Velvet
- Velvet: Algorithms for De Novo Short Read Assembly Using De Bruijn Graphs Genome Res. Mar 2008
- Project Web Site at EBI Current version: 0.6 02/06/2008: Velvet 0.6
Edena
- De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer Genome Res. Apr 2008
- Project Web Site Current verision: 2.1.1 03/17/2008
ALLPATHS
- ALLPATHS : De novo assembly of whole-genome shotgun microreads Genome Res. Mar 2008
Others
- SSAKE
- VCAKE
- SHARCGS
- SeqMan Genome Assembler (SMGA) by DNAStar(commercial)
- Euler-SR
Consensus calling and Structural variation
- Consensus generation and variant detection by Celera Assembler.
- Detecting Structural Variations, Brudno et al.
Read Mapping Software
BFAST
- need to e-mail to author to get the code
BLAT
- BLAT—The BLAST-Like Alignment Tool, Genome Research 2002
- FAQ
- Can align any type of reads
- Can do nt:aa translation
- Command: blat
blat -noHead -t=dna -q=dna -tileSize=10 -stepSize=3 Pa.1con Pa.seq Pa.blat
MAQ
- Maq Sourceforge
- Maq Poster from Sanger
- Illumina-Solexa/AB-SOLiD , not 454 or capillary reads
- Uses FASTQ format
- Command: maq map ...
- does ungapped alignment on unpaired reads
SOLEXA maq.pl easyrun -d . ref.1con reads.fastq
SOLID solid2fastq.pl reads_ shortname maq fastq2bfq shortname.fastq shortname.bfq maq fasta2csfa ref.fasta > ref.csfa maq fasta2bfa ref.csfa ref.csbfa maq fasta2bfa ref.fasta ref.bfa maq map -c aln.cs.map ref.csbfa shortname.bfq 2> aln.log maq csmap2nt aln.nt.map ref.bfa aln.cs.map maq assemble cns.cns ref.bfa aln.nt.map 2> cns.log
RMAP
- RMAP : designed for Illumina-Solexa
- Command: rmap
rmap -m 3 -w 33 -c Pa.1con Pa.seq -o Pa.rmap
SHRiMP
- Web site
- Commands: rmapper-cs , rmapper-ls, ...
SeqMap
- SeqMap developed at Stanford
- allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping
- allows sequences to contain N’s, and to have unequal lengths
./seqmap Usage: seqmap <number of mismatches> <probe FASTA file name> <transcript FASTA file name> <output file name> [options] Parameters: <number of mismatches> maximum edit distance allowed <probe FASTA file name> probe/tag/read sequences <transcript FASTA file name> reference sequences <output file name> name of the output file ...
SHORE
SOAP
- Web site (China)
- Formatofoutput
- SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008
- Commands: soap, soap.contig, soap_dealign, soap.huge, soap.short
- can use qualities, do read trimming, use pair ends, RNA alignments
soap -v 5 -d Pa.1con -a Pa.seq -o Pa.soap
SOCS
- ABI color space
socs socs.pref more socs.pref Req.fa Seq_F3.csfasta Seq_F3_QV.qual out_prefix 2 1000 2 false true 0
SOLiD
SSAHA
- Web site(Sanger)
- Focused on exact, nearly exact matches
- Does not find all the exact matches???
- Example: Solexa 33bp ~30% of reads are not found
ZOOM
Genome Resequencing
- The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008
- J.Watson's genome (supplementary info)
Links
Data
Solexa
- Strep suis Solexa at Sanger 36bp, ~49X coverage
- Staphylococcus aureus strain MW2 (edena paper) 35bp, ~47X coverage
- Pseudomonas aeruginosa: 33bp, ~43X coverage
- Pseudomonas syringae: 32bp, ~31X coverage
- 1000 Genomes (June 14th 2008): 47bp
Accession #Runs Instrument Center Study [Individual] SRA000303 41 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12878 SRA000304 49 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12891 SRA000305 56 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12892 SRA000307 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA10851 SRA000308 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11993 SRA000309 3 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11995 SRA000310 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12006 SRA000311 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12044 SRA000312 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12156 SRA000313 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12414 SRA000314 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12776 SRA000315 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12828 SRA000316 12 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12878 SRA000317 8 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12891 SRA000318 14 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12892 SRA000319 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12004
June 14th 2008: Sept 19th 2008
SRA001100 23 Illumina Genome Analyzer BGI 1000Genomes Project Pilot 2 NA19240 ... SRA002029 1 Illumina Genome Analyzer II WUGSC 1000Genomes Project Pilot 2 NA19239
/fs/szdata/Solexa/1000genomes
- Example SRR001113.seq :
7,058,926 47 bp sequences 2,402,398 contain at least 1 '.'
454
- 1000 Genomes
June 14th 2008
Accession #Runs Instrument Center Study [Individual] SRA000302 121 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001032 2 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001036 1 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 SRA001094 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
June 14th 2008: Sept 19th 2008
SRA001037 2 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 ... SRA001819 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
Refseq
- /fs/szdata/genomes/human_ncbi_build36/ NCBI build36.1 May 2006 (Current build is 36.3 March 2008)
- /fs/szdata/genomes/human_celera_2001_Orig/
Software @ CBCB
Under /fs/sz-user-supported/Linux-x86_64/bin/
Denovo assembly
- edena
- ssake
- velveth,velvetg
Read mapping
- blat
- maq
- soap