Short read sequencing: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
(56 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
DELETED !!! | |||
= | = Consensus calling and Structural variation = | ||
* [http://www. | * [http://www.ncbi.nlm.nih.gov/pubmed/18321888 Consensus generation and variant detection by Celera Assembler.] | ||
* [http:// | * [http://compbio.cs.toronto.edu/structvar/ Detecting Structural Variations, Brudno et al. ] | ||
= Read Mapping Software = | |||
* [http:// | * [http://en.wikipedia.org/wiki/Sequence_alignment_software#Short-Read_Sequence_Alignment Short-Read_Sequence_Alignment programs] | ||
== | == BFAST == | ||
* need to e-mail to author to get the code | |||
== BLAT == | |||
* [http://www.genome.org/cgi/reprint/GR-2292Rv1 BLAT—The BLAST-Like Alignment Tool, Genome Research 2002] | * [http://www.genome.org/cgi/reprint/GR-2292Rv1 BLAT—The BLAST-Like Alignment Tool, Genome Research 2002] | ||
* [http://genome.ucsc.edu/FAQ/FAQblat | * [http://genome.ucsc.edu/FAQ/FAQblat FAQ] | ||
* Can align any type of reads | |||
* Can do nt:aa translation | |||
* Command: blat | |||
blat -noHead -t=dna -q=dna -tileSize=10 -stepSize=3 Pa.1con Pa.seq Pa.blat | |||
== MAQ * == | |||
* [http://maq.sourceforge.net/ Maq Sourceforge] | |||
* [http://www.sanger.ac.uk/Users/lh3/maq-poster.pdf Maq Poster from Sanger] | |||
* Illumina-Solexa/AB-SOLiD , not 454 or capillary reads | |||
* Uses FASTQ format | |||
* Command: maq map ... | |||
* does ungapped alignment on unpaired reads | |||
SOLEXA | |||
maq.pl easyrun -d . ref.1con reads.fastq | |||
SOLID | |||
solid2fastq.pl reads_ shortname | |||
maq fastq2bfq shortname.fastq shortname.bfq | |||
maq fasta2csfa ref.fasta > ref.csfa | |||
maq fasta2bfa ref.csfa ref.csbfa | |||
maq fasta2bfa ref.fasta ref.bfa | |||
maq map -c aln.cs.map ref.csbfa shortname.bfq 2> aln.log | |||
maq csmap2nt aln.nt.map ref.bfa aln.cs.map | |||
maq assemble cns.cns ref.bfa aln.nt.map 2> cns.log | |||
== RMAP == | |||
* [http://rulai.cshl.edu/rmap/ RMAP] : designed for Illumina-Solexa | |||
* Command: rmap | |||
rmap -m 3 -w 33 -c Pa.1con Pa.seq -o Pa.rmap | |||
== SHRiMP == | |||
* [http://compbio.cs.toronto.edu/shrimp/ Web site] | |||
* Commands: rmapper-cs , rmapper-ls, ... | |||
* [http:// | == SeqMap == | ||
* [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap] developed at Stanford | |||
* allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping | |||
* allows sequences to contain N’s, and to have unequal lengths | |||
./seqmap | |||
Usage: seqmap <number of mismatches> <probe FASTA file name> <transcript FASTA file name> <output file name> [options] | |||
Parameters: | |||
<number of mismatches> maximum edit distance allowed | |||
<probe FASTA file name> probe/tag/read sequences | |||
<transcript FASTA file name> reference sequences | |||
<output file name> name of the output file | |||
... | |||
* [http:// | == SHORE == | ||
* [http://1001genomes.org/downloads/ SHORE] | |||
* [http://soap.genomics.org.cn/ | == SOAP * == | ||
* [http://soap.genomics.org.cn/ Web site (China)] | |||
* [http://soap.genomics.org.cn/#Formatofoutput Formatofoutput] | |||
* [http://soap.genomics.org.cn/SOAP_paper.pdf SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008] | * [http://soap.genomics.org.cn/SOAP_paper.pdf SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008] | ||
* Commands: soap, soap.contig, soap_dealign, soap.huge, soap.short | |||
* can use qualities, do read trimming, use pair ends, RNA alignments | |||
soap -v 5 -d Pa.1con -a Pa.seq -o Pa.soap | |||
== SOCS == | |||
* ABI color space | |||
socs socs.pref | |||
more socs.pref | |||
Req.fa | |||
Seq_F3.csfasta | |||
Seq_F3_QV.qual | |||
out_prefix | |||
2 | |||
1000 | |||
2 | |||
false | |||
true | |||
0 | |||
= | == SOLiD == | ||
* [http://solidsoftwaretools.com/gf/project/corona/ SOLID System Analysis Pipeline Tool (Corona Lite)] | |||
== SSAHA == | |||
* [http://www.sanger.ac.uk/Software/analysis/SSAHA/ Web site(Sanger)] | |||
* Focused on exact, nearly exact matches | |||
* Does not find all the exact matches??? | |||
* Example: Solexa 33bp ~30% of reads are not found | |||
== ZOOM == | |||
* [http://bioinformatics.oxfordjournals.org/cgi/content/full/24/21/2431 ZOOM] | |||
= Genome Resequencing = | |||
* [http://www.nature.com/nature/journal/v452/n7189/pdf/nature06884.pdf The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008] | |||
* [http://www.nature.com/nature/journal/v452/n7189/extref/nature06884-s1.pdf J.Watson's genome (supplementary info) ] | |||
* [http://www.nature.com/nmeth/journal/v5/n2/full/nmeth.1179.html;jsessionid=DC518BCD8B2CACAE8AFFF7F70DD46902 Whole-genome sequencing and variant discovery in C. elegans Nature Jan 2008] | |||
= Links = | |||
* [http://www.1000genomes.org/page.php?page=home 1000 genomes] | * [http://www.1000genomes.org/page.php?page=home 1000 genomes] | ||
* | * [http://www.cbcb.umd.edu/~langmead/solexa_1000genomes.html Ben's web site 1] | ||
* [http://www.cbcb.umd.edu/~langmead/solexa_format.html Ben's web site 2] | |||
* [http://en.wikipedia.org/wiki/Chip-Sequencing Chip-Seq @ Wikipedia] | |||
* [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=table&f=run&m=data&s=run SRA] | * [http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=table&f=run&m=data&s=run SRA] | ||
Line 37: | Line 127: | ||
== Solexa == | == Solexa == | ||
* [ftp://ftp.sanger.ac.uk/pub/PRODUCTION_SOFTWARE/data_sets/suis_solexa/ Strep suis Solexa at Sanger] | * [ftp://ftp.sanger.ac.uk/pub/PRODUCTION_SOFTWARE/data_sets/suis_solexa/ Strep suis Solexa at Sanger] 36bp, ~49X coverage | ||
* [http://www.genomic.ch/edena/mw2Reads.seq.gz Staphylococcus aureus strain MW2 (edena paper)] | * [http://www.genomic.ch/edena/mw2Reads.seq.gz Staphylococcus aureus strain MW2 (edena paper)] 35bp, ~47X coverage | ||
* Pseudomonas aeruginosa: 33bp, ~43X coverage | |||
* Pseudomonas aeruginosa | * Pseudomonas syringae: 32bp, ~31X coverage | ||
* Pseudomonas syringae | * 1000 Genomes (June 14th 2008): 47bp | ||
* 1000 Genomes (June 14th 2008) | |||
Accession #Runs Instrument Center Study [Individual] | Accession #Runs Instrument Center Study [Individual] | ||
Line 62: | Line 151: | ||
SRA000319 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12004 | SRA000319 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12004 | ||
= | June 14th 2008: Sept 19th 2008 | ||
SRA001100 23 Illumina Genome Analyzer BGI 1000Genomes Project Pilot 2 NA19240 | |||
... | |||
SRA002029 1 Illumina Genome Analyzer II WUGSC 1000Genomes Project Pilot 2 NA19239 | |||
/fs/szdata/Solexa/1000genomes | |||
* Example SRR001113.seq : | |||
7,058,926 47 bp sequences | |||
2,402,398 contain at least 1 '.' | |||
== 454 == | |||
* 1000 Genomes | |||
June 14th 2008 | |||
Accession #Runs Instrument Center Study [Individual] | |||
SRA000302 121 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 | |||
SRA001032 2 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 | |||
SRA001036 1 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 | |||
SRA001094 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 | |||
June 14th 2008: Sept 19th 2008 | |||
SRA001037 2 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 | |||
... | |||
SRA001819 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 | |||
== | == Refseq == | ||
* | * /fs/szdata/genomes/human_ncbi_build36/ NCBI build36.1 May 2006 (Current build is 36.3 March 2008) | ||
* | * /fs/szdata/genomes/human_celera_2001_Orig/ | ||
Latest revision as of 15:38, 4 December 2008
DELETED !!!
Consensus calling and Structural variation
- Consensus generation and variant detection by Celera Assembler.
- Detecting Structural Variations, Brudno et al.
Read Mapping Software
BFAST
- need to e-mail to author to get the code
BLAT
- BLAT—The BLAST-Like Alignment Tool, Genome Research 2002
- FAQ
- Can align any type of reads
- Can do nt:aa translation
- Command: blat
blat -noHead -t=dna -q=dna -tileSize=10 -stepSize=3 Pa.1con Pa.seq Pa.blat
MAQ *
- Maq Sourceforge
- Maq Poster from Sanger
- Illumina-Solexa/AB-SOLiD , not 454 or capillary reads
- Uses FASTQ format
- Command: maq map ...
- does ungapped alignment on unpaired reads
SOLEXA maq.pl easyrun -d . ref.1con reads.fastq
SOLID solid2fastq.pl reads_ shortname maq fastq2bfq shortname.fastq shortname.bfq maq fasta2csfa ref.fasta > ref.csfa maq fasta2bfa ref.csfa ref.csbfa maq fasta2bfa ref.fasta ref.bfa maq map -c aln.cs.map ref.csbfa shortname.bfq 2> aln.log maq csmap2nt aln.nt.map ref.bfa aln.cs.map maq assemble cns.cns ref.bfa aln.nt.map 2> cns.log
RMAP
- RMAP : designed for Illumina-Solexa
- Command: rmap
rmap -m 3 -w 33 -c Pa.1con Pa.seq -o Pa.rmap
SHRiMP
- Web site
- Commands: rmapper-cs , rmapper-ls, ...
SeqMap
- SeqMap developed at Stanford
- allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping
- allows sequences to contain N’s, and to have unequal lengths
./seqmap Usage: seqmap <number of mismatches> <probe FASTA file name> <transcript FASTA file name> <output file name> [options] Parameters: <number of mismatches> maximum edit distance allowed <probe FASTA file name> probe/tag/read sequences <transcript FASTA file name> reference sequences <output file name> name of the output file ...
SHORE
SOAP *
- Web site (China)
- Formatofoutput
- SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008
- Commands: soap, soap.contig, soap_dealign, soap.huge, soap.short
- can use qualities, do read trimming, use pair ends, RNA alignments
soap -v 5 -d Pa.1con -a Pa.seq -o Pa.soap
SOCS
- ABI color space
socs socs.pref more socs.pref Req.fa Seq_F3.csfasta Seq_F3_QV.qual out_prefix 2 1000 2 false true 0
SOLiD
SSAHA
- Web site(Sanger)
- Focused on exact, nearly exact matches
- Does not find all the exact matches???
- Example: Solexa 33bp ~30% of reads are not found
ZOOM
Genome Resequencing
- The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008
- J.Watson's genome (supplementary info)
Links
Data
Solexa
- Strep suis Solexa at Sanger 36bp, ~49X coverage
- Staphylococcus aureus strain MW2 (edena paper) 35bp, ~47X coverage
- Pseudomonas aeruginosa: 33bp, ~43X coverage
- Pseudomonas syringae: 32bp, ~31X coverage
- 1000 Genomes (June 14th 2008): 47bp
Accession #Runs Instrument Center Study [Individual] SRA000303 41 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12878 SRA000304 49 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12891 SRA000305 56 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12892 SRA000307 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA10851 SRA000308 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11993 SRA000309 3 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11995 SRA000310 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12006 SRA000311 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12044 SRA000312 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12156 SRA000313 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12414 SRA000314 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12776 SRA000315 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12828 SRA000316 12 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12878 SRA000317 8 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12891 SRA000318 14 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12892 SRA000319 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12004
June 14th 2008: Sept 19th 2008
SRA001100 23 Illumina Genome Analyzer BGI 1000Genomes Project Pilot 2 NA19240 ... SRA002029 1 Illumina Genome Analyzer II WUGSC 1000Genomes Project Pilot 2 NA19239
/fs/szdata/Solexa/1000genomes
- Example SRR001113.seq :
7,058,926 47 bp sequences 2,402,398 contain at least 1 '.'
454
- 1000 Genomes
June 14th 2008
Accession #Runs Instrument Center Study [Individual] SRA000302 121 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001032 2 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001036 1 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 SRA001094 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
June 14th 2008: Sept 19th 2008
SRA001037 2 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 ... SRA001819 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
Refseq
- /fs/szdata/genomes/human_ncbi_build36/ NCBI build36.1 May 2006 (Current build is 36.3 March 2008)
- /fs/szdata/genomes/human_celera_2001_Orig/