Short read sequencing: Difference between revisions
		
		
		
		Jump to navigation
		Jump to search
		
| No edit summary | No edit summary | ||
| (28 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| DELETED !!!  | |||
| = Consensus calling and Structural variation = | |||
| = | |||
| * [http://www.ncbi.nlm.nih.gov/pubmed/18321888 Consensus generation and variant detection by Celera Assembler.] | * [http://www.ncbi.nlm.nih.gov/pubmed/18321888 Consensus generation and variant detection by Celera Assembler.] | ||
| * [http://compbio.cs.toronto.edu/structvar/ Detecting Structural Variations, Brudno et al. ] | * [http://compbio.cs.toronto.edu/structvar/ Detecting Structural Variations, Brudno et al. ] | ||
| = Read Mapping Software = | |||
| * [http://en.wikipedia.org/wiki/Sequence_alignment_software#Short-Read_Sequence_Alignment Short-Read_Sequence_Alignment programs] | |||
| == BFAST == | |||
| * need to e-mail to author to get the code | |||
| == BLAT == | |||
| * [http://www.genome.org/cgi/reprint/GR-2292Rv1 BLAT—The BLAST-Like Alignment Tool, Genome Research 2002] | * [http://www.genome.org/cgi/reprint/GR-2292Rv1 BLAT—The BLAST-Like Alignment Tool, Genome Research 2002] | ||
| * [http://genome.ucsc.edu/FAQ/FAQblat FAQ] | * [http://genome.ucsc.edu/FAQ/FAQblat FAQ] | ||
| Line 18: | Line 19: | ||
| * Can do nt:aa translation | * Can do nt:aa translation | ||
| * Command: blat   | * Command: blat   | ||
|   blat -noHead -t=dna  -q=dna  -tileSize=10 -stepSize=3 Pa.1con    Pa.seq    Pa.blat | |||
| == MAQ * == | |||
| * [http://maq.sourceforge.net/ Maq Sourceforge] | * [http://maq.sourceforge.net/ Maq Sourceforge] | ||
| * [http://www.sanger.ac.uk/Users/lh3/maq-poster.pdf Maq Poster from Sanger]   | * [http://www.sanger.ac.uk/Users/lh3/maq-poster.pdf Maq Poster from Sanger]   | ||
| Line 25: | Line 27: | ||
| * Uses FASTQ format | * Uses FASTQ format | ||
| * Command: maq map ... | * Command: maq map ... | ||
| * does ungapped alignment on unpaired reads | |||
|   SOLEXA | |||
|   maq.pl easyrun -d . ref.1con reads.fastq | |||
|   SOLID | |||
|   solid2fastq.pl reads_ shortname | |||
|   maq fastq2bfq shortname.fastq shortname.bfq | |||
|   maq fasta2csfa ref.fasta > ref.csfa | |||
|   maq fasta2bfa ref.csfa ref.csbfa | |||
|   maq fasta2bfa ref.fasta ref.bfa | |||
|   maq map -c aln.cs.map ref.csbfa shortname.bfq 2> aln.log | |||
|   maq csmap2nt aln.nt.map ref.bfa aln.cs.map | |||
|   maq assemble cns.cns ref.bfa aln.nt.map 2> cns.log | |||
| == RMAP == | |||
| * [http://rulai.cshl.edu/rmap/ RMAP] : designed for Illumina-Solexa | * [http://rulai.cshl.edu/rmap/ RMAP] : designed for Illumina-Solexa | ||
| * Command: rmap   | * Command: rmap   | ||
|   rmap         -m 3 -w 33                            -c Pa.1con    Pa.seq -o Pa.rmap | |||
| == SHRiMP == | |||
| * [http://compbio.cs.toronto.edu/shrimp/ Web site] | * [http://compbio.cs.toronto.edu/shrimp/ Web site] | ||
| * Commands: rmapper-cs , rmapper-ls, ... | * Commands: rmapper-cs , rmapper-ls, ... | ||
| === SOAP  | == SeqMap == | ||
| * [http://soap.genomics.org.cn/ Web site (China)] | * [http://biogibbs.stanford.edu/~jiangh/SeqMap/ SeqMap] developed at Stanford | ||
| * allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping | |||
| * allows sequences to contain N’s, and to have unequal lengths | |||
|   ./seqmap | |||
|   Usage: seqmap <number of mismatches> <probe FASTA file name> <transcript FASTA file name> <output file name> [options] | |||
|   Parameters: | |||
|   <number of mismatches>                          maximum edit distance allowed | |||
|   <probe FASTA file name>                         probe/tag/read sequences | |||
|   <transcript FASTA file name>                    reference sequences | |||
|   <output file name>                              name of the output file | |||
|   ... | |||
| == SHORE == | |||
| * [http://1001genomes.org/downloads/ SHORE]  | |||
| == SOAP * == | |||
| * [http://soap.genomics.org.cn/ Web site (China)]  | |||
| * [http://soap.genomics.org.cn/#Formatofoutput Formatofoutput] | |||
| * [http://soap.genomics.org.cn/SOAP_paper.pdf SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008] | * [http://soap.genomics.org.cn/SOAP_paper.pdf SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008] | ||
| * Commands: soap, soap.contig, soap_dealign, soap.huge, soap.short | * Commands: soap, soap.contig, soap_dealign, soap.huge, soap.short | ||
| * can use qualities, do read trimming, use pair ends, RNA alignments | * can use qualities, do read trimming, use pair ends, RNA alignments | ||
|   soap         -v 5                                  -d Pa.1con -a Pa.seq -o Pa.soap | |||
| == SOCS == | |||
| * ABI color space | |||
|   socs socs.pref | |||
|   more socs.pref | |||
|   Req.fa | |||
|   Seq_F3.csfasta | |||
|   Seq_F3_QV.qual | |||
|   out_prefix | |||
|   2 | |||
|   1000 | |||
|   2 | |||
|   false | |||
|   true | |||
|   0 | |||
| == SOLiD == | |||
| * [http://solidsoftwaretools.com/gf/project/corona/ SOLID System Analysis Pipeline Tool (Corona Lite)] | |||
| == SSAHA == | |||
| * [http://www.sanger.ac.uk/Software/analysis/SSAHA/ Web site(Sanger)] | |||
| * Focused on exact, nearly exact matches | |||
| * Does not find all the exact matches???  | |||
| * Example: Solexa 33bp  ~30% of reads are not found | |||
| == ZOOM  == | |||
| * [http://bioinformatics.oxfordjournals.org/cgi/content/full/24/21/2431 ZOOM] | |||
| = | = Genome Resequencing = | ||
| * [http://www.nature.com/nature/journal/v452/n7189/pdf/nature06884.pdf The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008] | * [http://www.nature.com/nature/journal/v452/n7189/pdf/nature06884.pdf The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008] | ||
| Line 49: | Line 110: | ||
| * [http://www.nature.com/nmeth/journal/v5/n2/full/nmeth.1179.html;jsessionid=DC518BCD8B2CACAE8AFFF7F70DD46902 Whole-genome sequencing and variant discovery in C. elegans Nature Jan 2008] | * [http://www.nature.com/nmeth/journal/v5/n2/full/nmeth.1179.html;jsessionid=DC518BCD8B2CACAE8AFFF7F70DD46902 Whole-genome sequencing and variant discovery in C. elegans Nature Jan 2008] | ||
| = Links = | = Links = | ||
| Line 90: | Line 150: | ||
|    SRA000318       14      Solexa 1G Genome Analyzer       SC      1000Genomes Project Pilot 2     NA12892 |    SRA000318       14      Solexa 1G Genome Analyzer       SC      1000Genomes Project Pilot 2     NA12892 | ||
|    SRA000319       1       Solexa 1G Genome Analyzer       SC      1000Genomes Project Pilot 1     NA12004 |    SRA000319       1       Solexa 1G Genome Analyzer       SC      1000Genomes Project Pilot 1     NA12004 | ||
| June 14th 2008: Sept 19th 2008 | |||
|   SRA001100       23      Illumina Genome Analyzer        BGI     1000Genomes Project Pilot 2     NA19240 | |||
|   ... | |||
|   SRA002029       1       Illumina Genome Analyzer II     WUGSC   1000Genomes Project Pilot 2     NA19239 | |||
|    /fs/szdata/Solexa/1000genomes |    /fs/szdata/Solexa/1000genomes | ||
| * Example SRR001113.seq :  | |||
|   7,058,926 47 bp sequences | |||
|   2,402,398 contain at least 1 '.' | |||
| == 454 == | == 454 == | ||
| * 1000 Genomes  | * 1000 Genomes   | ||
| June 14th 2008 | |||
|    Accession       #Runs   Instrument      Center  Study                           [Individual] |    Accession       #Runs   Instrument      Center  Study                           [Individual] | ||
|    SRA000302       121     454 GS FLX      BCM     1000Genomes Project Pilot 2     NA12878 |    SRA000302       121     454 GS FLX      BCM     1000Genomes Project Pilot 2     NA12878 | ||
| Line 102: | Line 172: | ||
|    SRA001036       1       454 GS FLX      BCM     1000Genomes Project Pilot 1     NA12812 |    SRA001036       1       454 GS FLX      BCM     1000Genomes Project Pilot 1     NA12812 | ||
|    SRA001094       1       454 GS FLX      BCM     1000Genomes Project Pilot 2     NA12878 |    SRA001094       1       454 GS FLX      BCM     1000Genomes Project Pilot 2     NA12878 | ||
| June 14th 2008: Sept 19th 2008 | |||
|   SRA001037       2       454 GS FLX      BCM     1000Genomes Project Pilot 1     NA12812 | |||
|   ... | |||
|   SRA001819       1       454 GS FLX      BCM     1000Genomes Project Pilot 2     NA12878 | |||
| == Refseq == | == Refseq == | ||
| Line 107: | Line 182: | ||
| * /fs/szdata/genomes/human_ncbi_build36/ NCBI build36.1 May 2006 (Current build is 36.3 March 2008) | * /fs/szdata/genomes/human_ncbi_build36/ NCBI build36.1 May 2006 (Current build is 36.3 March 2008) | ||
| * /fs/szdata/genomes/human_celera_2001_Orig/ | * /fs/szdata/genomes/human_celera_2001_Orig/ | ||
Latest revision as of 15:38, 4 December 2008
DELETED !!!
Consensus calling and Structural variation
- Consensus generation and variant detection by Celera Assembler.
- Detecting Structural Variations, Brudno et al.
Read Mapping Software
BFAST
- need to e-mail to author to get the code
BLAT
- BLAT—The BLAST-Like Alignment Tool, Genome Research 2002
- FAQ
- Can align any type of reads
- Can do nt:aa translation
- Command: blat
blat -noHead -t=dna -q=dna -tileSize=10 -stepSize=3 Pa.1con Pa.seq Pa.blat
MAQ *
- Maq Sourceforge
- Maq Poster from Sanger
- Illumina-Solexa/AB-SOLiD , not 454 or capillary reads
- Uses FASTQ format
- Command: maq map ...
- does ungapped alignment on unpaired reads
SOLEXA maq.pl easyrun -d . ref.1con reads.fastq
SOLID solid2fastq.pl reads_ shortname maq fastq2bfq shortname.fastq shortname.bfq maq fasta2csfa ref.fasta > ref.csfa maq fasta2bfa ref.csfa ref.csbfa maq fasta2bfa ref.fasta ref.bfa maq map -c aln.cs.map ref.csbfa shortname.bfq 2> aln.log maq csmap2nt aln.nt.map ref.bfa aln.cs.map maq assemble cns.cns ref.bfa aln.nt.map 2> cns.log
RMAP
- RMAP : designed for Illumina-Solexa
- Command: rmap
rmap -m 3 -w 33 -c Pa.1con Pa.seq -o Pa.rmap
SHRiMP
- Web site
- Commands: rmapper-cs , rmapper-ls, ...
SeqMap
- SeqMap developed at Stanford
- allows up to five mixed substitutions and inserted/deleted nucleotides in the mapping
- allows sequences to contain N’s, and to have unequal lengths
./seqmap Usage: seqmap <number of mismatches> <probe FASTA file name> <transcript FASTA file name> <output file name> [options] Parameters: <number of mismatches> maximum edit distance allowed <probe FASTA file name> probe/tag/read sequences <transcript FASTA file name> reference sequences <output file name> name of the output file ...
SHORE
SOAP *
- Web site (China)
- Formatofoutput
- SOAP: short oligonucleotide alignment program, Bioinformatics Jan 2008
- Commands: soap, soap.contig, soap_dealign, soap.huge, soap.short
- can use qualities, do read trimming, use pair ends, RNA alignments
soap -v 5 -d Pa.1con -a Pa.seq -o Pa.soap
SOCS
- ABI color space
socs socs.pref more socs.pref Req.fa Seq_F3.csfasta Seq_F3_QV.qual out_prefix 2 1000 2 false true 0
SOLiD
SSAHA
- Web site(Sanger)
- Focused on exact, nearly exact matches
- Does not find all the exact matches???
- Example: Solexa 33bp ~30% of reads are not found
ZOOM
Genome Resequencing
- The complete genome of an individual by massively parallel DNA sequencing (J.Watson's genome) Nature April 2008
- J.Watson's genome (supplementary info)
Links
Data
Solexa
- Strep suis Solexa at Sanger 36bp, ~49X coverage
- Staphylococcus aureus strain MW2 (edena paper) 35bp, ~47X coverage
- Pseudomonas aeruginosa: 33bp, ~43X coverage
- Pseudomonas syringae: 32bp, ~31X coverage
- 1000 Genomes (June 14th 2008): 47bp
Accession #Runs Instrument Center Study [Individual] SRA000303 41 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12878 SRA000304 49 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12891 SRA000305 56 Solexa 1G Genome Analyzer BI 1000Genomes Project Pilot 2 NA12892 SRA000307 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA10851 SRA000308 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11993 SRA000309 3 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA11995 SRA000310 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12006 SRA000311 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12044 SRA000312 2 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12156 SRA000313 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12414 SRA000314 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12776 SRA000315 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12828 SRA000316 12 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12878 SRA000317 8 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12891 SRA000318 14 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 2 NA12892 SRA000319 1 Solexa 1G Genome Analyzer SC 1000Genomes Project Pilot 1 NA12004
June 14th 2008: Sept 19th 2008
SRA001100 23 Illumina Genome Analyzer BGI 1000Genomes Project Pilot 2 NA19240 ... SRA002029 1 Illumina Genome Analyzer II WUGSC 1000Genomes Project Pilot 2 NA19239
/fs/szdata/Solexa/1000genomes
- Example SRR001113.seq :
7,058,926 47 bp sequences 2,402,398 contain at least 1 '.'
454
- 1000 Genomes
June 14th 2008
Accession #Runs Instrument Center Study [Individual] SRA000302 121 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001032 2 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878 SRA001036 1 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 SRA001094 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
June 14th 2008: Sept 19th 2008
SRA001037 2 454 GS FLX BCM 1000Genomes Project Pilot 1 NA12812 ... SRA001819 1 454 GS FLX BCM 1000Genomes Project Pilot 2 NA12878
Refseq
- /fs/szdata/genomes/human_ncbi_build36/ NCBI build36.1 May 2006 (Current build is 36.3 March 2008)
- /fs/szdata/genomes/human_celera_2001_Orig/