Bos taurus redo: Difference between revisions
Jump to navigation
Jump to search
Line 140: | Line 140: | ||
= Contaminant search = | = Contaminant search = | ||
nucmer reads CLIPPING range to UniVec & EcoliK12 | |||
== UniVec == | == UniVec == | ||
Ref | |||
#seqs min max mean median n50 sum | #seqs min max mean median n50 sum | ||
UniVec 2861 12 48551 231 99 781 660,151 | UniVec 2861 12 48551 231 99 781 660,151 | ||
UniVec_Core 1348 12 48551 243 98 967 327,641 | UniVec_Core 1348 12 48551 243 98 967 327,641 | ||
Hits: alignment length | |||
bp #reads min max mean median n50 sum | |||
19 4548466 19 1045 28.37 23 27 129025025 | |||
20 3684852 20 1045 30.56 25 28 112616359 | |||
30 1097357 30 1045 48.04 38 43 52714583 | |||
40 484661 40 1045 66.36 47 53 32163896 | |||
100 54334 100 1045 198 116 223 10772815 # many are ESTs | |||
== Ecoli == | == Ecoli == | ||
K12 4,639,675 bp | Ref: | ||
K12 4,639,675 bp | |||
Hits: alignment length | |||
bp #reads min max mean median n50 sum | |||
19 275109 19 1223 30.66 19 20 8435470 | |||
20 102550 20 1223 50.29 21 161 5156849 | |||
30 19032 30 1223 178 37 706 3381214 | |||
40 9234 40 1223 329 171 738 3034293 | |||
100 6781 100 1223 424 223 749 2876432 | |||
200 4378 200 1223 575 696 771 2516916 | |||
== BCM vectors == | == BCM vectors == | ||
#seqs min max mean median n50 sum | #seqs min max mean median n50 sum | ||
BCM 14 2580 33180 9379 5821 32705 131312 | BCM 14 2580 33180 9379 5821 32705 131312 |
Revision as of 15:10, 4 January 2009
BCM
NCBI Data
- Genome Projects
- TA search
- Avg LEN=984
- Avg CLIP (CLB intersect CLV)=760
- Avg CLV=997 (3.66M reads) !!!> Avg LEN
- Avg QUAL=38.96 (27.51 for the 2.59M reads not in the UMD assembly)
- 0 QUAL reads 650,133
- Avg UMDoverlapper CLIP=778 (3.53M reads)
CENTER_NAME counts
COUNT CENTER_NAME 35629020 BCM Baylor College of Medicine 737900 NISC NIH Intramural Sequencing Center 652614 BCCAGSC British Columbia Cancer Agency Genome Sciences Centre # TA query_tracedb CENTER_NAME = "BCCAGSC" => 652,510 378871 MARC USDA, ARS, US Meat Animal Research Center 114753 UIUC University of Illinois at Urbana-Champaign # TA query_tracedb CENTER_NAME = "UIUC" => 106,368 107367 BARC USDA, ARS, Beltsville Agricultural Research Center 65171 TIGR The Institute for Genome Research 53556 GSC Genoscope 43033 CENARGEN Embrapa Genetic Resources and Biotechnology 18623 SC The Sanger Center 15301 UOKNOR University of Oklahoma Norman Campus, Advanced Center for Genome Technology 10651 TIGR_JCVIJTC The Institute for Genomic Research, Traces generated at JCVIJTC # TA query_tracedb CENTER_NAME="JCVI" 2485 UIACBCB University of Iowa Center for Bioinformatics and Computation Biology (UIACBCB) 49 WUGSC Washington University, Genome Sequencing Center # TA query_tracedb CENTER_NAME = "WUGSC" => 9 37829394 total total # TA query_tracedb SPECIES_CODE = "BOS TAURUS" => 37,788,710
TRACE_TYPE_CODE counts
COUNT CENTER_NAME TRACE_TYPE_CODE #LIBS(all) #LIBS(10K+ reads) 24863599 BCM WGS 89 31 10748529 BCM SHOTGUN 10 10 737900 NISC SHOTGUN 4 3 125597 BCCAGSC CLONEEND 114753 UIUC CLONEEND 65171 TIGR CLONEEND 53556 GSC CLONEEND 26246 CENARGEN WGS 25454 BARC CLONEEND 16892 BCM CLONEEND 1 1 VBBAA mea=167000 std=25000 16787 CENARGEN CLONEEND 15150 UOKNOR SHOTGUN 10651 TIGR_JCVIJTC CLONEEND 151 UOKNOR FINISHING 49 WUGSC CLONEEND 36809945 total 527017 BCCAGSC EST 207204 MARC EST 171667 MARC PCR 81913 BARC EST 81913 BARC EST 2485 UIACBCB EST 1019449 total
STRATEGY & TRACE_TYPE_CODE counts
COUNT CENTER_NAME STRATEGY TRACE_TYPE_CODE 12545304 BCM . WGS 11425910 BCM WGA WGS 5223683 BCM CLONE SHOTGUN 4479883 BCM POOLCLONE SHOTGUN 1044963 BCM . SHOTGUN 892385 BCM SNP WGS 737900 NISC CLONE SHOTGUN 125597 BCCAGSC CLONEEND CLONEEND 114753 UIUC CLONEEND CLONEEND 65171 TIGR CLONEEND CLONEEND 53556 GSC CLONEEND CLONEEND 26246 CENARGEN . WGS 25454 BARC . CLONEEND 16892 BCM CLONEEND CLONEEND 16787 CENARGEN CLONEEND CLONEEND 12195 UOKNOR . SHOTGUN 10651 TIGR_JCVIJTC CLONEEND CLONEEND 2955 UOKNOR CLONE SHOTGUN 151 UOKNOR . FINISHING 49 WUGSC CLONEEND CLONEEND
527017 BCCAGSC EST EST 145820 MARC EST EST 117958 MARC COMPARATIVE PCR 81913 BARC EST EST 61384 MARC CLONE EST 53709 MARC Re-Sequencing PCR 18623 SC EST EST 2485 UIACBCB . EST
3' VECTOR TRIMMED counts
CENTER_NAME TRACE_TYPE_CODE TOTAL 3'CLV<LEN QUAL==0 UMD.FRG BCM WGS 24863599 10968979 551114 24050767 BCM SHOTGUN 10748529 5052692 23419 10068499 NISC SHOTGUN 737900 28972 0 735488 BCCAGSC CLONEEND 125597 125484 8926 113790 UIUC CLONEEND 114753 90243 0 106247 TIGR CLONEEND 65171 46389 0 64903 GSC CLONEEND 53556 53556 53556 (all) 0 !!! all have 0 quals and were excluded CENARGEN WGS 26246 26246 0 25976 BARC CLONEEND 25454 25454 0 25387 BCM CLONEEND 16892 6751 0 16863 CENARGEN CLONEEND 16787 16787 0 16628 UOKNOR SHOTGUN 15150 2885 12195 0 TIGR_JCVIJTC CLONEEND 10651 339 0 10644 UOKNOR FINISHING 151 0 151 151 WUGSC CLONEEND 49 0 0 0 BCCAGSC EST 527017 524173 772 0 MARC EST 207204 207204 0 0 MARC PCR 171667 171667 0 0 BARC EST 81913 78597 0 0 SC EST 18623 7350 0 0 UIACBCB EST 2485 2485 0 0
Local Data
Files & Dirs
/fs/szasmg3/bos_taurus/data/ /fs/szasmg2/Drosophila/D_pseudoobscura/Vectors /nfshomes/dpuiu/db/UniVec
Software
Figaro
- trims vector only at 5' end
- call lucy trimming for qualities
Lucy
- both vector sequence and splice sites are required
Atlas
- web site
- atlas-screen-trim-file : "calls cross_match and atlas-screen-window to create trimmed reads file (scan in from each end of read looking for 50-base windows of high quality and no vector); "
Contaminant search
nucmer reads CLIPPING range to UniVec & EcoliK12
UniVec
Ref
#seqs min max mean median n50 sum UniVec 2861 12 48551 231 99 781 660,151 UniVec_Core 1348 12 48551 243 98 967 327,641
Hits: alignment length
bp #reads min max mean median n50 sum 19 4548466 19 1045 28.37 23 27 129025025 20 3684852 20 1045 30.56 25 28 112616359 30 1097357 30 1045 48.04 38 43 52714583 40 484661 40 1045 66.36 47 53 32163896 100 54334 100 1045 198 116 223 10772815 # many are ESTs
Ecoli
Ref:
K12 4,639,675 bp
Hits: alignment length
bp #reads min max mean median n50 sum 19 275109 19 1223 30.66 19 20 8435470 20 102550 20 1223 50.29 21 161 5156849 30 19032 30 1223 178 37 706 3381214 40 9234 40 1223 329 171 738 3034293 100 6781 100 1223 424 223 749 2876432 200 4378 200 1223 575 696 771 2516916
BCM vectors
#seqs min max mean median n50 sum BCM 14 2580 33180 9379 5821 32705 131312