Bos taurus 3.0
Jump to navigation
Jump to search
Data download
- NCBI : ftp://ftp.ncbi.nih.gov/pub/TraceDB/bos_taurus/
- 91 volumes: 87 with qual & 4 with no quality
- 14 centers
Centers:
TRACE_COUNT CENTER_NAME 1 35629020 BCM Baylor College of Medicine 2 737900 NISC NIH Intramural Sequencing Center 3 652614 BCCAGSC British Columbia Cancer Agency Genome Sciences Center 4 378871 MARC USDA, ARS, US Meat Animal Research Center 5 114753 UIUC University of Illinois at Urbana-Champaign 6 107367 BARC USDA, ARS, Beltsville Agricultural Research Center 7 65171 TIGR The Institute for Genome Research 8 53556 GSC Genoscope 9 43033 CENARGEN Embrapa Genetic Resources and Biotechnology 10 18623 SC The Sanger Center 11 15301 UOKNOR University of Oklahoma Norman Campus, Advanced Center for Genome Technology 12 10651 TIGR_JCVIJTC The Institute for Genomic Research, Traces generated at JCVIJTC 13 2485 UIACBCB University of Iowa Center for Bioinformatics and Computation Biology (UIACBCB) 14 49 WUGSC Washington University, Genome Sequencing Center 37829394 total total
Trace summary
TRACE_COUNT CENTER_NAME TRACE_TYPE_CODE 1 24863599 BCM* WGS 2 10748529 BCM* SHOTGUN 3 737900 NISC SHOTGUN 4 125597 BCCAGSC CLONEEND 5 114753 UIUC CLONEEND 6 65171 TIGR CLONEEND 7 53556 GSC CLONEEND 8 26246 CENARGEN WGS 9 25454 BARC CLONEEND 10 16892 BCM* CLONEEND 11 16787 CENARGEN CLONEEND 12 15150 UOKNOR SHOTGUN 13 10651 TIGR_JCVIJTC CLONEEND 14 151 UOKNOR FINISHING 15 49 WUGSC CLONEEND 36820485 total 16 527017 BCCAGSC EST 17 207204 MARC EST 18 171667 MARC PCR 19 81913 BARC EST 20 18623 SC EST 21 2485 UIACBCB EST 1008909 total
Data processing
Vector trimming
For each library:
- Identify high frequency overrepresented kmers (8,24 bp)
- Seed/extend vector base on these sequences
- Align hypothetical vector sequences to UniVec to identify Vector name & complete sequence
- Create Lucy vector/splice site file
- Run Lucy using vector/splice site file => read CLV
Preliminary Assembly
- Assembly version: wgs-5.2
- Use only quality traces
- set read CLV top Lucy CLV
- set non random flag = 1 on SHOTGUN reads (non WGS)
- obtMerThreshold = 200 (default 1000)
- OBT = 1
Preliminary Assembly processing
- Extract OBT CLR for quality reads => final CLR
- Trim quality-less reads based on alignments to contigs : set new CLR to alignment CLR or 50..min(len,600)
- Extract library insert estimates; merge libraries sequenced by same center that have similar mean/std; assign new library ids
Final Assembly
Assembly processing
Assembly Summary
. ctg+deg <2Kbp >=2Kbp min max mean med n50 sum ====================================================================================================== Chr1..29,X 72481 20864 51617 65 1160130 36423 12940 97255 2639986644 ChrU 3285 2404 881 224 179692 2890 1338 5425 9496583 Chr 75766 23268 52498 65 1160130 34969 11207 96955 2649483227 contigs.haplotype-variants 40611 36984 3627 263 97877 1476 1205 1372 59958728 deg.unplaced.less_2K 224933 224933 0 65 1996 972 983 990 218837572 ChrY-contigs 314 266 48 224 26490 2210 973 6539 694140 ChrY-contigs.SHOTGUN_ONLY 144 140 4 804 4224 993 882 888 143047 delete.notPrimates 97 96 1 263 5310 1031 996 1004 100066 trim 61 21 40 213 205361 38577 11681 126330 2353214 ======================================================================================================