Bos taurus redo: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Line 130: Line 130:
Figaro
Figaro
* trims vector only at 5' end
* trims vector only at 5' end
* call lucy trimming for qaulities
* call lucy trimming for qualities


Lucy
Lucy

Revision as of 14:39, 4 January 2009

BCM

NCBI Data

  • Genome Projects
  • TA search
  • Avg LEN=984
  • Avg CLIP (CLB intersect CLV)=760
  • Avg CLV=997 (3.66M reads) !!!> Avg LEN
  • Avg QUAL=38.96 (27.51 for the 2.59M reads not in the UMD assembly)
  • 0 QUAL reads 650,133
  • Avg UMDoverlapper CLIP=778 (3.53M reads)

CENTER_NAME counts

 COUNT           CENTER_NAME     
 35629020        BCM             Baylor College of Medicine
 737900          NISC            NIH Intramural Sequencing Center
 652614          BCCAGSC         British Columbia Cancer Agency Genome Sciences Centre                           # TA query_tracedb CENTER_NAME = "BCCAGSC" => 652,510 
 378871          MARC            USDA, ARS, US Meat Animal Research Center
 114753          UIUC            University of Illinois at Urbana-Champaign                                      # TA query_tracedb CENTER_NAME = "UIUC" => 106,368
 107367          BARC            USDA, ARS, Beltsville Agricultural Research Center
 65171           TIGR            The Institute for Genome Research
 53556           GSC             Genoscope
 43033           CENARGEN        Embrapa Genetic Resources and Biotechnology
 18623           SC              The Sanger Center
 15301           UOKNOR          University of Oklahoma Norman Campus, Advanced Center for Genome Technology
 10651           TIGR_JCVIJTC    The Institute for Genomic Research, Traces generated at JCVIJTC                 # TA query_tracedb CENTER_NAME="JCVI"
 2485            UIACBCB         University of Iowa Center for Bioinformatics and Computation Biology (UIACBCB)
 49              WUGSC           Washington University, Genome Sequencing Center                                 # TA query_tracedb CENTER_NAME = "WUGSC" => 9
 37829394        total           total                                                                           # TA query_tracedb SPECIES_CODE = "BOS TAURUS" => 37,788,710 


TRACE_TYPE_CODE counts

 COUNT         CENTER_NAME     TRACE_TYPE_CODE        #LIBS(all)     #LIBS(10K+ reads)
 24863599      BCM             WGS                    89             31
 10748529      BCM             SHOTGUN                10             10
 737900        NISC            SHOTGUN                4              3
 125597        BCCAGSC         CLONEEND
 114753        UIUC            CLONEEND
 65171         TIGR            CLONEEND
 53556         GSC             CLONEEND
 26246         CENARGEN        WGS
 25454         BARC            CLONEEND
 16892         BCM             CLONEEND               1              1      VBBAA   mea=167000  std=25000
 16787         CENARGEN        CLONEEND
 15150         UOKNOR          SHOTGUN
 10651         TIGR_JCVIJTC    CLONEEND
 151           UOKNOR          FINISHING
 49            WUGSC           CLONEEND
 36809945      total

 527017        BCCAGSC         EST
 207204        MARC            EST
 171667        MARC            PCR
 81913         BARC            EST
 81913         BARC            EST
 2485          UIACBCB         EST
 1019449       total

STRATEGY & TRACE_TYPE_CODE counts

 COUNT           CENTER_NAME     STRATEGY        TRACE_TYPE_CODE
 12545304        BCM             .               WGS
 11425910        BCM             WGA             WGS
 5223683         BCM             CLONE           SHOTGUN
 4479883         BCM             POOLCLONE       SHOTGUN
 1044963         BCM             .               SHOTGUN
 892385          BCM             SNP             WGS
 737900          NISC            CLONE           SHOTGUN
 125597          BCCAGSC         CLONEEND        CLONEEND
 114753          UIUC            CLONEEND        CLONEEND 
 65171           TIGR            CLONEEND        CLONEEND
 53556           GSC             CLONEEND        CLONEEND
 26246           CENARGEN        .               WGS
 25454           BARC            .               CLONEEND
 16892           BCM             CLONEEND        CLONEEND
 16787           CENARGEN        CLONEEND        CLONEEND
 12195           UOKNOR          .               SHOTGUN
 10651           TIGR_JCVIJTC    CLONEEND        CLONEEND
 2955            UOKNOR          CLONE           SHOTGUN
 151             UOKNOR          .               FINISHING
 49              WUGSC           CLONEEND        CLONEEND
 527017          BCCAGSC         EST             EST
 145820          MARC            EST             EST
 117958          MARC            COMPARATIVE     PCR
 81913           BARC            EST             EST
 61384           MARC            CLONE           EST
 53709           MARC            Re-Sequencing   PCR
 18623           SC              EST             EST
 2485            UIACBCB         .               EST

3' VECTOR TRIMMED counts

 CENTER_NAME     TRACE_TYPE_CODE TOTAL           3'CLV<LEN   QUAL==0          UMD.FRG
 BCM             WGS             24863599        10968979    551114           24050767
 BCM             SHOTGUN         10748529        5052692     23419            10068499
 NISC            SHOTGUN         737900          28972       0                735488
 BCCAGSC         CLONEEND        125597          125484      8926             113790
 UIUC            CLONEEND        114753          90243       0                106247
 TIGR            CLONEEND        65171           46389       0                64903
 GSC             CLONEEND        53556           53556       53556 (all)      0           !!! all have 0 quals and were excluded
 CENARGEN        WGS             26246           26246       0                25976
 BARC            CLONEEND        25454           25454       0                25387
 BCM             CLONEEND        16892           6751        0                16863
 CENARGEN        CLONEEND        16787           16787       0                16628
 UOKNOR          SHOTGUN         15150           2885        12195            0
 TIGR_JCVIJTC    CLONEEND        10651           339         0                10644
 UOKNOR          FINISHING       151             0           151              151
 WUGSC           CLONEEND        49              0           0                0

 BCCAGSC         EST             527017          524173      772              0
 MARC            EST             207204          207204      0                0
 MARC            PCR             171667          171667      0                0
 BARC            EST             81913           78597       0                0
 SC              EST             18623           7350        0                0
 UIACBCB         EST             2485            2485        0                0

Local Data

Files & Dirs

 /fs/szasmg3/bos_taurus/data/
 /fs/szasmg2/Drosophila/D_pseudoobscura/Vectors
 /nfshomes/dpuiu/db/UniVec

Software

Figaro

  • trims vector only at 5' end
  • call lucy trimming for qualities

Lucy

  • both vector sequence and splice sites are required

Atlas

  • web site
  • atlas-screen-trim-file : "calls cross_match and atlas-screen-window to create trimmed reads file (scan in from each end of read looking for 50-base windows of high quality and no vector); "

Contaminant search

UniVec

                 #seqs   min     max     mean    median  n50     sum
 UniVec          2861    12      48551   231     99      781     660,151
 UniVec_Core     1348    12      48551   243     98      967     327,641

Ecoli

K12 4,639,675 bp

BCM vectors

                 #seqs   min     max     mean    median  n50     sum
 BCM             14      2580    33180   9379    5821    32705   131312