Bos taurus

From Cbcb
Jump to: navigation, search

Articles

NCBI Traces

 SPECIES_CODE = "BOS TAURUS"                                                          37,788,710 traces
 SPECIES_CODE = "BOS TAURUS" and CENTER_NAME = "BCM"                                  35,596,825 traces
 
 SPECIES_CODE = "BOS TAURUS" and CENTER_NAME = "BCM" and TRACE_TYPE_CODE = "WGS"      24,863,627 traces
 SPECIES_CODE = "BOS TAURUS" and CENTER_NAME = "BCM" and TRACE_TYPE_CODE = "SHOTGUN"  10,716,306 traces
 SPECIES_CODE = "BOS TAURUS" and CENTER_NAME = "BCM" and TRACE_TYPE_CODE = "CLONEEND"     16,892 traces

BCM Assembly

         #elem   min             max             mean            median          n50             sum
 contig  131620  91              326010          20755           10365           44270           2731814362
 placed  101579  91              250125          24286           13928           47485           2466971326
 chrom   30      44060403        161106243       87813777        84419198        106383598       2634413324

 Chr            Span      GC%
 chr1           161106243 40.76
 chr2           140800416 41.21
 chr3           127923604 42.29
 chr4           124454208 41.01
 chr5           125847759 42.02
 chr6           122561022 40.60
 chr7           112078216 42.39
 chr8           116942821 41.70
 chr9           108145351 40.53
 chr10          106383598 41.84
 chr11          110171769 43.16
 chr12          85358539  41.00
 chr13          84419198  44.00
 chr14          81345643  41.59
 chr15          84633453  42.34
 chr16          77906053  42.91
 chr17          76506943  42.70
 chr18          66141439  45.87
 chr19          65312493  46.32
 chr20          75796353  41.51
 chr21          69173390  43.20
 chr22          61848140  43.59
 chr23          53376148  43.75
 chr24          65020233  42.27
 chr25          44060403  47.13
 chr26          51750746  43.16
 chr27          48749334  42.19
 chr28          46084206  42.61
 chr29          51998940  44.34
 chrX           88516663  41.11
 chrM           16338     39.42
 chrUn          283544868  ?     #  11869 contigs

Files:

 /fs/szasmg3/bos_taurus/BOSTAU4

Children's Hospital Oakland Research Institute

  • Bovine BAC Library (male)):
  • 6 finished BACs
  • NCBI links
 [2]
 [3] 
 [4]
 [5]
 [6]
 [7]

UMD2.0 Assembly

  • qc
          #elem
 scf      134612
 ctg      194643
  • Used UMDoverlapper to trim reads
          #elem       min     max     mean    median  n50     sum
 reads    35237868    68      1418    778     840     864     27406137041
  • Library re-estimates:
 paste *dst *mdi | perl -ane 'print "  $_" if($F[1]!=$F[4]);'
 35237870      150000  50000   35237870        162386  21158
 35237871      2496    1431    35237871        2409    1137
 35237873      3001    829     35237873        2973    770
 35237875      150001  50000   35237875        172955  43831
 35237876      1629    282     35237876        1595    245
 35237877      3063    1326    35237877        3193    1131
 35237878      6756    836     35237878        6701    793
 35237879      2569    293     35237879        2547    285
 35237880      150002  50000   35237880        160984  26638
 35237881      2749    446     35237881        2697    325
 35237883      3593    1213    35237883        3463    1232
 35237884      3165    700     35237884        3172    699
 35237885      3812    533     35237885        3804    537
 35237886      2754    1432    35237886        2701    1289
 35237887      4977    693     35237887        4968    694
 35237889      2710    1529    35237889        2566    1225
 35237890      150003  50000   35237890        161995  26438
  • AGP
 Chr   #Ctgs
 Chr1    4617    156422777
 Chr2    3468    137970877
 Chr3    3260    119903216
 Chr4    3032    120499176
 Chr5    3103    119906797
 Chr6    3570    116708387
 Chr7    3049    109835480
 Chr8    2954    110918838
 Chr9    2584    104153020
 Chr10   2712    103370270
 Chr11   2778    105870899
 Chr12   2673    88593048
 Chr13   2147    83426589
 Chr14   2549    84346988
 Chr15   2655    84608865
 Chr16   2547    80726864
 Chr17   1885    71868308
 Chr18   2195    65032274 
 Chr19   1809    63177714
 Chr20   2146    70879676
 Chr21   1967    70124586
 Chr22   1628    60370627
 Chr23   1434    51154144
 Chr24   1380    61242035
 Chr25   1274    42286642
 Chr26   1668    51439476
 Chr27   1381    45311792
 Chr28   1186    45980083
 Chr29   1803    50591405
 ChrX    4883    136090029
 Chr1..29,X  74337  2612810882
 ChrU    113346  244744116
 ChrY    94      832527
 Chromosome mapped ctg/deg orientation:
 -       33990
 +       32686
 0       7661
 Chr1..30:
           elem       min        max        mean       med        n50        sum
 ctg       63006      88         840370     41151      20696      89067      2592807255  
 deg       11331      251        21929      1765       1330       1781       20003627    
 ChrU:
           elem       min        max        mean       med        n50        sum
 ctg       94017      89         166670     2362       1398       2537       222079922
 deg       19329      71         13330      1172       1031       1127       22664194
 Chr1..30 & ChrU:
           elem       min        max        mean       med        n50        sum
 ctg       157023     88         840370     17926      1787       81230      2814887177
 deg       30660      71         21929      1391       1112       1346       42667821
  • haplotype-variants
           elem       min        max        mean       med        n50        sum            
 ctg+deg   12375      73         28074      1956       1429       2005       24209396       
 ctg       7499       73         28074      2426       1728       2671       18193542        
 deg       4876       147        6807       1233       1139       1213       6015854

Submission

  • Title: "A whole-genome assembly of the cow, Bos taurus"
  • Authors:
 Steven Salzberg
 Aleksey Zimin
 Arthur Delcher
 Liliana Florea
 David Kelley
 Finian Hanrahan
 Guillaume Marcais
 Geo Pertea
 Michael Roberts
 Michael Schatz
 Curt Van Tassell
 James Yorke
 Poorani S.
  • Assembler:
 Celera Assembler and UMD Overlapper.
  • Sequencing Center :
 Baylor College of Medicine. 
  • Source of DNA used for sequencing:

The source of the BAC library DNA was Hereford bull L1 Domino 99375, registration number 41170496. Dr. Michael MacNeil's laboratory, USDA-ARS, Miles City, MT provided the blood. The DNA for the whole genome shotgun sequences was provided by Dr. Timothy Smith's laboratory, U.S. Meat Animal Research Center, Clay Center, NE from white blood cells from L1 Dominette 01449, American Hereford Association registration number 42190680 (a daughter of L1 Domino 99375). A skin cell fibroblast cell line from the same animal is available from Dr. Carol Chitko-McKown's laboratory, although there is no sequence from that cell line.

  • Sequence modifiers:
 [organism=Bos taurus][breed=Hereford][tech=wgs][chromosome=...]
  • Submission: Use sequin
 /nfshomes/dpuiu/szdevel/sequin.8.10/sequin
  • Sequence:
 Contig length summary:
           #seqs   min     max     mean    median  n50     sum
 all       210657  71      840370  13709   1523    78511   2887902366
 placed    75775   88      840370  34512   13416   88287   2615171268
 unplaced  134882  71      166670  2022    1322    1742    272731098

Duplicates

 deg0003136509,7180003440308 : both unplaced
 deg0003084562,7180002954167 : both unplaced

Contaminants

NCBI (1st batch)

Through "Foreign Contamination Screen" : http://www.ncbi.nlm.nih.gov/projects/WGS/screens/DAAA01_120508/

  • list.exclude_contigs 4,813 ctgs (3,939 vector + 394 Ecoli + 452 other) (73 mito, 43 deg, 8 Acinetobacter baumannii)
  • list.trim_contigs 19,049 ctgs (18,336 vector + 289 Ecoli + 397 other)

Steven search against Ecoli MG1655

  • 12/11/2008 : Found 121 ctgs that align to Ecoli
 /fs/szasmg3/bos_taurus/Bos_taurus_UMD_2.0/salzberg/Eco-vs-cow.mum 
 /fs/szasmg3/bos_taurus/Bos_taurus_UMD_2.0/salzberg/EcoK12.fna 

NCBI (2nd batch)

Overall

  • Counts
 Bos_taurus.UMD2.exclude.count
 Bos_taurus.UMD2.trim.count

Summary

                             #ctgs   min     max     mean    median  n50     total_bp
 UMD_Freeze2.0_contam        210657  71      840370  13709   1523    78508   2887902366
 UMD_Freeze2.0               187683  71      840370  15225   1609    79580   2857554998
 difference                   22974

Contaminant region summary:

            elem       min        max        mean       med        n50        sum
 exclude    4817       316        16661      1510       1485       1514       7276894
 trim       30325      48         2479       354        319        446        10745455
 all        35142      48         16661      512        362        674        18022349
 
 Ecoli      746        54         16661      1125       1111       1264       839899          
 vector     33540      49         3128       487        346        603        16340397       
 other      910        53         13090      1006       1037       1329       916117

Exclude sequences(example):

 7180003318605   16661   Escherichia coli str. K12 substr  # DH10B
 7180003320028   13090   Acinetobacter baumannii
 7180003316967   7473    Escherichia coli str. K12 substr  # DH10B
 7180003313366   7098    Acinetobacter baumannii
 7180003195772   4993    Serratia marcescens
 7180003288790   4668    Klebsiella pneumoniae
 7180003310064   4371    Escherichia coli                  # all 3
 7180003262150   4275    Escherichia coli
 7180003288789   3565    Serratia marcescens
 join100003627   3563    Acinetobacter baumannii
 ...
 7180003289260   3128    Escherichia coli or vector  
 7180003292886   2957    vector
 7180003166540   2081    mitochondrion
 7180003310112   1977    contaminants
 7180002995790   1711    bacterial insertion sequence
 7180003221530   1647    Bacillus cereus ATCC 10987
 7180003259696   1597    Pseudomonas aeruginosa PAO1
 7180003239826   1378    Macaca mulatta

Problems:

1: Vectors

2: Ecoli

  • There are 22 Ecoli strains * 3 Ecoli K12 substarins
  • MG1655 is the 1st one completed, DH10B and W3110 have been recently(?) completed
  • Contain unique seqs as long as 28K
 cat /fs/szdata/genomes/ncbi/Bacteria/Escherichia_coli_K_12*/*fna | infoseq -description
 NC_010473.1    4686137 50.78  Escherichia coli str. K-12 substr. DH10B, complete genome
 NC_000913.2    4639675 50.79  Escherichia coli str. K-12 substr. MG1655, complete genome
 AC_000091.1    4646332 50.80  Escherichia coli str. K-12 substr. W3110, complete genome
  • out of 746 UMD2 Ecoli seqs, 636(723 maxmatch) aligned to Ecoli.all

3: Other

  • contaminants: 289 ; mostly Ecoli
  • phage: 16 ; 1 aligns to Ecoli, all <1108bp
  • IS: 398; all <1711 bp; 14 align to Ecoli.all & 1 to UniVec_Core; few were NCBI blasted aligned to mammals !!!
  • mitochondrion: 74 seqs: all align to ~/db/bos_taurus.mitochondrion
  • Others: 130; (Acinetobacter baumannii ...)
  • Files:
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/bos_taurus.UMD2.exclude.count     (4813 exlude sequence counts)
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/bos_taurus.UMD2.contaminant.list  (35142 contaminant sequences: exclude+trim) 
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/bos_taurus.UMD2.contaminant.fasta
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/bos_taurus.UMD2.contaminant.infoseq  35142
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/bos_taurus.UMD2.Ecoli.infoseq        746
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/bos_taurus.UMD2.vector.infoseq       33540
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/odd-contaminants.infoseq             14
 /fs/szasmg3/dpuiu/bos_taurus/submission/contaminants/NCBI/nucmer/bos_taurus.UMD2.vector.no_UniVec_Core.no_Ecoli.all.blastn_hits.fasta # 100 vector seqs not in UniVec (pPAC7 ...) -> ~dpuiu/db/OtherVec
  • Dirs:
 /fs/szasmg3/dpuiu/bos_taurus/submission/decontam

Notes:

  • The 4,814 ctgs were aligned to UniVec (-c 20 ; delta-filter -q)
    • 4,247 ctgs aligned to 89 vecto seqs
    • top ref hits:
 gnl|uv|J01749.1   Cloning vector pBR322
 gnl|uv|J01636.1   E.coli lactose operon with lacI, lacZ, lacY and lacA genes
 gnl|uv|AF102576.1 Cloning vector pSOS
 gnl|uv|L08959.1   pUC8 cloning vector 
 gnl|uv|L08931.1   pMAC7-8 cloning vector for site-directed mutagenesis
 gnl|uv|L09145.1   pUR222 cloning vector
 gnl|uv|U47102.2   Cloning vector pALTER<R>-Ex1
 ...
  • The 4,814 ctgs were aligned to EcoliK12
    • 4,299 aligned 200bp+ to Ecoli
    • 3,877 aligned 100% to region 365521_365744 (224bp)
 >NC_000913.2_365521_365744 Escherichia coli K12, complete genome
 CATGGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATAC
 GAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAA
 TTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAAT
 GAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGC
  • The 4,814 ctgs were assembled based on 11,912 reads
 BCM          SHOTGUN    11247 (out of 10M reads)
 BCM          WGS        415   (out of 24M reads)
 NISC         SHOTGUN    213   (out of 0.7M reads)
 BARC         CLONEEND   14    ...
 BCM          CLONEEND   11
 BCCAGSC      CLONEEND   7
 TIGR         CLONEEND   4
 TIGR_JCVIJTC CLONEEND   1
  • Avg UMD clipping rangeof the 11,912 reads is 840bp (vs 778 avg for the 3.53M assembled reads)
  • Other: /fs/ftp-cbcb/pub/data/Bos_taurus/Bos_taurus_UMD_2.0/odd-contaminants.fa

Local files

  • Freeze dir files
 /fs/szasmg3/bos_taurus/Bos_taurus_UMD_2.0/contigs.unplaced.fa  : sequences
 /fs/szasmg3/bos_taurus/Bos_taurus_UMD_2.0/bos_taurus.agp       : all scaffolds
 
 /fs/szasmg3/bos_taurus/UMD_Freeze2.0/reads.placed.gz: 31,942,023 reads (read_id, read_clr, ctg_id, scf_id, ctg_pos, scf_pos)


 /fs/ftp-cbcb/pub/data/assembly/Bos_taurus/Bos_taurus_UMD_2.0
  • Files uploaded
 Ftp server: ftp-private.ncbi.nlm.nih.gov
 Account: cbcb_trc
 Dir: uploads/
 Local files: /fs/szasmg3/dpuiu/bos_taurus/submission/ftp/   : 22 *sqn + 1 agp

Contaminant search

Ecoli

UniVec_Core

UMD2.other

  • 83(82) ctgs align to 65 ref sequences
  • 10 ctgs are Acinetobacter baumannii
 pwd
 /fs/szasmg3/dpuiu/bos_taurus/submission/nucmer_contaminant
 
 join UMD2.contaminant.other-ctg.ref_hits ~/db/bos_taurus.UMD2.contaminant.infoseq | sort -nk3 -r | head
 7180003370686_12513_13066 553 16 554 phage
 7180003320028 13090 10 13090 Acinetobacter baumannii
 7180003341208_1_647 646 8 647 phage
 ...
 
 contigs          <2000      >2000      min        max        mean       med        n50        sum
 82               20         62         709        397429     65384      44841      138429     5361543
 
 alignments       <200       >200       min        max        mean       med        n50        sum            
 103              33         70         105        3312       467        276        688        48109

Files:

/fs/szasmg3/dpuiu/bos_taurus/submission/nucmer_contaminant/UMD2.contaminant.Acinetobacter-ctg.qry_hits   # 10 UMD2.0 Acinetobacter ctg ids
/fs/szasmg3/dpuiu/bos_taurus/submission/nucmer_contaminant/Acinetobacter.all-ctg.filter-q.qry_hits       # 22 UMD2.0 Acinetobacter ctg ids ; 7 in common with the 10 above
# 25 Acinetobacter ctg's
       ctgid         ctglen
    1  7180003321583 96481
    2  7180003308373 9419
    3  7180003319195 8955
    4  7180003317370 8045
    5  7180003290024 5922
    6  join100003699 5649
    7  7180003288988 3618
    8  7180003308907 3157
    9  7180003234806 3100
   10  7180003319189 2966
   11  7180003202299 2653
   12  7180003217023 2213
   13  7180003219002 2161
   14  7180003219018 2010
   15  7180003292866 1767
   16  7180003215440 1617
   17  7180003235746 1573
   18  7180003235747 1524
   19  7180003234890 1422
   20  7180003219292 1329
   21  7180003221397 1308
   22  7180003221476 1243
   23  7180003235699 1139
   24  7180003214110 1100
   25  deg0003235855 1062

Other issues

Segmental duplications

  • David Kelly seminar
  • UMD1.6
    • inclusions: 384 (1.1Mbp)
    • joins: 1090 (1.1Mbp)