Turkey: Difference between revisions

From Cbcb
Jump to navigation Jump to search
 
(37 intermediate revisions by the same user not shown)
Line 45: Line 45:
= Assembly2.0 =
= Assembly2.0 =


Stats: preliminary
== Original (CA) ==
.                    elem      min    q1      q2      q3      max        mean      n50        sum           
  Ch1..30,40,41        32        531    6400446  15119779 34928883 184590300  28263595  70426150  904,435,047
  gaps                147792    100    100      100      100      2999      268        860        39,738,918


Stats: final
Reads:
                      elem      min    q1      q2      q3        max        mean      n50        sum
   TotalUsableReads=151,843,863 (151M)
   Ch1..30,40,41        32        531    7024757 18811362 37793329  207174646  31576111   75696247   1,010,435,575
   AvgClearRange=102
   Ch1..30,40,41,Un    33        531    7024757 18811362 37793329  207174646  32954439   75696247  1,087,496,503
   ContigReads=139021843(91.56%)
   [[Media:turkey.len|turkey.len]]
   DegenContigReads=8392124(5.53%)
  SurrogateReads=1317962(0.87%)
   SingletonReads=3314375(2.18%)
   Cvg=15X


----
Stats:
 
Assembly stats(all):
   .                    elem        min    q1    q2    q3    max        mean      n50        sum
   .                    elem        min    q1    q2    q3    max        mean      n50        sum
   scf                  27,007      66    1354  1988  4793  9558742    37856      1538143    1,022,394,764
   scf                  27,007      66    1354  1988  4793  9558742    37856      1538143    1,022,394,764
Line 64: Line 62:
   deg                  440,796    64    102    256    485    8055      312        483        137,835,235
   deg                  440,796    64    102    256    485    8055      312        483        137,835,235


Assembly stats(placed): preliminary AGP
== Preliminary ==
 
Stats:
.                    elem      min    q1      q2      q3      max        mean      n50        sum           
  Ch1..30,40,41        32        531    6400446  15119779 34928883 184590300  28263595  70426150  904,435,047
  gaps                147792    100    100      100      100      2999      268        860        39,738,918
 
Stats(placed):
   .                    elem        min    q1    q2    q3    max        mean      n50        sum
   .                    elem        min    q1    q2    q3    max        mean      n50        sum
   scf                  2,504      1001  5868  35589  272564 9558742    362085    1830406    906,662,877
   scf                  2,504      1001  5868  35589  272564 9558742    362085    1830406    906,662,877
Line 71: Line 76:
   ctg+deg              147,824    64    520    2783  8197  91891      5849      13426      864,696,129  
   ctg+deg              147,824    64    520    2783  8197  91891      5849      13426      864,696,129  


Assembly stats(unplaced): preliminary AGP
== Final ==
   .                    elem      min    q1     q2     q3     max        mean      n50        sum
 
   scf                  24,503    66    1323   1847   3835  325966    4723      11202      115,731,887
* More ctgs placed based on synteny.
   ctg                  33,911     64    1242   1693   2609  42695      2351      2774      79,744,229
* Alignments to chicken (delta-filter -1):
   deg                  404,724    64    100    249    482    3868      307        481        124,348,229
   1      150471
  -1      32587
Assembly stats(placed by synteny): final AGP; Aleksey & Daniela's
* Many scaffolds seem to be interleaved
 
Stats:
                      elem      min    q1     q2       q3       max        mean      n50        sum
   Ch1..30,40,41        32        531    7024757 18811362 37793329  207174646  31576111   75696247   1,010,435,575
   Ch1..30,40,41,Un     33        531    7024757 18811362 37793329  207174646  32954439   75696247   1,087,496,503
   [[Media:turkey.len|turkey.len]]
 
Stats(placed):
   .                    elem      min    q1    q2    q3    max        mean      n50        sum             
   .                    elem      min    q1    q2    q3    max        mean      n50        sum             
   ctg                  131,217    64    1651  3975  9289  91891      6866      12989      901,044,472       
   ctg                  131,217    64    1651  3975  9289  91891      6866      12989      901,044,472       
Line 83: Line 96:
   ctg+deg              162,643    64    731    2602  7576  91891      5609      12829      912,285,854
   ctg+deg              162,643    64    731    2602  7576  91891      5609      12829      912,285,854


Reads:
More stats:
   TotalUsableReads=151,843,863 (151M)
  total genome size with gaps    :      1087496503  1010435575
   AvgClearRange=102
  total genome size without gaps  :     941191869    912285854
   ContigReads=139021843(91.56%)
   DegenContigReads=8392124(5.53%)
   where:
   SurrogateReads=1317962(0.87%)
    all: Chr1..41,Un
   SingletonReads=3314375(2.18%)
    placed: Chr1..41
   Cvg=15X
 
  N50 contig size(CA ctgs):              12435       
   N50 scaffold size(original CA scaff):  1538143     
 
   total bases mapped to chromosomes:    . 941191869 (Chr1..41)
   total unmapped                        : 28906015 (ChrU)    
 
   size and number of contigs in each chromosome:
  chr      #ctg/deg  len(noGaps)  len(withGaps)
   1        31920    186281234    207174646
  2       17221    108330071    119814280
  3        15247    92546836    102780271
  4        10336    69043870    75696247
  5        8892      57589156    63943857
  6        7680      49575076    55000907
  7        6634      36192137    39986770
  8        5331      34152018    37933571
  9        2583      18366421    20063553
  10      4455      29082703    31790800
  11      2962      22664575    24752353
  12      2854      19182682    21170715
  13      2810      18912345    21086818
  14      2657      19298732    21185158
  15      2671      17107111    18811362
  16      2421      14623454    16273683
  17      1858      12183352    13504974
  18       57        118600      132921
  19      1687      9654238      10789531
  20      2039      10407256    11885725
  21      1562      9611963      10683868
  22      5729      14123046    16000480
  23      1066      6510119      7383190
  24      710      3881523      4300864
  25      954      5025869      5613781
  26      1522      6146115      7024757
  27      195      777582      887413
  28      778      4125373      4632725
  29      688      3036456      3487800
  30      1067      3660581      4277653
  40      1        531          531
  41      16056    30074829    32364371
  Un      14048    28906015    77060928
  total    176691    941191869    1087496503
 
  Reads and bases
  ctg          H  120498260  8874803391
  ctg          F  18491893  5298292983
  ctg          C  17546      7111026
  ctg          0  14144      4334955
 
  deg          H  6775791    493312431
  deg          F  1614232    440598789
  deg          C  971        275722
  deg          0  1130      253396
 
  placed_ctg    H  118224274  8707749456
  placed_ctg    F  18072645  5176540153
  placed_ctg    C  17515      7100114
  placed_ctg    0  14058      4313107
  placed_deg    H  1401655    101867758
  placed_deg    F  212050    49638442
  placed_deg    C  675        195293
   placed_deg    0  653        150688


---
Files:  
Files:  
   /fs/szattic-asmg4/turkey/Assembly2.0/
   /fs/szattic-asmg4/turkey/Assembly2.0/
  /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/
== Chr_111909 ==
--[[User:Dpuiu|Dpuiu]] 22:28, 19 November 2009 (EST)
* Aleksey try to fix the contig rearrangements & scaff overlaps
  .          elem      min    q1    q2    q3    max        mean      n50        sum           
  ctg        154342    64    1463  3214  8113  91891      6170      12340      952327586     
  deg        13627      64    181    453    685    8055      460        656        6270299       
  ctg+deg    167969    64    1218  2747  7462  91891      5706      12242      958597885     
Files:
  /nfshomes/alekseyz/Chr_111909/Chr.all.agp
== Chr_112409 ==
--[[User:Dpuiu|Dpuiu]] 15:39, 24 November 2009 (EST)
* Alignments to chicken (delta-filter -1): still many inversions
  1      137445
  -1      22445
* Inverted scaffold examples:
cd ~dpuiu/turkey/
join2.pl Alignment2.0/chicken-turkey.scf/Chr.scf.dir Assembly2.0/Chr_112409/Chr.scf.dir | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1]  ne $F[3]);' | sort -nk3 -r | pretty | grep -v -f turkey.scf.split.112409
#scfid      alignDir alignCount AgpDir AgpCount  markerDir markerCount 
7180002103721    -1  602            1  395    -1        116  # flipped errorneously (395 ctg,3.3Mbp scaffold)
7180002103327    -1  299            1  221    1        68
7180002103550    -1  298            1  241    1        65
7180002103191    -1  280            1  258    1        69
7180002103618    -1  267            1  426    -1,1      144  # half rev, half fwd
7180002103677    -1  246            1  186    -1,-1    53  # aligns in 2 separate regions of Chr1
...
7180002103609    1  228            -1  166    -1
7180002103567    -1  224            1  241    1
7180002103561    1  223            -1  597    -1
7180002103695    1  210            -1  286    -1
7180002103421    -1  201            1  203    1
7180002103478    -1  181            1  217    -1            # flipped errorneously (217 ctg,1.4Mbp scaffold)
7180002103668    -1  176            1  171    -1            # flipped errorneously (171 ctg,1.7Mbp scaffold)
7180002103762    -1  161            1  257    ?
7180002103538    -1  154            1  134    1
7180002102914    1  147            -1  74      -1
7180002103634    -1  142            1  95      -1            # flipped errorneously (95 ctg,6.9Mbp scaffold)
7180002103116    1  141            -1  97      -1
7180002103453    -1  129            1  49      ?
7180002102994    1  128            -1  109      -1
* About 70 scaffolds (40Mbp) seem "clearly" inverted
join2.pl ~dpuiu/turkey/Assembly2.0/Chr_112409/Chr.scf.dir BACs/Chr.scf.dir | grep -v -f turkey.scf.split.112409 | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1] and $F[1] ne $F[3]);' | join2.pl -f \
        ~dpuiu/turkey/Assembly2.0/turkey.posmap.scflen | sort -nk6 -r | pretty | getSummary.pl -i 5
elem      min    q1    q2    q3    max        mean      n50        sum
70        89456  305981 483779 692778 3317675    576698    692778    40368888
* Scaffolds don't seem to be interleaved any more
* Stats
  .            elem    min    q1      q2        q3        max        mean      n50      sum       
  ctg.all      152641  64      1356    3154      8130      91891      6131      12520    935915009
  ctg.placed    144893  64      1388    3361      8485      91891      6330      12751    917287101
 
  chr.all      33      242906  6750934  18242820  38656374  204065997  32193660  74864811  1062390784 
  chr.placed    32      242906  6750934  18242820  38723638  204065997  32509112  74864811  1040291584 
  chr      #ctg/deg  len(noGaps)  len(withGaps)
  1        26557    181826552    204065997
  2        14384    106718223    116966045
  3        12649    91132767    100405573
  4        9170      68844569    74864811
  5        7553      56965239    62524249
  6        6534      48705183    53257597
  7        4755      35338084    38723638
  8        4751      35279744    38656374
  9        2286      18014631    19388932
  10      3733      28668829    31125850
  11      2720      22659912    24221968
  12      2372      18944919    20663392
  13      2354      18696996    20109273
  14      2367      19181786    20812949
  15      2265      16791072    18242820
  16      1967      14411805    15988588
  17      1635      12015459    13277650
  18      51        139801      244178
  19      1399      9478246      10526513
  20      1424      9943105      11078077
  21      1328      9405728      10459872
  22      1865      13252797    14786889
  23      937      6420024      7113901
  24      569      3613335      4158826
  25      834      4963017      5560155
  26      1040      5925429      6750934
  27      161      687724      943818
  28      717      4244239      4894166
  29      803      3649262      4826720
  30      693      3524564      4396719
  W        50        108225      242906
  Z        24970    47735835    81012204
  Un      7748      18627908    22099200
  total    152641    935915009    1062390784
Files:
  /nfshomes/alekseyz/Chr_111909/Chr.all.agp
  /fs/szasmg3/dpuiu/turkey/Assembly2.0/Chr_112409/
=== Table 13 ===
* From the article
* 34 predicted rearrangements between the turkey and chicken genomes ; 6 look wrong, 6 questionable, 22 probably right
  GGA  GGA start      GGA end        MGA*    Nature of the rearrangement                                                          Notes
  1    9,713,416      10,050,000      MGA1    segment relocated to chr1:74570000                                                  translocated segment is internal to direct repeat of SEMA3 genes
  1    75,800,000      76,000,000      MGA1    small inversion                                                                      possible unequal recombination within KCN gene cluster
  1    104,450,000    104,459,439    MGA1    possible very small intrachromosomal translocation                                  the genetic map places this short segment near 1q telomere
  #1    125,900,000    126,300,000    MGA1    small interchromosomal translocation                                                insertion of GGA4:25,500,000-25,550,000 at repetitive locus (see also below)
  #1    156,600,000    156,600,001    MGA1    small interchromosomal translocation                                                may be misplacement of Ctg13.1004 in GGA seq or LINE-based translocation of a small segment from GGA4:73,089,000-73,090,000  1    172,822,000    172,900,000  MGA1    possible small inversion        may be mis-assembly of GGA ctg3.1161
  2    54,870,224      56,560,442      MGA3    inversion with 56.560 Mb coordinate being telomeric in MGA3                          (together one inversion and two translocations or assembly errors)
  2    54,398,341      54,413,232      MGA3    small translocation or mis-assembly of GGA seq., inverted rel. to GGA seq. coord.    (together one inversion and two translocations or assembly errors)
  2    54,641,337      54,845,268      MGA3    probably inverted relative to GGA sequence coordinates                              (together one inversion and two translocations or assembly errors)
  #2    54,290,000      54,330,000      MGA3    small translocation or mis-assembly of GGA seq., orientation uncertain              (together one inversion and two translocations or assembly errors)
  #2    54,452,395      54,545,188      MGA3    probably inverted relative to GGA sequence coordinates                              (together one inversion and two translocations or assembly errors)
  2    53,804,240      54,263,147      MGA3    inverted relative to GGA seq. coordinates with 53.8 Mb joined to 56.6 Mb in MGA      (together one inversion and two translocations or assembly errors)
  3    6,218          2,344,838      MGA2    inversion, telomeric                                                                FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------]
  3    5,605,686      11,605,484      MGA2    inversion, (agrees with genetic map)                                                FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------]
  #4    25,500,000      25,550,000      MGA4    small interchromosomal translocation to about 125.90 Mb orthologous coord. on MGA1  see also chr1:125900000
  ?4    35,150,000      35,160,000      MGA4    likely small duplication                                                            part of this segment duplicated at around 35,828,000 may be misplacement of Ctg13.1004 in seq or
  #4    73,080,000      73,090,000      MGA4    small interchromosomal translocation to about 156.60 Mb orthologous coord. on MGA1  LINE-based translocation of a small segment, see also GGA1:156,600,000
  5    1              270,229        MGA5    local small inversion with respect to p arm which as a whole is inverted            local inversion with respect to p arm which as a whole is inverted
  5    1              7,248,180      MGA5    inversion of p arm      p arm likely inverted based on genetic map of Nte0897, MNT-193
  6    1,576,787      13,080,207      MGA8    multiple inversions:  predicted order is ...                                        can be explained by a series of 4-5 consecutive inversions, including possible unequal recombination between SLC16A9 or, less likely, protocadherin genes
  7    1              7,248,180      MGA7    inversion of p arm
  ?8ran 64,951          407,592        MGA10  GGA8_random sequences likely telomeric on MGA10                                      (and probably GGA8)     
  8    44,817          10,199,568      MGA10  inversion of p arm                                                                  possible unequal recombination between AMY genes
  8    8,992,540      9,170,000      MGA10  local small inversion with respect to p arm which as a whole is inverted            probable inversion but might be mis-orientation of GGA sequence contigs
  9    1,528,027      4,372,460      MGA11  inversion                                                                            telomeric inversion
  10    1,907,125      3,642,461      MGA12  no internal centromere observed in turkey                                            centromere misplaced in chicken or moved to telomere in turkey
  11    75,337          3,280,000      MGA13  no internal centromere observed in turkey                                            inversion of GGA 11p, FISH CONFIRMED
  12    95,816          940,546        MGA14  may be inverted, orientation uncertain                                              may be fused to a repeat of 2.15-2.3 Mb region of GGA12
  ?12  1,050,000      1,100,000      MGA14  possible small intrachromosomal translocation to telomere                            small segment may be now at MGA telomere
  ?12  1,128,610      1,134,284      MGA14  possible very small intrachromosomal translocation                                  small segment now between about 2,632,117-2,703,753 in GGA coordinates on q arm
  12    1,164,577      1,399,694      MGA14  inversion  (1164577 joined to 1599552)                                              centromere either misplaced in GGA or moved telomeric or between 940,546 and 1,399,694
  13    8,233,861      8,511,782      MGA15  small inversion
  14    14,370,000      15,070,000      MGA16  inversion                                                                            FISH CONFIRMED
  18    5,062,096      9,882,412      MGA20  inversion                                                                            unequal recombination between NME paralogs, FISH confirmed
  ?28  1,550,000      1,620,000      MGA30  apparent duplication with extra copy at about 1.05 Mb in MGA                        unclear if these are rearrangements or assembly errors


= Scaffold alignment to chicken =
= Scaffold alignment to chicken =
Line 112: Line 332:
* Ctg stats (ctgs in aligned scaff)
* Ctg stats (ctgs in aligned scaff)
                       elem        min    q1    q2    q3    max        mean      n50        sum       
                       elem        min    q1    q2    q3    max        mean      n50        sum       
   aligned              139790      64    1580  3665  8822  91891      6585      12739      920634899
   aligned              139790      64    1580  3665  8822  91891      6585      12739      920,634,899
   unaligned            5873        64    1148  1399  1887  22071      1756      1766      10318453
   unaligned            5873        64    1148  1399  1887  22071      1756      1766      10,318,453


* Alignment stats  
* Alignment stats  
Line 120: Line 340:
   len(filter-1)        163390      12    1191  2673  6437  134409    5188      10410      847,715,057
   len(filter-1)        163390      12    1191  2673  6437  134409    5188      10410      847,715,057
   %id(filter-1)        163390      11.24  81.10  84.82  87.68  100.00    83        85        .
   %id(filter-1)        163390      11.24  81.10  84.82  87.68  100.00    83        85        .
  len(subset 10)*      2952        12    771    2202  5413  45094      4040      8052      11926507
  id%(subset 10)      2952        33.22  80.56  84.51  87.68  100.00    83        84        .


* turkey scf vs chicken & turkey chr : 15% of the scaffold sequence seem to align in opposite orientation !!! Could the scaffold be misoriented by mistake?
* turkey scf vs chicken & turkey chr : 15% of the scaffold sequence seem to align in opposite orientation !!! Could the scaffold be misoriented by mistake?
Line 179: Line 396:
   
   
  24            Chr26      862
  24            Chr26      862
25            Chr27      3
   
   
  26            Chr28      592
  26            Chr28      592
Line 190: Line 409:
    
    
  W            Chr41      24
  W            Chr41      24
W            Chr40      ?
   
   
  E22C19W28_E50C23  ChrUn  7l
  E22C19W28_E50C23  ChrUn  7l
Line 327: Line 547:
                     elem      min    q1      q2      q3      max        mean      n50        sum  
                     elem      min    q1      q2      q3      max        mean      n50        sum  
   1+markers          23077      76    6408    11837  19433  91891      14425      19768      332,889,618
   1+markers          23077      76    6408    11837  19433  91891      14425      19768      332,889,618
= Scf splits (Daniela) =
1. Input format
  cat BACs/BAC_map_final.txt | grep 7180002103762 | pretty
  CH260094G18_SP6      3_1.3  3  2205409  150000  7180002076309  7180002103762  3285  322415
  78TKNMI001N01_SP6    3_1.3  3  2287385  150000  7180002058027  7180002103762  2224  329223
  ...
  CH260099O02_SP6      3_1.5  3  3910434  150000  7180002058147  7180002103762  4524  1655352
  CH260096N05_T7      6_3    6  26824213  150000  7180002058054  7180002103762  12808  693787
  ..
  CH260026H13_SP6      6_3    6  29907224  266336  7180002057998  7180002103762  634    33979
2. find scaffolds with markers from multiple chromosomes
  cat BACs/BAC_map_final.txt | awk '{print $7,$3}' | count.pl -m 2 | awk '{print $1,$2}' | paste.pl
  ...
  7180002103762 3 6
  ...


= Scf splits (Aleksey) =
= Scf splits (Aleksey) =
     1  7180002103685 6 156 161 jumps from chr6 to chr1 4049114-4201400
     1  7180002103685 6 156 161 jumps from chr6 to chr1 4049114-4201400
     2  7180002103648 1 45 79 1187881-1198679
     2  7180002103648 1 45 79 1187881-1198679
Line 394: Line 634:
     24  7181002103752
     24  7181002103752
     25  7181002103771
     25  7181002103771
= Annotation =
* ftp://ftp.sanger.ac.uk/pub/searle/umd/turkey


=  Zebrafinch chr sample vs Chicken chr =
=  Zebrafinch chr sample vs Chicken chr =
Line 467: Line 703:
   
   
   E22C19W28_E50C23*  chrLGE22  3
   E22C19W28_E50C23*  chrLGE22  3
= Synteny =
MSU:
  "We do see a couple of very small translocations between chromosomes 1 and 4,but these are so small that they could be errors in the chicken assembly or, more likely, paralogous sequences that perhaps were two copies in the last common ancestor and chicken kept one and turkey the other. We don't see translocations between chromosomes Z and 1, so I expect that these alignments are due to a repetitive element (CR1 being the most likely), but the Z assembly is tentative even in chicken, so it's hard to be sure."
From the spreadsheet:
  chickenChr    turkeyChr
  4        chr1 12.2 1-12.2  25,500,000 25,550,000
  4              chr1 18.2 1-18.2 73,080,000 73,090,000
From the *merge2.anc
  4                  Chr1  94230402  207174646  73196453  73204143  177454210  177447336  3184    2528      5            -1      250.65
  4                  Chr1  94230402  207174646  86530225  86583469  117075107  116976224  11548    58133    9            -1      203.6
Syntenic regions:
            chickenRegions  turkeyRegions  chickenChr  turkeyChr
  all      209166          311363                                    # nucmer -l 12 -c 65 -g 1000 -b 1000                     
  filter-1  183058          259760        142        186            # delta-filter -1
  filter    170658          239592        125        129            # filter-anc.pl -maxDist 200000 -W 20 -p 0.1
  merge0    3260            2250          125        130            # merge-anc.pl  -maxDist 200000
  merge1    1573            1368          110        93            # merge-anc.pl  -maxDist 200000  -minCount 8  -minLen 10000
  merge2    376            488            49          47            # merge-anc.pl  -maxDist 1000000 -minCount 20 -minLen 100000
= Problems =
== ctg7180001625741 ==
* 1 ctg scaff: 7180002083787(1.4Kbp)
* Single links to 2 diff scaff: 7180002103637 & 7180002103666
* Synteny info (Daniela)
  cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.ctg/turkey.ctg.posmap.merge | grep -C 20 7180001625741
  #                chickenChr                                  turkeyChr
  7180002057801    6                36246991  36257816  -1  Chr8  35888371  35899195  r  U  100
  7180001625741    6                36269529  36271001  -1  .      .          .          .  .  .
  ...
  7180002074579    6                36382217  36386350  -1  Chr8  35899296  35903428  r  N  20910
* Synteny info (Aleksey)
  cat /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/place_by_sinteny/contigs.chicken.order.with_AGP.valid.txt | grep -C 1 7180001625741 | pretty
  #                                                chickenChr  turkeyChr
  1      1790  36269140  36267359  7180001578245  chr6        Chr7  20109532  20111855  2324  -  7180001578245
  307    1472  36270694  36269529  7180001625741  chr6        ChrUn  32131240  32132711  1472  0  7180001625741*
  2343    5341  36282706  36279707  7180001914610  chr6        Chr7  36045401  36052860  7460  -  7180001914610
  cat turkey.posmap.ctgscf | grep 7180002103637 | egrep -n '7180001578245|7180001914610'
  ...
  302:7180001914610      7180002103637  2403512 2410972 f
  391:7180001578245      7180002103637  3013067 3015391 f
  ...
  463:
Scf 7180002103637 aligns both to Chr6 & Chr7
  cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.scf/turkey.scf-chicken.filter-1.merge0.anc | grep 7180002103637
  7180002103637    7                3817505  38384769  2          2285066  23413757  21084724  651501  705639  284          -1      23.41
  7180002103637    6                3817505  37400442  2285109    2410972  36410067  36277407  37596    43600    21          -1      38.69
  7180002103637    7                3817505  38384769  2410993    3817505  21084676  19700317  384609  354790  151          -1      23.49
  grep 7180002103637 /fs/szasmg3/dpuiu/turkey/BACs/BAC_map_final.txt | pretty | sort -nk9 | nl
    1  CH260098J15_SP6      7_10  7  21623779  150000  7180001914412  7180002103637  2833  2833
    2  78TKNMI023L02_SP6    7_10  7  21655259  150000  7180001914413  7180002103637  462    8036
    3  78TKNMI020I14_T7    7_10  7  21786579  150000  7180001914413  7180002103637  6568  14142
    ...
    91  78TKNMI028M05_T7    8_13  8  34451891  150000  7180001914600  7180002103637  4694  2314126
    92  CH260110M21_T7      8_13  8  34375922  150000  7180001914602  7180002103637  4382  2344157
    93  CH260102C12_T7      8_13  8  34561953  150000  7180001914608  7180002103637  7429  2400413
    ...
  155  CH260102B06_SP6      7_10  7  18173403  150000  7180001914714  7180002103637  578    3753466
  156  CH260091G02_T7      7_10  7  18147518  150000  7180001914716  7180002103637  14232  3777334
  157  78TKNMI020K20_T7    7_10  7  17944915  150000  7180001914719  7180002103637  5969  3809779
* Solution:
  Chr7.agp:10573:    Chr7      36935151        36938433        10573  W      7180001538614  1      3283    +      #      chr7/v3.6/scaffolds/scaffold_0.3
  Chr7.agp.bak:10609: Chr7  37075393        37078675        10609  W      7180001538614  1      3283    +      #      chr7/v3.6/scaffolds/scaffold_0.3
  Chr8.agp:10281:    Chr8      36818118        36822250        10281  W      7180002074579  1      4133    -      #      chr8/v3.6/scaffolds/scaffold_0.8
  Chr8.agp.bak:10245: Chr8  36677876        36682008        10245  W      7180002074579  1      4133    -      #      chr8/v3.6/scaffolds/scaffold_0.8
== 9 more problems ==
Turkey marker counts:
  scfId      turkeyChr #markers
  7180002103213    28  3 # 100K  on Chr28
  7180002103213    9  9
  7180002103555    20  7  # found before ; 100K  on Chr8
  7180002103555    8  3
  7180002103653    1  2 # 100K on Chr1
  7180002103653    5  71
  7180002103669    10  161 # 130K in the middle on Chr11
  7180002103669    11  3
  7180002103694    1  53 # 60K in the middle on Chr3
  7180002103694    3  7
  7180002103720    1  59 # found before ;  160K in the middle of Chr8  # "very messy"
  7180002103720    7  63
  7180002103720    8  8
  7180002103742    1  2 # 40K on Chr1
  7180002103742    2  23
  7180002103744    1  2 # 60K on Chr1
  7180002103744    19  3
  7180002103750    2  115 # 50K in the middle on Chr3
  7180002103750    3  2
Alignment to chicken chromosomes:
  scfId            chickenChr        scfLen  chrLen    scfStart    scfEnd  chrStart  chrEnd    scfSnp  chrSnp  #alignm.      chrDir  scfIntercept
  7180002103213    4                426424  94230402  6            299125  492126    808691    84749    99619    33            1        -0.49
  7180002103213    26                426424  5102438    299637      426422  1866683    1733616    25738    31146    16            -1      2.16
  7180002103555    18                462038  10925261  1            370822  8723614    8393969    118299  280534  19            -1      8.72
  7180002103555    6                462038  37400442  370843      462032  20662598  20558365  31608    42763    22            -1      21.03
  7180002103653    1                2021582  200994015  288          61479    168123022  168180581  29306    25853    22            1        -168.12
  7180002103653    5                2021582  62238931  65643        2021301  48278635  50168489  555769  666443  202            1        -48.21
  7180002103669    8                3819803  30671729  1            1573516  22762147  21163160  362616  390538  146            -1      22.76
  7180002103669    9                3819803  25554352  1582769      1673152  20499440  20382802  40878    42100    14            -1      22.08
  7180002103669    8                3819803  30671729  1674402      3819472  21100173  18954403  504851  532551  145            -1      22.77
  7180002103694    1                1815438  200994015  23814        1010176  175716942  174763467  370937  334310  93            -1      175.74
  7180002103694    2                1815438  154873767  1083861      1163613  145943508  145868623  25220    19821    12            -1      147.02
  7180002103694    1                1815438  200994015  1164196      1783528  174719693  174136372  206925  166944  91            -1      175.88
  7180002103720    4                3387095  94230402  3120        22799    74444867  74427082  10167    7883    9              -1      74.44
  7180002103720    7                3387095  38384769  23768        689878  25625928  24976086  154053  144800  49            -1      25.64
  7180002103720    6                3387095  37400442  707721      849237  9020691    9164801    21129    30710    10            1        -8.31
  7180002103720    7                3387095  38384769  849503      1867680  24927870  23935979  247702  228094  87            -1      25.77
  7180002103720    1                3387095  200994015  1896368      3387092  142436224  143961082  406262  431752  212            1        -140.53
  7180002103742    3                1122157  113657789  33          1003474  77462460  78481139  264247  275185  122            1        -77.46
  7180002103742    1                1122157  200994015  1051349      1117652  5090648    5024622    27877    16946    13            -1      6.14
  7180002103744    17                283784  11182526  4            124728  2142615    2019855    26734    33044    12            -1      2.14
  7180002103744    1                283784  200994015  213512      283782  119477488  119414846  20891    50004    14            -1      119.69
  7180002103750    3                2462253  113657789  236          601601  48166755  48796598  146263  177067  84            1        -48.16
  7180002103750    2                2462253  154873767  632754      702802  74042519  73959216  28350    39498    20            -1      74.67
  7180002103750    3                2462253  113657789  702856      2462253  48823530  50636809  416701  636563  219            1        -48.12
= Annotation =
* ftp://ftp.sanger.ac.uk/pub/searle/umd/turkey
* http://birdbase.net/cgi-bin/gbrowse/turkeygenome/#search
  15,093 - protein coding gene loci
    611  - noncoding RNA genes
  15,704 - total number, protein and RNA gene loci.
= Submission =
* [https://netfiles.umn.edu/xythoswfs/webui/_xy-11544920_1-t_Tk0AQByW Nature draft]
* [ftp://ftp.cbcb.umd.edu/pub/data/turkey/Assembly2.0/ CBCB ftp]
* Local dirs:
  /fs/ftp-cbcb/pub/data/turkey/                              # assemblies, FASTA, AGP ...
  /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/Alignments/  # alignments to chicken

Latest revision as of 19:32, 20 January 2010

Data

Chicken (Gallus gallus)

Stats:

.                                                elem       min    q1       q2       q3       max        mean       n50        sum            
Chr1..28,32,MT,W,Z,E22C19W28_E50C23,E64          34         1028   4512026  12968165 30671729 200994015  30377803   94230402   1,032,845,329
gaps(N's)                                        524913     1      30       64       254      1504285    268        792        141,055,297
chicken.len

Files:

 /fs/szasmg3/dpuiu/chicken/

Zebrafinch (Taeniopygia guttata)

Chr stats:

.                                                elem       min    q1      q2       q3       max        mean       n50        sum            
all(random dumplication)                         70         9909   369730  2517995  16419078 175225315  17616947   73657157   1,233,186,341  
all(gaps)                                        107061     25     100     100      100      500000     92         100        9,879,775
  
Chr1,1A,1B,2,3,4,4A,5..28,LG2,LG5,LGE22,M,Un,Z   37         9909   4907541 15652063 36305782 175225315  32343381   73657157   1,196,705,108
zebrafinch.len

Files:

 /fs/szasmg3/dpuiu/zebrafinch/

Turkey (Meleagris gallopavo)

Files:

 /fs/szasmg3/dpuiu/turkey/

Assembly2.0

Original (CA)

Reads:

 TotalUsableReads=151,843,863 (151M)
 AvgClearRange=102
 ContigReads=139021843(91.56%)
 DegenContigReads=8392124(5.53%)
 SurrogateReads=1317962(0.87%)
 SingletonReads=3314375(2.18%)
 Cvg=15X

Stats:

 .                    elem        min    q1     q2     q3     max        mean       n50        sum
 scf                  27,007      66     1354   1988   4793   9558742    37856      1538143    1,022,394,764
 ctg                  145,663     64     1512   3433   8500   91891      6391       12594      930,953,352
 deg                  440,796     64     102    256    485    8055       312        483        137,835,235

Preliminary

Stats:

.                     elem       min    q1       q2       q3       max        mean       n50        sum            
 Ch1..30,40,41        32         531    6400446  15119779 34928883 184590300  28263595   70426150   904,435,047
 gaps                 147792     100    100      100      100      2999       268        860        39,738,918

Stats(placed):

 .                    elem        min    q1     q2     q3     max        mean       n50        sum
 scf                  2,504       1001   5868   35589  272564 9558742    362085     1830406    906,662,877
 ctg                  111,752     64     1919   4886   10524  91891      7616       13635      851,209,123
 deg                  36,072      64     144    331    530    8055       373        521        13,487,006
 ctg+deg              147,824     64     520    2783   8197   91891      5849       13426      864,696,129 

Final

  • More ctgs placed based on synteny.
  • Alignments to chicken (delta-filter -1):
 1       150471
 -1      32587
  • Many scaffolds seem to be interleaved

Stats:

                      elem       min    q1      q2       q3        max        mean       n50        sum
 Ch1..30,40,41        32         531    7024757 18811362 37793329  207174646  31576111   75696247   1,010,435,575
 Ch1..30,40,41,Un     33         531    7024757 18811362 37793329  207174646  32954439   75696247   1,087,496,503
 turkey.len

Stats(placed):

 .                    elem       min    q1     q2     q3     max        mean       n50        sum            
 ctg                  131,217    64     1651   3975   9289   91891      6866       12989      901,044,472      
 deg                  31,426     64     128    283    530    8055       357        540        11,241,382      
 ctg+deg              162,643    64     731    2602   7576   91891      5609       12829      912,285,854

More stats:

 total genome size with gaps     :      1087496503   1010435575
 total genome size without gaps  :      941191869    912285854

 where:
   all: Chr1..41,Un
   placed: Chr1..41
 N50 contig size(CA ctgs):              12435        
 N50 scaffold size(original CA scaff):  1538143      
 total bases mapped to chromosomes:     . 941191869 (Chr1..41)
 total unmapped                         : 28906015 (ChrU)      
 size and number of contigs in each chromosome:
 chr      #ctg/deg  len(noGaps)  len(withGaps)
 1        31920     186281234    207174646
 2        17221     108330071    119814280
 3        15247     92546836     102780271
 4        10336     69043870     75696247
 5        8892      57589156     63943857
 6        7680      49575076     55000907
 7        6634      36192137     39986770
 8        5331      34152018     37933571
 9        2583      18366421     20063553
 10       4455      29082703     31790800
 11       2962      22664575     24752353
 12       2854      19182682     21170715
 13       2810      18912345     21086818
 14       2657      19298732     21185158
 15       2671      17107111     18811362
 16       2421      14623454     16273683
 17       1858      12183352     13504974
 18       57        118600       132921
 19       1687      9654238      10789531
 20       2039      10407256     11885725
 21       1562      9611963      10683868
 22       5729      14123046     16000480
 23       1066      6510119      7383190
 24       710       3881523      4300864
 25       954       5025869      5613781
 26       1522      6146115      7024757
 27       195       777582       887413
 28       778       4125373      4632725
 29       688       3036456      3487800
 30       1067      3660581      4277653
 40       1         531          531
 41       16056     30074829     32364371
 Un       14048     28906015     77060928
 total    176691    941191869    1087496503
 Reads and bases
 ctg           H  120498260  8874803391
 ctg           F  18491893   5298292983
 ctg           C  17546      7111026
 ctg           0  14144      4334955
 
 deg           H  6775791    493312431
 deg           F  1614232    440598789
 deg           C  971        275722
 deg           0  1130       253396
 
 placed_ctg    H  118224274  8707749456
 placed_ctg    F  18072645   5176540153
 placed_ctg    C  17515      7100114
 placed_ctg    0  14058      4313107

 placed_deg    H  1401655    101867758
 placed_deg    F  212050     49638442
 placed_deg    C  675        195293
 placed_deg    0  653        150688

--- Files:

 /fs/szattic-asmg4/turkey/Assembly2.0/
 /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/

Chr_111909

--Dpuiu 22:28, 19 November 2009 (EST)

  • Aleksey try to fix the contig rearrangements & scaff overlaps
 .           elem       min    q1     q2     q3     max        mean       n50        sum            
 ctg         154342     64     1463   3214   8113   91891      6170       12340      952327586      
 deg         13627      64     181    453    685    8055       460        656        6270299        
 ctg+deg     167969     64     1218   2747   7462   91891      5706       12242      958597885      

Files:

 /nfshomes/alekseyz/Chr_111909/Chr.all.agp

Chr_112409

--Dpuiu 15:39, 24 November 2009 (EST)

  • Alignments to chicken (delta-filter -1): still many inversions
 1       137445
 -1      22445
  • Inverted scaffold examples:
cd ~dpuiu/turkey/
join2.pl Alignment2.0/chicken-turkey.scf/Chr.scf.dir Assembly2.0/Chr_112409/Chr.scf.dir | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1]   ne $F[3]);' | sort -nk3 -r | pretty | grep -v -f turkey.scf.split.112409 

#scfid      alignDir alignCount AgpDir AgpCount  markerDir markerCount  
7180002103721    -1  602             1   395     -1        116  # flipped errorneously (395 ctg,3.3Mbp scaffold)
7180002103327    -1  299             1   221     1         68
7180002103550    -1  298             1   241     1         65
7180002103191    -1  280             1   258     1         69
7180002103618    -1  267             1   426     -1,1      144  # half rev, half fwd
7180002103677    -1  246             1   186     -1,-1     53   # aligns in 2 separate regions of Chr1
...
7180002103609    1   228             -1  166     -1
7180002103567    -1  224             1   241     1
7180002103561    1   223             -1  597     -1
7180002103695    1   210             -1  286     -1
7180002103421    -1  201             1   203     1
7180002103478    -1  181             1   217     -1             # flipped errorneously (217 ctg,1.4Mbp scaffold)
7180002103668    -1  176             1   171     -1             # flipped errorneously (171 ctg,1.7Mbp scaffold) 
7180002103762    -1  161             1   257     ?
7180002103538    -1  154             1   134     1
7180002102914    1   147             -1  74      -1
7180002103634    -1  142             1   95      -1             # flipped errorneously (95 ctg,6.9Mbp scaffold) 
7180002103116    1   141             -1  97      -1
7180002103453    -1  129             1   49      ?
7180002102994    1   128             -1  109      -1
  • About 70 scaffolds (40Mbp) seem "clearly" inverted
join2.pl ~dpuiu/turkey/Assembly2.0/Chr_112409/Chr.scf.dir BACs/Chr.scf.dir | grep -v -f turkey.scf.split.112409 | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1] and $F[1] ne $F[3]);' | join2.pl -f \
       ~dpuiu/turkey/Assembly2.0/turkey.posmap.scflen | sort -nk6 -r | pretty | getSummary.pl -i 5
elem       min    q1     q2     q3     max        mean       n50        sum
70         89456  305981 483779 692778 3317675    576698     692778     40368888
  • Scaffolds don't seem to be interleaved any more
  • Stats
 .             elem    min     q1       q2        q3        max        mean      n50       sum         
 ctg.all       152641  64      1356     3154      8130      91891      6131      12520     935915009
 ctg.placed    144893  64      1388     3361      8485      91891      6330      12751     917287101
 
 chr.all       33      242906  6750934  18242820  38656374  204065997  32193660  74864811  1062390784   
 chr.placed    32      242906  6750934  18242820  38723638  204065997  32509112  74864811  1040291584  
 chr      #ctg/deg  len(noGaps)  len(withGaps)
 1        26557     181826552    204065997
 2        14384     106718223    116966045
 3        12649     91132767     100405573
 4        9170      68844569     74864811
 5        7553      56965239     62524249
 6        6534      48705183     53257597
 7        4755      35338084     38723638
 8        4751      35279744     38656374
 9        2286      18014631     19388932
 10       3733      28668829     31125850
 11       2720      22659912     24221968
 12       2372      18944919     20663392
 13       2354      18696996     20109273
 14       2367      19181786     20812949
 15       2265      16791072     18242820
 16       1967      14411805     15988588
 17       1635      12015459     13277650
 18       51        139801       244178
 19       1399      9478246      10526513
 20       1424      9943105      11078077
 21       1328      9405728      10459872
 22       1865      13252797     14786889
 23       937       6420024      7113901
 24       569       3613335      4158826
 25       834       4963017      5560155
 26       1040      5925429      6750934
 27       161       687724       943818
 28       717       4244239      4894166
 29       803       3649262      4826720
 30       693       3524564      4396719
 W        50        108225       242906
 Z        24970     47735835     81012204
 Un       7748      18627908     22099200
 total    152641    935915009    1062390784

Files:

 /nfshomes/alekseyz/Chr_111909/Chr.all.agp
 /fs/szasmg3/dpuiu/turkey/Assembly2.0/Chr_112409/

Table 13

  • From the article
  • 34 predicted rearrangements between the turkey and chicken genomes ; 6 look wrong, 6 questionable, 22 probably right
 GGA   GGA start       GGA end         MGA*    Nature of the rearrangement                                                          Notes
 1     9,713,416       10,050,000      MGA1    segment relocated to chr1:74570000                                                   translocated segment is internal to direct repeat of SEMA3 genes
 1     75,800,000      76,000,000      MGA1    small inversion                                                                      possible unequal recombination within KCN gene cluster
 1     104,450,000     104,459,439     MGA1    possible very small intrachromosomal translocation                                   the genetic map places this short segment near 1q telomere
 #1    125,900,000     126,300,000     MGA1    small interchromosomal translocation                                                 insertion of GGA4:25,500,000-25,550,000 at repetitive locus (see also below)
 #1    156,600,000     156,600,001     MGA1    small interchromosomal translocation                                                 may be misplacement of Ctg13.1004 in GGA seq or LINE-based translocation of a small segment from GGA4:73,089,000-73,090,000  1     172,822,000     172,900,000   MGA1    possible small inversion        may be mis-assembly of GGA ctg3.1161
 2     54,870,224      56,560,442      MGA3    inversion with 56.560 Mb coordinate being telomeric in MGA3                          (together one inversion and two translocations or assembly errors)
 2     54,398,341      54,413,232      MGA3    small translocation or mis-assembly of GGA seq., inverted rel. to GGA seq. coord.    (together one inversion and two translocations or assembly errors)
 2     54,641,337      54,845,268      MGA3    probably inverted relative to GGA sequence coordinates                               (together one inversion and two translocations or assembly errors)
 #2    54,290,000      54,330,000      MGA3    small translocation or mis-assembly of GGA seq., orientation uncertain               (together one inversion and two translocations or assembly errors)
 #2    54,452,395      54,545,188      MGA3    probably inverted relative to GGA sequence coordinates                               (together one inversion and two translocations or assembly errors)
 2     53,804,240      54,263,147      MGA3    inverted relative to GGA seq. coordinates with 53.8 Mb joined to 56.6 Mb in MGA      (together one inversion and two translocations or assembly errors)
 3     6,218           2,344,838       MGA2    inversion, telomeric                                                                 FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------]
 3     5,605,686       11,605,484      MGA2    inversion, (agrees with genetic map)                                                 FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------]
 #4    25,500,000      25,550,000      MGA4    small interchromosomal translocation to about 125.90 Mb orthologous coord. on MGA1   see also chr1:125900000
 ?4    35,150,000      35,160,000      MGA4    likely small duplication                                                             part of this segment duplicated at around 35,828,000 may be misplacement of Ctg13.1004 in seq or
 #4    73,080,000      73,090,000      MGA4    small interchromosomal translocation to about 156.60 Mb orthologous coord. on MGA1   LINE-based translocation of a small segment, see also GGA1:156,600,000
 5     1               270,229         MGA5    local small inversion with respect to p arm which as a whole is inverted             local inversion with respect to p arm which as a whole is inverted
 5     1               7,248,180       MGA5    inversion of p arm      p arm likely inverted based on genetic map of Nte0897, MNT-193
 6     1,576,787       13,080,207      MGA8    multiple inversions:  predicted order is ...                                         can be explained by a series of 4-5 consecutive inversions, including possible unequal recombination between SLC16A9 or, less likely, protocadherin genes
 7     1               7,248,180       MGA7    inversion of p arm
 ?8ran 64,951          407,592         MGA10   GGA8_random sequences likely telomeric on MGA10                                      (and probably GGA8)      
 8     44,817          10,199,568      MGA10   inversion of p arm                                                                   possible unequal recombination between AMY genes
 8     8,992,540       9,170,000       MGA10   local small inversion with respect to p arm which as a whole is inverted             probable inversion but might be mis-orientation of GGA sequence contigs
 9     1,528,027       4,372,460       MGA11   inversion                                                                            telomeric inversion
 10    1,907,125       3,642,461       MGA12   no internal centromere observed in turkey                                            centromere misplaced in chicken or moved to telomere in turkey
 11    75,337          3,280,000       MGA13   no internal centromere observed in turkey                                            inversion of GGA 11p, FISH CONFIRMED
 12    95,816          940,546         MGA14   may be inverted, orientation uncertain                                               may be fused to a repeat of 2.15-2.3 Mb region of GGA12
 ?12   1,050,000       1,100,000       MGA14   possible small intrachromosomal translocation to telomere                            small segment may be now at MGA telomere
 ?12   1,128,610       1,134,284       MGA14   possible very small intrachromosomal translocation                                   small segment now between about 2,632,117-2,703,753 in GGA coordinates on q arm
 12    1,164,577       1,399,694       MGA14   inversion  (1164577 joined to 1599552)                                               centromere either misplaced in GGA or moved telomeric or between 940,546 and 1,399,694
 13    8,233,861       8,511,782       MGA15   small inversion
 14    14,370,000      15,070,000      MGA16   inversion                                                                            FISH CONFIRMED
 18    5,062,096       9,882,412       MGA20   inversion                                                                            unequal recombination between NME paralogs, FISH confirmed
 ?28   1,550,000       1,620,000       MGA30   apparent duplication with extra copy at about 1.05 Mb in MGA                         unclear if these are rearrangements or assembly errors

Scaffold alignment to chicken

  • Parameters:
 nucmer -l 12 -c 65 -g 1000 -b 1000
 delta-filter -1
  • Scf stats
                      elem        min    q1     q2     q3     max        mean       n50        sum      
 aligned              22,045      66     1450   2276   5670   9558742    45827      1562815    1,010,256,240  
 unaligned            4,962       73     1159   1411   1935   119729     2446       2654       12,138,524

 1+alignments/scf     22045       1      1      1      2      1660       6          136        153866

 2+alignments/2+chr   50         11625  55577  1398890 3387095 7409211   1883381    4298282    94,169,060
  • Ctg stats (ctgs in aligned scaff)
                      elem        min    q1     q2     q3     max        mean       n50        sum      
 aligned              139790      64     1580   3665   8822   91891      6585       12739      920,634,899
 unaligned            5873        64     1148   1399   1887   22071      1756       1766       10,318,453
  • Alignment stats
 .                    elem        min    q1     q2     q3     max        mean       n50        sum
 len(all)             202105      11     681    1895   5189   134408     4315       10045      872,231,977 
 len(filter-1)        163390      12     1191   2673   6437   134409     5188       10410      847,715,057
 %id(filter-1)        163390      11.24  81.10  84.82  87.68  100.00     83         85         .
  • turkey scf vs chicken & turkey chr : 15% of the scaffold sequence seem to align in opposite orientation !!! Could the scaffold be misoriented by mistake?
 .                    elem       min    q1     q2     q3     max        mean       n50        sum
 opposite             1527       925    2604   7579   32323  6964320    78342      1018939    119629225
 same                 2619       97     2591   11510  128530 9558742    306323     1873938    802261737

Mapping

  • (200+ alignments)
chickenChr    turkeyChr  #alignments
1             Chr1       35025

2             Chr3       18143 :  Chr6 followed by Chr3
2             Chr6       7612

3             Chr2       17765  : Chr2 5' flipped

4             Chr4       11226  : Chr6 followed by Chr4
4             Chr9       2132

5             Chr5       8516

6             Chr8       4552

7             Chr7       4394

8             Chr10      3654 : Chr10 5' flipped

9             Chr11      2729

10            Chr12      2500

11            Chr13      2629

12            Chr14      2158 

13            Chr15      2136

14            Chr16      2109

15            Chr17      1524

17            Chr19      1285

18            Chr20      1374 : Chr20 3' flipped

19            Chr21      1155

20            Chr22      1828

21            Chr23      887

22            Chr24      511

23            Chr25      751

24            Chr26      862

25            Chr27      3

26            Chr28      592

27            Chr29      568

28            Chr30      553

Z             Chr41      4178
Z             Chr1       404
 
W             Chr41      24
W             Chr40      ?

E22C19W28_E50C23  ChrUn  7l

E64               ChrUn  20
  • Scaffolds with multiple alignment blocks:
    • 44 on different Chr
    • 30 on same chr; 11 appear to be partially flipped
    nl scfid         chickenChr
    1  7180002103050 2
    2  7180002103154 6
    3  7180002103203 3 25 # new
    4  7180002103204 10 28
    5  7180002103206 18
    6  7180002103213 4 26 # new
    7  7180002103242 5    # partially flipped 
    8  7180002103280 1 8
    9  7180002103298 7
   10  7180002103329 6    # partially flipped 
   11  7180002103402 8    # partially flipped 
   12  7180002103421 9
   13  7180002103425 2 7  # new
   14  7180002103431 6    # partially flipped 
   15  7180002103433 1    # partially flipped 
   16  7180002103480 5 6
   17  7180002103500 12 13 # new
   18  7180002103519 3 9
   19  7180002103555 6 18  # new
   20  7180002103557 8
   21  7180002103561 3
   22  7180002103574 1
   23  7180002103597 2 17  # new
   24  7180002103605 2 3   # new
   25  7180002103608 8
   26  7180002103614 2
   27  7180002103617 2     # partially flipped 
   28  7180002103618 1     # partially flipped 
   29  7180002103619 11    # partially flipped 
   30  7180002103620 4
   31  7180002103621 1 2 28
   32  7180002103627 1
   33  7180002103637 6 7   # new
   34  7180002103638 2 18  # new
   35  7180002103642 4
   36  7180002103648 1 3
   37  7180002103653 1 5   # new
   38  7180002103663 6
   39  7180002103668 1
   40  7180002103669 8 9   # new
   41  7180002103670 1 4   # new
   42  7180002103672 2 3
   43  7180002103675 1     # partially flipped 
   44  7180002103677 1
   45  7180002103679 2     # partially flipped 
   46  7180002103681 1 5
   47  7180002103682 1 21
   48  7180002103683 4 17
   49  7180002103684 13    # partially flipped 
   50  7180002103685 1 2
   51  7180002103686 1 3
   52  7180002103688 3 8
   53  7180002103693 12 15
   54  7180002103694 1 2   # new
   55  7180002103695 3
   56  7180002103698 2 12
   57  7180002103702 6 11
   58  7180002103714 4 5
   59  7180002103715 1 2 4
   60  7180002103717 2 10
   61  7180002103720 1 6 7 # new
   62  7180002103723 4 6
   63  7180002103725 1 14
   64  7180002103728 1 9   # new
   65  7180002103736 1 5   # new
   66  7180002103740 7
   67  7180002103742 1 3   # new
   68  7180002103743 6 8 17 
   69  7180002103744 1 17  # new
   70  7180002103750 2 3   # new
   71  7180002103752 9 18
   72  7180002103762 2
   73  7180002103771 1 3 19
   74  7180002103798 7 26  # new

Scaffold alignment to zebrafinch

  • Parameters:
 nucmer -l 12 -c 65 -g 1000 -b 1000
 delta-filter -1
  • Alignment stats (44 scf : subset 10)
 .                    elem        min    q1     q2     q3     max        mean       n50        sum
 len(subset 10)*      5286        12     233    485    860    12853      675        1033       3570025
 %id(subset 10)       5286        40.99  74.20  78.57  85.63  100.00     80         79         .

Chromosome alignment to chicken

  • Parameters:
 nucmer -l 12 -c 65 -g 1000 -b 1000
 delta-filter -1 # not yet
  • Alignment stats
 .                    elem        min    q1     q2     q3     max        mean       n50        sum
 len(all)             185138      11     600    2011   5567   134408     4407       10093      815928282
 len(delta-filter -r) 155094      11     1065   2783   6592   134408     5165       10302      801185719
 len(delta-filter -1) 148515      11     1144   2953   6836   134408     5341       10421      793361287

BACs.old

  • Markers:
 37918 : total CH260's
 8558  : assembled in scaffolds
 8641  : total 78TKNMI
  • Scf stats:
                    elem       min    q1      q2      q3      max        mean       n50        sum
 1+markers          1228       1001   24541   247381  879303  9558742    696129     1984837    854,846,919
 0markers           25779      66     1338    1911    4245    1214147    6499       26354      167,547,845

 1+markers/scf      1228       1      1       2       7       110        6          19         8,262

 2+markers/2+chr    38         671404 1525677 2968427 4298282 7409211    3084380    4013969    117,206,475

BACs

  • Scf len stats:
                    elem       min    q1      q2      q3      max        mean       n50        sum 
 1+markers          2478       1001   6013    36597   278486  9558742    365837     1830406    906,544,909
 0 markers          24529      66     1323    1848    3839    325966     4722       11201      115,849,855
  
 2+markers/2+chr    60         283784 1158965 2021582 3549120 7409211    2457241    3411361    147,434,495
 3+markers/2+chr    38         426424 1609106 2833228 4013969 7409211    3061980    3819803    116,355,251


  • Ctg len stats:
                    elem       min    q1      q2      q3      max        mean       n50        sum 
 1+markers          23077      76     6408    11837   19433   91891      14425      19768      332,889,618

Scf splits (Daniela)

1. Input format

 cat BACs/BAC_map_final.txt | grep 7180002103762 | pretty

 CH260094G18_SP6      3_1.3  3  2205409   150000  7180002076309  7180002103762  3285   322415
 78TKNMI001N01_SP6    3_1.3  3  2287385   150000  7180002058027  7180002103762  2224   329223
 ...
 CH260099O02_SP6      3_1.5  3  3910434   150000  7180002058147  7180002103762  4524   1655352
 CH260096N05_T7       6_3    6  26824213  150000  7180002058054  7180002103762  12808  693787
 ..
 CH260026H13_SP6      6_3    6  29907224  266336  7180002057998  7180002103762  634    33979

2. find scaffolds with markers from multiple chromosomes

 cat BACs/BAC_map_final.txt | awk '{print $7,$3}' | count.pl -m 2 | awk '{print $1,$2}' | paste.pl 
 ...
 7180002103762 3 6
 ...

Scf splits (Aleksey)

    1  7180002103685 6 156 161 jumps from chr6 to chr1 4049114-4201400
    2  7180002103648 1 45 79 1187881-1198679
    3  7180002103620 241786-307810                                                         # aligns to one chicken chr
    4  7180002103280 56334-114382
    5  7180002103762 386780-485750                                                         # aligns to one chicken chr
    6  7180002103638 111865-184832
    7  7180002103743 707755-712324
    8  7180002103743 1618441-1646472
    9  7180002103743 1895159-1956617
   10  7180002103683 3122611-3324351
   11  7180002103642 536597-587034                                                         # aligns to one chicken chr
   12  7180002103204 94910-122663
   13  7180002103681 5 33 57 jumps from chr5 to chr1 943178-1075454 map looks ok
   14  7180002103715 9 243 270 jumps from chr3 to chr9 547913-610659 map looks ok
   15  7180002103725 1 129 187 jumps from chr16 to chr1 1904425-2067581, map looks ok
   16  7180002103728 11 83 131 jumps from chr11 to chr1 2456073-2532176, map looks ok
   17  7180002103698 3 240 266 jumps from chr14 to chr3 588551-618407, map looks ok
   18  7180002103686 1 34 41 jumps from chr2 to chr1 292876-340742, map look ok
   19  7180002103621 3 40 57 jumps from chr1 to chr3 707868-766695, map looks ok
   20  7180002103720 7 63 130 jumps from chr7 to chr13 1890283-1900965, map looks ok
   21  7180002103682 23 68 75 jumps from chr1 to chr23 270646-281964, map looks ok
   22  7180002103605 2 43 60 jumps from chr2 to chr3 1059724-1121629, map looks ok
   23  7180002103688 10 131 162 jumps from chr10 to chr2 3129178-3331813, map looks ok
   24  7180002103672 6 31 55 jumps from chr2 to chr6, 800904-850720, map looks ok
   25  7180002103771 2 13 26 jumps from chr21 to chr2 516684-703439, map looks ok
   26  7180002103519 11 52 62 jumps from chr11 to chr2 1685597-1695161, map looks ok
   27  7180002103597 3 120 150 jumps from chr3 to chr19 2839516-3067987, map looks ok
   28  7180002103717 3 61 96 jumps from chr3 to chr12, 2101452-2251116, map looks ok
   29  7180002103743 10 101 257 jumps from chr8 to chr10, 3601251-3670472 map look ok,
   30  7180002103743 jump from chr10 to chr19, 6212398-6251410 map looks ok
   31  7180002103714 4 95 146 jumps from chr5 to chr4 1553913-1593600, map looks ok
   32  7180002103723 4 133 179 jumps from chr9 to chr4 1656209-1721059, map looks ok
   33  7180002103752 20 100 166 jumps from chr11 to chr20, 1951227-2017628,map looks ok
   34  7180002103480 5 79 119 jumps from chr8 to chr5, 1086539-1133932, map looks ok
   35  7180002103702 13 124 145 jumps from chr8 to chr13 935622-1070705, map looks ok
   36  7180002103693 14 73 84 jumps from chr17 to chr14, 477273-532094, map looks ok
   37  7180002103614                                                                          # aligns to one chicken chr
   38  7180002103677                                                                          # aligns to one chicken chr

Split ids: cat Chr_preliminary.agp | grep W | grep -v ChrUn | awk '{print $11}' | grep ^7181 | sort -u | nl

    1  7181002103204
    2  7181002103280
    3  7181002103480
    4  7181002103519
    5  7181002103620
    6  7181002103621
    7  7181002103648
    8  7181002103672
    9  7181002103681
   10  7181002103682
   11  7181002103683
   12  7181002103685
   13  7181002103686
   14  7181002103688
   15  7181002103693
   16  7181002103698
   17  7181002103702
   18  7181002103714
   19  7181002103715
   20  7181002103717
   21  7181002103723
   22  7181002103725
   23  7181002103743
   24  7181002103752
   25  7181002103771

Zebrafinch chr sample vs Chicken chr

  • Sample 1Kbp every 1M in Zebrafinsh chr
 ChickenChr          ZebraChr  count(>2)       
 1                   chr1      406
 1*                  chr1A     287
 1*                  chr1B     124  # not sampled

 2                   chr2      589

 3                   chr3      436

 4                   chr4      217
 4*                  chr4A     77

 5                   chr5      244

 6                   chr6      132

 7                   chr7      155

 8                   chr8      116

 9                   chr9      103

 10                  chr10     108

 11                  chr11     105

 12                  chr12     88

 13                  chr13     75

 14                  chr14     65

 15                  chr15     56

 16                  nothing

 17                  chr17     49

 18                  chr18     45

 19                  chr19     53

 20                  chr20     63

 21                  chr21     26

 22                  chr22     11

 23                  chr23     20

 24                  chr24     32

 26                  chr26     14

 27                  chr27     13

 28                  chr28     14

 Z                   chrZ      165

 W                   chrZ      30    # not sampled

 E64                 nothing

 E22C19W28_E50C23*   chrLGE22  3

Synteny

MSU:

 "We do see a couple of very small translocations between chromosomes 1 and 4,but these are so small that they could be errors in the chicken assembly or, more likely, paralogous sequences that perhaps were two copies in the last common ancestor and chicken kept one and turkey the other. We don't see translocations between chromosomes Z and 1, so I expect that these alignments are due to a repetitive element (CR1 being the most likely), but the Z assembly is tentative even in chicken, so it's hard to be sure."

From the spreadsheet:

 chickenChr     turkeyChr
 4        	 chr1	12.2	1-12.2  25,500,000	25,550,000
 4              chr1	18.2	1-18.2	73,080,000	73,090,000

From the *merge2.anc

 4                   Chr1   94230402   207174646  73196453   73204143   177454210  177447336  3184     2528      5            -1       250.65
 4                   Chr1   94230402   207174646  86530225   86583469   117075107  116976224  11548    58133     9            -1       203.6


Syntenic regions:

           chickenRegions  turkeyRegions  chickenChr  turkeyChr
 all       209166          311363                                    # nucmer -l 12 -c 65 -g 1000 -b 1000                       
 filter-1  183058          259760         142         186            # delta-filter -1
 filter    170658          239592         125         129            # filter-anc.pl -maxDist 200000 -W 20 -p 0.1 
 merge0    3260            2250           125         130            # merge-anc.pl  -maxDist 200000 
 merge1    1573            1368           110         93             # merge-anc.pl  -maxDist 200000  -minCount 8  -minLen 10000
 merge2    376             488            49          47             # merge-anc.pl  -maxDist 1000000 -minCount 20 -minLen 100000

Problems

ctg7180001625741

  • 1 ctg scaff: 7180002083787(1.4Kbp)
  • Single links to 2 diff scaff: 7180002103637 & 7180002103666
  • Synteny info (Daniela)
 cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.ctg/turkey.ctg.posmap.merge | grep -C 20 7180001625741
 #                chickenChr                                  turkeyChr
 7180002057801    6                 36246991   36257816   -1  Chr8   35888371   35899195   r  U  100
 7180001625741    6                 36269529   36271001   -1  .      .          .          .  .  .
 ...
 7180002074579    6                 36382217   36386350   -1  Chr8   35899296   35903428   r  N  20910
  • Synteny info (Aleksey)
 cat /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/place_by_sinteny/contigs.chicken.order.with_AGP.valid.txt | grep -C 1 7180001625741 | pretty
 #                                                chickenChr  turkeyChr
 1       1790  36269140  36267359  7180001578245  chr6        Chr7   20109532  20111855  2324  -  7180001578245
 307     1472  36270694  36269529  7180001625741  chr6        ChrUn  32131240  32132711  1472  0  7180001625741*
 2343    5341  36282706  36279707  7180001914610  chr6        Chr7   36045401  36052860  7460  -  7180001914610
 cat turkey.posmap.ctgscf | grep 7180002103637 | egrep -n '7180001578245|7180001914610'
 ...
 302:7180001914610       7180002103637   2403512 2410972 f
 391:7180001578245       7180002103637   3013067 3015391 f
 ...
 463: 

Scf 7180002103637 aligns both to Chr6 & Chr7

 cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.scf/turkey.scf-chicken.filter-1.merge0.anc | grep 7180002103637
 7180002103637    7                 3817505  38384769   2          2285066  23413757   21084724   651501   705639   284          -1       23.41
 7180002103637    6                 3817505  37400442   2285109    2410972  36410067   36277407   37596    43600    21           -1       38.69
 7180002103637    7                 3817505  38384769   2410993    3817505  21084676   19700317   384609   354790   151          -1       23.49
 grep 7180002103637 /fs/szasmg3/dpuiu/turkey/BACs/BAC_map_final.txt | pretty | sort -nk9 | nl
    1  CH260098J15_SP6      7_10  7  21623779   150000  7180001914412  7180002103637  2833   2833
    2  78TKNMI023L02_SP6    7_10  7  21655259   150000  7180001914413  7180002103637  462    8036
    3  78TKNMI020I14_T7     7_10  7  21786579   150000  7180001914413  7180002103637  6568   14142
    ...
   91  78TKNMI028M05_T7     8_13  8  34451891   150000  7180001914600  7180002103637  4694   2314126
   92  CH260110M21_T7       8_13  8  34375922   150000  7180001914602  7180002103637  4382   2344157
   93  CH260102C12_T7       8_13  8  34561953   150000  7180001914608  7180002103637  7429   2400413
    ...
  155  CH260102B06_SP6      7_10  7  18173403   150000  7180001914714  7180002103637  578    3753466
  156  CH260091G02_T7       7_10  7  18147518   150000  7180001914716  7180002103637  14232  3777334
  157  78TKNMI020K20_T7     7_10  7  17944915   150000  7180001914719  7180002103637  5969   3809779
  • Solution:
 Chr7.agp:10573:     Chr7      36935151        36938433        10573   W       7180001538614   1       3283    +       #       chr7/v3.6/scaffolds/scaffold_0.3
 Chr7.agp.bak:10609: Chr7  37075393        37078675        10609   W       7180001538614   1       3283    +       #       chr7/v3.6/scaffolds/scaffold_0.3
 Chr8.agp:10281:     Chr8      36818118        36822250        10281   W       7180002074579   1       4133    -       #       chr8/v3.6/scaffolds/scaffold_0.8
 Chr8.agp.bak:10245: Chr8  36677876        36682008        10245   W       7180002074579   1       4133    -       #       chr8/v3.6/scaffolds/scaffold_0.8

9 more problems

Turkey marker counts:

  scfId      turkeyChr #markers
  7180002103213    28  3 # 100K  on Chr28
  7180002103213    9   9

  7180002103555    20  7  # found before ; 100K  on Chr8
  7180002103555    8   3

  7180002103653    1   2 # 100K on Chr1
  7180002103653    5   71

  7180002103669    10  161 # 130K in the middle on Chr11
  7180002103669    11  3

  7180002103694    1   53 # 60K in the middle on Chr3
  7180002103694    3   7

  7180002103720    1   59 # found before ;  160K in the middle of Chr8  # "very messy"
  7180002103720    7   63
  7180002103720    8   8

  7180002103742    1   2 # 40K on Chr1
  7180002103742    2   23

  7180002103744    1   2 # 60K on Chr1
  7180002103744    19  3

  7180002103750    2   115 # 50K in the middle on Chr3
  7180002103750    3   2

Alignment to chicken chromosomes:

  scfId            chickenChr        scfLen   chrLen     scfStart     scfEnd   chrStart   chrEnd     scfSnp   chrSnp   #alignm.       chrDir   scfIntercept
  7180002103213    4                 426424   94230402   6            299125   492126     808691     84749    99619    33             1        -0.49
  7180002103213    26                426424   5102438    299637       426422   1866683    1733616    25738    31146    16             -1       2.16

  7180002103555    18                462038   10925261   1            370822   8723614    8393969    118299   280534   19             -1       8.72
  7180002103555    6                 462038   37400442   370843       462032   20662598   20558365   31608    42763    22             -1       21.03

  7180002103653    1                 2021582  200994015  288          61479    168123022  168180581  29306    25853    22             1        -168.12
  7180002103653    5                 2021582  62238931   65643        2021301  48278635   50168489   555769   666443   202            1        -48.21

  7180002103669    8                 3819803  30671729   1            1573516  22762147   21163160   362616   390538   146            -1       22.76
  7180002103669    9                 3819803  25554352   1582769      1673152  20499440   20382802   40878    42100    14             -1       22.08
  7180002103669    8                 3819803  30671729   1674402      3819472  21100173   18954403   504851   532551   145            -1       22.77

  7180002103694    1                 1815438  200994015  23814        1010176  175716942  174763467  370937   334310   93             -1       175.74
  7180002103694    2                 1815438  154873767  1083861      1163613  145943508  145868623  25220    19821    12             -1       147.02
  7180002103694    1                 1815438  200994015  1164196      1783528  174719693  174136372  206925   166944   91             -1       175.88

  7180002103720    4                 3387095  94230402   3120         22799    74444867   74427082   10167    7883     9              -1       74.44
  7180002103720    7                 3387095  38384769   23768        689878   25625928   24976086   154053   144800   49             -1       25.64
  7180002103720    6                 3387095  37400442   707721       849237   9020691    9164801    21129    30710    10             1        -8.31
  7180002103720    7                 3387095  38384769   849503       1867680  24927870   23935979   247702   228094   87             -1       25.77
  7180002103720    1                 3387095  200994015  1896368      3387092  142436224  143961082  406262   431752   212            1        -140.53

  7180002103742    3                 1122157  113657789  33           1003474  77462460   78481139   264247   275185   122            1        -77.46
  7180002103742    1                 1122157  200994015  1051349      1117652  5090648    5024622    27877    16946    13             -1       6.14

  7180002103744    17                283784   11182526   4            124728   2142615    2019855    26734    33044    12             -1       2.14
  7180002103744    1                 283784   200994015  213512       283782   119477488  119414846  20891    50004    14             -1       119.69

  7180002103750    3                 2462253  113657789  236          601601   48166755   48796598   146263   177067   84             1        -48.16
  7180002103750    2                 2462253  154873767  632754       702802   74042519   73959216   28350    39498    20             -1       74.67
  7180002103750    3                 2462253  113657789  702856       2462253  48823530   50636809   416701   636563   219            1        -48.12

Annotation

 15,093 - protein coding gene loci
   611  - noncoding RNA genes
 15,704 - total number, protein and RNA gene loci.

Submission

 /fs/ftp-cbcb/pub/data/turkey/                               # assemblies, FASTA, AGP ...
 /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/Alignments/  # alignments to chicken