Turkey: Difference between revisions
Jump to navigation
Jump to search
(35 intermediate revisions by the same user not shown) | |||
Line 45: | Line 45: | ||
= Assembly2.0 = | = Assembly2.0 = | ||
== Original (CA) == | |||
Reads: | Reads: | ||
Line 67: | Line 56: | ||
Cvg=15X | Cvg=15X | ||
Stats: | |||
. elem min q1 q2 q3 max mean n50 sum | . elem min q1 q2 q3 max mean n50 sum | ||
scf 27,007 66 1354 1988 4793 9558742 37856 1538143 1,022,394,764 | scf 27,007 66 1354 1988 4793 9558742 37856 1538143 1,022,394,764 | ||
Line 73: | Line 62: | ||
deg 440,796 64 102 256 485 8055 312 483 137,835,235 | deg 440,796 64 102 256 485 8055 312 483 137,835,235 | ||
== Preliminary == | |||
Stats: | |||
. elem min q1 q2 q3 max mean n50 sum | |||
Ch1..30,40,41 32 531 6400446 15119779 34928883 184590300 28263595 70426150 904,435,047 | |||
gaps 147792 100 100 100 100 2999 268 860 39,738,918 | |||
Stats(placed): | |||
. elem min q1 q2 q3 max mean n50 sum | . elem min q1 q2 q3 max mean n50 sum | ||
scf 2,504 1001 5868 35589 272564 9558742 362085 1830406 906,662,877 | scf 2,504 1001 5868 35589 272564 9558742 362085 1830406 906,662,877 | ||
Line 80: | Line 76: | ||
ctg+deg 147,824 64 520 2783 8197 91891 5849 13426 864,696,129 | ctg+deg 147,824 64 520 2783 8197 91891 5849 13426 864,696,129 | ||
== Final == | |||
* More ctgs placed based on synteny. | |||
* Alignments to chicken (delta-filter -1): | |||
1 150471 | |||
-1 32587 | |||
* Many scaffolds seem to be interleaved | |||
Stats: | |||
elem min q1 q2 q3 max mean n50 sum | |||
Ch1..30,40,41 32 531 7024757 18811362 37793329 207174646 31576111 75696247 1,010,435,575 | |||
Ch1..30,40,41,Un 33 531 7024757 18811362 37793329 207174646 32954439 75696247 1,087,496,503 | |||
[[Media:turkey.len|turkey.len]] | |||
Stats(placed): | |||
. elem min q1 q2 q3 max mean n50 sum | . elem min q1 q2 q3 max mean n50 sum | ||
ctg 131,217 64 1651 3975 9289 91891 6866 12989 901,044,472 | ctg 131,217 64 1651 3975 9289 91891 6866 12989 901,044,472 | ||
Line 86: | Line 96: | ||
ctg+deg 162,643 64 731 2602 7576 91891 5609 12829 912,285,854 | ctg+deg 162,643 64 731 2602 7576 91891 5609 12829 912,285,854 | ||
More stats: | |||
total genome size with gaps : 1087496503 1010435575 | |||
total genome size without gaps : 941191869 912285854 | |||
where: | |||
all: Chr1..41,Un | |||
placed: Chr1..41 | |||
N50 contig size(CA ctgs): 12435 | |||
N50 scaffold size(original CA scaff): 1538143 | |||
total bases mapped to chromosomes: . 941191869 (Chr1..41) | |||
total unmapped : 28906015 (ChrU) | |||
size and number of contigs in each chromosome: | |||
chr #ctg/deg len(noGaps) len(withGaps) | |||
1 31920 186281234 207174646 | |||
2 17221 108330071 119814280 | |||
3 15247 92546836 102780271 | |||
4 10336 69043870 75696247 | |||
5 8892 57589156 63943857 | |||
6 7680 49575076 55000907 | |||
7 6634 36192137 39986770 | |||
8 5331 34152018 37933571 | |||
9 2583 18366421 20063553 | |||
10 4455 29082703 31790800 | |||
11 2962 22664575 24752353 | |||
12 2854 19182682 21170715 | |||
13 2810 18912345 21086818 | |||
14 2657 19298732 21185158 | |||
15 2671 17107111 18811362 | |||
16 2421 14623454 16273683 | |||
17 1858 12183352 13504974 | |||
18 57 118600 132921 | |||
19 1687 9654238 10789531 | |||
20 2039 10407256 11885725 | |||
21 1562 9611963 10683868 | |||
22 5729 14123046 16000480 | |||
23 1066 6510119 7383190 | |||
24 710 3881523 4300864 | |||
25 954 5025869 5613781 | |||
26 1522 6146115 7024757 | |||
27 195 777582 887413 | |||
28 778 4125373 4632725 | |||
29 688 3036456 3487800 | |||
30 1067 3660581 4277653 | |||
40 1 531 531 | |||
41 16056 30074829 32364371 | |||
Un 14048 28906015 77060928 | |||
total 176691 941191869 1087496503 | |||
Reads and bases | |||
ctg H 120498260 8874803391 | |||
ctg F 18491893 5298292983 | |||
ctg C 17546 7111026 | |||
ctg 0 14144 4334955 | |||
deg H 6775791 493312431 | |||
deg F 1614232 440598789 | |||
deg C 971 275722 | |||
deg 0 1130 253396 | |||
placed_ctg H 118224274 8707749456 | |||
placed_ctg F 18072645 5176540153 | |||
placed_ctg C 17515 7100114 | |||
placed_ctg 0 14058 4313107 | |||
placed_deg H 1401655 101867758 | |||
placed_deg F 212050 49638442 | |||
placed_deg C 675 195293 | |||
placed_deg 0 653 150688 | |||
--- | |||
Files: | Files: | ||
/fs/szattic-asmg4/turkey/Assembly2.0/ | /fs/szattic-asmg4/turkey/Assembly2.0/ | ||
/fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/ | |||
== Chr_111909 == | |||
--[[User:Dpuiu|Dpuiu]] 22:28, 19 November 2009 (EST) | |||
* Aleksey try to fix the contig rearrangements & scaff overlaps | |||
. elem min q1 q2 q3 max mean n50 sum | |||
ctg 154342 64 1463 3214 8113 91891 6170 12340 952327586 | |||
deg 13627 64 181 453 685 8055 460 656 6270299 | |||
ctg+deg 167969 64 1218 2747 7462 91891 5706 12242 958597885 | |||
Files: | |||
/nfshomes/alekseyz/Chr_111909/Chr.all.agp | |||
== Chr_112409 == | |||
--[[User:Dpuiu|Dpuiu]] 15:39, 24 November 2009 (EST) | |||
* Alignments to chicken (delta-filter -1): still many inversions | |||
1 137445 | |||
-1 22445 | |||
* Inverted scaffold examples: | |||
cd ~dpuiu/turkey/ | |||
join2.pl Alignment2.0/chicken-turkey.scf/Chr.scf.dir Assembly2.0/Chr_112409/Chr.scf.dir | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1] ne $F[3]);' | sort -nk3 -r | pretty | grep -v -f turkey.scf.split.112409 | |||
#scfid alignDir alignCount AgpDir AgpCount markerDir markerCount | |||
7180002103721 -1 602 1 395 -1 116 # flipped errorneously (395 ctg,3.3Mbp scaffold) | |||
7180002103327 -1 299 1 221 1 68 | |||
7180002103550 -1 298 1 241 1 65 | |||
7180002103191 -1 280 1 258 1 69 | |||
7180002103618 -1 267 1 426 -1,1 144 # half rev, half fwd | |||
7180002103677 -1 246 1 186 -1,-1 53 # aligns in 2 separate regions of Chr1 | |||
... | |||
7180002103609 1 228 -1 166 -1 | |||
7180002103567 -1 224 1 241 1 | |||
7180002103561 1 223 -1 597 -1 | |||
7180002103695 1 210 -1 286 -1 | |||
7180002103421 -1 201 1 203 1 | |||
7180002103478 -1 181 1 217 -1 # flipped errorneously (217 ctg,1.4Mbp scaffold) | |||
7180002103668 -1 176 1 171 -1 # flipped errorneously (171 ctg,1.7Mbp scaffold) | |||
7180002103762 -1 161 1 257 ? | |||
7180002103538 -1 154 1 134 1 | |||
7180002102914 1 147 -1 74 -1 | |||
7180002103634 -1 142 1 95 -1 # flipped errorneously (95 ctg,6.9Mbp scaffold) | |||
7180002103116 1 141 -1 97 -1 | |||
7180002103453 -1 129 1 49 ? | |||
7180002102994 1 128 -1 109 -1 | |||
* About 70 scaffolds (40Mbp) seem "clearly" inverted | |||
join2.pl ~dpuiu/turkey/Assembly2.0/Chr_112409/Chr.scf.dir BACs/Chr.scf.dir | grep -v -f turkey.scf.split.112409 | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1] and $F[1] ne $F[3]);' | join2.pl -f \ | |||
~dpuiu/turkey/Assembly2.0/turkey.posmap.scflen | sort -nk6 -r | pretty | getSummary.pl -i 5 | |||
elem min q1 q2 q3 max mean n50 sum | |||
70 89456 305981 483779 692778 3317675 576698 692778 40368888 | |||
* Scaffolds don't seem to be interleaved any more | |||
* Stats | |||
. elem min q1 q2 q3 max mean n50 sum | |||
ctg.all 152641 64 1356 3154 8130 91891 6131 12520 935915009 | |||
ctg.placed 144893 64 1388 3361 8485 91891 6330 12751 917287101 | |||
chr.all 33 242906 6750934 18242820 38656374 204065997 32193660 74864811 1062390784 | |||
chr.placed 32 242906 6750934 18242820 38723638 204065997 32509112 74864811 1040291584 | |||
chr #ctg/deg len(noGaps) len(withGaps) | |||
1 26557 181826552 204065997 | |||
2 14384 106718223 116966045 | |||
3 12649 91132767 100405573 | |||
4 9170 68844569 74864811 | |||
5 7553 56965239 62524249 | |||
6 6534 48705183 53257597 | |||
7 4755 35338084 38723638 | |||
8 4751 35279744 38656374 | |||
9 2286 18014631 19388932 | |||
10 3733 28668829 31125850 | |||
11 2720 22659912 24221968 | |||
12 2372 18944919 20663392 | |||
13 2354 18696996 20109273 | |||
14 2367 19181786 20812949 | |||
15 2265 16791072 18242820 | |||
16 1967 14411805 15988588 | |||
17 1635 12015459 13277650 | |||
18 51 139801 244178 | |||
19 1399 9478246 10526513 | |||
20 1424 9943105 11078077 | |||
21 1328 9405728 10459872 | |||
22 1865 13252797 14786889 | |||
23 937 6420024 7113901 | |||
24 569 3613335 4158826 | |||
25 834 4963017 5560155 | |||
26 1040 5925429 6750934 | |||
27 161 687724 943818 | |||
28 717 4244239 4894166 | |||
29 803 3649262 4826720 | |||
30 693 3524564 4396719 | |||
W 50 108225 242906 | |||
Z 24970 47735835 81012204 | |||
Un 7748 18627908 22099200 | |||
total 152641 935915009 1062390784 | |||
Files: | |||
/nfshomes/alekseyz/Chr_111909/Chr.all.agp | |||
/fs/szasmg3/dpuiu/turkey/Assembly2.0/Chr_112409/ | |||
=== Table 13 === | |||
* From the article | |||
* 34 predicted rearrangements between the turkey and chicken genomes ; 6 look wrong, 6 questionable, 22 probably right | |||
GGA GGA start GGA end MGA* Nature of the rearrangement Notes | |||
1 9,713,416 10,050,000 MGA1 segment relocated to chr1:74570000 translocated segment is internal to direct repeat of SEMA3 genes | |||
1 75,800,000 76,000,000 MGA1 small inversion possible unequal recombination within KCN gene cluster | |||
1 104,450,000 104,459,439 MGA1 possible very small intrachromosomal translocation the genetic map places this short segment near 1q telomere | |||
#1 125,900,000 126,300,000 MGA1 small interchromosomal translocation insertion of GGA4:25,500,000-25,550,000 at repetitive locus (see also below) | |||
#1 156,600,000 156,600,001 MGA1 small interchromosomal translocation may be misplacement of Ctg13.1004 in GGA seq or LINE-based translocation of a small segment from GGA4:73,089,000-73,090,000 1 172,822,000 172,900,000 MGA1 possible small inversion may be mis-assembly of GGA ctg3.1161 | |||
2 54,870,224 56,560,442 MGA3 inversion with 56.560 Mb coordinate being telomeric in MGA3 (together one inversion and two translocations or assembly errors) | |||
2 54,398,341 54,413,232 MGA3 small translocation or mis-assembly of GGA seq., inverted rel. to GGA seq. coord. (together one inversion and two translocations or assembly errors) | |||
2 54,641,337 54,845,268 MGA3 probably inverted relative to GGA sequence coordinates (together one inversion and two translocations or assembly errors) | |||
#2 54,290,000 54,330,000 MGA3 small translocation or mis-assembly of GGA seq., orientation uncertain (together one inversion and two translocations or assembly errors) | |||
#2 54,452,395 54,545,188 MGA3 probably inverted relative to GGA sequence coordinates (together one inversion and two translocations or assembly errors) | |||
2 53,804,240 54,263,147 MGA3 inverted relative to GGA seq. coordinates with 53.8 Mb joined to 56.6 Mb in MGA (together one inversion and two translocations or assembly errors) | |||
3 6,218 2,344,838 MGA2 inversion, telomeric FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------] | |||
3 5,605,686 11,605,484 MGA2 inversion, (agrees with genetic map) FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------] | |||
#4 25,500,000 25,550,000 MGA4 small interchromosomal translocation to about 125.90 Mb orthologous coord. on MGA1 see also chr1:125900000 | |||
?4 35,150,000 35,160,000 MGA4 likely small duplication part of this segment duplicated at around 35,828,000 may be misplacement of Ctg13.1004 in seq or | |||
#4 73,080,000 73,090,000 MGA4 small interchromosomal translocation to about 156.60 Mb orthologous coord. on MGA1 LINE-based translocation of a small segment, see also GGA1:156,600,000 | |||
5 1 270,229 MGA5 local small inversion with respect to p arm which as a whole is inverted local inversion with respect to p arm which as a whole is inverted | |||
5 1 7,248,180 MGA5 inversion of p arm p arm likely inverted based on genetic map of Nte0897, MNT-193 | |||
6 1,576,787 13,080,207 MGA8 multiple inversions: predicted order is ... can be explained by a series of 4-5 consecutive inversions, including possible unequal recombination between SLC16A9 or, less likely, protocadherin genes | |||
7 1 7,248,180 MGA7 inversion of p arm | |||
?8ran 64,951 407,592 MGA10 GGA8_random sequences likely telomeric on MGA10 (and probably GGA8) | |||
8 44,817 10,199,568 MGA10 inversion of p arm possible unequal recombination between AMY genes | |||
8 8,992,540 9,170,000 MGA10 local small inversion with respect to p arm which as a whole is inverted probable inversion but might be mis-orientation of GGA sequence contigs | |||
9 1,528,027 4,372,460 MGA11 inversion telomeric inversion | |||
10 1,907,125 3,642,461 MGA12 no internal centromere observed in turkey centromere misplaced in chicken or moved to telomere in turkey | |||
11 75,337 3,280,000 MGA13 no internal centromere observed in turkey inversion of GGA 11p, FISH CONFIRMED | |||
12 95,816 940,546 MGA14 may be inverted, orientation uncertain may be fused to a repeat of 2.15-2.3 Mb region of GGA12 | |||
?12 1,050,000 1,100,000 MGA14 possible small intrachromosomal translocation to telomere small segment may be now at MGA telomere | |||
?12 1,128,610 1,134,284 MGA14 possible very small intrachromosomal translocation small segment now between about 2,632,117-2,703,753 in GGA coordinates on q arm | |||
12 1,164,577 1,399,694 MGA14 inversion (1164577 joined to 1599552) centromere either misplaced in GGA or moved telomeric or between 940,546 and 1,399,694 | |||
13 8,233,861 8,511,782 MGA15 small inversion | |||
14 14,370,000 15,070,000 MGA16 inversion FISH CONFIRMED | |||
18 5,062,096 9,882,412 MGA20 inversion unequal recombination between NME paralogs, FISH confirmed | |||
?28 1,550,000 1,620,000 MGA30 apparent duplication with extra copy at about 1.05 Mb in MGA unclear if these are rearrangements or assembly errors | |||
= Scaffold alignment to chicken = | = Scaffold alignment to chicken = | ||
Line 183: | Line 409: | ||
W Chr41 24 | W Chr41 24 | ||
W Chr40 ? | |||
E22C19W28_E50C23 ChrUn 7l | E22C19W28_E50C23 ChrUn 7l | ||
Line 320: | Line 547: | ||
elem min q1 q2 q3 max mean n50 sum | elem min q1 q2 q3 max mean n50 sum | ||
1+markers 23077 76 6408 11837 19433 91891 14425 19768 332,889,618 | 1+markers 23077 76 6408 11837 19433 91891 14425 19768 332,889,618 | ||
= Scf splits (Daniela) = | |||
1. Input format | |||
cat BACs/BAC_map_final.txt | grep 7180002103762 | pretty | |||
CH260094G18_SP6 3_1.3 3 2205409 150000 7180002076309 7180002103762 3285 322415 | |||
78TKNMI001N01_SP6 3_1.3 3 2287385 150000 7180002058027 7180002103762 2224 329223 | |||
... | |||
CH260099O02_SP6 3_1.5 3 3910434 150000 7180002058147 7180002103762 4524 1655352 | |||
CH260096N05_T7 6_3 6 26824213 150000 7180002058054 7180002103762 12808 693787 | |||
.. | |||
CH260026H13_SP6 6_3 6 29907224 266336 7180002057998 7180002103762 634 33979 | |||
2. find scaffolds with markers from multiple chromosomes | |||
cat BACs/BAC_map_final.txt | awk '{print $7,$3}' | count.pl -m 2 | awk '{print $1,$2}' | paste.pl | |||
... | |||
7180002103762 3 6 | |||
... | |||
= Scf splits (Aleksey) = | = Scf splits (Aleksey) = | ||
1 7180002103685 6 156 161 jumps from chr6 to chr1 4049114-4201400 | 1 7180002103685 6 156 161 jumps from chr6 to chr1 4049114-4201400 | ||
2 7180002103648 1 45 79 1187881-1198679 | 2 7180002103648 1 45 79 1187881-1198679 | ||
Line 456: | Line 703: | ||
E22C19W28_E50C23* chrLGE22 3 | E22C19W28_E50C23* chrLGE22 3 | ||
= Synteny = | |||
MSU: | |||
"We do see a couple of very small translocations between chromosomes 1 and 4,but these are so small that they could be errors in the chicken assembly or, more likely, paralogous sequences that perhaps were two copies in the last common ancestor and chicken kept one and turkey the other. We don't see translocations between chromosomes Z and 1, so I expect that these alignments are due to a repetitive element (CR1 being the most likely), but the Z assembly is tentative even in chicken, so it's hard to be sure." | |||
From the spreadsheet: | |||
chickenChr turkeyChr | |||
4 chr1 12.2 1-12.2 25,500,000 25,550,000 | |||
4 chr1 18.2 1-18.2 73,080,000 73,090,000 | |||
From the *merge2.anc | |||
4 Chr1 94230402 207174646 73196453 73204143 177454210 177447336 3184 2528 5 -1 250.65 | |||
4 Chr1 94230402 207174646 86530225 86583469 117075107 116976224 11548 58133 9 -1 203.6 | |||
Syntenic regions: | |||
chickenRegions turkeyRegions chickenChr turkeyChr | |||
all 209166 311363 # nucmer -l 12 -c 65 -g 1000 -b 1000 | |||
filter-1 183058 259760 142 186 # delta-filter -1 | |||
filter 170658 239592 125 129 # filter-anc.pl -maxDist 200000 -W 20 -p 0.1 | |||
merge0 3260 2250 125 130 # merge-anc.pl -maxDist 200000 | |||
merge1 1573 1368 110 93 # merge-anc.pl -maxDist 200000 -minCount 8 -minLen 10000 | |||
merge2 376 488 49 47 # merge-anc.pl -maxDist 1000000 -minCount 20 -minLen 100000 | |||
= Problems = | |||
== ctg7180001625741 == | |||
* 1 ctg scaff: 7180002083787(1.4Kbp) | |||
* Single links to 2 diff scaff: 7180002103637 & 7180002103666 | |||
* Synteny info (Daniela) | |||
cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.ctg/turkey.ctg.posmap.merge | grep -C 20 7180001625741 | |||
# chickenChr turkeyChr | |||
7180002057801 6 36246991 36257816 -1 Chr8 35888371 35899195 r U 100 | |||
7180001625741 6 36269529 36271001 -1 . . . . . . | |||
... | |||
7180002074579 6 36382217 36386350 -1 Chr8 35899296 35903428 r N 20910 | |||
* Synteny info (Aleksey) | |||
cat /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/place_by_sinteny/contigs.chicken.order.with_AGP.valid.txt | grep -C 1 7180001625741 | pretty | |||
# chickenChr turkeyChr | |||
1 1790 36269140 36267359 7180001578245 chr6 Chr7 20109532 20111855 2324 - 7180001578245 | |||
307 1472 36270694 36269529 7180001625741 chr6 ChrUn 32131240 32132711 1472 0 7180001625741* | |||
2343 5341 36282706 36279707 7180001914610 chr6 Chr7 36045401 36052860 7460 - 7180001914610 | |||
cat turkey.posmap.ctgscf | grep 7180002103637 | egrep -n '7180001578245|7180001914610' | |||
... | |||
302:7180001914610 7180002103637 2403512 2410972 f | |||
391:7180001578245 7180002103637 3013067 3015391 f | |||
... | |||
463: | |||
Scf 7180002103637 aligns both to Chr6 & Chr7 | |||
cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.scf/turkey.scf-chicken.filter-1.merge0.anc | grep 7180002103637 | |||
7180002103637 7 3817505 38384769 2 2285066 23413757 21084724 651501 705639 284 -1 23.41 | |||
7180002103637 6 3817505 37400442 2285109 2410972 36410067 36277407 37596 43600 21 -1 38.69 | |||
7180002103637 7 3817505 38384769 2410993 3817505 21084676 19700317 384609 354790 151 -1 23.49 | |||
grep 7180002103637 /fs/szasmg3/dpuiu/turkey/BACs/BAC_map_final.txt | pretty | sort -nk9 | nl | |||
1 CH260098J15_SP6 7_10 7 21623779 150000 7180001914412 7180002103637 2833 2833 | |||
2 78TKNMI023L02_SP6 7_10 7 21655259 150000 7180001914413 7180002103637 462 8036 | |||
3 78TKNMI020I14_T7 7_10 7 21786579 150000 7180001914413 7180002103637 6568 14142 | |||
... | |||
91 78TKNMI028M05_T7 8_13 8 34451891 150000 7180001914600 7180002103637 4694 2314126 | |||
92 CH260110M21_T7 8_13 8 34375922 150000 7180001914602 7180002103637 4382 2344157 | |||
93 CH260102C12_T7 8_13 8 34561953 150000 7180001914608 7180002103637 7429 2400413 | |||
... | |||
155 CH260102B06_SP6 7_10 7 18173403 150000 7180001914714 7180002103637 578 3753466 | |||
156 CH260091G02_T7 7_10 7 18147518 150000 7180001914716 7180002103637 14232 3777334 | |||
157 78TKNMI020K20_T7 7_10 7 17944915 150000 7180001914719 7180002103637 5969 3809779 | |||
* Solution: | |||
Chr7.agp:10573: Chr7 36935151 36938433 10573 W 7180001538614 1 3283 + # chr7/v3.6/scaffolds/scaffold_0.3 | |||
Chr7.agp.bak:10609: Chr7 37075393 37078675 10609 W 7180001538614 1 3283 + # chr7/v3.6/scaffolds/scaffold_0.3 | |||
Chr8.agp:10281: Chr8 36818118 36822250 10281 W 7180002074579 1 4133 - # chr8/v3.6/scaffolds/scaffold_0.8 | |||
Chr8.agp.bak:10245: Chr8 36677876 36682008 10245 W 7180002074579 1 4133 - # chr8/v3.6/scaffolds/scaffold_0.8 | |||
== 9 more problems == | |||
Turkey marker counts: | |||
scfId turkeyChr #markers | |||
7180002103213 28 3 # 100K on Chr28 | |||
7180002103213 9 9 | |||
7180002103555 20 7 # found before ; 100K on Chr8 | |||
7180002103555 8 3 | |||
7180002103653 1 2 # 100K on Chr1 | |||
7180002103653 5 71 | |||
7180002103669 10 161 # 130K in the middle on Chr11 | |||
7180002103669 11 3 | |||
7180002103694 1 53 # 60K in the middle on Chr3 | |||
7180002103694 3 7 | |||
7180002103720 1 59 # found before ; 160K in the middle of Chr8 # "very messy" | |||
7180002103720 7 63 | |||
7180002103720 8 8 | |||
7180002103742 1 2 # 40K on Chr1 | |||
7180002103742 2 23 | |||
7180002103744 1 2 # 60K on Chr1 | |||
7180002103744 19 3 | |||
7180002103750 2 115 # 50K in the middle on Chr3 | |||
7180002103750 3 2 | |||
Alignment to chicken chromosomes: | |||
scfId chickenChr scfLen chrLen scfStart scfEnd chrStart chrEnd scfSnp chrSnp #alignm. chrDir scfIntercept | |||
7180002103213 4 426424 94230402 6 299125 492126 808691 84749 99619 33 1 -0.49 | |||
7180002103213 26 426424 5102438 299637 426422 1866683 1733616 25738 31146 16 -1 2.16 | |||
7180002103555 18 462038 10925261 1 370822 8723614 8393969 118299 280534 19 -1 8.72 | |||
7180002103555 6 462038 37400442 370843 462032 20662598 20558365 31608 42763 22 -1 21.03 | |||
7180002103653 1 2021582 200994015 288 61479 168123022 168180581 29306 25853 22 1 -168.12 | |||
7180002103653 5 2021582 62238931 65643 2021301 48278635 50168489 555769 666443 202 1 -48.21 | |||
7180002103669 8 3819803 30671729 1 1573516 22762147 21163160 362616 390538 146 -1 22.76 | |||
7180002103669 9 3819803 25554352 1582769 1673152 20499440 20382802 40878 42100 14 -1 22.08 | |||
7180002103669 8 3819803 30671729 1674402 3819472 21100173 18954403 504851 532551 145 -1 22.77 | |||
7180002103694 1 1815438 200994015 23814 1010176 175716942 174763467 370937 334310 93 -1 175.74 | |||
7180002103694 2 1815438 154873767 1083861 1163613 145943508 145868623 25220 19821 12 -1 147.02 | |||
7180002103694 1 1815438 200994015 1164196 1783528 174719693 174136372 206925 166944 91 -1 175.88 | |||
7180002103720 4 3387095 94230402 3120 22799 74444867 74427082 10167 7883 9 -1 74.44 | |||
7180002103720 7 3387095 38384769 23768 689878 25625928 24976086 154053 144800 49 -1 25.64 | |||
7180002103720 6 3387095 37400442 707721 849237 9020691 9164801 21129 30710 10 1 -8.31 | |||
7180002103720 7 3387095 38384769 849503 1867680 24927870 23935979 247702 228094 87 -1 25.77 | |||
7180002103720 1 3387095 200994015 1896368 3387092 142436224 143961082 406262 431752 212 1 -140.53 | |||
7180002103742 3 1122157 113657789 33 1003474 77462460 78481139 264247 275185 122 1 -77.46 | |||
7180002103742 1 1122157 200994015 1051349 1117652 5090648 5024622 27877 16946 13 -1 6.14 | |||
7180002103744 17 283784 11182526 4 124728 2142615 2019855 26734 33044 12 -1 2.14 | |||
7180002103744 1 283784 200994015 213512 283782 119477488 119414846 20891 50004 14 -1 119.69 | |||
7180002103750 3 2462253 113657789 236 601601 48166755 48796598 146263 177067 84 1 -48.16 | |||
7180002103750 2 2462253 154873767 632754 702802 74042519 73959216 28350 39498 20 -1 74.67 | |||
7180002103750 3 2462253 113657789 702856 2462253 48823530 50636809 416701 636563 219 1 -48.12 | |||
= Annotation = | = Annotation = | ||
* ftp://ftp.sanger.ac.uk/pub/searle/umd/turkey | * ftp://ftp.sanger.ac.uk/pub/searle/umd/turkey | ||
* http://birdbase.net/cgi-bin/gbrowse/turkeygenome/#search | |||
15,093 - protein coding gene loci | |||
611 - noncoding RNA genes | |||
15,704 - total number, protein and RNA gene loci. | |||
= Submission = | |||
* [https://netfiles.umn.edu/xythoswfs/webui/_xy-11544920_1-t_Tk0AQByW Nature draft] | |||
* [ftp://ftp.cbcb.umd.edu/pub/data/turkey/Assembly2.0/ CBCB ftp] | |||
* Local dirs: | |||
/fs/ftp-cbcb/pub/data/turkey/ # assemblies, FASTA, AGP ... | |||
/fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/Alignments/ # alignments to chicken |
Latest revision as of 19:32, 20 January 2010
Data
Chicken (Gallus gallus)
Stats:
. elem min q1 q2 q3 max mean n50 sum Chr1..28,32,MT,W,Z,E22C19W28_E50C23,E64 34 1028 4512026 12968165 30671729 200994015 30377803 94230402 1,032,845,329 gaps(N's) 524913 1 30 64 254 1504285 268 792 141,055,297 chicken.len
Files:
/fs/szasmg3/dpuiu/chicken/
Zebrafinch (Taeniopygia guttata)
Chr stats:
. elem min q1 q2 q3 max mean n50 sum all(random dumplication) 70 9909 369730 2517995 16419078 175225315 17616947 73657157 1,233,186,341 all(gaps) 107061 25 100 100 100 500000 92 100 9,879,775 Chr1,1A,1B,2,3,4,4A,5..28,LG2,LG5,LGE22,M,Un,Z 37 9909 4907541 15652063 36305782 175225315 32343381 73657157 1,196,705,108 zebrafinch.len
Files:
/fs/szasmg3/dpuiu/zebrafinch/
Turkey (Meleagris gallopavo)
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=search&term=Meleagris
- http://www.biolbull.org/cgi/content/abstract/61/2/157
- http://www.ncbi.nlm.nih.gov/sites/gquery?term=Meleagris+gallopavo[organism]
Files:
/fs/szasmg3/dpuiu/turkey/
Assembly2.0
Original (CA)
Reads:
TotalUsableReads=151,843,863 (151M) AvgClearRange=102 ContigReads=139021843(91.56%) DegenContigReads=8392124(5.53%) SurrogateReads=1317962(0.87%) SingletonReads=3314375(2.18%) Cvg=15X
Stats:
. elem min q1 q2 q3 max mean n50 sum scf 27,007 66 1354 1988 4793 9558742 37856 1538143 1,022,394,764 ctg 145,663 64 1512 3433 8500 91891 6391 12594 930,953,352 deg 440,796 64 102 256 485 8055 312 483 137,835,235
Preliminary
Stats:
. elem min q1 q2 q3 max mean n50 sum Ch1..30,40,41 32 531 6400446 15119779 34928883 184590300 28263595 70426150 904,435,047 gaps 147792 100 100 100 100 2999 268 860 39,738,918
Stats(placed):
. elem min q1 q2 q3 max mean n50 sum scf 2,504 1001 5868 35589 272564 9558742 362085 1830406 906,662,877 ctg 111,752 64 1919 4886 10524 91891 7616 13635 851,209,123 deg 36,072 64 144 331 530 8055 373 521 13,487,006 ctg+deg 147,824 64 520 2783 8197 91891 5849 13426 864,696,129
Final
- More ctgs placed based on synteny.
- Alignments to chicken (delta-filter -1):
1 150471 -1 32587
- Many scaffolds seem to be interleaved
Stats:
elem min q1 q2 q3 max mean n50 sum Ch1..30,40,41 32 531 7024757 18811362 37793329 207174646 31576111 75696247 1,010,435,575 Ch1..30,40,41,Un 33 531 7024757 18811362 37793329 207174646 32954439 75696247 1,087,496,503 turkey.len
Stats(placed):
. elem min q1 q2 q3 max mean n50 sum ctg 131,217 64 1651 3975 9289 91891 6866 12989 901,044,472 deg 31,426 64 128 283 530 8055 357 540 11,241,382 ctg+deg 162,643 64 731 2602 7576 91891 5609 12829 912,285,854
More stats:
total genome size with gaps : 1087496503 1010435575 total genome size without gaps : 941191869 912285854 where: all: Chr1..41,Un placed: Chr1..41
N50 contig size(CA ctgs): 12435 N50 scaffold size(original CA scaff): 1538143
total bases mapped to chromosomes: . 941191869 (Chr1..41) total unmapped : 28906015 (ChrU)
size and number of contigs in each chromosome: chr #ctg/deg len(noGaps) len(withGaps) 1 31920 186281234 207174646 2 17221 108330071 119814280 3 15247 92546836 102780271 4 10336 69043870 75696247 5 8892 57589156 63943857 6 7680 49575076 55000907 7 6634 36192137 39986770 8 5331 34152018 37933571 9 2583 18366421 20063553 10 4455 29082703 31790800 11 2962 22664575 24752353 12 2854 19182682 21170715 13 2810 18912345 21086818 14 2657 19298732 21185158 15 2671 17107111 18811362 16 2421 14623454 16273683 17 1858 12183352 13504974 18 57 118600 132921 19 1687 9654238 10789531 20 2039 10407256 11885725 21 1562 9611963 10683868 22 5729 14123046 16000480 23 1066 6510119 7383190 24 710 3881523 4300864 25 954 5025869 5613781 26 1522 6146115 7024757 27 195 777582 887413 28 778 4125373 4632725 29 688 3036456 3487800 30 1067 3660581 4277653 40 1 531 531 41 16056 30074829 32364371 Un 14048 28906015 77060928 total 176691 941191869 1087496503
Reads and bases ctg H 120498260 8874803391 ctg F 18491893 5298292983 ctg C 17546 7111026 ctg 0 14144 4334955 deg H 6775791 493312431 deg F 1614232 440598789 deg C 971 275722 deg 0 1130 253396 placed_ctg H 118224274 8707749456 placed_ctg F 18072645 5176540153 placed_ctg C 17515 7100114 placed_ctg 0 14058 4313107 placed_deg H 1401655 101867758 placed_deg F 212050 49638442 placed_deg C 675 195293 placed_deg 0 653 150688
--- Files:
/fs/szattic-asmg4/turkey/Assembly2.0/ /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/
Chr_111909
--Dpuiu 22:28, 19 November 2009 (EST)
- Aleksey try to fix the contig rearrangements & scaff overlaps
. elem min q1 q2 q3 max mean n50 sum ctg 154342 64 1463 3214 8113 91891 6170 12340 952327586 deg 13627 64 181 453 685 8055 460 656 6270299 ctg+deg 167969 64 1218 2747 7462 91891 5706 12242 958597885
Files:
/nfshomes/alekseyz/Chr_111909/Chr.all.agp
Chr_112409
--Dpuiu 15:39, 24 November 2009 (EST)
- Alignments to chicken (delta-filter -1): still many inversions
1 137445 -1 22445
- Inverted scaffold examples:
cd ~dpuiu/turkey/ join2.pl Alignment2.0/chicken-turkey.scf/Chr.scf.dir Assembly2.0/Chr_112409/Chr.scf.dir | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1] ne $F[3]);' | sort -nk3 -r | pretty | grep -v -f turkey.scf.split.112409 #scfid alignDir alignCount AgpDir AgpCount markerDir markerCount 7180002103721 -1 602 1 395 -1 116 # flipped errorneously (395 ctg,3.3Mbp scaffold) 7180002103327 -1 299 1 221 1 68 7180002103550 -1 298 1 241 1 65 7180002103191 -1 280 1 258 1 69 7180002103618 -1 267 1 426 -1,1 144 # half rev, half fwd 7180002103677 -1 246 1 186 -1,-1 53 # aligns in 2 separate regions of Chr1 ... 7180002103609 1 228 -1 166 -1 7180002103567 -1 224 1 241 1 7180002103561 1 223 -1 597 -1 7180002103695 1 210 -1 286 -1 7180002103421 -1 201 1 203 1 7180002103478 -1 181 1 217 -1 # flipped errorneously (217 ctg,1.4Mbp scaffold) 7180002103668 -1 176 1 171 -1 # flipped errorneously (171 ctg,1.7Mbp scaffold) 7180002103762 -1 161 1 257 ? 7180002103538 -1 154 1 134 1 7180002102914 1 147 -1 74 -1 7180002103634 -1 142 1 95 -1 # flipped errorneously (95 ctg,6.9Mbp scaffold) 7180002103116 1 141 -1 97 -1 7180002103453 -1 129 1 49 ? 7180002102994 1 128 -1 109 -1
- About 70 scaffolds (40Mbp) seem "clearly" inverted
join2.pl ~dpuiu/turkey/Assembly2.0/Chr_112409/Chr.scf.dir BACs/Chr.scf.dir | grep -v -f turkey.scf.split.112409 | sed 's/f/1/' | sed 's/r/-1/;' | p 'print $_ if($F[1] and $F[1] ne $F[3]);' | join2.pl -f \ ~dpuiu/turkey/Assembly2.0/turkey.posmap.scflen | sort -nk6 -r | pretty | getSummary.pl -i 5 elem min q1 q2 q3 max mean n50 sum 70 89456 305981 483779 692778 3317675 576698 692778 40368888
- Scaffolds don't seem to be interleaved any more
- Stats
. elem min q1 q2 q3 max mean n50 sum ctg.all 152641 64 1356 3154 8130 91891 6131 12520 935915009 ctg.placed 144893 64 1388 3361 8485 91891 6330 12751 917287101 chr.all 33 242906 6750934 18242820 38656374 204065997 32193660 74864811 1062390784 chr.placed 32 242906 6750934 18242820 38723638 204065997 32509112 74864811 1040291584
chr #ctg/deg len(noGaps) len(withGaps) 1 26557 181826552 204065997 2 14384 106718223 116966045 3 12649 91132767 100405573 4 9170 68844569 74864811 5 7553 56965239 62524249 6 6534 48705183 53257597 7 4755 35338084 38723638 8 4751 35279744 38656374 9 2286 18014631 19388932 10 3733 28668829 31125850 11 2720 22659912 24221968 12 2372 18944919 20663392 13 2354 18696996 20109273 14 2367 19181786 20812949 15 2265 16791072 18242820 16 1967 14411805 15988588 17 1635 12015459 13277650 18 51 139801 244178 19 1399 9478246 10526513 20 1424 9943105 11078077 21 1328 9405728 10459872 22 1865 13252797 14786889 23 937 6420024 7113901 24 569 3613335 4158826 25 834 4963017 5560155 26 1040 5925429 6750934 27 161 687724 943818 28 717 4244239 4894166 29 803 3649262 4826720 30 693 3524564 4396719 W 50 108225 242906 Z 24970 47735835 81012204 Un 7748 18627908 22099200 total 152641 935915009 1062390784
Files:
/nfshomes/alekseyz/Chr_111909/Chr.all.agp /fs/szasmg3/dpuiu/turkey/Assembly2.0/Chr_112409/
Table 13
- From the article
- 34 predicted rearrangements between the turkey and chicken genomes ; 6 look wrong, 6 questionable, 22 probably right
GGA GGA start GGA end MGA* Nature of the rearrangement Notes 1 9,713,416 10,050,000 MGA1 segment relocated to chr1:74570000 translocated segment is internal to direct repeat of SEMA3 genes 1 75,800,000 76,000,000 MGA1 small inversion possible unequal recombination within KCN gene cluster 1 104,450,000 104,459,439 MGA1 possible very small intrachromosomal translocation the genetic map places this short segment near 1q telomere #1 125,900,000 126,300,000 MGA1 small interchromosomal translocation insertion of GGA4:25,500,000-25,550,000 at repetitive locus (see also below) #1 156,600,000 156,600,001 MGA1 small interchromosomal translocation may be misplacement of Ctg13.1004 in GGA seq or LINE-based translocation of a small segment from GGA4:73,089,000-73,090,000 1 172,822,000 172,900,000 MGA1 possible small inversion may be mis-assembly of GGA ctg3.1161 2 54,870,224 56,560,442 MGA3 inversion with 56.560 Mb coordinate being telomeric in MGA3 (together one inversion and two translocations or assembly errors) 2 54,398,341 54,413,232 MGA3 small translocation or mis-assembly of GGA seq., inverted rel. to GGA seq. coord. (together one inversion and two translocations or assembly errors) 2 54,641,337 54,845,268 MGA3 probably inverted relative to GGA sequence coordinates (together one inversion and two translocations or assembly errors) #2 54,290,000 54,330,000 MGA3 small translocation or mis-assembly of GGA seq., orientation uncertain (together one inversion and two translocations or assembly errors) #2 54,452,395 54,545,188 MGA3 probably inverted relative to GGA sequence coordinates (together one inversion and two translocations or assembly errors) 2 53,804,240 54,263,147 MGA3 inverted relative to GGA seq. coordinates with 53.8 Mb joined to 56.6 Mb in MGA (together one inversion and two translocations or assembly errors) 3 6,218 2,344,838 MGA2 inversion, telomeric FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------] 3 5,605,686 11,605,484 MGA2 inversion, (agrees with genetic map) FISH CONFIRMED, order is telo-cen-[2.4-0.0Mb]-[2.4-5.6Mb]-[11.605-5.605Mb]-[13.16Mb-------] #4 25,500,000 25,550,000 MGA4 small interchromosomal translocation to about 125.90 Mb orthologous coord. on MGA1 see also chr1:125900000 ?4 35,150,000 35,160,000 MGA4 likely small duplication part of this segment duplicated at around 35,828,000 may be misplacement of Ctg13.1004 in seq or #4 73,080,000 73,090,000 MGA4 small interchromosomal translocation to about 156.60 Mb orthologous coord. on MGA1 LINE-based translocation of a small segment, see also GGA1:156,600,000 5 1 270,229 MGA5 local small inversion with respect to p arm which as a whole is inverted local inversion with respect to p arm which as a whole is inverted 5 1 7,248,180 MGA5 inversion of p arm p arm likely inverted based on genetic map of Nte0897, MNT-193 6 1,576,787 13,080,207 MGA8 multiple inversions: predicted order is ... can be explained by a series of 4-5 consecutive inversions, including possible unequal recombination between SLC16A9 or, less likely, protocadherin genes 7 1 7,248,180 MGA7 inversion of p arm ?8ran 64,951 407,592 MGA10 GGA8_random sequences likely telomeric on MGA10 (and probably GGA8) 8 44,817 10,199,568 MGA10 inversion of p arm possible unequal recombination between AMY genes 8 8,992,540 9,170,000 MGA10 local small inversion with respect to p arm which as a whole is inverted probable inversion but might be mis-orientation of GGA sequence contigs 9 1,528,027 4,372,460 MGA11 inversion telomeric inversion 10 1,907,125 3,642,461 MGA12 no internal centromere observed in turkey centromere misplaced in chicken or moved to telomere in turkey 11 75,337 3,280,000 MGA13 no internal centromere observed in turkey inversion of GGA 11p, FISH CONFIRMED 12 95,816 940,546 MGA14 may be inverted, orientation uncertain may be fused to a repeat of 2.15-2.3 Mb region of GGA12 ?12 1,050,000 1,100,000 MGA14 possible small intrachromosomal translocation to telomere small segment may be now at MGA telomere ?12 1,128,610 1,134,284 MGA14 possible very small intrachromosomal translocation small segment now between about 2,632,117-2,703,753 in GGA coordinates on q arm 12 1,164,577 1,399,694 MGA14 inversion (1164577 joined to 1599552) centromere either misplaced in GGA or moved telomeric or between 940,546 and 1,399,694 13 8,233,861 8,511,782 MGA15 small inversion 14 14,370,000 15,070,000 MGA16 inversion FISH CONFIRMED 18 5,062,096 9,882,412 MGA20 inversion unequal recombination between NME paralogs, FISH confirmed ?28 1,550,000 1,620,000 MGA30 apparent duplication with extra copy at about 1.05 Mb in MGA unclear if these are rearrangements or assembly errors
Scaffold alignment to chicken
- Parameters:
nucmer -l 12 -c 65 -g 1000 -b 1000 delta-filter -1
- Scf stats
elem min q1 q2 q3 max mean n50 sum aligned 22,045 66 1450 2276 5670 9558742 45827 1562815 1,010,256,240 unaligned 4,962 73 1159 1411 1935 119729 2446 2654 12,138,524 1+alignments/scf 22045 1 1 1 2 1660 6 136 153866 2+alignments/2+chr 50 11625 55577 1398890 3387095 7409211 1883381 4298282 94,169,060
- Ctg stats (ctgs in aligned scaff)
elem min q1 q2 q3 max mean n50 sum aligned 139790 64 1580 3665 8822 91891 6585 12739 920,634,899 unaligned 5873 64 1148 1399 1887 22071 1756 1766 10,318,453
- Alignment stats
. elem min q1 q2 q3 max mean n50 sum len(all) 202105 11 681 1895 5189 134408 4315 10045 872,231,977 len(filter-1) 163390 12 1191 2673 6437 134409 5188 10410 847,715,057 %id(filter-1) 163390 11.24 81.10 84.82 87.68 100.00 83 85 .
- turkey scf vs chicken & turkey chr : 15% of the scaffold sequence seem to align in opposite orientation !!! Could the scaffold be misoriented by mistake?
. elem min q1 q2 q3 max mean n50 sum opposite 1527 925 2604 7579 32323 6964320 78342 1018939 119629225 same 2619 97 2591 11510 128530 9558742 306323 1873938 802261737
Mapping
- (200+ alignments)
chickenChr turkeyChr #alignments 1 Chr1 35025 2 Chr3 18143 : Chr6 followed by Chr3 2 Chr6 7612 3 Chr2 17765 : Chr2 5' flipped 4 Chr4 11226 : Chr6 followed by Chr4 4 Chr9 2132 5 Chr5 8516 6 Chr8 4552 7 Chr7 4394 8 Chr10 3654 : Chr10 5' flipped 9 Chr11 2729 10 Chr12 2500 11 Chr13 2629 12 Chr14 2158 13 Chr15 2136 14 Chr16 2109 15 Chr17 1524 17 Chr19 1285 18 Chr20 1374 : Chr20 3' flipped 19 Chr21 1155 20 Chr22 1828 21 Chr23 887 22 Chr24 511 23 Chr25 751 24 Chr26 862 25 Chr27 3 26 Chr28 592 27 Chr29 568 28 Chr30 553 Z Chr41 4178 Z Chr1 404 W Chr41 24 W Chr40 ? E22C19W28_E50C23 ChrUn 7l E64 ChrUn 20
- Scaffolds with multiple alignment blocks:
- 44 on different Chr
- 30 on same chr; 11 appear to be partially flipped
nl scfid chickenChr 1 7180002103050 2 2 7180002103154 6 3 7180002103203 3 25 # new 4 7180002103204 10 28 5 7180002103206 18 6 7180002103213 4 26 # new 7 7180002103242 5 # partially flipped 8 7180002103280 1 8 9 7180002103298 7 10 7180002103329 6 # partially flipped 11 7180002103402 8 # partially flipped 12 7180002103421 9 13 7180002103425 2 7 # new 14 7180002103431 6 # partially flipped 15 7180002103433 1 # partially flipped 16 7180002103480 5 6 17 7180002103500 12 13 # new 18 7180002103519 3 9 19 7180002103555 6 18 # new 20 7180002103557 8 21 7180002103561 3 22 7180002103574 1 23 7180002103597 2 17 # new 24 7180002103605 2 3 # new 25 7180002103608 8 26 7180002103614 2 27 7180002103617 2 # partially flipped 28 7180002103618 1 # partially flipped 29 7180002103619 11 # partially flipped 30 7180002103620 4 31 7180002103621 1 2 28 32 7180002103627 1 33 7180002103637 6 7 # new 34 7180002103638 2 18 # new 35 7180002103642 4 36 7180002103648 1 3 37 7180002103653 1 5 # new 38 7180002103663 6 39 7180002103668 1 40 7180002103669 8 9 # new 41 7180002103670 1 4 # new 42 7180002103672 2 3 43 7180002103675 1 # partially flipped 44 7180002103677 1 45 7180002103679 2 # partially flipped 46 7180002103681 1 5 47 7180002103682 1 21 48 7180002103683 4 17 49 7180002103684 13 # partially flipped 50 7180002103685 1 2 51 7180002103686 1 3 52 7180002103688 3 8 53 7180002103693 12 15 54 7180002103694 1 2 # new 55 7180002103695 3 56 7180002103698 2 12 57 7180002103702 6 11 58 7180002103714 4 5 59 7180002103715 1 2 4 60 7180002103717 2 10 61 7180002103720 1 6 7 # new 62 7180002103723 4 6 63 7180002103725 1 14 64 7180002103728 1 9 # new 65 7180002103736 1 5 # new 66 7180002103740 7 67 7180002103742 1 3 # new 68 7180002103743 6 8 17 69 7180002103744 1 17 # new 70 7180002103750 2 3 # new 71 7180002103752 9 18 72 7180002103762 2 73 7180002103771 1 3 19 74 7180002103798 7 26 # new
Scaffold alignment to zebrafinch
- Parameters:
nucmer -l 12 -c 65 -g 1000 -b 1000 delta-filter -1
- Alignment stats (44 scf : subset 10)
. elem min q1 q2 q3 max mean n50 sum len(subset 10)* 5286 12 233 485 860 12853 675 1033 3570025 %id(subset 10) 5286 40.99 74.20 78.57 85.63 100.00 80 79 .
Chromosome alignment to chicken
- Parameters:
nucmer -l 12 -c 65 -g 1000 -b 1000 delta-filter -1 # not yet
- Alignment stats
. elem min q1 q2 q3 max mean n50 sum len(all) 185138 11 600 2011 5567 134408 4407 10093 815928282 len(delta-filter -r) 155094 11 1065 2783 6592 134408 5165 10302 801185719 len(delta-filter -1) 148515 11 1144 2953 6836 134408 5341 10421 793361287
BACs.old
- Markers:
37918 : total CH260's 8558 : assembled in scaffolds
8641 : total 78TKNMI
- Scf stats:
elem min q1 q2 q3 max mean n50 sum 1+markers 1228 1001 24541 247381 879303 9558742 696129 1984837 854,846,919 0markers 25779 66 1338 1911 4245 1214147 6499 26354 167,547,845 1+markers/scf 1228 1 1 2 7 110 6 19 8,262 2+markers/2+chr 38 671404 1525677 2968427 4298282 7409211 3084380 4013969 117,206,475
BACs
- Scf len stats:
elem min q1 q2 q3 max mean n50 sum 1+markers 2478 1001 6013 36597 278486 9558742 365837 1830406 906,544,909 0 markers 24529 66 1323 1848 3839 325966 4722 11201 115,849,855 2+markers/2+chr 60 283784 1158965 2021582 3549120 7409211 2457241 3411361 147,434,495 3+markers/2+chr 38 426424 1609106 2833228 4013969 7409211 3061980 3819803 116,355,251
- Ctg len stats:
elem min q1 q2 q3 max mean n50 sum 1+markers 23077 76 6408 11837 19433 91891 14425 19768 332,889,618
Scf splits (Daniela)
1. Input format
cat BACs/BAC_map_final.txt | grep 7180002103762 | pretty CH260094G18_SP6 3_1.3 3 2205409 150000 7180002076309 7180002103762 3285 322415 78TKNMI001N01_SP6 3_1.3 3 2287385 150000 7180002058027 7180002103762 2224 329223 ... CH260099O02_SP6 3_1.5 3 3910434 150000 7180002058147 7180002103762 4524 1655352 CH260096N05_T7 6_3 6 26824213 150000 7180002058054 7180002103762 12808 693787 .. CH260026H13_SP6 6_3 6 29907224 266336 7180002057998 7180002103762 634 33979
2. find scaffolds with markers from multiple chromosomes
cat BACs/BAC_map_final.txt | awk '{print $7,$3}' | count.pl -m 2 | awk '{print $1,$2}' | paste.pl ... 7180002103762 3 6 ...
Scf splits (Aleksey)
1 7180002103685 6 156 161 jumps from chr6 to chr1 4049114-4201400 2 7180002103648 1 45 79 1187881-1198679 3 7180002103620 241786-307810 # aligns to one chicken chr 4 7180002103280 56334-114382 5 7180002103762 386780-485750 # aligns to one chicken chr 6 7180002103638 111865-184832 7 7180002103743 707755-712324 8 7180002103743 1618441-1646472 9 7180002103743 1895159-1956617 10 7180002103683 3122611-3324351 11 7180002103642 536597-587034 # aligns to one chicken chr 12 7180002103204 94910-122663 13 7180002103681 5 33 57 jumps from chr5 to chr1 943178-1075454 map looks ok 14 7180002103715 9 243 270 jumps from chr3 to chr9 547913-610659 map looks ok 15 7180002103725 1 129 187 jumps from chr16 to chr1 1904425-2067581, map looks ok 16 7180002103728 11 83 131 jumps from chr11 to chr1 2456073-2532176, map looks ok 17 7180002103698 3 240 266 jumps from chr14 to chr3 588551-618407, map looks ok 18 7180002103686 1 34 41 jumps from chr2 to chr1 292876-340742, map look ok 19 7180002103621 3 40 57 jumps from chr1 to chr3 707868-766695, map looks ok 20 7180002103720 7 63 130 jumps from chr7 to chr13 1890283-1900965, map looks ok 21 7180002103682 23 68 75 jumps from chr1 to chr23 270646-281964, map looks ok 22 7180002103605 2 43 60 jumps from chr2 to chr3 1059724-1121629, map looks ok 23 7180002103688 10 131 162 jumps from chr10 to chr2 3129178-3331813, map looks ok 24 7180002103672 6 31 55 jumps from chr2 to chr6, 800904-850720, map looks ok 25 7180002103771 2 13 26 jumps from chr21 to chr2 516684-703439, map looks ok 26 7180002103519 11 52 62 jumps from chr11 to chr2 1685597-1695161, map looks ok 27 7180002103597 3 120 150 jumps from chr3 to chr19 2839516-3067987, map looks ok 28 7180002103717 3 61 96 jumps from chr3 to chr12, 2101452-2251116, map looks ok 29 7180002103743 10 101 257 jumps from chr8 to chr10, 3601251-3670472 map look ok, 30 7180002103743 jump from chr10 to chr19, 6212398-6251410 map looks ok 31 7180002103714 4 95 146 jumps from chr5 to chr4 1553913-1593600, map looks ok 32 7180002103723 4 133 179 jumps from chr9 to chr4 1656209-1721059, map looks ok 33 7180002103752 20 100 166 jumps from chr11 to chr20, 1951227-2017628,map looks ok 34 7180002103480 5 79 119 jumps from chr8 to chr5, 1086539-1133932, map looks ok 35 7180002103702 13 124 145 jumps from chr8 to chr13 935622-1070705, map looks ok 36 7180002103693 14 73 84 jumps from chr17 to chr14, 477273-532094, map looks ok 37 7180002103614 # aligns to one chicken chr 38 7180002103677 # aligns to one chicken chr
Split ids: cat Chr_preliminary.agp | grep W | grep -v ChrUn | awk '{print $11}' | grep ^7181 | sort -u | nl
1 7181002103204 2 7181002103280 3 7181002103480 4 7181002103519 5 7181002103620 6 7181002103621 7 7181002103648 8 7181002103672 9 7181002103681 10 7181002103682 11 7181002103683 12 7181002103685 13 7181002103686 14 7181002103688 15 7181002103693 16 7181002103698 17 7181002103702 18 7181002103714 19 7181002103715 20 7181002103717 21 7181002103723 22 7181002103725 23 7181002103743 24 7181002103752 25 7181002103771
Zebrafinch chr sample vs Chicken chr
- Sample 1Kbp every 1M in Zebrafinsh chr
ChickenChr ZebraChr count(>2) 1 chr1 406 1* chr1A 287 1* chr1B 124 # not sampled 2 chr2 589 3 chr3 436 4 chr4 217 4* chr4A 77 5 chr5 244 6 chr6 132 7 chr7 155 8 chr8 116 9 chr9 103 10 chr10 108 11 chr11 105 12 chr12 88 13 chr13 75 14 chr14 65 15 chr15 56 16 nothing 17 chr17 49 18 chr18 45 19 chr19 53 20 chr20 63 21 chr21 26 22 chr22 11 23 chr23 20 24 chr24 32 26 chr26 14 27 chr27 13 28 chr28 14 Z chrZ 165 W chrZ 30 # not sampled E64 nothing E22C19W28_E50C23* chrLGE22 3
Synteny
MSU:
"We do see a couple of very small translocations between chromosomes 1 and 4,but these are so small that they could be errors in the chicken assembly or, more likely, paralogous sequences that perhaps were two copies in the last common ancestor and chicken kept one and turkey the other. We don't see translocations between chromosomes Z and 1, so I expect that these alignments are due to a repetitive element (CR1 being the most likely), but the Z assembly is tentative even in chicken, so it's hard to be sure."
From the spreadsheet:
chickenChr turkeyChr 4 chr1 12.2 1-12.2 25,500,000 25,550,000 4 chr1 18.2 1-18.2 73,080,000 73,090,000
From the *merge2.anc
4 Chr1 94230402 207174646 73196453 73204143 177454210 177447336 3184 2528 5 -1 250.65 4 Chr1 94230402 207174646 86530225 86583469 117075107 116976224 11548 58133 9 -1 203.6
Syntenic regions:
chickenRegions turkeyRegions chickenChr turkeyChr all 209166 311363 # nucmer -l 12 -c 65 -g 1000 -b 1000 filter-1 183058 259760 142 186 # delta-filter -1 filter 170658 239592 125 129 # filter-anc.pl -maxDist 200000 -W 20 -p 0.1 merge0 3260 2250 125 130 # merge-anc.pl -maxDist 200000 merge1 1573 1368 110 93 # merge-anc.pl -maxDist 200000 -minCount 8 -minLen 10000 merge2 376 488 49 47 # merge-anc.pl -maxDist 1000000 -minCount 20 -minLen 100000
Problems
ctg7180001625741
- 1 ctg scaff: 7180002083787(1.4Kbp)
- Single links to 2 diff scaff: 7180002103637 & 7180002103666
- Synteny info (Daniela)
cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.ctg/turkey.ctg.posmap.merge | grep -C 20 7180001625741 # chickenChr turkeyChr 7180002057801 6 36246991 36257816 -1 Chr8 35888371 35899195 r U 100 7180001625741 6 36269529 36271001 -1 . . . . . . ... 7180002074579 6 36382217 36386350 -1 Chr8 35899296 35903428 r N 20910
- Synteny info (Aleksey)
cat /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/place_by_sinteny/contigs.chicken.order.with_AGP.valid.txt | grep -C 1 7180001625741 | pretty # chickenChr turkeyChr 1 1790 36269140 36267359 7180001578245 chr6 Chr7 20109532 20111855 2324 - 7180001578245 307 1472 36270694 36269529 7180001625741 chr6 ChrUn 32131240 32132711 1472 0 7180001625741* 2343 5341 36282706 36279707 7180001914610 chr6 Chr7 36045401 36052860 7460 - 7180001914610
cat turkey.posmap.ctgscf | grep 7180002103637 | egrep -n '7180001578245|7180001914610' ... 302:7180001914610 7180002103637 2403512 2410972 f 391:7180001578245 7180002103637 3013067 3015391 f ... 463:
Scf 7180002103637 aligns both to Chr6 & Chr7
cat /fs/szasmg3/dpuiu/turkey/Alignment2.0/chicken-turkey.scf/turkey.scf-chicken.filter-1.merge0.anc | grep 7180002103637 7180002103637 7 3817505 38384769 2 2285066 23413757 21084724 651501 705639 284 -1 23.41 7180002103637 6 3817505 37400442 2285109 2410972 36410067 36277407 37596 43600 21 -1 38.69 7180002103637 7 3817505 38384769 2410993 3817505 21084676 19700317 384609 354790 151 -1 23.49
grep 7180002103637 /fs/szasmg3/dpuiu/turkey/BACs/BAC_map_final.txt | pretty | sort -nk9 | nl 1 CH260098J15_SP6 7_10 7 21623779 150000 7180001914412 7180002103637 2833 2833 2 78TKNMI023L02_SP6 7_10 7 21655259 150000 7180001914413 7180002103637 462 8036 3 78TKNMI020I14_T7 7_10 7 21786579 150000 7180001914413 7180002103637 6568 14142 ... 91 78TKNMI028M05_T7 8_13 8 34451891 150000 7180001914600 7180002103637 4694 2314126 92 CH260110M21_T7 8_13 8 34375922 150000 7180001914602 7180002103637 4382 2344157 93 CH260102C12_T7 8_13 8 34561953 150000 7180001914608 7180002103637 7429 2400413 ... 155 CH260102B06_SP6 7_10 7 18173403 150000 7180001914714 7180002103637 578 3753466 156 CH260091G02_T7 7_10 7 18147518 150000 7180001914716 7180002103637 14232 3777334 157 78TKNMI020K20_T7 7_10 7 17944915 150000 7180001914719 7180002103637 5969 3809779
- Solution:
Chr7.agp:10573: Chr7 36935151 36938433 10573 W 7180001538614 1 3283 + # chr7/v3.6/scaffolds/scaffold_0.3 Chr7.agp.bak:10609: Chr7 37075393 37078675 10609 W 7180001538614 1 3283 + # chr7/v3.6/scaffolds/scaffold_0.3
Chr8.agp:10281: Chr8 36818118 36822250 10281 W 7180002074579 1 4133 - # chr8/v3.6/scaffolds/scaffold_0.8 Chr8.agp.bak:10245: Chr8 36677876 36682008 10245 W 7180002074579 1 4133 - # chr8/v3.6/scaffolds/scaffold_0.8
9 more problems
Turkey marker counts:
scfId turkeyChr #markers 7180002103213 28 3 # 100K on Chr28 7180002103213 9 9 7180002103555 20 7 # found before ; 100K on Chr8 7180002103555 8 3 7180002103653 1 2 # 100K on Chr1 7180002103653 5 71 7180002103669 10 161 # 130K in the middle on Chr11 7180002103669 11 3 7180002103694 1 53 # 60K in the middle on Chr3 7180002103694 3 7 7180002103720 1 59 # found before ; 160K in the middle of Chr8 # "very messy" 7180002103720 7 63 7180002103720 8 8 7180002103742 1 2 # 40K on Chr1 7180002103742 2 23 7180002103744 1 2 # 60K on Chr1 7180002103744 19 3 7180002103750 2 115 # 50K in the middle on Chr3 7180002103750 3 2
Alignment to chicken chromosomes:
scfId chickenChr scfLen chrLen scfStart scfEnd chrStart chrEnd scfSnp chrSnp #alignm. chrDir scfIntercept 7180002103213 4 426424 94230402 6 299125 492126 808691 84749 99619 33 1 -0.49 7180002103213 26 426424 5102438 299637 426422 1866683 1733616 25738 31146 16 -1 2.16 7180002103555 18 462038 10925261 1 370822 8723614 8393969 118299 280534 19 -1 8.72 7180002103555 6 462038 37400442 370843 462032 20662598 20558365 31608 42763 22 -1 21.03 7180002103653 1 2021582 200994015 288 61479 168123022 168180581 29306 25853 22 1 -168.12 7180002103653 5 2021582 62238931 65643 2021301 48278635 50168489 555769 666443 202 1 -48.21 7180002103669 8 3819803 30671729 1 1573516 22762147 21163160 362616 390538 146 -1 22.76 7180002103669 9 3819803 25554352 1582769 1673152 20499440 20382802 40878 42100 14 -1 22.08 7180002103669 8 3819803 30671729 1674402 3819472 21100173 18954403 504851 532551 145 -1 22.77 7180002103694 1 1815438 200994015 23814 1010176 175716942 174763467 370937 334310 93 -1 175.74 7180002103694 2 1815438 154873767 1083861 1163613 145943508 145868623 25220 19821 12 -1 147.02 7180002103694 1 1815438 200994015 1164196 1783528 174719693 174136372 206925 166944 91 -1 175.88 7180002103720 4 3387095 94230402 3120 22799 74444867 74427082 10167 7883 9 -1 74.44 7180002103720 7 3387095 38384769 23768 689878 25625928 24976086 154053 144800 49 -1 25.64 7180002103720 6 3387095 37400442 707721 849237 9020691 9164801 21129 30710 10 1 -8.31 7180002103720 7 3387095 38384769 849503 1867680 24927870 23935979 247702 228094 87 -1 25.77 7180002103720 1 3387095 200994015 1896368 3387092 142436224 143961082 406262 431752 212 1 -140.53 7180002103742 3 1122157 113657789 33 1003474 77462460 78481139 264247 275185 122 1 -77.46 7180002103742 1 1122157 200994015 1051349 1117652 5090648 5024622 27877 16946 13 -1 6.14 7180002103744 17 283784 11182526 4 124728 2142615 2019855 26734 33044 12 -1 2.14 7180002103744 1 283784 200994015 213512 283782 119477488 119414846 20891 50004 14 -1 119.69 7180002103750 3 2462253 113657789 236 601601 48166755 48796598 146263 177067 84 1 -48.16 7180002103750 2 2462253 154873767 632754 702802 74042519 73959216 28350 39498 20 -1 74.67 7180002103750 3 2462253 113657789 702856 2462253 48823530 50636809 416701 636563 219 1 -48.12
Annotation
- ftp://ftp.sanger.ac.uk/pub/searle/umd/turkey
- http://birdbase.net/cgi-bin/gbrowse/turkeygenome/#search
15,093 - protein coding gene loci 611 - noncoding RNA genes 15,704 - total number, protein and RNA gene loci.
Submission
- Nature draft
- CBCB ftp
- Local dirs:
/fs/ftp-cbcb/pub/data/turkey/ # assemblies, FASTA, AGP ... /fs/ftp-cbcb/pub/data/turkey/Assembly2.0/final/Alignments/ # alignments to chicken