Megachile rotundata: Difference between revisions
Jump to navigation
Jump to search
(→Stats) |
(→Stats) |
||
Line 580: | Line 580: | ||
=== Stats === | === Stats === | ||
. elem min q1 q2 q3 max mean n50 sum | . elem min q1 q2 q3 max mean n50 sum |
Revision as of 02:31, 15 October 2010
Data
Original Traces
- 8(+2) pairs of data files (paired ends)
lib insert mates reads readLen ~coverage(500M genome) adaptor s_2_3kbp 3000 21,563,283 43,126,566 124 11 yes s_2_8kbp 8000 198,377 396,754 124 0.1 yes s_2_5kbp 5000 36,218,589 72,437,178 35 5 no s_3 475 35548153 71,096,306 124 18 no s_4 475 35471044 70,942,088 124 18 s_5 475 35616846 71,233,692 124 18 s_6 475 35303840 70,607,680 124 18 s_7 475 34893313 69,786,626 124 18 total . 234813445 469,626,890 . 98* s_2_1.1kb 1100 32,634,858 65,269,716 100 13 no s_2_1.4kb 1500 50,861,645 101,723,290 100 20.3 no
- Location
/fs/szattic-asmg5/Bees/Megachile_rotundata/*txt /fs/szattic-asmg5/Bees/Megachile_rotundata/newLibrary/*txt
- Ftp
ftp.biotec.illinois.edu ftp://username@ftp.biotec.illinois.edu login: generobi Password: GRbeehi3
Corrected/Addaptor-free Traces
- Mated ones
lib insert mates reads repeatReads s_2_3kb 3000 4,823,235 (22%orig) 9,646,470 4,349,208 (45%) s_2_8kb 8000 111,267 (56%orig) 222,534 167,246 (75%) s_2_5kb 5000 36,218,589 72,437,178 35 # same as original s_3 475 33,024,597(92%orig) 66,049,194 35,777,342 (54%) s_4 475 33,237,593 66,475,186 36,247,656 s_5 475 33,150,790 66,301,580 36,350,706 s_6 475 33,223,371 66,446,742 36,102,470 s_7 475 32,647,890 65,295,780 36,102,370 total . 170,218,743 340,437,486
s_2_1.1kbp 1100 21,502,608 43,005,216 33,900,260 s_2_1.4kbp 14000 29,125,202 58,250,404 47,076,236
Corrected/Adaptor-free Traces (Daniela's selection)
lib mates reads repeatReads s_2_3kb.trim64 2,888,124(13.39% orig) 5,776,248 ~0 # these reads aligned to the all read SOAPdenovo assembly; ~ 20% of the mates have 1 mate aligned to linker & ~ 80% of mates are linker free s_2_8kb.trim64 4,883(0.01% orig) 9,766 ~0 # these reads aligned to the all read SOAPdenovo assembly
- repeatReads:
- at least one of the mate contains a perfect match of one of the 15 frequent 22mers listed below
- 32.5%GC in repeatREads vs ~ 35.5%GC in uniqueReads
- Location
/fs/szattic-asmg5/Bees/Megachile_rotundata/error_correction/large_libs/s_?_?_?kb.sequence.cor.all.txt # large insert libs ; inverted compared to the original (outies => innies) /fs/szattic-asmg5/Bees/Megachile_rotundata/error_free/s_?_?_sequence.cor.txt # short insert libs /fs/szattic-asmg5/Bees/Megachile_rotundata/frg/ # frg files to assemble
Adaptors
>circularizarion CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA >circularizarion.revcomp TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG
Frequent kmers
- 22mers which seem to appear in tandem
~ %reads s_2_3kb s_2_8kb s_[4567] s_2_1k ------------------------- 1 AATCATACAATCACAATCATAC|GTATGATTGTGATTGTATGATT 12.04 20.1 9.99 21.18 2 CAATCACAATCATACAATCACA|TGTGATTGTATGATTGTGATTG 10.5 17.87 8.3 2.33 3 AATAATATGAGTTAGATTGATA|TATCAATCTAACTCATATTATT 7.94 11.77 21.47 9.12 4 AGTAATTGTCGTTCTATCGATC|GATCGATAGAACGACAATTACT 5.08 7.47 13.04 6.33 5 ATATAAGCATAATATGGCTAAT|ATTAGCCATATTATGCTTATAT 5.01 7.55 15.15 4.19 6 CACACAATCACACAATCACACA|TGTGTGATTGTGTGATTGTGTG 4.72 8.57 2.32 7 ATTACTCTTATTATTATCAATC|GATTGATAATAATAAGAGTAAT 4.62 6.67 11.8 8 TCACACAATCACAATCACACAA|TTGTGTGATTGTGATTGTGTGA 3.76 7.01 1.54 9 ACAATTACTATACTTATTACTC|GAGTAATAAGTATAGTAATTGT 2.94 4.39 8.46 10 AGACAGAGACAGAGACAGAGAC|GTCTCTGTCTCTGTCTCTGTCT 2.17 5.66 1.03 1.03 11 CACAATCACGATCACACAATCA|TGATTGTGTGATCGTGATTGTG 1.43 2.25 0.5 12 CTGTCTCTGTCTGTCTCTGTCT|AGACAGAGACAGACAGAGACAG 1.34 3.77 0.68 13 CAGCGGATATGTGCGAATTAGA|TCTAATTCGCACATATCCGCTG 0.8 0.54 0.73 14 CTGAGCACAATTCAACACCACA|TGTGGTGTTGAATTGTGCTCAG 0.58 0.35 0.68 15 AACCTAACCTAACCTAACCTAA|TTAGGTTAGGTTAGGTTAGGTT 0.06 0.15 0.03 total 31 55 50 52
- Location
ginkgo:/scratch1/dpuiu/Megachile_rotundata/Data/error_free/noRepeats/ # repeat free FASTQ reads & FRG files /fs/szattic-asmg5/Bees/Megachile_rotundata/error_free_repeats/ # repeat ids & unique FASTQ reads
Assemblies
- CA Version: 6.1 (09/01/2010) /fs/szdevel/dpuiu/SourceForge/wgs-6.1/Linux-amd64/bin/runCA
- SOAP version 1.04: /nfshomes/dpuiu/szdevel/SOAPdenovo_Release1.04/
CA noOBT ; partial s_2_3kb, s_2_8kb, s_3
- Data : 3 libs : ~ 16X cvg
- Files
/fs/szattic-asmg5/Bees/Megachile_rotundata/Assembly/assemblyAdaptorFree/longLibrariesAdaptorFree/s_2_?kb_?.filter.fastq # inverted /fs/szattic-asmg5/Bees/Megachile_rotundata/error_free_better/s_3_?_sequence.cor.txt
Gatekeeper
LibraryName numActiveFRG numDeletedFRG numMatedFRG readLength clearLength GLOBAL 72,995,448 0 70632632 8307194830 8278360381 LegacyUnmatedReads 0 0 0 0 0 s_2_3kb 9,166,343 0 8736228 942501164 914798596 s_2_8kb 210,266 0 199620 21669112 20742291 s_3 63,618,839 0 61696784 7343024554 7342819494
UID IID mateUID mateIID libUID libIID isDel isNonRandom Orient Length clrBeginLATEST clrEndLATEST 110000000001 1 120000000001 2 s_2_3kb 1 0 0 I 75 0 75 120000000001 2 110000000001 1 s_2_3kb 1 0 0 I 123 0 123 110000000003 3 120000000003 4 s_2_3kb 1 0 0 I 90 0 90 120000000003 4 110000000003 3 s_2_3kb 1 0 0 I 123 40 123 ... 110009166343 9166343 0 0 s_2_3kb 1 0 0 U 76 11 76 210009166344 9166344 220009166344 9166345 s_2_8kb 2 0 0 I 123 21 123 ... 210009376609 9376609 0 0 s_2_8kb 2 0 0 U 88 0 88 320009376610 9376610 0 0 s_3 3 0 0 U 72 0 72 ... 310072995448 72995448 0 0 s_3 3 0 0 U 68 0 68
BOG/ tigStore
- Number of tigs in the store
tigStore -g asm.gkpStore -t asm.tigStore 2 -D unitiglist | tail -1 | awk '{print $1}' # 36318422
- Single read tigs
tigStore -g asm.gkpStore -t asm.tigStore 2 -U -d layout | grep -c '^data.num_frags 1$' # 34985292 ts2lay | grep -B 9 -A 3 '^data.num_frags 1$'
Stats
. elem min q1 q2 q3 max mean n50 sum #repeats comments scf 20,827 122 3228 6374 13700 202495* 11508 20462* 239696810 SOAPdenovo: max=1102803 , N50=26876 ctg 37,494 65 2185 3998 7706 191323* 6380 10151* 239226293 206 SOAPdenovo: max=121554 , N50=3138 deg 1,136,469 64 123 143 184 5031 160 164 181954480 807132 utg 1,437,146 64 123 143 195 67048 308 870 443759899 readsTotal 72,995,448 readsInContigs 27,837,956 readsInDegenerates 9,627,122 singletons 34,881,692 (47%) readsWithOuttieMate 3,028,956(4.15%) ???
Placed reads . badLong badOuttie badSame bothDegen bothSurrogate diffScaffold good notMated oneChaff oneDegen oneSurrogate s_2_3kb 534 2,998,286 458 1614846 9892 21872 27308 267328 979980 760044 65268 s_2_8kb 4 26,864 10 38636 114 294 178 5044 35465 7848 1022 s_3 11072 3,806 1104 2369982 61236 53370 23058022 1208112 3967689 371538 87260
Chaff reads . bothChaff notMated oneChaff s_2_3kb 1,277,760 162,787 979,980 s_2_8kb 53,588 5,602 35,465 s_3 27,684,878 713,943 3,967,689
Issues
- reads are renamed : HWI-EAS385_0062:2:1:1036:15608#GCCAAT/1 => UID:110000000001 => IID:1
- reads < 64bp are deleted from the beginning : ID mapping ???
- lib s_2 orientation ??? Too many badOuttie's
Location
ginkgo:/scratch1/dpuiu/Megachile_rotundata/Assembly/wgs-noOBT-partial.1/
CA noOBT ; partial : s_2_3kb, s_2_8kb, s_3 ; no repeats
- Reads that contain at least one of the 15 most frequent 22mers are deleted from the input set
Gatekeeper
LibraryName numActiveFRG numDeletedFRG numMatedFRG readLength clearLength GLOBAL 33811292 0 32385550 3781368484 3772837304 LegacyUnmatedReads 0 0 0 0 0 s_2_3kb 5103461 0 4928188 518415011 510187775 s_2_8kb 53215 0 51436 5395700 5289866 s_3 28654616 0 27405926 3257557773 3257359663
Overlapper
- Dirty 3' ends for the s_2_* reads
totalOvl avgOvl s_2_3kb 5' 4955294 9 s_2_3kb 3' 4955294 7 s_2_8kb 5' 51050 10 s_2_8kb 3' 51050 7 s_3 5' 27721948 9 s_3 3' 27721948 9
Bog
cat 4-unitigger/asm.cga.0 | head Global Arrival Rate: 0.125220 There were 1,983,199 unitigs generated. Unitig Length 65407 - 67872: 4 50209 - 58608: 5 40073 - 49263: 27 30132 - 39913: 72 20030 - 29892: 319 10001 - 19992: 1979 9007 - 9999: 673 8000 - 8999: 934 7000 - 7999: 1332 6000 - 6998: 2048 5000 - 5999: 3103 4000 - 4999: 4898 3000 - 3999: 8120 2000 - 2999: 14634 1000 - 1999: 26621 900 - 999: 4116 800 - 899: 4457 700 - 799: 5042 600 - 699: 6146 500 - 599: 8107 400 - 499: 11901 300 - 399: 19373 200 - 299: 64394 100 - 199: 1173219 90 - 99: 161987 80 - 89: 189874 70 - 79: 132943 64 - 69: 82098
UTG
- The default unitigger tried as well. Fails with the following message:
unitigger: AS_FGB_io.C:338: void add_overlap_to_graph(Aedge, Tfragment*, Tedge*, IntFragment_ID*, VarArrayIntEdge_ID*, int, int, int, IntEdge_ID*, IntEdge_ID*, IntEdge_ID*): Assertion `ialn > iahg' failed. Failure message: failed to unitig
Stats
- Larger max scf & ctg !!! (compared with "CA noOBT partial" that assembled the repeats as well)
. elem min q1 q2 q3 max mean n50 sum scf 21041 65 3174 6334 13482 337719* 11376 20153* 239366537 ctg 37668 65 2181 3963 7687 191376* 6343 10083* 238928665 deg 380596 64 107 126 170 4688 163 160 62151395 # ~22% of deg align with 1 mismatch to kmers utg 652051 64 115 133 225 67870 491 2469 320381694 readsTotal 33,811,292 readsInContigs 27,753,101 (82.08%) readsInDegenerates 4,004,853 (11.84%) singletons 1,276,811 (3.78%) # about 12% of singletons align with 1 mismatch to the frequent kmers
Placed reads . badLong badOuttie badSame bothDegen bothSurrogate diffScaffold good notMated oneChaff oneDegen oneSurrogate 1 582 2992742 410 773006 13486 20146 26870 159957 107838 753916 78412 2 1228 2 9486 84 124 62 1550 12940 7750 860 3 11354 3884 1074 2266364 90824 56066 23001316 1165421 346169 416116 156218
~/bin/asm2mdi.pl < asm.asm s_2_3kb 16 87 ??? s_2_8kb 8000 800 s_3 337 27
Location
ginkgo:/scratch1/dpuiu/Megachile_rotundata/Assembly/wgs-noOBT-partial.1.noRepeats/
CA noOBT ; partial : s_2_3kb & s_2_8kb (trim 64), s_3 no repeats
s_2_3kb & s_2_8kb reads filtering:
- trimmed to the first 64bp
- aligned to one of the SOAPdenovo assemblies (soap2)
- filtered only the mated reads and the single reads with mates aligned to different scaffolds.
The main goal was to get rid of the short inserts present in the long insert libraries (confuse the Celera scaffolder). At the same time I got rid of the linkers (since the linker did not get assembled by SOAPdenovo & all these reads align to the assembly) ...
LibraryName numActiveFRG numDeletedFRG numMatedFRG readLength clearLength GLOBAL 34220457 0 32793056 3613771597 3613573487 s_3 28654616 0 27405926 3257557773 3257359663 s_2_1_3kb.trim64 5556371 0 5377908 355607744 355607744 s_2_1_8kb.trim64 9470 0 9222 606080 606080
Stats
. elem min q1 q2 q3 max mean n50 sum scf 6,407 70 3609 10907 37417 854,387 38538 126946 246918298 ctg 46,442 64 882 2719 6385 184,930 5235.23 11002 243134341 deg 217,175 64 109 134 187 17,896 174.34 184 37861920 utg 564,404 64 77 124 226 82,947 539.31 3168 304389506 singletonReads 1,125,209
Location
ginkgo:/scratch1/dpuiu/Megachile_rotundata/Assembly/wgs-noOBT.partial.1.trim64/
CA noOBT ; partial : s_2_3kb, s_2_8kb, s_3 ; no repeats; reverse
- Reads that contain at least one of the 15 most frequent 22mers are deleted from the input set
- s_2_3kb & s_2_8kb libraries were reversed (since most of the reads in "CA noOBT partial" were outies)
- fewer bad mates
- Smaller contigs & scaffolds
CA noOBT ; partial : s_2_3kb, s_2_8kb, s_3 ; no repeats; no bad links
- Reads that contain at least one of the 15 most frequent 22mers are deleted from the input set
- s_2_3kb & s_2_8kb libraries : all mates listed as bad got "broken"
CA OBT ; partial : s_2_3kb, s_2_8kb, s_3 ; no repeats ; doDeduplication
- smaller contigs, scaffolds
CA noOBT ; partial s_3 , s_4 ; no repeats
- Reads that contain at least one of the 15 most frequent 22mers are deleted from the input set
Gatekeeper
LibraryName numActiveFRG numDeletedFRG numMatedFRG readLength clearLength GLOBAL 56839966 0 54032986 6425669702 6425397526 LegacyUnmatedReads 0 0 0 0 0 s_3 28654616 0 27405926 3257557773 3257359663 s_4 28185350 0 26627060 3168111929 3168037863
Stats
. elem min q1 q2 q3 max mean n50 sum scf 12908 148 4003 8811 22202 511831 19416 42207 250623581 ctg 23116 64 2888 5959 13088 255301 10828 20109 250302752 deg 274961 64 124 139 182 4652 172 164 47499725 utg 574873 64 123 132 202 128547 563 4478 324059992
. elem min q1 q2 q3 max mean n50 sum SLK 243 -50905 -7731 -448 -152 126 -5863 126 -1424832 CLK 516105 -245237 -59 36 82 208 -1112 208 -574243691 ULK 1123683 -150489 -83 6 71 209 -329 209 -369827832 CTP 10012 -19762 -20 -9 10 9876 -2 9876 -25270
Location
ginkgo:/scratch1/dpuiu/Megachile_rotundata/Assembly/wgs-noOBT-partial.2.noRepeats
CA noOBT ; partial : s_2_3kb & s_2_8kb (trim 64), s_3 ,s_4 no repeats
. elem min q1 q2 q3 max mean n50 sum scf 5159 64 3501 9986 45329 1493864 49143.80 180907 253532854 ctg 23598 64 1641 4533 12544 298755 10661.13 26042 251581405 deg 258950 64 121 133 176 13861 162.26 161 42018113 utg 656004 64 84 124 162 128674 499.73 5935 327823531
Location:
ginkgo:/scratch1/dpuiu/Megachile_rotundata/Assembly/wgs-noOBT-partial.2.noRepeats.trim64/
CA noOBT **
- Data : 7 libs : ~ 74X cvg
Gatekeeper
LibraryName numActiveFRG numDelFRG numMatedFRG readLength clearLengt #repeats GLOBAL 326,236,387 0 315518526 37451489553 37418130441 s_2_3kb 9107424 0 9107424 942165284 910444046 # s_2_8kb 209336 0 209336 21814418 20787384 # s_3 63618839 0 61696784 7343024554 7342819494 # s_4 63544688 0 61255960 7291557748 7291478152 # s_5 63370860 0 61084368 7271218123 7271051639 # s_6 63780887 0 61685156 7359094156 7359012512 # s_7 62604353 0 60479498 7222615270 7222537214 #
Meryl
meryl -Dh -s 0-mercounts/asm-C-ms22-cm0 Found 30570218845 mers. Found 271464470 distinct mers. Found 11164787 unique mers. Largest mercount is 87984949; 1896 mers are too big for histogram. 1 11164787 0.0411 0.0004 2 9376915 0.0757 0.0010 3 3714582 0.0894 0.0013 ... 54 5344148 0.6573 0.1788 ... 87984949 1 # AATCATACAATCACAATCATAC
Overlap
- job count :
cat 1-overlapper/ovlopts.pl | grep ^\"h | wc -l 924
- Failures: 709 jobs failed; runCA 6.1 could not restart overlap properly !!!
cat 1-overlap/overlap*out | grep "^Could not" | sort -u Could not malloc memory (1305184948 bytes)
- Only ~ 60% of the reads had overlaps
Bog
cat 4-unitigger/asm.cga.0 Global Arrival Rate: 0.443659 There were 158,805,551 unitigs generated. Unitig Length Global Arrival Rate: 0.443659 # ??? <=> 200X cvg 100071 - 168549: 21 90845 - 99102: 15 80566 - 88867: 17 70006 - 79485: 39 60191 - 69891: 51 50210 - 59643: 98 40106 - 49917: 191 30015 - 39986: 448 20006 - 29992: 1068 10000 - 19995: 4187 9001 - 9999: 942 8001 - 8998: 1202 7000 - 7999: 1489 6000 - 6999: 1927 5000 - 5999: 2379 4000 - 4999: 3266 3000 - 3999: 4580 2000 - 2999: 6979 1000 - 1999: 9654 900 - 999: 1176 800 - 899: 1346 700 - 799: 1658 600 - 699: 2405 500 - 599: 4742 400 - 499: 13047 300 - 399: 26578 200 - 299: 361389 100 - 199: 135260255 90 - 99: 7874207 80 - 89: 7147630 70 - 79: 5128367 63 - 69: 2427507
138,219,089 out of 158,805,551 contain one of the frequent kmers
CGW
- Monitor cgw
ps -C cgw PID PPID %MEM RSZ %CPU STIME TIME CMD 8563 8560 95.2 251872528 88.2 13:24 01:47:56 /fs/szdevel/dpuiu/SourceForge/wgs-6.1/Linux-amd64/bin/cgw ...
top -b -p 8563 -d 10 | grep dpuiu > cgw.resource_usage.log
- Failure 1:
tail 7-0-CGW/cgw.out ... Processed 158,288,858 unitigs with 326,296,236 fragments #Bumble bee : Processed 61,930,044 unitigs with 301,738,113 fragments * Loaded dist s_2_3kb,1 (3000 +/- 300) * Loaded dist s_2_8kb,2 (8000 +/- 800) * Loaded dist s_3,3 (475 +/- 47.5) ... * Splitting chimeric input unitigs LIB 1 mu = 15.318100 sigma = 89.035478 LIB 2 mu = 8000.000000 sigma = 800.000000 LIB 3 mu = 337.817628 sigma = 26.699549 ... minLength = 460 minSplit = -429 Splitting unitig 47689 into as many as 3 unitigs at intervals: 22905,22906 .. Splitting unitig 158234882 into as many as 3 unitigs at intervals: 124,136 * BuildGraphEdgesDirectly
Fix (partial): add "-I" flag to cgw in runCA cat 7-0-CGW/cgw.out ... *** BuildGraphEdgesDirectly Operated on 171664374 fragments
- Failure 2:
tail 7-0-CGW/cgw.out **** Calling CheckEdgesAgainstOverlapper **** **** Survived CheckEdgesAgainstOverlapper with 0 failures**** * Allocating Contig Graph with 158289029 nodes and 14055921 edges Could not calloc memory (25326244640 * 1 bytes = 25326244640) cgw: AS_UTL_alloc.C:55: void* safe_calloc(size_t, size_t): Assertion `p != __null' failed. Fix : delete single fragment unitigs (tigStore) and the fragments asseociated with them (gkpStore)
158,288,858 unitigs total 154,631,861 unitigs to delete (single frg unitigs) 3,656,997 unitigs to keep 326,236,387 frg total 154,631,861 frg to delete (single frg unitigs) 171,604,526 frg to keep
Stats
. elem min q1 q2 q3 max mean n50 sum scf 11,768 109 4232 8819 22525 741575** 21676 51546 255094767 ctg 16,852 64 3501 7433 17418 317387** 15122 31217 254844680 ctg100+ 16,822 100 3514 7453 17442 317387 15149 31217 254842109 deg 3,388,183 64 125 138 168 4608 150 148 511490229 utg 3,657,090 64 124 138 169 168543 216 179 792973816 totalReads 326,236,387 usableReads 171,664,375 singletonReads 1
- Mate stats (%)
lib badLong badOuttie badSame bothDegen bothSurrogate diffScaffold good notMated oneDegen oneSurrogate 1 0.01 53.9 0 15.57 0.36 0.24 0.4 18.33 9.45 1.68 2 0 1.1 0 26.09 0.31 0.09 0.03 61.57 9.42 1.32 3 0.02 0 0 7.94 0.5 0.11 69.91 19.88 1.03 0.48 4 0.02 0 0 7.78 0.48 0.11 69 21.01 1.01 0.47 5 0.02 0 0 7.71 0.47 0.11 69.02 21.08 1 0.47 6 0.02 0 0 7.72 0.48 0.11 69.76 20.32 1 0.47 7 0.02 0 0 7.74 0.48 0.11 69.2 20.89 1 0.46
lib mates ULK CLK SLK 1 2494975 961180 521360 14 2 14560 14483 12375 4 3 13510875 2120862 1169185 2502 4 13100370 2048500 1128523 2392 5 12974843 2017970 1110081 2467 6 13304733 2059656 1125182 2481 7 12774596 1986181 1095046 2386
cat SLK.asm | grep ^mea | sed 's/mea://' | getSummary.pl -t SLK | pretty -int -o cat CLK.asm | grep ^mea | sed 's/mea://' | getSummary.pl -t CLK | pretty -int -o cat ULK.asm | grep ^mea | sed 's/mea://' | getSummary.pl -t ULK | pretty -int -o cat SCF.asm | grep mea | grep -v mea:0.000 | sed 's/mea://' | getSummary.pl -t CTP | pretty -int -o
. elem min q1 q2 q3 max mean n50 sum SLK 981 -53589 -484 -328 -183 4607 -1311 4607 -1286505 CLK 2714158 -488421 -42 34 73 7856 -682 7856 -1853032141 ULK 3576611 -188986 -71 24 69 7856 -338 7856 -1209800223 CTP 4987 -33482 -19 -6 15 10646 1 10646 8602
Location
mulberry:/scratch2/dpuiu/Megachile_rotundata/Assembly/wgs-noOBT/ /fs/szattic-asmg5/Bees/Megachile_rotundata/Assembly/wgs-noOBT/
- Ftp
ftp://ftp.cbcb.umd.edu/pub/data/assembly/Megachile_rotundata.wgs.1/ /fs/ftp-cbcb/pub/data/assembly/Megachile_rotundata.wgs.1
- Try:
- bog
-b Break promisciuous unitigs at unitig intersection points => delete -m 7 Break a unitig if a region has more than 7 bad mates => increase to 1000
- cgw :
-m <min> Number of mate samples to recompute an insert size, default is 100 => increase to ?
SOAPdenovo (Tanja) **
cat *.ContigIndex | grep -v ^E | grep -v ^i | count.pl -i 1 | getSummary.pl -j 1 -t "contigs" cat *.ContigIndex | grep -v ^E | grep -v ^i | count.pl -i 1 | getSummary.pl -j 1 -min 100 -t "contigs(>100bp)" grep "^>" *.scaf | getSummary.pl -i 2 -t scaf
- Stats
. elem min q1 q2 q3 max mean n50 sum scaf 7863 102 903 3272 17692 2,338,728** 37825 240,706 297423517 # N50 for Bee was 1.17M contigs 9742349 31 32 33 37 114,832 60 44 585430821 contigs(>100bp) 177327 100 131 261 1398 114,832 1333 3897** 236496823 # N50 for Bee was 7K
- Location
/fs/szattic-asmg5/Bees/Megachile_rotundata/Assembly/assembly5kbForAll
SOAPdenovo (Daniela)
Stats
. elem min q1 q2 q3 max mean n50 sum scaff 25,119 351 1896 4444 10914 1,102,803 11041 26876 277,338,897 scaff+singleCtgs 51,551 100 121 489 4320 1,102,830 5516 25940 284,368,749 scaffContigs(all) 263,977 33 78 149 878 121,523 914 3273 241,406,149 # <= scafffasta2fasta.pl contigs(all) 6,917,796 31 32 34 40 121,554 70 73 487,401,812 contigs(>100bp) 210,666 100 124 222 1174 121,554 1108 3138 233,563,401 reads 340,437,486 readsOnContigs 171,212,613
Alignments
- Align reads to the scaffolds
soap2-index asm.K31.scafSeq mkdir soap2-index mv asm.K31.scafSeq.index.* soap2-index/ soap2 -D soap2-index/asm.K31.scafSeq.index -a s_2_1_1.1kb_sequence.txt -b s_2_2_1.1kb_sequence.txt -l 32 -p 16 -v 2 -m 800 -x 1400 -o s_2_1.1kb.mated.soap2 -2 s_2_1.1kb.single.soap2 soap2 -D soap2-index/asm.K31.scafSeq.index -a s_2_1_3kb_sequence.txt -b s_2_2_3kb_sequence.txt -l 32 -p 16 -v 2 -m 2000 -x 4000 -o s_2_3kb.mated.soap2 -2 s_2_3kb.single.soap2 -R soap2 -D soap2-index/asm.K31.scafSeq.index -a s_2_1_5kb_sequence.txt -b s_2_2_5kb_sequence.txt -l 32 -p 16 -v 2 -m 4000 -x 6000 -o s_2_5kb.mated.soap2 -2 s_2_5kb.single.soap2 -R soap2 -D soap2-index/asm.K31.scafSeq.index -a s_2_1_8kb_sequence.txt -b s_2_2_8kb_sequence.txt -l 32 -p 16 -v 2 -m 6000 -x 10000 -o s_2_8kb.mated.soap2 -2 s_2_8kb.single.soap2 -R soap2 -D soap2-index/asm.K31.scafSeq.index -a s_3_1_sequence.txt -b s_3_2_sequence.txt -l 32 -p 16 -v 2 -m 200 -x 400 -o s_3.mated.soap2 -2 s_3.single.soap2 ... mates mated single single.diffScaff single.sameScaff s_2_1.1kb 32,634,858 1,466,112 8,014,900 131,084 585,668 s_2_3kb 21,563,283 1,545,114 8,449,321 341,974 1,203,570 s_2_3kb.trim64 21,563,283 4,291,750* 12,156,958 1,484,498* 3,457,492 s_2_3kb.filter 4,823,235 3,730 3,618,426 38,562 3,017,172 s_2_5kb 36,218,589 5,639,332 44,533,553 4,784,348 30,621,038 s_2_8kb 198,377 1,068 32,280 1,168 2,842 s_2_8kb.trim64 198,377 5,508* 48,877 4,258* 3,532 s_2_8kb.filter 111,267 20 33,819 372 27,300
s_3 35,548,153 15,521,020 4,589,276 37,564 651,924
Location
mulberry:/scratch2/dpuiu/Megachile_rotundata/Assembly/SOAPdenovo/
SOAPdenovo ; partial : s_[34567] ; no repeats **
- Slightly better results than we got when the s_2_?kb libs were used
Stats
. elem min q1 q2 q3 max mean n50 sum scf 24602 333 1724 4380 11049 1,103,462 11135 27887 273963709 contigs(all) 2515516 31 33 36 52 148,198 131 1880 330512911 contigs(100bp+) 184395 100 127 235 1316 148,198** 1263 3730 232932308
Location
ginkgo: /scratch1/dpuiu/Megachile_rotundata/Assembly/SOAPdenovo-partial.3.noRepeats/
SOAPdenovo ; s_2_3kb & s_ 2_8kb soap-aligned & trim64; s_[34567] no repeat **
Stats
. elem min q1 q2 q3 max mean n50 sum scf 9963 351 898 2365 13580 1,884,770 28737 191220 286316389 contigs(all) 2515514 32 33 36 52 148,198 131 1881 330514291 contigs(100bp+) 184382 100 127 235 1316 148,198 1263 3730 232931762
Location
mulberry: /scratch2/dpuiu/Megachile_rotundata/Assembly/SOAPdenovo.noRepeats.trim64/