Kalanchoe

From Cbcb
Revision as of 15:09, 13 April 2011 by Dpuiu (talk | contribs)
Jump to navigation Jump to search

Data

  • 300M genome
  • ~5x 454 from a variety of library sizes
  • ~20x illumina (Illumina scale qualities).
  • Location:
 /fs/szattic-asmg7/Kalenchoe_genome/

454

 LIB          reads  mates       meaIns          stdIns          20mers                  ids                       
 fff01.frg.gz 545792 0                                           AT                      GLMTKIY01
 fff02.frg.gz 459461 0                                           AT                      GLMTKIY02
 fff03.frg.gz 477691 0                                           AC                      GLZRKVN01
 fff04.frg.gz 610848 0                                           AAACCCTAAACCCTAAACCCTA  GKFZ9MZ01
 fff05.frg.gz 450912 0                                           AAAACCCATAAAGTTGTTATTT  GKFZ9MZ02
 fff06.frg.gz 548462 0                                           AACAAGGCACACAGGGGATAGG  GKH094001
 fff11.frg.gz 418299 118317      20k,17072       4268            CG                      GMF8K3302    
 fff12.frg.gz 807808 273276      8k,6609         1652            AT                      GK7ZAL002    
 fff13.frg.gz 638072 205830      8k,6571         1642            ACGTACGTACGTACGTACGTAC  GLC77YN02    
 fff14.frg.gz 771593 231598      3k,2749         687             AT                      GK7ZAL001    
 fff15.frg.gz 634113 165697      3k,2768         692             AT                      GLC77YN01    
  • linker issues
'titanium' == TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG and
              CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA

Illumina

Assembly

Blogs

 assemble the illumina lane with SOAPdenovo and then 
 assemble with newbler the 454 lanes and the resulting contig of SOAP2.  
 SOAP uses a de bruijn graph data structure that is well suited for short illumina reads but it is not enough flexible in order to handle 454 reads.
 Newbler, instead, is based on Overlap Layout Approach that work well with long reads
  • CABOG ... throws a lot of errors on our architecture for some reason