Cbcb:Pop-Lab:Chris-Report: Difference between revisions

From Cbcb
Jump to navigation Jump to search
No edit summary
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
== June 12, 2009 ==
===Tasks===
'''Intergenic space and gene ontology work'''
*Investigated GO annotation tools that Bo has linked me to.
*I have been working on the Snail, ''Lottia gigantea'', genome.
*Found the top 10% intergenic space with relative ease, however there was no GO annotation file for the genes.  Most of the genomes of interest will not have a GO annotation file (shocker, I know).
*BLAST'd the sequences using Blast2GO, '''ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data.''' http://www.blast2go.org/
*Blast2GO has a pipeline version, so I have been planning a way to take new incomplete genomes and have them run through this annotation pipeline.
*BLAST took around 8 hours for ~2200 sequences using an online database.  I will run another group on the CBCB's BLAST.
*GO annotation using Blast2GO took around 24 hours.
*I plan on running HMMs for the approximate 200 sequences with no suitable hits this weekend.
'''Partition'''
*Drastically overhauling the backend of my partition program to make it more extendable/robust.
*Adding XML support to make parsing easier and increase possible functionality.
*Need to focus on getting James something usable asap.
===Summer Goals===
*Finish analyzing the intergenic space of the incomplete genomes.  Determine future possibilities/publications.
*Modify/extend partition program to incorporate into James' metagenomic pipeline.
*Include viral metagenomic data into GeneParser.
== Mar 2, 2009 ==
== Mar 2, 2009 ==
'''Intergenic Space and Gene Ontology work with Cristian'''   
===Tasks===
'''Intergenic space and gene ontology work with Cristian'''   
*Approach
*Approach
   *Get the gene-spacing information - typically a simple parse of GFF files, if they are available.
   Get the gene-spacing information - typically a simple parse of GFF files, if they are available.
   *Get the gene function information - use GO if available.
   Get the gene function information - use GO if available.
   *Rank the genes based on 5' spacing size.
   Rank the genes based on 5' spacing size.
   *Take 10% longest, 10% shortest and middle 20% and find out what they do with GeneMerge
   Take 10% longest, 10% shortest and middle 20% and find out what they do with GeneMerge
*Completed genomes
*Completed genomes
**Anenome, ''Nematostella vectensis''
**Anenome, ''Nematostella vectensis''
Line 11: Line 33:
**Waterflea, ''Daphnia pulex''
**Waterflea, ''Daphnia pulex''
*Incomplete genomes that require blast
*Incomplete genomes that require blast
**Gastropod Snail, Lottia gigantea
**Gastropod Snail, ''Lottia gigantea''
**Polychaete Worm, Capitella sp
**Polychaete Worm, ''Capitella sp''
*Future genomes
*Future genomes
**Leech, ''Helobdella robusta''
**Leech, ''Helobdella robusta''
Line 18: Line 40:
**Sea Slug,'' Aplysia californica''
**Sea Slug,'' Aplysia californica''
**Snail,'' Biomphalaria glabrata''
**Snail,'' Biomphalaria glabrata''
**Slime-mold, ''Dictyostelium purpureum QSDP1''
*Waiting to hear back from Cristian about blast value cut-offs for incomplete genomes.
'''Partitioning System'''
*Splitting contigs based on "subgroup" information - essentially extending breaking up an entire assembly into multiple "sub-assemblies" each containing just reads from a single subgroup.
'''Conserved genomic elements in bacteria'''
*Update elements based on Adam's changes to Insignia.
*Find something to write about.
===Interesting Stuff===
*New insights into aging based on transcription factors, [http://med.stanford.edu/news_releases/2008/july/aging-worm.html Prevailing theory of aging challenged in Stanford worm study]

Latest revision as of 00:47, 13 June 2009

June 12, 2009

Tasks

Intergenic space and gene ontology work

  • Investigated GO annotation tools that Bo has linked me to.
  • I have been working on the Snail, Lottia gigantea, genome.
  • Found the top 10% intergenic space with relative ease, however there was no GO annotation file for the genes. Most of the genomes of interest will not have a GO annotation file (shocker, I know).
  • BLAST'd the sequences using Blast2GO, ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data. http://www.blast2go.org/
  • Blast2GO has a pipeline version, so I have been planning a way to take new incomplete genomes and have them run through this annotation pipeline.
  • BLAST took around 8 hours for ~2200 sequences using an online database. I will run another group on the CBCB's BLAST.
  • GO annotation using Blast2GO took around 24 hours.
  • I plan on running HMMs for the approximate 200 sequences with no suitable hits this weekend.

Partition

  • Drastically overhauling the backend of my partition program to make it more extendable/robust.
  • Adding XML support to make parsing easier and increase possible functionality.
  • Need to focus on getting James something usable asap.

Summer Goals

  • Finish analyzing the intergenic space of the incomplete genomes. Determine future possibilities/publications.
  • Modify/extend partition program to incorporate into James' metagenomic pipeline.
  • Include viral metagenomic data into GeneParser.

Mar 2, 2009

Tasks

Intergenic space and gene ontology work with Cristian

  • Approach
 Get the gene-spacing information - typically a simple parse of GFF files, if they are available.
 Get the gene function information - use GO if available.
 Rank the genes based on 5' spacing size.
 Take 10% longest, 10% shortest and middle 20% and find out what they do with GeneMerge
  • Completed genomes
    • Anenome, Nematostella vectensis
    • Frog, Xenopus tropicalis
    • Waterflea, Daphnia pulex
  • Incomplete genomes that require blast
    • Gastropod Snail, Lottia gigantea
    • Polychaete Worm, Capitella sp
  • Future genomes
    • Leech, Helobdella robusta
    • Flatworm (Planaria), Schmidtea mediterranea
    • Sea Slug, Aplysia californica
    • Snail, Biomphalaria glabrata
    • Slime-mold, Dictyostelium purpureum QSDP1
  • Waiting to hear back from Cristian about blast value cut-offs for incomplete genomes.

Partitioning System

  • Splitting contigs based on "subgroup" information - essentially extending breaking up an entire assembly into multiple "sub-assemblies" each containing just reads from a single subgroup.

Conserved genomic elements in bacteria

  • Update elements based on Adam's changes to Insignia.
  • Find something to write about.

Interesting Stuff