NCBI submission: Difference between revisions

From Cbcb
Jump to navigation Jump to search
Dpuiu (talk | contribs)
No edit summary
Dpuiu (talk | contribs)
Line 29: Line 29:
Scripts:  
Scripts:  
   /nfshomes/dpuiu/Archives/JCVI/bin/phred2xmlTrace.pl
   /nfshomes/dpuiu/Archives/JCVI/bin/phred2xmlTrace.pl
== SRA submission ==
  server: ftp-trace.ncbi.nlm.nih.gov
  login:        cbcb_trc
  password:      t@@GeaYF
 
  Center_name (acronym): CBCB
  Full name: Center for Bioinformatics and Computational Biology,
  University of Maryland
  Short reads: uploaded to short_read/
  Sanger reads: uploaded to uploads/
  [http://www.ncbi.nlm.nih.gov/Traces/field_matrix_current.xls Validation table]


== AA submission ==
== AA submission ==

Revision as of 17:48, 21 February 2008

NCBI

BankIt

seqin: standalone application

WGS

Genome submission

genomesubmit

TA submission

TA

Compressed archive containing 
  3 files: TRACEINFO.xml, MD5, README
  traces/ directory
  SCF format traces under traces/ or traces/*/
 
The archive(s) is/are gzip files 1-4GB; include center's name and the date into file names
Accepted only by uploading to NCBI FTP server.
  server: ftp-trace.ncbi.nih.gov
  login: 
  passwd: 
  center: UMD

Scripts:

 /nfshomes/dpuiu/Archives/JCVI/bin/phred2xmlTrace.pl

SRA submission

 server: ftp-trace.ncbi.nlm.nih.gov
 login:         cbcb_trc
 password:      t@@GeaYF
 
 Center_name (acronym): CBCB 
 Full name: Center for Bioinformatics and Computational Biology,
 University of Maryland
 Short reads: uploaded to short_read/ 
 Sanger reads: uploaded to uploads/
 Validation table

AA submission

AA

 Compressed archive containing 2 files: ASSEMBLY.xml , MD5 
 Accepted only by uploading to NCBI FTP server.
   server: ftp-private.ncbi.nlm.nih.gov
   login: umd_trc
   passwd: 
   center: UMD   
   description: University of Maryland
 ASSEMBLY XML Schema png 
 ASSEMBLY XML Schema xsd 

Use XContig package scripts

Files:

.contig      : contigs & underlying reads 
.seq         : read sequences
.qual        : read qualities
.ti2seq_name

Steps:

1. makeConinfo ASSEMBLY.coninfo
 $ more ASSEMBLY.coninfo
 <coninfo>
 <meta name='center'>UMD</meta>
 <meta name='db'>Xoo</meta>
 <meta name='desc'>Xanthomonas oryzae pv. oryzae strain PXO99A</meta>
 <meta name='object'>ASSEMBLY</meta>
 <meta name='species_code'>Xanthomonas oryzae pv. oryzae strain PXO99A</meta>
 <meta name='structure'>Chromosome</meta>
 <meta name='subtype'>NEW</meta>
 <meta name='taxid'>360094</meta>
 <contig id="1106158952778_stitched" conformation="CIRCULAR" subtype="NEW"/>
 <file src="Xoo.contig"/>
 <seq src="Xoo.seq"/>
 <qual src="Xoo.qual"/>
 <idmap  src="Xoo.ti2seq_name" direction="FORWARD"/>
 </coninfo>
2. buildAssemblyArchive ASSEMBLY.coninfo --prompt --subname umd-20070816-125223
 problems:
    * submitter_reference="tigr...." : replace tigr with umd
    * conformation: always LINEAR    : replace LINEAR with CIRCULAR
    * taxid: not recognized          : replace <taxid>id</taxid> with <organism descriptor="TAXID">id</organism>
 $ md5sum umd-20070816-125223/ASSEMBLY.xml
 $ edit umd-20070816-125223/MANIFEST         # update ASSEMBLY.xml md5sum 
 
 $ ls -1 umd-20070816-125223*
 umd-20070816-125223.tar.gz
 umd-20070816-125223/
  1106158952778_stitched_20070817-141849.con       # Contig consensus
  1106158952778_stitched_20070817-141849.congap    # Contig gaps
  ASSEMBLY.xml                                     # Assembly XML
  MANIFEST                                         # MD5 sums
$ tar czvf umd-20070816-125223.tar.gz umd-20070816-125223/
3. validate:
 oXygen: software used by NCBI; license required
 xmllint: open source
 $ xmllint --schema ASSEMBLY.xsd umd-20070816-125223/ASSEMBLY.xml > /dev/null
 umd-20070816-125223/ASSEMBLY.xml validates