Francisella tularensis tularensis FSC033: Difference between revisions

From Cbcb
Jump to navigation Jump to search
 
(30 intermediate revisions by the same user not shown)
Line 4: Line 4:
== Data ==  
== Data ==  


Reference:  
'''Reference:'''
   NZ_AAYE00000000
   NZ_AAYE00000000


Line 23: Line 23:
   NZ_AAYE01000014.1 91009  32.87
   NZ_AAYE01000014.1 91009  32.87
   NZ_AAYE01000015.1 50644  33.42
   NZ_AAYE01000015.1 50644  33.42
   Total            1844205
   Total            1844205         1,892,819 bp in SCHU S4(complete genome)=> ~ 48,614 bp in gaps


Traces: from NCBI TA
Traces: from NCBI TA


Libraries:
'''Libraries:'''


   CENTER PROJECT STRAIN LIB TYPE SIZE STDEV COUNT Location
   CENTER         PROJECT STRAIN LIB TYPE SIZE STDEV COUNT Location
   WIBR          G907 033 G907A1 WGS 4,500 450 768 TA
   WIBR          G907 033 G907A1 WGS 4,500 450 768 TA
   WIBR         G907 033 G907A2 WGS 4,500 450 768 TA
   WIBR         G907 033 G907A2 WGS 4,500 450 768 TA
Line 36: Line 36:
   WIBR         G919 033 454-410 454 . . 278,235 TA
   WIBR         G919 033 454-410 454 . . 278,235 TA
   StrainSubtotal                                                301,275
   StrainSubtotal                                                301,275
 
 
[https://wiki.umiacs.umd.edu/cbcb/images/3/34/NC_006570-NZ_AAYE00000000.png NC_006570-NZ_AAYE00000000.png] Mummerplot  Francisella tularensis subsp. tularensis Schu 4, complete genome vs Francisella tularensis subsp. tularensis FSC033
  [[Media:NC_006570-NZ_AAYE00000000.png|NC_006570-NZ_AAYE00000000.png]] Mummerplot  Francisella tularensis subsp. tularensis Schu S4 (complete genome) vs Francisella tularensis subsp. tularensis FSC033; looks like the repeat is collapsed in NZ_AAYE00000000


== Assembly ==
== Assembly ==


Locations:  
'''Locations:'''
   /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/
   /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/


2007_0807_AMOScmp
'''2007_0807_AMOScmp'''
   Uses default trimming
   Uses default trimming
   AMOScmp
   AMOScmp
   =>  13 scaff, 32 contigs
   =>  13 scaff, 32 contigs, 1,830,923 bp


2007_0807_WGA  
'''2007_0807_WGA'''
   Only WGS reads
   Only WGS reads
   runCA-OBT.pl => new read clr, new library insert estimates
   runCA-OBT.pl => new read clr, new library insert estimates  
   => 12 scaff, 23 contigs
 
  clr got shorter by runing OBT (usually is the opposite)
  .              #traces  min    median  max    sum            mean    stdev  n50
  CLIPPING        22713    1      868    1018    19613029        863.52  68.27  871
  OBT            22207    71      785    907    17128190        771.3  75.98  789
 
  lib: G907A3 mean=4274.536 stdev=110.196 (probably underestimate due to low coverage)
   => 12 scaff, 23 contigs, 1,882,120 bp
  MaxSurrogateLength=28659
 
'''2007_0807_AMOScmp-OBT'''
  use  read clr from 2007_0807_WGA
  => 13 scaff, 23 contigs, 1,834,339 bp
 
'''2007_0809_AMOScmp-OBT-454 '''
  use  read clr from 2007_0807_WGA
  include the 454 reads
  => 15 scaff, 18 contigs, 1,844,162 bp
  [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.5.png|scaff_5]]  Largest scaff : one region with no 454 coverage
  [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.4.png|scaff_4]]  2nd largest scaff : several regions with no 454 coverage
  [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.12.png|scaff_12]] 3rd largest scaff : ~ 32KB of bases at the beginning of this scaff have coverage twice as deep as the rest
                                                                                    this region aligns to the 28KB WGA surrogate
 
    show-coords 12-wga.delta -L 1000
    /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/2007_0809_AMOScmp-OBT-454/nucmer/12.fasta /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/2007_0809_AMOScmp-OBT-454/nucmer/wga.scaffold.fasta
 
    [S1]    [E1]  |    [S2]    [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS]
  ===============================================================================================================================
    179    28709  |  169258  140728  |    28531    28531  |  100.00  |  249990  169258  |    11.41    16.86  | 12 7180000000303
    179  249803  |  393212  143584  |  249625  249629  |    99.99  |  249990  393212  |    99.85    63.48  | 12 7180000000310  [CONTAINED]
  ...
 
'''2007_0810_AMOScmp-OBT-Schu4'''
  use  read clr from 2007_0807_WGA
  Schu S4 used as reference (looks like there are 2 inversions)
  26 contigs, 1,873,282 bp


2007_0807_AMOScmp-OBT
'''2007_0810_AMOScmp-OBT-454-Schu4 -> best'''
   use  read clr, library insert estimates from 2007_0807_WGA  
   use  read clr from 2007_0807_WGA  
   => 13 scaff, 23 contigs
   include the 454 reads
  Schu S4 used as reference (looks like there are 2 inversions)
  10 contigs, 1,890,475 bp

Latest revision as of 15:22, 26 November 2007

Center: Broad

 Status: Assembly (15 contigs)

Data

Reference:

 NZ_AAYE00000000
 Name              Length %GC
 NZ_AAYE01000001.1 101124 33.65
 NZ_AAYE01000002.1 46675  32.87
 NZ_AAYE01000003.1 1600   34.25
 NZ_AAYE01000004.1 295522 31.87
 NZ_AAYE01000005.1 650364 31.73
 NZ_AAYE01000006.1 2400   37.29
 NZ_AAYE01000007.1 132212 32.96
 NZ_AAYE01000008.1 23680  31.04
 NZ_AAYE01000009.1 201    45.27      high GC%: 16S-23S rRNA(megablast)
 NZ_AAYE01000010.1 571    46.06      high GC%
 NZ_AAYE01000011.1 61231  32.30
 NZ_AAYE01000012.1 249955 31.91
 NZ_AAYE01000013.1 137017 32.24
 NZ_AAYE01000014.1 91009  32.87
 NZ_AAYE01000015.1 50644  33.42
 Total             1844205          1,892,819 bp in SCHU S4(complete genome)=> ~ 48,614 bp in gaps

Traces: from NCBI TA

Libraries:

 CENTER         PROJECT	STRAIN	LIB	TYPE	SIZE	STDEV	COUNT	Location
 WIBR           G907	033	G907A1	WGS	4,500	450	768	TA
 WIBR	        G907	033	G907A2	WGS	4,500	450	768	TA
 WIBR	        G907	033	G907A3	WGS	4,500	450	20,736	TA
 WIBR	        G907	033	G907A4	WGS	4,500	450	768	TA
 WIBR	        G919	033	454-410	454	.	.	278,235	TA
 StrainSubtotal                                                 301,275
 
 NC_006570-NZ_AAYE00000000.png Mummerplot  Francisella tularensis subsp. tularensis Schu S4 (complete genome) vs Francisella tularensis subsp. tularensis FSC033; looks like the repeat is collapsed in NZ_AAYE00000000

Assembly

Locations:

 /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/

2007_0807_AMOScmp

 Uses default trimming
 AMOScmp
 =>  13 scaff, 32 contigs, 1,830,923 bp

2007_0807_WGA

 Only WGS reads
 runCA-OBT.pl => new read clr, new library insert estimates 
 clr got shorter by runing OBT (usually is the opposite)
 .               #traces  min     median  max     sum             mean    stdev   n50
 CLIPPING        22713    1       868     1018    19613029        863.52  68.27   871
 OBT             22207    71      785     907     17128190        771.3   75.98   789
 lib: G907A3 mean=4274.536 stdev=110.196 (probably underestimate due to low coverage)
 => 12 scaff, 23 contigs, 1,882,120 bp 
 MaxSurrogateLength=28659

2007_0807_AMOScmp-OBT

 use  read clr from 2007_0807_WGA 
 => 13 scaff, 23 contigs, 1,834,339 bp

2007_0809_AMOScmp-OBT-454

 use  read clr from 2007_0807_WGA 
 include the 454 reads
 => 15 scaff, 18 contigs, 1,844,162 bp
 scaff_5   Largest scaff : one region with no 454 coverage
 scaff_4   2nd largest scaff : several regions with no 454 coverage
 scaff_12 3rd largest scaff : ~ 32KB of bases at the beginning of this scaff have coverage twice as deep as the rest
                                                                                    this region aligns to the 28KB WGA surrogate
   show-coords 12-wga.delta -L 1000
   /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/2007_0809_AMOScmp-OBT-454/nucmer/12.fasta /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/2007_0809_AMOScmp-OBT-454/nucmer/wga.scaffold.fasta
   [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS]
  ===============================================================================================================================
    179    28709  |   169258   140728  |    28531    28531  |   100.00  |   249990   169258  |    11.41    16.86  | 12 7180000000303
    179   249803  |   393212   143584  |   249625   249629  |    99.99  |   249990   393212  |    99.85    63.48  | 12 7180000000310   [CONTAINED]
 ...

2007_0810_AMOScmp-OBT-Schu4

 use  read clr from 2007_0807_WGA 
 Schu S4 used as reference (looks like there are 2 inversions)
 26 contigs, 1,873,282 bp

2007_0810_AMOScmp-OBT-454-Schu4 -> best

 use  read clr from 2007_0807_WGA 
 include the 454 reads
 Schu S4 used as reference (looks like there are 2 inversions)
 10 contigs, 1,890,475 bp