Francisella tularensis tularensis FSC033: Difference between revisions

From Cbcb
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 36: Line 36:
   WIBR         G919 033 454-410 454 . . 278,235 TA
   WIBR         G919 033 454-410 454 . . 278,235 TA
   StrainSubtotal                                                301,275
   StrainSubtotal                                                301,275
 
 
   [[Media:NC_006570-NZ_AAYE00000000.png|NC_006570-NZ_AAYE00000000.png]] Mummerplot  Francisella tularensis subsp. tularensis Schu 4, complete genome vs Francisella tularensis subsp. tularensis FSC033
   [[Media:NC_006570-NZ_AAYE00000000.png|NC_006570-NZ_AAYE00000000.png]] Mummerplot  Francisella tularensis subsp. tularensis Schu S4 (complete genome) vs Francisella tularensis subsp. tularensis FSC033; looks like the repeat is collapsed in NZ_AAYE00000000


== Assembly ==
== Assembly ==
Line 60: Line 60:
   lib: G907A3 mean=4274.536 stdev=110.196 (probably underestimate due to low coverage)
   lib: G907A3 mean=4274.536 stdev=110.196 (probably underestimate due to low coverage)
   => 12 scaff, 23 contigs, 1,882,120 bp  
   => 12 scaff, 23 contigs, 1,882,120 bp  
  MaxSurrogateLength=28659


'''2007_0807_AMOScmp-OBT'''
'''2007_0807_AMOScmp-OBT'''
Line 65: Line 66:
   => 13 scaff, 23 contigs, 1,834,339 bp
   => 13 scaff, 23 contigs, 1,834,339 bp


'''2007_0809_AMOScmp-OBT-454 -> best'''
'''2007_0809_AMOScmp-OBT-454 '''
   use  read clr from 2007_0807_WGA  
   use  read clr from 2007_0807_WGA  
   include the 454 reads
   include the 454 reads
Line 71: Line 72:
   [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.5.png|scaff_5]]  Largest scaff : one region with no 454 coverage
   [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.5.png|scaff_5]]  Largest scaff : one region with no 454 coverage
   [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.4.png|scaff_4]]  2nd largest scaff : several regions with no 454 coverage
   [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.4.png|scaff_4]]  2nd largest scaff : several regions with no 454 coverage
   [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.12.png|scaff_12]] 3rd largest scaff : 32KB of bases at the beginning of scaff have coverage twice as deep as the rest
   [[Media:F_tularensis_tularensis_FSC033.2007_0809_AMOScmp-OBT-454.12.png|scaff_12]] 3rd largest scaff : ~ 32KB of bases at the beginning of this scaff have coverage twice as deep as the rest
                                                                                     assembled in 2 scaffolds in teh WGA
                                                                                     this region aligns to the 28KB WGA surrogate


     show-coords 12-wga.delta -L 1000
     show-coords 12-wga.delta -L 1000
Line 88: Line 89:
   26 contigs, 1,873,282 bp
   26 contigs, 1,873,282 bp


'''2007_0810_AMOScmp-OBT-454-Schu4'''
'''2007_0810_AMOScmp-OBT-454-Schu4 -> best'''
   use  read clr from 2007_0807_WGA  
   use  read clr from 2007_0807_WGA  
   include the 454 reads
   include the 454 reads
   Schu S4 used as reference (looks like there are 2 inversions)
   Schu S4 used as reference (looks like there are 2 inversions)
   10 contigs, 1,890,475 bp
   10 contigs, 1,890,475 bp

Latest revision as of 15:22, 26 November 2007

Center: Broad

 Status: Assembly (15 contigs)

Data

Reference:

 NZ_AAYE00000000
 Name              Length %GC
 NZ_AAYE01000001.1 101124 33.65
 NZ_AAYE01000002.1 46675  32.87
 NZ_AAYE01000003.1 1600   34.25
 NZ_AAYE01000004.1 295522 31.87
 NZ_AAYE01000005.1 650364 31.73
 NZ_AAYE01000006.1 2400   37.29
 NZ_AAYE01000007.1 132212 32.96
 NZ_AAYE01000008.1 23680  31.04
 NZ_AAYE01000009.1 201    45.27      high GC%: 16S-23S rRNA(megablast)
 NZ_AAYE01000010.1 571    46.06      high GC%
 NZ_AAYE01000011.1 61231  32.30
 NZ_AAYE01000012.1 249955 31.91
 NZ_AAYE01000013.1 137017 32.24
 NZ_AAYE01000014.1 91009  32.87
 NZ_AAYE01000015.1 50644  33.42
 Total             1844205          1,892,819 bp in SCHU S4(complete genome)=> ~ 48,614 bp in gaps

Traces: from NCBI TA

Libraries:

 CENTER         PROJECT	STRAIN	LIB	TYPE	SIZE	STDEV	COUNT	Location
 WIBR           G907	033	G907A1	WGS	4,500	450	768	TA
 WIBR	        G907	033	G907A2	WGS	4,500	450	768	TA
 WIBR	        G907	033	G907A3	WGS	4,500	450	20,736	TA
 WIBR	        G907	033	G907A4	WGS	4,500	450	768	TA
 WIBR	        G919	033	454-410	454	.	.	278,235	TA
 StrainSubtotal                                                 301,275
 
 NC_006570-NZ_AAYE00000000.png Mummerplot  Francisella tularensis subsp. tularensis Schu S4 (complete genome) vs Francisella tularensis subsp. tularensis FSC033; looks like the repeat is collapsed in NZ_AAYE00000000

Assembly

Locations:

 /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/

2007_0807_AMOScmp

 Uses default trimming
 AMOScmp
 =>  13 scaff, 32 contigs, 1,830,923 bp

2007_0807_WGA

 Only WGS reads
 runCA-OBT.pl => new read clr, new library insert estimates 
 clr got shorter by runing OBT (usually is the opposite)
 .               #traces  min     median  max     sum             mean    stdev   n50
 CLIPPING        22713    1       868     1018    19613029        863.52  68.27   871
 OBT             22207    71      785     907     17128190        771.3   75.98   789
 lib: G907A3 mean=4274.536 stdev=110.196 (probably underestimate due to low coverage)
 => 12 scaff, 23 contigs, 1,882,120 bp 
 MaxSurrogateLength=28659

2007_0807_AMOScmp-OBT

 use  read clr from 2007_0807_WGA 
 => 13 scaff, 23 contigs, 1,834,339 bp

2007_0809_AMOScmp-OBT-454

 use  read clr from 2007_0807_WGA 
 include the 454 reads
 => 15 scaff, 18 contigs, 1,844,162 bp
 scaff_5   Largest scaff : one region with no 454 coverage
 scaff_4   2nd largest scaff : several regions with no 454 coverage
 scaff_12 3rd largest scaff : ~ 32KB of bases at the beginning of this scaff have coverage twice as deep as the rest
                                                                                    this region aligns to the 28KB WGA surrogate
   show-coords 12-wga.delta -L 1000
   /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/2007_0809_AMOScmp-OBT-454/nucmer/12.fasta /fs/szasmg/Bacteria/F_tularensis_tularensis_FSC033/2007_0809_AMOScmp-OBT-454/nucmer/wga.scaffold.fasta
   [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS]
  ===============================================================================================================================
    179    28709  |   169258   140728  |    28531    28531  |   100.00  |   249990   169258  |    11.41    16.86  | 12 7180000000303
    179   249803  |   393212   143584  |   249625   249629  |    99.99  |   249990   393212  |    99.85    63.48  | 12 7180000000310   [CONTAINED]
 ...

2007_0810_AMOScmp-OBT-Schu4

 use  read clr from 2007_0807_WGA 
 Schu S4 used as reference (looks like there are 2 inversions)
 26 contigs, 1,873,282 bp

2007_0810_AMOScmp-OBT-454-Schu4 -> best

 use  read clr from 2007_0807_WGA 
 include the 454 reads
 Schu S4 used as reference (looks like there are 2 inversions)
 10 contigs, 1,890,475 bp