Cbcb:Pop-Lab:OTUs How do I create OTUs from 16S rRNA sequence data?
james robert white, whitej@umd.edu
To begin:
You need a fasta file of 16S sequences.
1. Make sure all sequences are in the correct orientation. You may need to reverse complement several.
2. Perform a multiple sequence alignment. This can be done using several programs such as ClustalW, MUSCLE, NAST, or MAFFT.
Each MSA program has options for outputting an alignment in different formats (e.g. FASTA, ClustalW, PHYLIP)
4. Trim the MSA so that each sequence spans the entire alignment. You may need to remove poor sequences altogether.
5. Create a distance matrix from your MSA. A matrix must include an evolutionary distance correction such as Jukes-Cantor or Olsen. This can be done several ways:
- DNADIST from the PHYLIP package. (Jukes-Cantor, Kimura-2, or Felsenstein 84 corrections)
- ARB has functions for loading an MSA and creating distance matrices. (Jukes-Cantor, Felsenstein 84,and Olsen corrections)
The distance matrix will be in one of two formats: square or lower triangular. Be sure to know which one you have.
6. The distance matrix serves as input to an OTU program such as DOTUR. DOTUR clusters sequences in OTU according to their relative distances. The default clustering algorithm is complete linkage (furthest neighbor)
7. A large number of files are created by DOTUR, but the following are most important:
- *.list - the actual list of OTUs. Each line is tab delimited. The first element is the distance threshold, the 2nd element is the number of OTUs, and all subsequent elements are the individual OTUs. Members of each OTU are delimited by commas.
- *ace.ltt - ACE diversity estimates for different thresholds.
- *chao.ltt - CHAO1 diversity estimates for different thresholds.
- *shannon.ltt - Shannon diversity estimates for different thresholds.