Wednesday, May 15, 2013
Microbial Eukaryotes: Underrepresented & Ignored
Dr. Holly Bik, a postdoctoral researcher at the UC Davis, gave a presentation last week on the difficulties and importance of studying microbial eukaryotes. Microbial eukaryotes are typically ignored in environmental sequence data analyses primarily because eukaryotic ribosomal RNA (rRNA) sequences are more difficult to analyze than bacterial and Archaeal rRNA. This is mostly due to sequence analysis techniques for bacteria and Archaea being developed since the 70’s. Dr. Bik, having done her Ph. D. research on Nematodes, has developed a particular interest in these microbial worms, and used them as an example for the importance of studying all microbial eukaryotes.
Nematodes are a taxa that is “Hyper-diverse”. No one is really quite sure about the number of species on earth but estimates range from 1-100 million different species. Despite their diversity and abundance (one square meter of dirt can contain anywhere from 100,000 to 84,000,000 nematodes) only 28,000 species are genetically characterized, most being parasitic taxa or model organisms (C. elegans). This is just astonishing when you think about how nematodes are literally found everywhere. An article published in March demonstrates that Nematodes are not only present in the soil and marine environments but also in human constructed ecosystems such as tap water. (Buse et al.) Dr. Bik stated, “People should be thinking about eukaryotes because they are clearly important, not just in natural ecosystems, but also in ecosystems like the built environment.”
The study of microbes tends to ask the same 3 questions regardless of the environment being sampled or taxa being studied. First is about biodiversity i.e. what is the composition of microbes in the area being sampled, and does the sample area have a high or low biodiversity. The second question being asked in about phylogeography – scientists want to know about species patterns over space and time and to be thinking of these patterns in an evolutionary context. And 3rd is what is the environmental impact on the ecosystem such as the human skin or gut microbiome. Again, these are very general questions that are also asked when studying bacteria and Archaea, but there are a few issues make it more difficult to analyze Eukaryotic sequences.
Collecting, extracting, amplifying and sequencing eukaryotic rRNA is relatively easy compared to obtaining meaningful results out of the sequence data. One issue is that the software and other analysis tools aren’t available (i.e. a computational pipeline which incorporates composition, phylogenetic and pattern data all into one picture.) Another issue is that eukaryotic data is more difficult to work with than bacterial and Archaea data because rRNA isn’t only present in one copy as in bacteria or Archaea. Eukaryotic rRNA is present in 10’s to 100’s of copies in each genome. This makes it near impossible to quantify species based on rRNA. Also, among the many copies of specific sequences in each genome, there is inter-genomic variation amongst these sequences giving rise to a computational issue about how species are classified. Dr. Bik uses Operational Taxonomic Units (OTUs) to attempt to overcome this problem.
OTUs organize sequence data into biologically similar groups or “units”. Biologically similar sequences are found clustered together on a similarity plot to represent the species composition of a sample. Dr. Bik also mentioned that there are limitations to this method such as the interpretation of outliers. Outliers on a sequence similarity plot can arise from a number of issues, and the common practice has been to throw such data points out which Dr. Bik says isn’t always the best thing to do since these data points are a real representation of biological similarity. So how do we incorporate these outliers? It turns out you can adjust the sequence similarity cutoff percentage. You can have OTUs that have 99% or 97% similarity or even 95% similarity (although lower similarity percentages aren’t exactly representative of a “species). But even when you have assigned distinct OTUs, Taxonomic assignments are difficult because they depend on reference databases, which are incomplete for eukaryotes. Dr. Bik proposes a phylogenetic approach to solve this issue.
At the UC Davis Dr. Bik and her colleagues have been taking environmental sequences and a phylogenetic reference tree and testing for the most probable placement of each sequence on the tree. Phylosift is the pipeline used at the UC Davis for this type of analysis (http://phylosift.wordpress.com/). Phylosift, being such a “mathematically robust pipeline” sparks an important question: how do we visualize this data? Most of these programs require a lot of manipulation just to be able to make sense of the data. Dr. Bik is very interested in developing such visualization tools and spoke about developing a program similar to kayak.com in order to view patterns and filter data in a way to learn about the biology of a data set.
Eukaryotic microbes are severely underrepresented in the microbial world, and almost entirely ignored. Dr. Bik brings up some interesting points on why this is the case, but also why this should not be the case. There is much we don’t know about eukaryotic microbes, but we do know that they play an important role in numerous environments because of their abundance and diversity. Studying these eukaryotes and interpreting the data is very difficult due to a number of reasons. In order to be able to better analyze sequence data, visualization tools that let you view patterns and filter data are needed. Dr. Bik brought up a very important issue in the microbial world and I think its time that more people pay attention to and get involved in studying microbial eukaryotes.