Dr. Holly Bik, a postdoctoral researcher at the UC Davis,
gave a presentation last week on the difficulties and importance of studying
microbial eukaryotes. Microbial eukaryotes are typically ignored in
environmental sequence data analyses primarily because eukaryotic ribosomal RNA
(rRNA) sequences are more difficult to analyze than bacterial and Archaeal rRNA.
This is mostly due to sequence analysis techniques for bacteria and Archaea
being developed since the 70’s. Dr. Bik, having done her Ph. D. research on
Nematodes, has developed a particular interest in these microbial worms, and
used them as an example for the importance of studying all microbial
eukaryotes.
Nematodes are a taxa that is
“Hyper-diverse”. No one is really quite sure about the number of species on
earth but estimates range from 1-100 million different species. Despite their
diversity and abundance (one square meter of dirt can contain anywhere from
100,000 to 84,000,000 nematodes) only 28,000 species are genetically
characterized, most being parasitic taxa or model organisms (C. elegans). This is just astonishing
when you think about how nematodes are literally found everywhere. An article
published in March demonstrates that Nematodes are not only present in the soil
and marine environments but also in human constructed ecosystems such as tap
water. (Buse et al.) Dr. Bik stated, “People should be thinking about
eukaryotes because they are clearly important, not just in natural ecosystems,
but also in ecosystems like the built environment.”
The study of microbes tends to ask
the same 3 questions regardless of the environment being sampled or taxa being
studied. First is about biodiversity i.e. what is the composition of microbes
in the area being sampled, and does the sample area have a high or low
biodiversity. The second question being asked in about phylogeography –
scientists want to know about species patterns over space and time and to be
thinking of these patterns in an evolutionary context. And 3rd is
what is the environmental impact on the ecosystem such as the human skin or gut
microbiome. Again, these are very general questions that are also asked when
studying bacteria and Archaea, but there are a few issues make it more
difficult to analyze Eukaryotic sequences.
Collecting, extracting, amplifying
and sequencing eukaryotic rRNA is relatively easy compared to obtaining
meaningful results out of the sequence data. One issue is that the software and
other analysis tools aren’t available (i.e. a computational pipeline which
incorporates composition, phylogenetic and pattern data all into one picture.)
Another issue is that eukaryotic data is more difficult to work with than
bacterial and Archaea data because rRNA isn’t only present in one copy as in
bacteria or Archaea. Eukaryotic rRNA is present in 10’s to 100’s of copies in
each genome. This makes it near impossible to quantify species based on rRNA.
Also, among the many copies of specific sequences in each genome, there is
inter-genomic variation amongst these sequences giving rise to a computational
issue about how species are classified. Dr. Bik uses Operational Taxonomic
Units (OTUs) to attempt to overcome this problem.
OTUs organize sequence data into
biologically similar groups or “units”. Biologically similar sequences are
found clustered together on a similarity plot to represent the species
composition of a sample. Dr. Bik also mentioned that there are limitations to
this method such as the interpretation of outliers. Outliers on a sequence
similarity plot can arise from a number of issues, and the common practice has
been to throw such data points out which Dr. Bik says isn’t always the best
thing to do since these data points are a real representation of biological
similarity. So how do we incorporate these outliers? It turns out you can
adjust the sequence similarity cutoff percentage. You can have OTUs that have
99% or 97% similarity or even 95% similarity (although lower similarity
percentages aren’t exactly representative of a “species). But even when you
have assigned distinct OTUs, Taxonomic assignments are difficult because they
depend on reference databases, which are incomplete for eukaryotes. Dr. Bik
proposes a phylogenetic approach to solve this issue.
At the UC Davis Dr. Bik and her
colleagues have been taking environmental sequences and a phylogenetic
reference tree and testing for the most probable placement of each sequence on
the tree. Phylosift is the pipeline used at the UC Davis for this type of
analysis (http://phylosift.wordpress.com/).
Phylosift, being such a “mathematically robust pipeline” sparks an important
question: how do we visualize this data? Most of these programs require a lot
of manipulation just to be able to make sense of the data. Dr. Bik is very
interested in developing such visualization tools and spoke about developing a
program similar to kayak.com in order to view patterns and filter data in a way
to learn about the biology of a data set.
Eukaryotic microbes are severely
underrepresented in the microbial world, and almost entirely ignored. Dr. Bik
brings up some interesting points on why this is the case, but also why this
should not be the case. There is much we don’t know about eukaryotic microbes,
but we do know that they play an important role in numerous environments
because of their abundance and diversity. Studying these eukaryotes and interpreting
the data is very difficult due to a number of reasons. In order to be able to better
analyze sequence data, visualization tools that let you view patterns and
filter data are needed. Dr. Bik brought up a very important issue in the
microbial world and I think its time that more people pay attention to and get
involved in studying microbial eukaryotes.
No comments:
Post a Comment