Science and EcoMinga: How to compare the species counts of two or more sites

Sobralia luerorum from our Cerro Candelaria Reserve is on the cover of the Dec 2012 issue of the journal Ecology. It is a rare species easily missed in diversity surveys. Many species will be missed when sampling a tropical forest. In this issue of Ecology, Anne Chao and I derive mathematical techniques for taking these missing species into account when comparing  the diversities of two or more sites.

Sobralia luerorum from our Cerro Candelaria Reserve is on the cover of the Dec 2012 issue of the journal Ecology. It is a rare species easily missed in diversity surveys. Many species like this will be missed when sampling a tropical forest. In this issue of Ecology, Anne Chao and I derive mathematical techniques for taking these missing species into account when comparing the diversities of two or more sites. Photo: Lou Jost /EcoMinga.

Conservation biologists and ecologists often need to find out how diverse a site is, and how a site’s diversity compares with the diversity of other sites. The answers can help conservationists decide which site to buy, and can help ecologists and evolutionary biologists understand the causes of biological diversity. The question “How diverse is a site?” seems simple enough, but it turns out not to be simple at all. For starters, there are lots of ways to quantify diversity. I’ve spent a lot of time elsewhere writing about what I think is the best way to quantify diversity, but here I want to avoid that issue and just say that for conservation purposes, a simple count of the number of species living in a site is a good enough measure of diversity. To compare diversities of two or more sites, then, all we have to do is count the number of species present at each site, right?

Sounds easy, but this is the tropics. A person could spend a lifetime sampling a site and still not find all the insects or fungi or orchids that live there. My friends Phil DeVries, Tom Walla, and Harold Greeney spent ten years trapping butterflies in the Ecuadorian Amazon, and even after all that time, they were still finding new species not previously sampled. I lived in the same forest they studied, for two years, looking hard for new birds nearly every day as part of my guiding work. Even after two years of this, I was still finding birds I hadn’t seen before. There is no hope that a reasonably-sized random sample of a diverse taxonomic group will contain all the species in that forest.

That means we have to compare sites based on incomplete samples that miss many species. How can we make fair comparisons between sites based on such samples? Biologists thought they knew the answer: “rarefaction”. They reasoned it would be unfair to compare a sample of 1000 individuals from one site with a sample of 100 individuals from another site. In very rich forests, the larger sample will nearly always have more species, just because the sample is larger. To make fair comparisons, biologists decided that they should only compare samples of equal sizes. (This same idea is widespread in almost every other field, not just biology.) If the sample from one site had 1000 individuals and the sample from the other had 100 individuals, biologists would take the sample of 1000 individuals and make subsamples of 100 individuals from it. They would repeat the subsampling process many times, and average the species counts for each subsample of size 100. The resulting mean species count for samples of size 100 could then be compared with the species count from the actual sample of 100 individuals from the other site.

Rarefaction of samples to a common size became the standard procedure when comparing sites, and it remains so today. Unfortunately, it is not as fair as it seems. A sample of size 100 from a site that has twenty species might very well be complete, containing every species living at the site, while a sample of size 100 from a site that has five hundred species will be very incomplete. When we compare the species counts of two equally-large samples like this, we might be comparing a nearly complete sample to a very incomplete one. This will generally cause us to underestimate the difference in diversity between the sites.

The solution is to compare samples that are equally complete instead of equally large. John Alroy proposed this in the paleontology literature in 2009, and I proposed it in the ecological literature in 2010, with input from my friend and colleague Anne Chao (we were unaware of Alroy’s proposal at the time). Anne, who has made many contributions to diversity estimation over the last thirty years, and I have just published an article in the Dec 2012 issue of Ecology, developing new methods for rarefying samples so that they are equally complete. We also show how to extrapolate the species counts, to estimate how many species we should expect in a sample bigger (hence more complete) than the one we actually made.

It may seem that we would need to know the true number of species at a site before we could judge how complete a sample is, but there is a kind of mathematical magic (developed by Alan Turing and I.J. Good) which lets us estimate the completeness of a sample just by looking at the sample itself. We explain it all in our article, so if you are interested in more details, read it at:
http://www.esajournals.org/doi/pdf/10.1890/11-1952.1
The article also proves some neat theorems about this method. For example, we prove that this method is always more efficient (i.e. needs less sampling effort) than the traditional method, when the goal is to rank sites according to their true species counts.

This article is open-access, which was a nice surprise for us, since most articles in Ecology are accessible only by subscription.

Here is the abstract, slightly edited for clarity:

Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size

Anne Chao and Lou Jost

We propose an integrated sampling, rarefaction, and extrapolation methodology to compare species richnesses (species counts) of a set of communities based on samples of equal completeness (as measured by sample coverage) instead of equal size. Traditional rarefaction or extrapolation to equal-sized samples can misrepresent the relationships between the richnesses of the communities being compared, because a sample of a given size may be sufficient to fully characterize the lower diversity community, but insufficient to characterize the richer community. Thus, the traditional method generally underestimates the difference between community richnesses. We derived a new analytic method for seamless coverage-based rarefaction and extrapolation. We show that this method yields less biased comparisons of richness between communities, and manages this with less total sampling effort. When this approach is integrated with an adaptive coverage-based stopping rule during sampling, samples may be compared directly without rarefaction, so no extra data is taken and none is thrown away. Even if this stopping rule is not used during data collection, coverage-based rarefaction throws away less data than traditional size-based rarefaction, and more efficiently finds the correct ranking of communities according to their true richnesses. Several hypothetical and real examples demonstrate these advantages.

This issue’s cover photo is Sobralia luerorum, taken in our Cerro Candelaria Reserve. It is an example of the rare species that are typically missed in random samples. To illustrate that point more forcefully, our article contains a photo of the many new species of Teagueia orchids that my students and I discovered over the last twelve years in and around what are now our reserves. I’ll write much more about those in the future.

These are all new species of Teagueia orchids discovered in the last twelve years in and around the EcoMinga reserves. Only one Teagueia species was known previously from the area.

These are all new species of Teagueia orchids discovered in the last twelve years in and around the EcoMinga reserves. Only one Teagueia species was known previously from the area. Photo: Lou Jost /EcoMinga.

We hope this article will help improve decision-making processes in conservation biology around the world, but especially in the tropics where incomplete samples are the norm.

Anne’s work, and our publication costs, were supported by the “Decomposition and estimation of biodiversity” project of the Taiwan National Science Council. My work was supported in part by a grant from John Moore to the Population Biology Foundation.

7 thoughts on “Science and EcoMinga: How to compare the species counts of two or more sites

  1. This is amazing and very helpful. Thanks for posting this, Lou. I will immediately put this to use in our grassland inventory in Aysen, Chile!

  2. As a coauthor of the paper mentioned in this post, I would like to thank Lou
    for this wonderful post. His explanation of our new method is very intuitive
    and easy to understand. I like to add that the new method can be extended to
    incorporate species abundances and taxonomic or phylogenetic distances. So
    the proposed method will be very useful in biodiversity studies.
    I have been appreciating Lou’s great and devoted efforts (via EcoMinga
    Foundation) to have established several reserves to protect threatened areas
    of ecological importance in Ecuador. Those beautiful orchids, birds and
    animals in the reserves have brought us much inspiration for our science
    work.

  3. Pingback: Estimating the diversity of an ecosystem based on an incomplete sample | Fundacion EcoMinga

  4. The literature got me very confused as to what to use as a way to compare different sites: Same sampling coverage or the same size. Your paper gave me a good direction. I find it very useful and I like how you and Anne Chao’s papers feel like we are having a dialogue, so clear and concise. I was hoping to get some updates on your other website (http://www.loujost.com/) for a while, but I guess your focus is more on this blog at the moment. I am definitely going to follow the updates. Thank you!

    • Thanks for the kind words, and yes, I have not been updating my website in the last few years. Now my server changed the security codes without telling me and I can’t get into the site to update it. That should be resolved soon. Thanks for following this blog!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s