Conservation biologists and ecologists often need to find out how diverse a site is, and how a site’s diversity compares with the diversity of other sites. The answers can help conservationists decide which site to buy, and can help ecologists and evolutionary biologists understand the causes of biological diversity. The question “How diverse is a site?” seems simple enough, but it turns out not to be simple at all. For starters, there are lots of ways to quantify diversity. I’ve spent a lot of time elsewhere writing about what I think is the best way to quantify diversity, but here I want to avoid that issue and just say that for conservation purposes, a simple count of the number of species living in a site is a good enough measure of diversity. To compare diversities of two or more sites, then, all we have to do is count the number of species present at each site, right?
Sounds easy, but this is the tropics. A person could spend a lifetime sampling a site and still not find all the insects or fungi or orchids that live there. My friends Phil DeVries, Tom Walla, and Harold Greeney spent ten years trapping butterflies in the Ecuadorian Amazon, and even after all that time, they were still finding new species not previously sampled. I lived in the same forest they studied, for two years, looking hard for new birds nearly every day as part of my guiding work. Even after two years of this, I was still finding birds I hadn’t seen before. There is no hope that a reasonably-sized random sample of a diverse taxonomic group will contain all the species in that forest.
That means we have to compare sites based on incomplete samples that miss many species. How can we make fair comparisons between sites based on such samples? Biologists thought they knew the answer: “rarefaction”. They reasoned it would be unfair to compare a sample of 1000 individuals from one site with a sample of 100 individuals from another site. In very rich forests, the larger sample will nearly always have more species, just because the sample is larger. To make fair comparisons, biologists decided that they should only compare samples of equal sizes. (This same idea is widespread in almost every other field, not just biology.) If the sample from one site had 1000 individuals and the sample from the other had 100 individuals, biologists would take the sample of 1000 individuals and make subsamples of 100 individuals from it. They would repeat the subsampling process many times, and average the species counts for each subsample of size 100. The resulting mean species count for samples of size 100 could then be compared with the species count from the actual sample of 100 individuals from the other site.
Rarefaction of samples to a common size became the standard procedure when comparing sites, and it remains so today. Unfortunately, it is not as fair as it seems. A sample of size 100 from a site that has twenty species might very well be complete, containing every species living at the site, while a sample of size 100 from a site that has five hundred species will be very incomplete. When we compare the species counts of two equally-large samples like this, we might be comparing a nearly complete sample to a very incomplete one. This will generally cause us to underestimate the difference in diversity between the sites.
The solution is to compare samples that are equally complete instead of equally large. John Alroy proposed this in the paleontology literature in 2009, and I proposed it in the ecological literature in 2010, with input from my friend and colleague Anne Chao (we were unaware of Alroy’s proposal at the time). Anne, who has made many contributions to diversity estimation over the last thirty years, and I have just published an article in the Dec 2012 issue of Ecology, developing new methods for rarefying samples so that they are equally complete. We also show how to extrapolate the species counts, to estimate how many species we should expect in a sample bigger (hence more complete) than the one we actually made.
It may seem that we would need to know the true number of species at a site before we could judge how complete a sample is, but there is a kind of mathematical magic (developed by Alan Turing and I.J. Good) which lets us estimate the completeness of a sample just by looking at the sample itself. We explain it all in our article, so if you are interested in more details, read it at:
The article also proves some neat theorems about this method. For example, we prove that this method is always more efficient (i.e. needs less sampling effort) than the traditional method, when the goal is to rank sites according to their true species counts.
This article is open-access, which was a nice surprise for us, since most articles in Ecology are accessible only by subscription.
Here is the abstract, slightly edited for clarity:
Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size
Anne Chao and Lou Jost
We propose an integrated sampling, rarefaction, and extrapolation methodology to compare species richnesses (species counts) of a set of communities based on samples of equal completeness (as measured by sample coverage) instead of equal size. Traditional rarefaction or extrapolation to equal-sized samples can misrepresent the relationships between the richnesses of the communities being compared, because a sample of a given size may be sufficient to fully characterize the lower diversity community, but insufficient to characterize the richer community. Thus, the traditional method generally underestimates the difference between community richnesses. We derived a new analytic method for seamless coverage-based rarefaction and extrapolation. We show that this method yields less biased comparisons of richness between communities, and manages this with less total sampling effort. When this approach is integrated with an adaptive coverage-based stopping rule during sampling, samples may be compared directly without rarefaction, so no extra data is taken and none is thrown away. Even if this stopping rule is not used during data collection, coverage-based rarefaction throws away less data than traditional size-based rarefaction, and more efficiently finds the correct ranking of communities according to their true richnesses. Several hypothetical and real examples demonstrate these advantages.
This issue’s cover photo is Sobralia luerorum, taken in our Cerro Candelaria Reserve. It is an example of the rare species that are typically missed in random samples. To illustrate that point more forcefully, our article contains a photo of the many new species of Teagueia orchids that my students and I discovered over the last twelve years in and around what are now our reserves. I’ll write much more about those in the future.
We hope this article will help improve decision-making processes in conservation biology around the world, but especially in the tropics where incomplete samples are the norm.
Anne’s work, and our publication costs, were supported by the “Decomposition and estimation of biodiversity” project of the Taiwan National Science Council. My work was supported in part by a grant from John Moore to the Population Biology Foundation.