# Shannon diversity index: Pi relates to all found species, or number of species by site?

Biology Asked by maycca on January 5, 2021

I am trying to dig into Shannon’s species diversity index to compare the species diversity between different sites. For this, I need to calculate the ‘relative abundance of species within the community’[details here][1]. What I do not understand is if this relative abundance relates to all species in a sample, or all found species?

In my example, I have in site1 6 species: a, b, c, d, e, f and in site2 two species: g, h. Should I calculate the proportion by species by site relative to 6 and 2, or to 8 (6+2, as species do not overlap between the sites? This is dummy example, likely species will reoccur).

• If calculated by total number of species (8): Shannon H = 1.11 and 0.820,
• if calculated by total number of species by site (6 & 2): Shannon H = 1.29 and 1.26.

I wonder which one is correct? The general trend is the same (site1 is more diverse then site2) but differences between indices seems quite large.

Dummy example to calculate Shannon’s H step by step with dplyr. The commented out mutate allows to change if distinct number of species and their sum will by calculated by site or by total.

# Create dataframe with two sites and 6 and 2 different species. Area indicate abundance of species.
d<- data.frame(site = c(rep("s1",6),  rep("s2", 4)),
species = c(letters[1:6], letters[5:8]),
area = c(59,12,11,10,5,3,
10,5,21,20))

# Calculate sum of the area
# calculate distinct number of species, vary if the distinct area calculates by sum or by total??
d %>%
mutate(species_n   = n_distinct(species)) %>%   # !! should 'mutate' by placed before the grouping variable or after?
mutate(species_sum = sum(area)) %>%
group_by(site) %>%
#mutate(species_n   = n_distinct(species)) %>%
#mutate(species_sum = sum(area)) %>%
mutate(spec_pi     = area/species_sum) %>%
mutate(pi_ln       = abs(log(spec_pi))) %>%
mutate(sp_shannon  = spec_pi*pi_ln ) %>%
mutate(shannon_comun = sum(sp_shannon))


I am trying to dig into Shannon's species diversity index to compare the species diversity between different sites

Shannon's diversity index is a measure of alpha diversity, or diversity within a community. If you are considering each site a different community (which is sounds like you are), then you would calculate Shannon diversity for each site separately, without consideration of the other sites.

What I do not understand is if this relative abundance relates to all species in a sample, or all found species?

Relative abundance is not a function of the number of species found, but the total number of observations collected (similar to sampling effort) at a given site. So, if you made 100 total observations at a site and 10 of those belonged to Species X, the relative (proportional) abundance of Species X is (10 / 100) or 0.1, regardless of the total number of different species present, and regardless of any observations from other sites.

I can't comment on your code, because I don't do tidy-verse. I guess I'm (apparently) old school and still use Vegan and LabDSV for community analysis (I may be biased since I learned community ecology from the author of the LabDSV package). But in general, for any measure of alpha-diversity, each community gets it's own value, and you would not consider any observations from other sites in that calculation, regardless of whether there is species overlap between communities.

For a more direct comparison of diversity between sites, you would use a beta-diversity index like Bray-Curtis dissimilarity, which compares each community to every other community in your analysis.

Correct answer by MikeyC on January 5, 2021