Cross Validated Asked by sleepy on March 2, 2021
This is undoubtedly a basic question but I suffer from being in the situation where I do not even know what to google so I can’t solve this one myself. On the data below I want to test the hypothesis that species distributions between communities are different than would be expected from a random distribution of each species across the study site.
I have a list of species counts in different communities, and I have the proportions of these communities across the study site.
Can I calculate the expected distribution for each community by multiplying the total count of each species by the proportion of that community across the study site (species1_total*studysite_c1). In my mind this is a rational way to calculate the likely distribution of each species in each community were they randomly situated across the study site.
Can I then calculate do a chi-squared test on this data where the species1_total*studysite_c1 is the expected value, and species1_c1 is the actual value?
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 total
species1 0 38 0 6 94 2 0 0 12 6 158
species2 1 7 0 0 0 0 0 0 0 1 9
species3 3 30 0 0 1 1 0 0 11 3 49
species4 7 5 1 3 11 0 0 0 1 2 30
species5 5 2 0 0 0 4 0 0 9 0 20
species6 24 78 0 0 7 2 5 0 19 242 377
species7 3 13 0 0 0 3 0 0 28 9 56
species8 0 29 0 0 4 16 0 0 2 2 53
species9 44 66 13 0 1 0 0 0 37 10 171
species10 0 20 0 0 3 4 0 0 6 0 33
species11 1 0 0 0 0 0 0 8 0 0 9
species12 0 0 0 0 0 0 0 0 5 0 5
study site 0.22 0.40 0.01 0.01 0.03 0.01 0.00 0.00 0.07 0.25 1
I guess you are on the right track, but I am not familiar with
your data and study site, so I can't be sure. I can be sure
that your terminology is not quite right. You can't use the
numbers in your last row study site
as expected counts
because they are estimated probabilities adding to $1.$
study.site = c(0.22, 0.40, 0.01, 0.01, 0.03, 0.01, 0.00, 0.00, 0.07, 0.25)
sum(study.site)
[1] 1
One-category chi-squared test in R. In the R procedure chisq.test
, there is provision for a parameter p
of probabilities against
which counts x
are to be compared.
Thus, suppose I have a fair die with faces re-labeled so that there are
two 1
's and faces 2
through 5
then the probabilities of
outcomes should be p.d = c(1/3, 1/6, 1/6, 1/6, 1/6)
and suppose I have counts x
from 60 rolls of this relabeled die.
Then I should expect chisq.test
not to reject the null hypothesis
the p.d
has the correct probabilties. Indeed, this is what happens
below. The P-value is higher than 5%.
x = c(24,7,6,14,7)
p.d = c(2,1,1,1,1)/6
chisq.test(x, p=p.d)
Chi-squared test for given probabilities
data: x
X-squared = 5.931, df = 4, p-value = 0.2044
Not enough data for Species 1. So if I guess correctly what you have done to get the vector
study.site
, and if the counts in species.
are indeed not
randomly distributed, I might expect chisq.test
to reject.
However, there is a difficulty. You have only 158 specimens
in Species 1, with none at all in many communities.
sum(species.1)
[1] 158
This means you do not have enough data for the chi-squared test to work properly. In particular, R is finding 'expected counts' for various communities, and too many of them are below the minimum required (some authors say all should be above 5, others say most should be above 5 and all should be above 3.) The technical difficulty is that the chi-squared statistic has only approximately a chi-squared distribution, and a good approximation requires a certain amount of data.
species.1 = c(0, 38, 0, 6, 94, 2, 0, 0, 12, 6)
chisq.test(species.1, study.site)
Pearson's Chi-squared test
data: species.1 and study.site
X-squared = 38.333, df = 30, p-value = 0.1414
Warning message:
In chisq.test(species.1, study.site) :
Chi-squared approximation may be incorrect
Combine communities or species? A common remedy for such sparse data is to combine categories (communities). If some communities are adjacent, then it might make sense to combine them. You might also consider whether it is appropriate to combine counts for several species, especially of some species are similar to others.
Simulated P-value for sparse data. Another remedy, for the implementation of chisq.test
in R,
is to let let the program simulate a P-value, but we still don't
get a rejection with simulation.
chisq.test(species.1, study.site, sim=T)
Pearson's Chi-squared test
with simulated p-value
(based on 2000 replicates)
data: species.1 and study.site
X-squared = 38.333, df = NA, p-value = 0.1644
Somewhat better results with higher counts. Trying again for Species 6, which has more specimens. This time we reject at at the 10% level, not at the 5% level.
species.6 = c(24, 78, 0, 0, 7, 2, 5, 0, 19, 242)
chisq.test(species.6, study.site, sim=T)
Pearson's Chi-squared test
with simulated p-value
(based on 2000 replicates)
data: species.6 and study.site
X-squared = 54.444, df = NA, p-value = 0.07696
Correct answer by BruceET on March 2, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP