Cross Validated Asked by sleepy on March 2, 2021

This is undoubtedly a basic question but I suffer from being in the situation where I do not even know what to google so I can’t solve this one myself. On the data below I want to test the hypothesis that species distributions between communities are different than would be expected from a random distribution of each species across the study site.

I have a list of species counts in different communities, and I have the proportions of these communities across the study site.

Can I calculate the expected distribution for each community by multiplying the total count of each species by the proportion of that community across the study site (species1_total*studysite_c1). In my mind this is a rational way to calculate the likely distribution of each species in each community were they randomly situated across the study site.

Can I then calculate do a chi-squared test on this data where the species1_total*studysite_c1 is the expected value, and species1_c1 is the actual value?

```
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 total
species1 0 38 0 6 94 2 0 0 12 6 158
species2 1 7 0 0 0 0 0 0 0 1 9
species3 3 30 0 0 1 1 0 0 11 3 49
species4 7 5 1 3 11 0 0 0 1 2 30
species5 5 2 0 0 0 4 0 0 9 0 20
species6 24 78 0 0 7 2 5 0 19 242 377
species7 3 13 0 0 0 3 0 0 28 9 56
species8 0 29 0 0 4 16 0 0 2 2 53
species9 44 66 13 0 1 0 0 0 37 10 171
species10 0 20 0 0 3 4 0 0 6 0 33
species11 1 0 0 0 0 0 0 8 0 0 9
species12 0 0 0 0 0 0 0 0 5 0 5
study site 0.22 0.40 0.01 0.01 0.03 0.01 0.00 0.00 0.07 0.25 1
```

I guess you are on the right track, but I am not familiar with
your data and study site, so I can't be sure. I can be sure
that your terminology is not quite right. You can't use the
numbers in your last row `study site`

as *expected counts*
because they are estimated probabilities adding to $1.$

```
study.site = c(0.22, 0.40, 0.01, 0.01, 0.03, 0.01, 0.00, 0.00, 0.07, 0.25)
sum(study.site)
[1] 1
```

**One-category chi-squared test in R.** In the R procedure `chisq.test`

, there is provision for a parameter `p`

of probabilities against
which counts `x`

are to be compared.

Thus, suppose I have a fair die with faces re-labeled so that there are
two `1`

's and faces `2`

through `5`

then the probabilities of
outcomes should be `p.d = c(1/3, 1/6, 1/6, 1/6, 1/6)`

and suppose I have counts `x`

from 60 rolls of this relabeled die.
Then I should expect `chisq.test`

not to reject the null hypothesis
the `p.d`

has the correct probabilties. Indeed, this is what happens
below. The P-value is higher than 5%.

```
x = c(24,7,6,14,7)
p.d = c(2,1,1,1,1)/6
chisq.test(x, p=p.d)
Chi-squared test for given probabilities
data: x
X-squared = 5.931, df = 4, p-value = 0.2044
```

**Not enough data for Species 1.** So if I guess correctly what you have done to get the vector
`study.site`

, and if the counts in `species.`

are indeed not
randomly distributed, I might expect `chisq.test`

to reject.
However, there is a difficulty. You have only 158 specimens
in Species 1, with none at all in many communities.

```
sum(species.1)
[1] 158
```

This
means you do not have enough data for the chi-squared test
to work properly. In particular, R is finding 'expected counts'
for various communities, and too many of them are below the
minimum required (some authors say all should be above 5, others
say most should be above 5 and all should be above 3.)
The technical difficulty is that the chi-squared statistic
has only *approximately* a chi-squared distribution, and
a good approximation requires a certain amount of data.

```
species.1 = c(0, 38, 0, 6, 94, 2, 0, 0, 12, 6)
chisq.test(species.1, study.site)
Pearson's Chi-squared test
data: species.1 and study.site
X-squared = 38.333, df = 30, p-value = 0.1414
Warning message:
In chisq.test(species.1, study.site) :
Chi-squared approximation may be incorrect
```

** Combine communities or species?** A common remedy for such sparse data is to combine categories (communities).
If some communities are adjacent, then it might make sense to combine them. You might also consider whether it is appropriate
to combine counts for several species, especially of some
species are similar to others.

** Simulated P-value for sparse data.** Another remedy, for the implementation of

`chisq.test`

in R,
is to let let the program simulate a P-value, but we still don't
get a rejection with simulation.```
chisq.test(species.1, study.site, sim=T)
Pearson's Chi-squared test
with simulated p-value
(based on 2000 replicates)
data: species.1 and study.site
X-squared = 38.333, df = NA, p-value = 0.1644
```

** Somewhat better results with higher counts.** Trying again for Species 6, which has more specimens. This time
we reject at at the 10% level, not at the 5% level.

```
species.6 = c(24, 78, 0, 0, 7, 2, 5, 0, 19, 242)
chisq.test(species.6, study.site, sim=T)
Pearson's Chi-squared test
with simulated p-value
(based on 2000 replicates)
data: species.6 and study.site
X-squared = 54.444, df = NA, p-value = 0.07696
```

Correct answer by BruceET on March 2, 2021

Get help from others!

Recent Answers

- Joshua Engel on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- Jon Church on Why fry rice before boiling?

Recent Questions

- How can I transform graph image into a tikzpicture LaTeX code?
- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP