Bioinformatics Asked on January 25, 2021
I used the below code to do CAP analysis to link microbial data with elements, but not sure how to do statistical analysis to show which element variation is responsible most for the difference in microbial composition. I can only able to see two axis in figure so how I can get complete information about the variation based on these mineral element and individually like which mineral responsible most for microbial diversity?
# CAP ordinate
cap_ord <- ordinate(
physeq = phyloseq_prop,
method = "CAP",
distance = bray_not_na,
formula = ~ ~ P + Mn + Cu +Zn + Rb + Mg )
# CAP plot
cap_plot <- plot_ordination(
physeq =phyloseq_prop ,
ordination = cap_ord,
color = "SampleType",
axes = c(1,2)
) +
aes(shape = Region) +
geom_point(aes(colour = SampleType), alpha = 0.4, size = 4) +
geom_point(colour = "grey90", size = 1.5) +
scale_color_manual(values = c("#CBD588", "#5F7FC7", "orange","#DA5724", "#508578", "#CD9BCD",
"#AD6F3B", "#673770","#D14285","#652926", "#C84248",
"#8569D5")
)
# Now add the environmental variables as arrows
arrowmat <- vegan::scores(cap_ord, display = "bp")
# Add labels, make a data.frame
arrowdf <- data.frame(labels = rownames(arrowmat), arrowmat)
# Define the arrow aesthetic mapping
arrow_map <- aes(xend = CAP1,
yend = CAP2,
x = 0,
y = 0,
shape = NULL,
color = NULL,
label = labels)
label_map <- aes(x = 1.3 * CAP1,
y = 1.3 * CAP2,
shape = NULL,
color = NULL,
label = labels)
arrowhead = arrow(length = unit(0.02, "npc"))
pdf("cap2.pdf", height=8, width=8)
# Make a new graphic
cap_plot +
geom_segment(
mapping = arrow_map,
size = .5,
data = arrowdf,
color = "gray",
arrow = arrowhead
) +
geom_text(
mapping = label_map,
size = 4,
data = arrowdf,
show.legend = FALSE
)
dev.off();
cap_ord
Call: capscale(formula = distance ~ P + Mn + Cu +Zn + Rb + Mg, data = data)
Inertia Proportion Rank
Total 2.069e+01 1.000e+00
Constrained 5.240e+00 2.533e-01 10
Unconstrained 1.545e+01 7.468e-01 97
Imaginary -1.090e-03 -5.268e-05 1
Inertia is squared Bray distance
Eigenvalues for constrained axes:
CAP1 CAP2 CAP3 CAP4 CAP5 CAP6 CAP7 CAP8 CAP9 CAP10
2.5733 0.8789 0.5647 0.3797 0.3037 0.1873 0.1243 0.1030 0.0800 0.0453
Eigenvalues for unconstrained axes:
MDS1 MDS2 MDS3 MDS4 MDS5 MDS6 MDS7 MDS8
4.790 1.343 1.138 0.916 0.784 0.649 0.518 0.501
(Showing 8 of 97 unconstrained eigenvalues)
Many thanks
I don't think that you can say that the 25% can be said to relate to the chemical variable specifically; rather, that 25% is the variance that is explained by all constrained axes. While these axes are drawn in such a way that they maximize variance associated with your independent variables, I do not think that they are directly interpretable as being 100% due to the elements.
The number of constrained axes is a parameter chosen as an input to the CAPscale method:
An extremely important point is to determine how many PCO axes should be retained; i.e., what should be the choice for $m$? One might be concerned that the method could suffer from a poor choice for $m$. If $m$ is too small, then there may be some ecologically important information in the data that will not be included in the canonical analysis. On the other hand, if $m$ is too large relative to $N$, then a misleading canonical plot could result. For example, we cannot set $m = N - 1$, for then the squared canonical correlations would all be equal to 1.0, because we have $N$ points!
So the value of the variance there will vary depending on how many constrained axes you use. If I understand right, you could obtain 100% distance variance explained by choosing as many constrained axes as you have observations.
I think that it is worth looking into how others have solved this problem. For example, one has to first decide whether or not this 25% is an unexpectedly large number, i.e. compute p-values (I am not crazy about p-values in most contexts but here they could serve as a helpful heuristic), as suggested by the CAPscale paper:
As a further option, one may test the hypothesis of either (1) no significant differences in multivariate location among groups (CDA), or (2) no significant relationship with quantitative environmental or other variables (CCorA). This is done by using the trace statistic (sum of canonical eigenvalues (sum of squared canonical correlations; see the Appendix) and obtaining a P value by permutation (e.g., Anderson 2001b).
This seems to be directly implemented as the anova.cca()
function in the vegan package (I believe that they have decided that CAP is a flavor of CCA for that package). I understand the motivation to get % variance as it's intuitive, but in complex statistical analysis of distances it is a little bit less clear what it really means. Some kind of comparison to a null hypothesis is probably necessary to say whether it is meaningful.
For how others have approached similar questions you can see this discussion. For other implementations that may be more user-friendly, you might consider the vegan package, which is a richer toolbox for ordination in ecology (even if it is less focused on metagenomics). Here is the CAPscale function in vegan. The phyloseq authors have correctly decided to build large pieces of phyloseq out of vegan, so I think that you can pass phyloseq back into vegan functions and e.g. use those analyses directly on your cap_ord
object. But see that link for more info.
For a little more easy-to-understand background on statistics of distance methods (particularly in microbiomes), you might see here.
Correct answer by Maximilian Press on January 25, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP