Bioinformatics Asked on March 7, 2021
I am now struggling to do random forest analysis, I will be thankful if you could help with code for random forest analysis.
I got samples from the root, soil, and leaf from two regions (bau & mau) and these samples belong to two seasons (Wet and Dry).
Now I am interested to do random forest analysis at genera or family level to identify the taxa which contribute the differences like in root samples based on region as well as season.
Here is my code, but I am getting the error.
library(randomForest)
library(knitr)
#### RANDOM FOREST ANALYSIS #####
#### Prepare data ####
#Load OTU table
OTU_table=t(read.table("asv.table.txt", row.names=1,sep="t", header=T, blank.lines.skip=F, check.names=F))
table(apply(OTU_table,1,sum)) #verify rarefaction
#Load metadata
Meta=read.table("metadata.txt", header=T, row.names=1, stringsAsFactors=F, na.strings="NA",check.names=FALSE)
Meta$sampleid=rownames(Meta)
#Load taxonomy
taxo=read.table("taxonomy.txt", row.names=1, sep="t", header=F ,stringsAsFactors=F,quote="")
rownames(taxo)=paste("a.",row.names(taxo),sep="")
#### Run models ####
#1. Root only
# 1.1. both region
# 1.2. bau only
# 1.3. mau only
#2. Soil only
# 2.1. both region
# 2.2. bau only
# 2.3. mau only
#Params RF
NTREE=1000 # Number of Trees
NbVar=1000 # Number of variables tested at each split
#### Root ONLY 1-3 ####
# 1. BOTH Region
#Subset of data
RootSamples=as.character(Meta[Meta$Compartment=="Root","sampleid"])
Root_OTU_table=OTU_table[RootSamples,]
#Model with microbiome based on Season, region
whole_root_pred=data.frame(Season=Meta[RootSamples,"Season"],Region=Meta[RootSamples,"Region"],a=Root_OTU_table)
head(whole_root_pred)
Season Region a.d2ec9f3b77975c0f457e4b7413b217ff
a.3147790f0d5a78316fb9dd64f53b9473 a.97aecc1f35cc1f50db507ad71dd22367
a.bfad6370d28182cc6304844e9bec7fb6 a.5fa2a987221a1d9ca416148570c18086
**RF_model_Root_all=randomForest(y=?,sampsize=c(143,143),strata=?,x=whole_Root_pred,importance = T,proximity = T,ntree =
NTREE,mtry = NbVar)**
print(RF_model_Root_all)
#plot summary using the 5% most important OTUs ERROR ON LAST LINE
imp=data.frame(importance(RF_model_Root_all))
imp$genus=as.character(taxo[rownames(imp),"Genus"])
Best=imp[imp$MeanDecreaseAccuracy>quantile(x = imp$MeanDecreaseAccuracy,.95),]
bymedian <- with(Best, reorder(genus, -MeanDecreaseAccuracy, median))
pdf(width = 20,height = 10,file=paste(pathforplots,"Variable_Importance_Root_BothRegion_raref.pdf",sep=""))
par(mar=c(15,5,1,1))
boxplot(Best$MeanDecreaseAccuracy ~ bymedian, data = Best,
xlab = "", ylab = "Variable Importance",
main = paste("Root in Both Countries; Error Rate=",round(RF_model_Feces_all$err.rate[NTREE,"OOB"],3),sep=""), varwidth = TRUE,
col = "lightgray",las=2)
dev.off()
Many thanks
This question will be a little hard to answer without more information.
For example, we will need to see your dataset (whole_root_pred
), to decide why Stunting_Root
is NULL.
Stunting_Root
as a variable. It is currently not clear if it is e.g. a column of your dataframe, or just uninitialized. Uninitialized variables are NULL, which would explain your problem. randomForest might not know to look for strata
inside your dataframe, for example. Is it in your dataframe??
as a response? I'm not an expert but I believe that is an illegal character in R (I'm pretty sure?).Answered by Maximilian Press on March 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP