Data Science Asked by Taku Charles-Noel Endo on March 8, 2021
I worked on a model that I applied a log10 transformation to the dependent variable. I am having trouble with manually calculating the R2 for both train and test dataset. The model looks like this.
Model <- lm(log10(Total_LT) ~ ThreeComb + Ship_Qtr, data = Train_Data)
Additionally here is the summary of the model.
Residuals:
Min 1Q Median 3Q Max
-0.47904 -0.09681 -0.00449 0.09272 0.63265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.178008 0.007786 151.302 < 2e-16 ***
ThreeCombAIR Site A Product C 0.221098 0.042209 5.238 1.85e-07 ***
ThreeCombAIR Site B Product B 0.467222 0.050400 9.270 < 2e-16 ***
ThreeCombAIR Site C Product B -0.020639 0.013471 -1.532 0.125716
ThreeCombFASTBOAT Site A Product A 0.357324 0.015775 22.652 < 2e-16 ***
ThreeCombFASTBOAT Site A Product C 0.397101 0.015291 25.970 < 2e-16 ***
ThreeCombGROUND Site D Product B -0.084635 0.010842 -7.806 1.08e-14 ***
ThreeCombOCEAN Site A Product A 0.470911 0.014879 31.648 < 2e-16 ***
ThreeCombOCEAN Site A Product B 0.582689 0.025467 22.880 < 2e-16 ***
ThreeCombOCEAN Site A Product C 0.474703 0.061184 7.759 1.56e-14 ***
ThreeCombOCEAN Site B Product B 0.414655 0.016140 25.691 < 2e-16 ***
Ship_QtrQ2 -0.039806 0.009264 -4.297 1.84e-05 ***
Ship_QtrQ4 -0.040277 0.012147 -3.316 0.000935 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1489 on 1535 degrees of freedom
Multiple R-squared: 0.6803, Adjusted R-squared: 0.6778
F-statistic: 272.2 on 12 and 1535 DF, p-value: < 2.2e-16
Now I am trying to test this model my calculating Rsquared manually like this.
Train_Data$Residual <- Model$residuals
Test_R2 <- 1 - (sum((Test_Data$Residual)^2)/ sum((Test_Data$Total_LT - mean(Test_Data$Total_LT))^2))
Here is the output that I get when for my R2
[1] 0.9999015
To validate my model, I also did this to calculate my R2 for test dataset.
Test_Data$Predicted <- predict(Model, newdata = Test_Data)
Test_Data$Residual <- Test_Data$Total_LT - Test_Data$Predicted
Test_R2 <- 1 - (sum((Test_Data$Residual)^2)/ sum((Test_Data$Total_LT - mean(Test_Data$Total_LT))^2))
And I get this R2 for test dataset.
[1] -1.964802
I am thinking this was caused by log10 transformation that I applied to my model. What can I do to make my R2 for both training and test close to 0.68 like it actually says on the summary of the model?
By the way, I tried the same thing without log10 transformation, and got a very good R2.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP