Using keras in R to perform neural network, my model has very low accuracy but the prediction is good and I don't know why

Question

I used the classic dataset - mnist dataset that has 784 columns of pixels and 1 column of the label (from 0 to 9), and I was going to transform the images into their corresponding seven segment display representation. The following is my code. # Convert the labels(digits) in train_set and test_set to seven-segment display train_set$a <- ifelse(train_set$V1 %in% c(0,2,3,5,6,7,8,9),1,0) train_set$b <- ifelse(train_set$V1 %in% c(0,1,2,3,4,7,8,9),1,0) train_set$c <- ifelse(train_set$V1 %in% c(0,1,3,4,5,6,7,8,9),1,0) train_set$d <- ifelse(train_set$V1 %in% c(0,2,3,5,6,8),1,0) train_set$e <- ifelse(train_set$V1 %in% c(0,2,6,8),1,0) train_set$f <- ifelse(train_set$V1 %in% c(0,4,5,6,8,9),1,0) train_set$g <- ifelse(train_set$V1 %in% c(2,3,4,5,6,8,9),1,0) test_set$a <- ifelse(test_set$V1 %in% c(0,2,3,5,6,7,8,9),1,0) test_set$b <- ifelse(test_set$V1 %in% c(0,1,2,3,4,7,8,9),1,0) test_set$c <- ifelse(test_set$V1 %in% c(0,1,3,4,5,6,7,8,9),1,0) test_set$d <- ifelse(test_set$V1 %in% c(0,2,3,5,6,8),1,0) test_set$e <- ifelse(test_set$V1 %in% c(0,2,6,8),1,0) test_set$f <- ifelse(test_set$V1 %in% c(0,4,5,6,8,9),1,0) test_set$g <- ifelse(test_set$V1 %in% c(2,3,4,5,6,8,9),1,0) # Split the given train data to train_x and train_y # Reshaping the training pixels and labels data to arrays train_x <- as.matrix(train_set[, 2:785]) train_x <- array_reshape(train_x, c(nrow(train_x), 784)) train_y <- as.matrix(train_set[, 786:792]) train_y <- array_reshape(train_y, c(nrow(train_y), 7)) # Split the given test data to test_x and test_y # Reshaping the testing pixels and labels data to arrays test_x <- as.matrix(test_set[, 2:785]) test_x <- array_reshape(test_x, c(nrow(test_x), 784)) test_y <- as.matrix(test_set[, 786:792]) test_y <- array_reshape(test_y, c(nrow(test_y), 7)) # Normalize inputs from 0-255 to 0-1 train_x <- train_x / 255 test_x <- test_x / 255 # Build the Model image_size <- 784 # 28*28 num_classes <- 7 #7 segment display of the digits model <- keras_model_sequential() model %>% #Hidden Layers layer_dense(units = 512, activation = 'relu', input_shape = c(image_size)) %>% layer_dropout(rate = 0.25) %>% layer_dense(units = 256, activation = 'relu') %>% layer_dropout(rate = 0.5) %>% # Output Layer layer_dense(units = num_classes, activation = 'sigmoid') # Summary of the model summary(model) # Compile the neural network model %>% compile( loss = 'binary_crossentropy', optimizer = 'adam', metrics = c('accuracy') ) # Modeling on Training Dataset model %>% fit( train_x, train_y, epochs = 5, batch_size = 128, validation_data = list(test_x, test_y) ) # Prediction pred <- predict_proba(model, test_x) pred <- round(as.data.frame(pred)) test_set$predict <- ifelse(pred$V1==1 & pred$V2==1 & pred$V3==1 & pred$V4==1 & pred$V5==1 & pred$V6==1 & pred$V7==0,0,NA) test_set$predict <- ifelse(pred$V1==0 & pred$V2==1 & pred$V3==1 & pred$V4==0 & pred$V5==0 & pred$V6==0 & pred$V7==0,1,test_set$predict) test_set$predict <- ifelse(pred$V1==1 & pred$V2==1 & pred$V3==0 & pred$V4==1 & pred$V5==1 & pred$V6==0 & pred$V7==1,2,test_set$predict) test_set$predict <- ifelse(pred$V1==1 & pred$V2==1 & pred$V3==1 & pred$V4==1 & pred$V5==0 & pred$V6==0 & pred$V7==1,3,test_set$predict) test_set$predict <- ifelse(pred$V1==0 & pred$V2==1 & pred$V3==1 & pred$V4==0 & pred$V5==0 & pred$V6==1 & pred$V7==1,4,test_set$predict) test_set$predict <- ifelse(pred$V1==1 & pred$V2==0 & pred$V3==1 & pred$V4==1 & pred$V5==0 & pred$V6==1 & pred$V7==1,5,test_set$predict) test_set$predict <- ifelse(pred$V1==1 & pred$V2==0 & pred$V3==1 & pred$V4==1 & pred$V5==1 & pred$V6==1 & pred$V7==1,6,test_set$predict) test_set$predict <- ifelse(pred$V1==1 & pred$V2==1 & pred$V3==1 & pred$V4==0 & pred$V5==0 & pred$V6==0 & pred$V7==0,7,test_set$predict) test_set$predict <- ifelse(pred$V1==1 & pred$V2==1 & pred$V3==1 & pred$V4==1 & pred$V5==1 & pred$V6==1 & pred$V7==1,8,test_set$predict) test_set$predict <- ifelse(pred$V1==1 & pred$V2==1 & pred$V3==1 & pred$V4==0 & pred$V5==0 & pred$V6==1 & pred$V7==1,9,test_set$predict) confusionMatrix(factor(test_set$predict), factor(test_set$V1)) It turned out that my model always had only around 20% or 30% accuracy. However, when I used the model to do the prediction and transformed the outputs back into labels, the accuracy was quite good like roughly 85% every time. I don't know what part is wrong with my model. Can someone help me out? Really appreciated! The accuracy of my model: The accuracy of my prediction: The dataset can be downloaded here: https://www.kaggle.com/zalando-research/fashionmnist Here is the seven-segment display chart: enter image description here

Burger · Answer

It looks like you are mixing two different accuracy concepts, hence the difference in values:

Your network is currently set-up to predict a value between 0 and 1 for each label (activation = 'sigmoid'). In this case you might get an output like [0.9 0.4 0.3 ... 0.2] etc. If you use this set-up and the measure 'accuracy' Keras will infer that you want to calculate binary accuracy. This is not the same as categorical accuracy

When you run the prediction part you are using categorical accuracy.

To fix your problem change:

set the final layer activation function to 'softmax'
use 'categorical cross entropy' loss function

In this case, Keras should recognize that you want to measure categorical accuracy.

Using keras in R to perform neural network, my model has very low accuracy but the prediction is good and I don't know why

One Answer

Add your own answers!

Ask a Question