Why accuracy scores reported by Keras are low and erratic while the loss on the validation set is decreasing?

Question

I'm trying to build CNN to predict two-label classification problem. Unfortuenetely, I can't share my model architecture, but I compiled the model using:
model.compile(optimizer=optimizers.Adam(lr=1e-3),
              loss='binary_crossentropy',
              metrics=['accuracy'])

Once the training was finished, I plotted log-loss and accuracy scores on train vs validationd dataset. Log-loss looks like below:

so in my layman's eyes this looks perfectly fine. On the other hand, accuracy scores (see below, the labels are misleading) are erratic and Keras reports accuracy of 62.8% on the training set and 62.5% on the validation set at the end of last epoch. 
This seems disturbingly low for me and the strong fluctuations on the validation set are concerning as well. I use a batch size of 32, so it may partly explain these flucutations but I don't think that's the reason behind the whole variance (although I may be entirely wrong...). However, when I calculated accuracy using sklearn accuracy_score function on my validation set:
from sklearn.metrics import accuracy_score
Y_prob = model.predict(X_test)
Y_pred = (Y_prob > 0.5).astype(float)
accuracy_score(Y_test, Y_pred)

I got a value of 0.99811817183879 and the contingency table of each of two labels looks like below:
import pandas as pd
pd.crosstab(Y_test[:, 0], Y_pred[:, 0], normalize=True)
col_0   0.0     1.0
row_0       
0.0     0.969804    0.000680
1.0     0.000680    0.028837

pd.crosstab(Y_test[:, 1], Y_pred[:, 1], normalize=True)
col_0   0.0     1.0
row_0       
0.0     0.981739    0.000296
1.0     0.000227    0.017738

I'm totally confused about this discrepancy. Therefore, I'd like to ask:

How Keras calculates this accuracy scores? Why is it so low if my post-analysis suggest otherwise?
Is there something wrong with my setup based on the training plots (particularly second one with accuracy)?

Why accuracy scores reported by Keras are low and erratic while the loss on the validation set is decreasing?

Add your own answers!

Ask a Question