Keras multi-label time-series classification considering time-series as an input image vector

Question

I am trying to build a multi-class classifier using Keras. I am not quite sure I have implemented it correctly.
Data is like this

label time-series variables [0:25728}

index 0  1   2   3   4            25728
  0   1  2.5 3.2 1.6 1.05 ........ 2.54
  1   5  3.2 1.6 1.5 1.49 ........ 1.41
  2   1  2.3 3.2 1.5 1.52 ........ 2.11
  3   3  0.2 3.1 1.5 1.89 ........ 0.81
  4   8  1.2 1.1 0.2 1.19 ........ 3.71
  .   5  .    .   .   .   ........   .
  .   7  .    .   .   .   ........   .
1323  5  .    .   .   .   ........   .

Here is the code.
I split data by 68 % then reshaping 1D array to a 2D array.
as 384*67 = 25728
So forming an image of vector 384 by 67 for one label

def readucr(filename):
data = np.loadtxt(filename, delimiter=',')
Y = data[:, 0]
X = data[:, 1:]
return X, Y

x_train, a = readucr(path+'p2_TRAIN')
x_test, b = readucr(path+'p2_TEST')
df_train_y = pd.read_csv(path+'p2_TRAIN',header=None)
df_test_y = pd.read_csv(path+'p2_TEST',header=None)

x_train = x_train[:,0:25728]
x_test = x_test[:,0:25728]

scaler = MinMaxScaler(feature_range=(0, 1))
x_train = scaler.fit_transform(x_train)
x_test = scaler.fit_transform(x_test)

x_train =x_train.reshape(x_train.shape[0],384,67)
x_test =x_test.reshape(x_test.shape[0],384,67)

train_label_y = df_train_y[0].values
test_label_y = df_test_y[0].values
batch_size = min(x_train.shape[0] / 10, 10)

y_train = np_utils.to_categorical(train_label_y)
y_test = np_utils.to_categorical(test_label_y)

x_train = x_train.reshape(x_train.shape + (1,))
x_test = x_test.reshape(x_test.shape + (1,))

input_shape = x_train.shape[1:]
model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3), padding='same',
                 input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, kernel_size=(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, kernel_size=(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(8, kernel_size=(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(GlobalAveragePooling2D())
model.add(Dense(9, activation='softmax'))

optimizer = keras.optimizers.Adam()
model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

hist = model.fit(x_train, y_train, batch_size=batch_size, epochs=nb_epochs, verbose=1)
score = model.evaluate(x_test, Y_test)
print("Accuracy: %.2f%%" % (score[1] * 100))

It gives 96.16% accuracy but I don't believe it is true. 
I want to predict the labels.

How can I predict labels? 
What I am doing wrong?

Please help! Thank you.

Thomas Cleberg · Answer

model.predict(X)

Will return an array of probabilities across your classes, the effective discrete prediction can be achieved by:

np.argmax(model.predict(X))

Accuracy is only one measure of the fit of a classifier, and can be strongly affected by properties of the evaluation set such as class imbalance. Does one class represent a large proportion of your observations? If so, this accuracy measure could be misleading you as to its performance.

Otherwise, with the predictions gathered as shown above, you can further analyze the performance of your network using confusion matrices, muticlass log loss (cross-entropy) and/or others.

Keras multi-label time-series classification considering time-series as an input image vector

One Answer

Add your own answers!

Ask a Question