Optimizing CNN network

Question

I am currently trying to recreate the result of this paper, in which they do feature extraction from a "spectogram" of log-melfilter energies..

Since the paper doesn't state what kind of feature I am seeking, i am currently trying to extract features, and match them to MFCC features. The paper states a technique called LWS (Limited weight sharing) in which the spectogram frequency axis will be divided into section, and each section don't share their weight with others.

So i've divided the my input image into 13 section to receive 1 output features  from a (6,3,3) input image. 6 for the number of rows, 3 as each column represent [static delta delta_delta] data of the given log melfilter energi, and the last 3 is the color channels.

If i'd used 13 filterbanks, and made the plot, will the result of this be that each (1,3,3) matrix would result in one feature, but that seemed a bit too good to be true, so i decided to use 78 filterbanks and divide it into 13 section which should result in one feature can be extracted from a matrix of size (6,3,3)

I am training the network with this model structure:

def create_model(init_mode='normal',activation_mode='softsign',optimizer_mode="Adamax", activation_mode_conv = 'softsign'):
    model = Sequential()

model.add(ZeroPadding2D((6,4),input_shape=(6,3,3)))
    model.add(Convolution2D(32,3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(Convolution2D(32, 3,3, activation=activation_mode_conv))
    print model.output_shape
    model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1)))
    print model.output_shape
    model.add(Convolution2D(64, 3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(Convolution2D(64, 3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1)))
    model.add(Flatten())
    print model.output_shape
    model.add(Dense(output_dim=32, input_dim=64, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=13, input_dim=50, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=1, input_dim=13, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=1,  init=init_mode, activation=activation_mode))
    #print model.summary()
    model.compile(loss='mean_squared_error',optimizer=optimizer_mode)

return model

This model keeps for some reason providing me very bad results.. 
I seem to keep getting an loss of 216, which is nearly 3 times the data range...

I did a grid seach to find out which parameter (activation function, init_mode, epochs and batch_size would be best, which are those chosen in the function above (eventhough there wasn't much change in the outcome..)

What can i do to get better results?
Is the CNN network poorly designed?

Bhagyesh Vikani · Answer

There are some suggestions which I think can improve CNN performance:

Input size is (6, 3) so it is not preferable to use MaxPooling layers.
Use padding = 'same' in all the convolution layers which output convolved image of the same dimension as input.
Use Relu or LeakyRelu as activation function.
Use Adam optimizer and tune the learning rate.
Convert some of the Dense layers to 1x1 Convolution with no. of filters = no. of units in dense layer as suggested by sh37211

e.g.

no_of_filters = [32, 64, 64, 32]
kernel_size = [3, 3, 3, 1]

input = Input(shape = (6,3,3), name = "input")
layer_output = [input]
for i in range(4):
    convolution = Convolution2D(no_of_filters[i], kernel_size[i], kernel_size[i], padding = 'same')(layer_output[-1])
    activation = LeakyReLU()(convolution)
    layer_output.append(activation)

flatten = Flatten()(layer_output[-1])
flatten_dropout = Dropout(0.5)(flatten)
fc = Dense(output_dim = 13)
activation = LeakyReLU()(fc)
layer_output.append(activation)
fc = Dense(output_dim = 1, activation = 'tanh')

model.compile(loss = 'mean_squared_error',optimizer = 'adam')

Hima Varsha · Answer

A suggestion I give is to change the layers of CNN. You have 3 CNN layers, all sequentially added acting on the same shape and outputting the same output shape with same filter size.

What you can try is to change the filter size and using multiple filter sizes to catch different features of different sizes. For this, try this model:

main_input = Input(shape=input_shape, name="main input")
flattened_outputs = []
for i in filter_sizes:
    conv_filter_i = Convolution1D(no_of_filters, i, border_mode='same', activation='relu', W_constraint=maxnorm(3))(main_input)
    pooling_i = MaxPooling1D(pool_length=2)(conv_filter_i)
    flattened_i = Flatten()(pooling_i)
    flattened_outputs.append(flattened_i)
merged_conv_outputs = merge(flattened_outputs, mode="concat")
softmax = Dense(output_shape, activation="softmax")(merged_conv_outputs)
model = Model(input=main_input, output=softmax)
model.compile(loss='mean_squared_error',optimizer=optimizer_mode)

Note: change the dimensions as necessary.

Apart from this, I suggest you to use Dropout layer. 
It helps a lot according to my personal experience. Also, do use adam optimizer. 
Experiment with various pooling mechanisms. Remember that there is a lot of experimentation that has to go on, so try various other parameters and filter sizees too.

Optimizing CNN network

2 Answers

Add your own answers!

Ask a Question