TransWikia.com

Scalar input to neural network whose existence is conditional

Data Science Asked by mimo on April 13, 2021

I am developing a neural network to determine if comments posted on a blog-type website are appropriate or not (to reject spam, poorly written comments, etc). I use Keras with Tensorflow to achieve this by defining a number of scalar features such as message length, number of words, fraction of words that are in English dictionary… All are normalized by appropriate quantities. These numbers are the input to a neural network defined as follows:

inp = Input(shape=(n_inputs, ))
hidden_1 = Dense(hidden_size, activation='relu')(inp)
hidden_2 = Dense(hidden_size, activation='relu')(hidden_1)
out = Dense(num_classes, activation='softmax')(hidden_2)
model = Model(inputs=inp, outputs=out)

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=[f1_score])

I would like to add some inputs that do not always exist. For example, a comment may be posted after another comment from the same user, and in that case the delay between the comment and the previous one would be a relevant quantity to consider as a user may be spamming by sending quickly several comments in a row.

  1. How can I use this quantity that does not always exist?
  2. Would it be good to use two quantities: a binary indicating if the
    post is not the first one from that user, and a second number
    indicating the delay from the previous comment, but filled with an
    arbitrary value (0, 1, 0.5, the mean delay on all my dataset?) if
    the post is the first one?

Edit

I think this is related to the problem of missing data, see this thread.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP