Data Science Asked by mimo on April 13, 2021
I am developing a neural network to determine if comments posted on a blog-type website are appropriate or not (to reject spam, poorly written comments, etc). I use Keras with Tensorflow to achieve this by defining a number of scalar features such as message length, number of words, fraction of words that are in English dictionary… All are normalized by appropriate quantities. These numbers are the input to a neural network defined as follows:
inp = Input(shape=(n_inputs, ))
hidden_1 = Dense(hidden_size, activation='relu')(inp)
hidden_2 = Dense(hidden_size, activation='relu')(hidden_1)
out = Dense(num_classes, activation='softmax')(hidden_2)
model = Model(inputs=inp, outputs=out)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=[f1_score])
I would like to add some inputs that do not always exist. For example, a comment may be posted after another comment from the same user, and in that case the delay between the comment and the previous one would be a relevant quantity to consider as a user may be spamming by sending quickly several comments in a row.
Edit
I think this is related to the problem of missing data, see this thread.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP