TransWikia.com

Image recognition of selfie images

Data Science Asked on April 12, 2021

I developed an Android app that lets anyone upload pictures of encyclopedic things (bridges, museums, dishes, landscapes, paintings, etc) to Wikimedia Commons.

Unfortunately, 5% of the users find it funny to upload their own selfie. So I want to programmatically check whether the picture is a selfie or not, and if it probably is, warn them that selfies are off-topic.

As a data set, I have:

  • 1000 pictures that I consider as undesirable selfies. It is in part subjective, but usually such pictures show one or two human faces taken from an arm’s distance and random backgrounds.
  • 1000 pictures that are not selfies (bridges, museums, dishes, etc, anything really). Tricky: this also includes pictures of famous people, usually they are easy to distinguish from a selfie because the persons are at a further distance. If you see an extended arm then you can be sure it is a selfie.

All pictures are taken with smartphones (hundreds of different models), they are JPG files of 2MB to 5MB in various sizes and ratios, in portrait or landscape mode.

I must use only open source, and the resulting detection code must run in less than a second on low-end Android phones.

What approach and steps does this task call for?

2 Answers

Implementation of Image Recognition techniques There are lots of open source libraries available for the image recognition and classification. You can make use of the "TensorFlow" library for image recognition and can be integrated with your android application.

Answered by EMKAY on April 12, 2021

I'd go transfer learning way. The idea is to take the net, that's already been trained on large data set and has developed a number of conv filters, which can be reused. There's few of them available in tensorflow. You take net pretrained on ImageNet, chop off last layers responsible for classifying those filters and substitute it with your own. That way you don't really need to have that much data to reach seminal scores.

Check these links:

You can also provide your own top layer to change input shape.

base_model = MobileNetV2(
    weights="imagenet",
    include_top=False,
    input_shape=(HEIGTH, WIDTH, DEPTH)
)

You can choose whether to retrain the net.

base_model.trainable = False

Now just instantiate new model with base model and add your final layers.

model = Sequential(base_model)

model.add(Flatten())

model.add(Dense(units=1024))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(rate=0.25))

model.add(Dense(units=1024))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(rate=0.25))

model.add(Dense(units=len(encoder.classes_)))
model.add(Activation('softmax'))

Then compile the model as usual.

model.compile(
    loss='categorical_crossentropy',
    optimizer=Adam(
        learning_rate=0.0001
    ),
    metrics=['accuracy']
)

Please note, that you have to choose the right number of final dense layers and their shape. You still need to fine tune it: which activation function to use, should you do weight decay, at which rate, what optimizer, what learning rate etc.

Answered by Piotr Rarus on April 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP