How to remove background (watermark) logo from image

Question

I have been scratching my head for a while. What I have is a scanned PDF document with text and water marked logo at the back as in the below image.

I want to do OCR over this, which becomes very difficult because of the logo. All the ratchet I've done so far is for coloured images where they can find contrast difference. I've hit a wall when solving the same for an B&W image as shown. Would love any help/suggestions for an approach/method to achieve what I am looking for.

Gerry P · Answer

Take a look at the python cv2 module. It has functions that should enable you to remove the watermark. If you have a separate image of the watermark and it is always in the same coordinate location in each image you should be able to subtract it from the images

Answered by Gerry P on June 12, 2021

tehem · Answer

Before we delve deeper, is the watermark like a light gray watermark or is it dark like the text? Because it could be as simple as thresholding on some grays cake value to filter it out.
If it is dark like the text, I'm not 100% sure if this would work, but you could build a small application to create a dataset of such images where you superimpose some watermark over the text. After which you could train a style-gan like model on the dataset to transform the image and clean up the text. Anybody more experienced with GANs should be able to confirm if this approach is worthwhile.

How to remove background (watermark) logo from image

2 Answers

Add your own answers!

Ask a Question