Data Science Asked by hirschme on January 24, 2021
My understanding of Multiple Instance Learning (MIL) for a weakly supervised problem, where we, instead of having a label for each data instance, we have a label for a "bag" of instances. For example in image recognition, a bag could be a full image, a single data instance is every possible region or patch in the image, and a label could be "face". This label refers to possibly one single data instance in the image (this would be the key instance, or the specific patch in the image that contains a face).
In MIL, we take instances from labeled bags, to avoid feeding the full image into a classifier. But how do we sample these instances from each bag? There are possibly thousands of instances, which is the reason in the first place we don’t want to process the full image. So we select only a sub-sample for k-instances.
But how are these samples selected? This seems like a major complication, as most of the instances won’t correspond to the key instances that the bag-label refers to..
Following are some publications using MIL for image classification, and the confusions arising from each one:
https://ieeexplore.ieee.org/document/7900139
–> Here they divide images into image-subsections or tiles, each one is a "data instance", and the full image from where they come from is called the bag or set. So the bag (full image) has a label, but each data instance (image subsection or tile) does not. How data instances are sampled is not clear nor explicitly written. What is also very confusing, is that in they pseudo-code, they assign labels to each data instance (x_i, y_i), which defeats the purpose of MIL, this is now ordinary supervised learning. Clearly there is some confusion in my understanding of the algorithm and idea of MIL
https://arxiv.org/pdf/1802.04712.pdf
–> Here they again specify MIL for weakly-annotated data, where only a bag is labeled, but each data-instance within the bag does not have its own individual label (contradicting the process in the former paper).
They also validate the idea of avoiding processing of a full image, by partitioning into single tiles or data-instances.
"The MIL seems to perfectly fit
medical imaging where processing a whole image consisting
of billions of pixels is computationally infeasible. Moreover,
in the medical domain it is very difficult to obtain pixel-level
annotations, that drastically reduces number of available
data. Therefore, it is tempting to divide a medical image
into smaller patches that could be further considered as a
bag with a single label"
So the alternative is using the data instances, which are cheaper to process compared to the whole image. But this must mean we use a sub-sample of data-instances, as using all data instances would be just as expensive as processing the entire image! So there must be a sampling mechanism of selecting which data instances to use. But again, no mention of it.
I am arriving at the conclusion that one of the misunderstandings I am having is the following one: In MIL, during training, we process ALL data instances within a bag/set. There is no sub-sampling involved. If a bag is an image containing thousands of tiles or data-instances, then they will all be processed. This is probably a reason why attention models are big in MIL, as they provide a way of detecting key instances in the bag
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP