Data Science Asked on January 3, 2021
I am studying support vector machines and different resources seem to define the margin differently. Some define the margin as 2 times the distance to the nearest datapoint.
Others define the margin based on the nearest datapoint in each direction. Why is there there a difference?
Here is my guess:
Case 1:
If we constrain the hyperplane so that it fulfills $y^{i}(w cdot x + b) geq 1$, then the hyperplane can be any hyperplane that separates the data. The distance between the closest positive datapoint and the closest negative datapoint do not have to be the same. Hence we find the closest in any direction and calculate the margin as twice the distance to this point.
Case 2:
If we constrain the hyperplane so that it fulfills $y^{i}(w cdot x + b) geq 1$ for all datapoints and $y^{i}(w cdot x + b = 1)$ for the support vectors (nearest datapoints) then the hyperplane must separate all the data and the nearest datapoints on each side must have the same distance. So the margin is now the sum of the distance to the nearest positive data point and the distance to the nearest negative data point.
Does the difference in margin definitions come from the additional constraint that the nearest datapoints lie on the hyperplane $y^{i}(w cdot x + b = 1)$?
Additionally is there a difference in the optimization procedure if we choose one margin definition over the other? For example if we define our margin as in case 1, will we have to search for a new nearest datapoint after every gradient descent iteration?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP