Cross Validated Asked on December 18, 2021
In http://cs229.stanford.edu/notes/cs229-notes-all/cs229-notes5.pdf pg 5, it states that forward search takes $O(p^2)$ (note the notes uses $n$ instead of $p$ for the number if independent variables).
Isn’t the number of calls actually $O(2^p cdot k$)? You’re consider all possible subsets of independent variables in forward search, of which there are $2^p$ of them. For each subset you perform a k-fold validation so $k$ learning calls for each subset.
How is it that it’s quadratic instead of exponential?
In the forward selection procedure described, the features are partitioned into selected vs. candidate sets. At the start, the selected set is empty, and all $p$ features are candidates. At each stage, each candidate feature is provisionally added as a(n additional) trial feature, and the best one is chosen to be kept.
So 1 new feature is removed from the candidate set after each iteration. This means the first iteration tests $p$ candidates, the second $p-1$, etc. And $p + (p-1) + ldots + 2 + 1 = O[p^2]$.
Once a feature is added to the selected set, it is never removed in subsequent iterations. So the procedure does not explore all subsets of features. Rather it is a greedy heuristic, with no guarantees of optimality. (I believe this is why techniques like LASSO have become more popular for feature selection.)
Answered by GeoMatt22 on December 18, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP