Why does forward selection only take $O(p^2)$ calls to the learning algorithm?

Question

In http://cs229.stanford.edu/notes/cs229-notes-all/cs229-notes5.pdf pg 5, it states that forward search takes $O(p^2)$ (note the notes uses $n$ instead of $p$ for the number if independent variables).
Isn't the number of calls actually $O(2^p cdot k$)? You're consider all possible subsets of independent variables in forward search, of which there are $2^p$ of them. For each subset you perform a k-fold validation so $k$ learning calls for each subset.
How is it that it's quadratic instead of exponential?

GeoMatt22 · Answer

In the forward selection procedure described, the features are partitioned into selected vs. candidate sets. At the start, the selected set is empty, and all $p$ features are candidates. At each stage, each candidate feature is provisionally added as a(n additional) trial feature, and the best one is chosen to be kept.
So 1 new feature is removed from the candidate set after each iteration.
This means the first iteration tests $p$ candidates, the second $p-1$, etc.
And $p + (p-1) + ldots + 2 + 1 = O[p^2]$.
Once a feature is added to the selected set, it is never removed in subsequent iterations. So the procedure does not explore all subsets of features. Rather it is a greedy heuristic, with no guarantees of optimality. (I believe this is why techniques like LASSO have become more popular for feature selection.)

Why does forward selection only take $O(p^2)$ calls to the learning algorithm?

One Answer

Add your own answers!

Ask a Question