Data transformations in hierarchical classification

Question

I am building a hierarchical text classifier using the Local Classifier Per Parent Node (LCPN) approach with the 'siblings' policy
as described in the A survey of hierarchical classification across different application domains:
E.g. if we have the classes 1.1, 1.2, 2.1, 2.2, 2.3 then in the first level we use all the training set to train a classifier to distinguish between class 1 (1.1,1.2) and 2 (2.1,2.2,2.3),
at the second level we use two multiclass classifier the first one to classify between 1.1 and 1.2 using as training set only the data belonging to these classes and the second classifier for the rest.
Should any data transformation (e.g. scaling, tf-idf) that we do to the data happen at each level of the classifier?
I.e. since at the first level the tf-idf vectors are created by fitting to the whole training set, can we use them at the second level or should we fit to the new training subsets?

vienna_kaggling · Answer

It depends on the dataset, but generally fit again

why? If you dont fit again on the second level when classifying 1.1 and 1.2 you are introducing bias that you got from the first level when you classified between classes 1 and 2.

why it depends? if information is intertwined between all of the parent and children classes and you will use these models again in the future, you could be loosing important information when fitting again, in other words you will be only over-fitting on the current train (classify 1.1 1.2)

Brian Spiering · Answer

It is generally best practice to perform all feature engineering before applying classifiers.
The two primary reasons are:

Simplicity - If feature engineering is conditional on model performance then it is harder to find and debug edge cases.

Handle of sample issues - Especially in text, there are novel examples (e.g., words that appear during prediction that do not appear during training). Applying feature engineering to as much as possible increases the robustness of the transforms.

Data transformations in hierarchical classification

2 Answers

Add your own answers!

Ask a Question