Data Science Asked by tokestermw on November 29, 2020
As far as I know, to train learning to rank models, you need to have three things in the dataset:
For example, the Microsoft Learning to Rank dataset uses this format (label, group id and features).
1 qid:10 1:0.031310 2:0.666667 ...
0 qid:10 1:0.078682 2:0.166667 ...
I am trying out xgBoost that utilizes GBMs to do pairwise ranking. They have an example for a ranking task that uses the C++ program to learn on the Microsoft dataset like above.
However, I am using their Python wrapper and cannot seem to find where I can input the group id (qid
above). I can train the model using just the features and relevance scores but I feel like I am missing something.
Here is a sample script.
gbm = XGBRegressor(objective="rank:pairwise")
X = np.random.normal(0, 1, 1000).reshape(100, 10)
y = np.random.randint(0, 5, 100)
gbm.fit(X, y) ### --- no group id needed???
print gbm.predict(X)
# should be in reverse order of relevance score
print y[gbm.predict_proba(X)[:, 1].argsort()][::-1]
According to the XGBoost documentation, XGboost expects:
set_group
method of DMatrix
in Python).Correct answer by amyrit on November 29, 2020
set_group
is very important to ranking, because only the scores in one group are comparable.
You can sort data according to their scores in their own group.
For easy ranking, you can use my xgboostExtension.
Answered by bigdong on November 29, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP