Cross Validated Asked on November 9, 2021
I would like to use recursive feature elimination (implemented via caret in R) to perform feature selection for about 40 test results with 2 possible outcomes. Consequently, RFE either models by Accuracy or by Kappa. Now, I would like to pre-define a specificity threshold since I explicitly care more about specificity than about sensitivity. Is there a way to define this in the training?
Thank you!
Update
To be more clear, I have 527 different cases. Each case has 42 results (of a multiplex antigen panel, on a continuous scale) and is classified in 2 possible outcomes by a different test (126 positives and 401 negatives in the gold standard). Now I would like to select important features out of the 42 results to achieve a good prediction of the outcome (positive vs negative). High specificity is especially important.
I'm not sure that learning vector quantization (LVQ) is the best choice for this project. It requires some measure of similarity between cases, to match cases to prototype cases representing each of the classes. You don't say what similarity measure you use; it's often a Euclidean distance calculated over the multi-dimensional predictor space. Unless the distance measure is carefully chosen you might be throwing away information. LVQ can have some advantage for multiple-class problems and for interpreting models, but it has one serious drawback for a binary outcome: all it reports is a yes/no predicted class membership, not a probability of class membership.
As this post explains, even if your ultimate goal is classification it's best to use a criterion that is a proper scoring rule. That's a measure that is optimized when you have the correct probability model, so it requires a probability estimate for the class membership of each case. Logistic regression effectively uses a log-loss scoring rule, but there is a large variety of rules. For example, the equivalent of mean-square error when you have a probability estimate for the class membership of each case and the true membership is the Brier score, another proper scoring rule.
With 126 cases in the smallest class, you probably can get away with about 8 unpenalized predictors out of the 42 in your final model without overfitting, or with a larger number of predictors in a type of model that penalizes individual predictor contributions to avoid overfitting. There are many methods other than LVQ to choose from.
As a preliminary step you might just want to see if any of your 42 predictors has a small range of values relative to its measurement error over all the cases, ignoring their apparent associations with outcome. Since your data aren't too badly imbalanced, that might be an efficient way to cut down on the number of candidate predictors, however you proceed, without biasing your results by "peeking" at the outcomes. Then consider some other possibilities.
Logistic regression with variable selection by LASSO is one good possibility for this type of data, as it can give you a selection of specific predictors that together provide good probability estimates. So if for reasons like cost you want to cut way down from your 42 antigens, that could be a good choice. If there's no problem with analyzing a large number of antigens then you could consider logistic ridge regression instead, which keeps all of the predictors but differentially weights them according to their contributions to outcome while minimizing overfitting.
LASSO and ridge can be unwieldy if you need to consider interactions among the predictors rather than just their individual contributions to the probability estimates. Gradient-boosted trees are another possibility, in which you can include a large number of predictors and specify how many levels of interaction to consider, in a slow-learning process that can minimize overfitting. It's possible to get estimates of predictor importance from such models, which you could in principle use to help design an ultimate testing protocol with further experimental validation.
Those are only a few possibilities; just make sure that the type of model returns probability estimates for the cases.
Once you have good probability estimates you can adjust the probability cutoff for the ultimate classification in a way that matches the relative costs of false-negative and false-positive decisions in your application. There's no need to use the cutoff of p = 0.5 that is so often an explicit or implicit default. If false negatives are very costly to you, as your emphasis on specificity suggests, choose a higher probability cutoff to capture more of the true negatives. But make that choice at the end, after you have a reliable probability model.
Answered by EdM on November 9, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP