Data Science Asked by ken wang on December 2, 2020
Given that I have a deep learning model(handover from former colleague).
For some reason, the train/dev set was missing.
In my situation, I want to classify my dataset into 100 categories.
The dataset is extremely imbalanced.
The dataset size is about tens of millions
First of all, I run the model and got the prediction on the whole dataset.
Then, I sample 100 records per category(according to the prediction) and got a 10,000 test set.
Next, I labeled the ground truth of each record for the test set and calculate the precision, recall, f1 for each category and got F1-micro and F1-macro.
How to estimate the accuracy or other metrics on the whole dataset? Is it correct that I use the weighted sum of each category’s precision(the weight is the proportion of prediction on the whole) to estimate?
Accuracy has a specific meaning classification - the data points with predicted labels must exactly match actual labels over the total number of data points.
In order to calculate accuracy, you need the actual labels for each data point. If you do not have actual labels for a data point, those data points can not be used in the analysis.
Answered by Brian Spiering on December 2, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP