pandas - How to interpret column matrix to find best model for imbalanced dataset?

Question

Welcome To Ask or Share your Answers For Others

pandas - How to interpret column matrix to find best model for imbalanced dataset?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - How to interpret column matrix to find best model for imbalanced dataset?

I am trying to make binary classification and My dataset is imbalanced with a 1:7 ratio. I have 1000 "1" labels and 6990 "0" labels.

Predicting "1" Labels is more important than "0" but still, It should also detect "0" labels correctly as much as possible.

I used sampling techniques and used different models like XGBClassifier, LightGBM, SVM, KNN and I got different confusion matrixes. In some of them, detecting the "1" label is very good but detecting the "O" is not very good. And others, both "1" and "O" detecting are average.

I know accuracy is not a good metric to evaluate an imbalanced dataset, so I used the recall, f2 score, and AUC score. But still, I confused about which model is best.

According to these results, which model is best?

question from:https://stackoverflow.com/questions/65908341/how-to-interpret-column-matrix-to-find-best-model-for-imbalanced-dataset

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:12:32+0000

One way is to validate your model is using different k-folds. Divide your data into 4 or 5 sets of train-test pairs. Get the results of the different tests and take an average. That should allow you to better understand the performance of the different models.

Categories

pandas - How to interpret column matrix to find best model for imbalanced dataset?

pandas - How to interpret column matrix to find best model for imbalanced dataset?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags