Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
237 views
in Technique[技术] by (71.8m points)

pandas - How to interpret column matrix to find best model for imbalanced dataset?

I am trying to make binary classification and My dataset is imbalanced with a 1:7 ratio. I have 1000 "1" labels and 6990 "0" labels.

Predicting "1" Labels is more important than "0" but still, It should also detect "0" labels correctly as much as possible.

I used sampling techniques and used different models like XGBClassifier, LightGBM, SVM, KNN and I got different confusion matrixes. In some of them, detecting the "1" label is very good but detecting the "O" is not very good. And others, both "1" and "O" detecting are average.

I know accuracy is not a good metric to evaluate an imbalanced dataset, so I used the recall, f2 score, and AUC score. But still, I confused about which model is best.

According to these results, which model is best?

enter image description here

question from:https://stackoverflow.com/questions/65908341/how-to-interpret-column-matrix-to-find-best-model-for-imbalanced-dataset

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One way is to validate your model is using different k-folds. Divide your data into 4 or 5 sets of train-test pairs. Get the results of the different tests and take an average. That should allow you to better understand the performance of the different models.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...