続続続Modelの評価 - ROC曲線とAUC

予測確率を用いたClassification Modelの評価

Confusion MatrixではPredicted Label (予測ラベル as 0 or 1)を用いてモデルを評価した。しかし、Predict Probability (予測確率)→Predicted Labelへと写すthreshold（閾値）が事前には決められるとは限らない。そこで、Predict Probabilityそのものからモデルを評価する場合にROC曲線とAUCが役に立つ

Confusion matrixはこちら
singapp.hatenablog.com

Receiver Operating Characteristic (ROC)

False Positive RateとTrue Positive Rateのthreshold =閾値（0-1）ごとの組み合わせ。

False Positive Rate: Precisionの余事象→ 横軸

　　　　　 $\displaystyle{FPR=\frac{FP}{(FP+TN)}}$

True Positive Rate: Recall→ 縦軸

　　　　　 $\displaystyle{TPR=\frac{TP}{(FN+TP)}}$

Structure

from sklearn.metrics import roc_curve
fpr_lr, tpr_lr, _ = roc_curve(y_test, y_score_lr)

Output

fpr_lr:　False Positive Rate

tpr_lr:　True Positive Rate

グラフが左上に近いほうが良い結果（つまり右下が広い＝AUCが大きい）
f:id:singapp:20200218155401p:plain

Sample

from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_curve

cancer =  load_breast_cancer()
X = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y = pd.Series(cancer.target)


X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=0)
clf = SVC(kernel='rbf', gamma=0.001, C=1, probability=True) #probability = True としないと学習結果に予測確率が含まれない
clf.fit(X_train,y_train) 

y_score_lr = clf.predict_proba(X_test)[:,1]
fpr_lr, tpr_lr, _ = roc_curve(y_test, y_score_lr) #threshold は使わないので'_'に取得

#graph plot
plt.figure(figsize=(10,6))
plt.plot(fpr_lr, tpr_lr, color ='red', label = 'ROC curve')
plt.plot([0,1], [0,1], color = 'black', linestyle ='--')

plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])

plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('Receiver Operating Characteristic')

plt.legend(loc='best')
plt.show()

f:id:singapp:20200918172454p:plain

Area Under Curve (AUC)

ROCの下の面積

Structure

from sklearn.metrics import auc
sklearn.metrics.auc(x, y)

Sample

from sklearn.metrics import auc
auc = auc(fpr_lr, tpr_lr)
print ('AUC:{:.3f}'.format(auc))

f:id:singapp:20200918172916p:plain

またはROCと一緒にグラフへ

#ROCのコードの一部を以下のように変更
plt.plot(fpr_lr, tpr_lr, color ='red', label = 'ROC curve (AUC = %.3f)' % auc)

f:id:singapp:20200918173132p:plain

PyInv

プログラミングのメモ、海外投資のメモ

続続続Modelの評価 - ROC曲線とAUC

予測確率を用いたClassification Modelの評価

Receiver Operating Characteristic (ROC)

Structure

Output

Sample

Area Under Curve (AUC)

Structure

Sample