분석시각화 대회 코드 공유 게시물은
내용 확인 후
좋아요(투표) 가능합니다.
월간 데이콘 심리 성향 예측 AI 경진대회
individual folds의 AUC는 비슷한데 Full AUC가 크게 낮아질수있나요??
Training until validation scores don't improve for 200 rounds [200] training's auc: 0.789392 training's binary_logloss: 0.578752 valid_1's auc: 0.773208 valid_1's binary_logloss: 0.584832 [400] training's auc: 0.800083 training's binary_logloss: 0.547232 valid_1's auc: 0.773773 valid_1's binary_logloss: 0.56219 [600] training's auc: 0.810649 training's binary_logloss: 0.532177 valid_1's auc: 0.774668 valid_1's binary_logloss: 0.556256 [800] training's auc: 0.821325 training's binary_logloss: 0.521558 valid_1's auc: 0.775217 valid_1's binary_logloss: 0.554556 [1000] training's auc: 0.831154 training's binary_logloss: 0.512736 valid_1's auc: 0.775269 valid_1's binary_logloss: 0.554112 Early stopping, best iteration is: [870] training's auc: 0.825007 training's binary_logloss: 0.518329 valid_1's auc: 0.775402 valid_1's binary_logloss: 0.554351 Fold 1 AUC : 0.775402 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.791407 training's binary_logloss: 0.577465 valid_1's auc: 0.75649 valid_1's binary_logloss: 0.592267 [400] training's auc: 0.802093 training's binary_logloss: 0.545747 valid_1's auc: 0.75837 valid_1's binary_logloss: 0.57137 [600] training's auc: 0.812232 training's binary_logloss: 0.530778 valid_1's auc: 0.759681 valid_1's binary_logloss: 0.565757 [800] training's auc: 0.822218 training's binary_logloss: 0.520356 valid_1's auc: 0.760617 valid_1's binary_logloss: 0.56406 [1000] training's auc: 0.831826 training's binary_logloss: 0.511492 valid_1's auc: 0.761223 valid_1's binary_logloss: 0.563485 [1200] training's auc: 0.84137 training's binary_logloss: 0.503284 valid_1's auc: 0.761405 valid_1's binary_logloss: 0.563413 [1400] training's auc: 0.85006 training's binary_logloss: 0.495879 valid_1's auc: 0.761411 valid_1's binary_logloss: 0.563482 Early stopping, best iteration is: [1235] training's auc: 0.842915 training's binary_logloss: 0.50194 valid_1's auc: 0.761504 valid_1's binary_logloss: 0.563355 Fold 2 AUC : 0.761504 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.790601 training's binary_logloss: 0.577889 valid_1's auc: 0.759194 valid_1's binary_logloss: 0.590355 [400] training's auc: 0.801342 training's binary_logloss: 0.546124 valid_1's auc: 0.761742 valid_1's binary_logloss: 0.567863 [600] training's auc: 0.811515 training's binary_logloss: 0.531096 valid_1's auc: 0.762837 valid_1's binary_logloss: 0.561915 [800] training's auc: 0.821298 training's binary_logloss: 0.520804 valid_1's auc: 0.763339 valid_1's binary_logloss: 0.560295 [1000] training's auc: 0.831061 training's binary_logloss: 0.512052 valid_1's auc: 0.763698 valid_1's binary_logloss: 0.559725 [1200] training's auc: 0.840207 training's binary_logloss: 0.504104 valid_1's auc: 0.763647 valid_1's binary_logloss: 0.559686 Early stopping, best iteration is: [1033] training's auc: 0.832642 training's binary_logloss: 0.510674 valid_1's auc: 0.763779 valid_1's binary_logloss: 0.559648 Fold 3 AUC : 0.763779 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.790684 training's binary_logloss: 0.577864 valid_1's auc: 0.760275 valid_1's binary_logloss: 0.58962 [400] training's auc: 0.801148 training's binary_logloss: 0.546246 valid_1's auc: 0.762826 valid_1's binary_logloss: 0.567714 [600] training's auc: 0.811247 training's binary_logloss: 0.53133 valid_1's auc: 0.764295 valid_1's binary_logloss: 0.561506 [800] training's auc: 0.821473 training's binary_logloss: 0.520861 valid_1's auc: 0.764594 valid_1's binary_logloss: 0.559854 [1000] training's auc: 0.831251 training's binary_logloss: 0.512031 valid_1's auc: 0.764988 valid_1's binary_logloss: 0.559128 [1200] training's auc: 0.84027 training's binary_logloss: 0.504285 valid_1's auc: 0.765109 valid_1's binary_logloss: 0.558935 [1400] training's auc: 0.848786 training's binary_logloss: 0.497049 valid_1's auc: 0.765401 valid_1's binary_logloss: 0.558775 [1600] training's auc: 0.856597 training's binary_logloss: 0.490171 valid_1's auc: 0.765302 valid_1's binary_logloss: 0.558812 Early stopping, best iteration is: [1529] training's auc: 0.853847 training's binary_logloss: 0.492594 valid_1's auc: 0.765417 valid_1's binary_logloss: 0.558726 Fold 4 AUC : 0.765417 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.790286 training's binary_logloss: 0.578626 valid_1's auc: 0.768811 valid_1's binary_logloss: 0.58405 [400] training's auc: 0.800174 training's binary_logloss: 0.547254 valid_1's auc: 0.769129 valid_1's binary_logloss: 0.560844 [600] training's auc: 0.810632 training's binary_logloss: 0.532253 valid_1's auc: 0.770016 valid_1's binary_logloss: 0.554113 [800] training's auc: 0.820693 training's binary_logloss: 0.521796 valid_1's auc: 0.770703 valid_1's binary_logloss: 0.551766 [1000] training's auc: 0.830826 training's binary_logloss: 0.512781 valid_1's auc: 0.771093 valid_1's binary_logloss: 0.55095 [1200] training's auc: 0.83967 training's binary_logloss: 0.504991 valid_1's auc: 0.770801 valid_1's binary_logloss: 0.550959 Early stopping, best iteration is: [1029] training's auc: 0.832206 training's binary_logloss: 0.511554 valid_1's auc: 0.77125 valid_1's binary_logloss: 0.550844 Fold 5 AUC : 0.771250 Full AUC score 0.770161
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Training until validation scores don't improve for 200 rounds [200] training's auc: 0.789698 training's binary_logloss: 0.578713 valid_1's auc: 0.773593 valid_1's binary_logloss: 0.584739 [400] training's auc: 0.800657 training's binary_logloss: 0.547076 valid_1's auc: 0.774148 valid_1's binary_logloss: 0.562079 [600] training's auc: 0.8117 training's binary_logloss: 0.531765 valid_1's auc: 0.775241 valid_1's binary_logloss: 0.555964 [800] training's auc: 0.822934 training's binary_logloss: 0.520971 valid_1's auc: 0.775604 valid_1's binary_logloss: 0.554454 [1000] training's auc: 0.833159 training's binary_logloss: 0.511646 valid_1's auc: 0.775816 valid_1's binary_logloss: 0.553796 [1200] training's auc: 0.843186 training's binary_logloss: 0.503133 valid_1's auc: 0.775492 valid_1's binary_logloss: 0.553694 Early stopping, best iteration is: [1000] training's auc: 0.833159 training's binary_logloss: 0.511646 valid_1's auc: 0.775816 valid_1's binary_logloss: 0.553796 Fold 1 AUC : 0.775816 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.791629 training's binary_logloss: 0.577406 valid_1's auc: 0.756418 valid_1's binary_logloss: 0.592248 [400] training's auc: 0.802457 training's binary_logloss: 0.545641 valid_1's auc: 0.758321 valid_1's binary_logloss: 0.571322 [600] training's auc: 0.812912 training's binary_logloss: 0.530542 valid_1's auc: 0.759668 valid_1's binary_logloss: 0.565656 [800] training's auc: 0.823868 training's binary_logloss: 0.519785 valid_1's auc: 0.760657 valid_1's binary_logloss: 0.563887 [1000] training's auc: 0.833776 training's binary_logloss: 0.510649 valid_1's auc: 0.761322 valid_1's binary_logloss: 0.563189 [1200] training's auc: 0.843575 training's binary_logloss: 0.502119 valid_1's auc: 0.761737 valid_1's binary_logloss: 0.562861 Early stopping, best iteration is: [1197] training's auc: 0.843438 training's binary_logloss: 0.50224 valid_1's auc: 0.761741 valid_1's binary_logloss: 0.562857 Fold 2 AUC : 0.761741 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.790821 training's binary_logloss: 0.577836 valid_1's auc: 0.75931 valid_1's binary_logloss: 0.590317 [400] training's auc: 0.801874 training's binary_logloss: 0.546006 valid_1's auc: 0.76195 valid_1's binary_logloss: 0.567807 [600] training's auc: 0.812314 training's binary_logloss: 0.530831 valid_1's auc: 0.762897 valid_1's binary_logloss: 0.561974 [800] training's auc: 0.823388 training's binary_logloss: 0.520028 valid_1's auc: 0.763576 valid_1's binary_logloss: 0.560383 [1000] training's auc: 0.833421 training's binary_logloss: 0.510932 valid_1's auc: 0.764037 valid_1's binary_logloss: 0.559766 [1200] training's auc: 0.84327 training's binary_logloss: 0.502412 valid_1's auc: 0.763988 valid_1's binary_logloss: 0.559696 Early stopping, best iteration is: [1017] training's auc: 0.834267 training's binary_logloss: 0.510166 valid_1's auc: 0.764153 valid_1's binary_logloss: 0.55969 Fold 3 AUC : 0.764153 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.79091 training's binary_logloss: 0.577827 valid_1's auc: 0.760442 valid_1's binary_logloss: 0.589594 [400] training's auc: 0.801624 training's binary_logloss: 0.54613 valid_1's auc: 0.762994 valid_1's binary_logloss: 0.567674 [600] training's auc: 0.812017 training's binary_logloss: 0.531113 valid_1's auc: 0.764471 valid_1's binary_logloss: 0.56155 [800] training's auc: 0.822886 training's binary_logloss: 0.520403 valid_1's auc: 0.764573 valid_1's binary_logloss: 0.560084 [1000] training's auc: 0.833003 training's binary_logloss: 0.511128 valid_1's auc: 0.765087 valid_1's binary_logloss: 0.559365 [1200] training's auc: 0.842694 training's binary_logloss: 0.502851 valid_1's auc: 0.765069 valid_1's binary_logloss: 0.559247 Early stopping, best iteration is: [1080] training's auc: 0.836971 training's binary_logloss: 0.507758 valid_1's auc: 0.765208 valid_1's binary_logloss: 0.559254 Fold 4 AUC : 0.765208 Training until validation scores don't improve for 200 rounds [200] training's auc: 0.790714 training's binary_logloss: 0.578604 valid_1's auc: 0.768595 valid_1's binary_logloss: 0.584183 Early stopping, best iteration is: [28] training's auc: 0.781558 training's binary_logloss: 0.661882 valid_1's auc: 0.768728 valid_1's binary_logloss: 0.660462 Fold 5 AUC : 0.768728 Full AUC score 0.762044
안녕하세요. 대회 참여 중 궁금한 부분이 생겨 많은 분들과 의견 나누고 싶어 글을 남기게 되었습니다.
간단한 전처리 이후 train데이터를 validation으로 나누어 lgbm으로 돌리다가 의문이 생겼는데요.
단순 호기심에 LGBMClassifier의 max_depth 파라미터를 10에서 -1로 바꾸고 AUC Score를 비교해봤는데
각각 fold에서의 AUC Score는 크게 다르지 않았지만 Full Auc Score가 굉장히 크게 낮아졌습니다.
이렇게 달라질수 있는 원인이 있을까요???
절취선을 기준으로 위쪽이 max_depth를 10, 아래쪽이 max_depth를 -1로 지정했을 때 입니다.
이해를 돕기 위해 아래에 작성 코드를 남깁니다.(clf의 파라미터는 간소화를 위해 제거하였습니다.)
토론 게시판에서는 코드 공유 게시판과 다르게 작성 코드를 보기좋게 올릴 수 없는것 같은데 방법을 아시는 분은
댓글남겨주시면 감사하겠습니다
작성 코드:
folds = KFold(n_splits=5, shuffle=True, random_state=RANDOM)
feats = [f for f in train.columns if f not in ['voted']]
oof_preds = np.zeros(train.shape[0])
sub_preds = np.zeros(test.shape[0])
for n_fold, (train_idx, valid_idx) in enumerate(folds.split(train[feats], train['voted'])):
train_x, train_y = train[feats].iloc[train_idx], train['voted'].iloc[train_idx]
valid_x, valid_y = train[feats].iloc[valid_idx], train['voted'].iloc[valid_idx]
clf = LGBMClassifier()
clf.fit(train_x, train_y, eval_set = [(train_x, train_y), (valid_x, valid_y)],
eval_metric='auc', verbose=200, early_stopping_rounds=200)
oof_preds[valid_idx] = clf.predict_proba(valid_x, num_iteration=clf.best_iteration_)[:, 1]
sub_preds += clf.predict_proba(test, num_iteration=clf.best_iteration_)[:, 1] / folds.n_splits
print('Fold %2d AUC : %.6f' % (n_fold + 1, roc_auc_score(valid_y, oof_preds[valid_idx])))
print('Full AUC score %.6f' % roc_auc_score(train['voted'], oof_preds))
댓글 감사합니다. 해당부분에 대해서 한번 고민해봐야겠네요. 첫번째 그림에서는 1029번째 iteration의 값이 추출되었고 두 번째 그림에서는 28번째 iteration값이 추출되어 나중에 전체 out of fold 값에 유의미한 차이를 줄 수 있을것 같네요. 조정해보고 결과값을 다시 도출해봐야할 것 같습니다. 감사합니다.
데이콘(주) | 대표 김국진 | 699-81-01021
통신판매업 신고번호: 제 2021-서울영등포-1704호
서울특별시 영등포구 은행로 3 익스콘벤처타워 901호
이메일 dacon@dacon.io | 전화번호: 070-4102-0545
Copyright ⓒ DACON Inc. All rights reserved
2번째 그림에 5fold train auc가 급격히 낮아지고 loss가 높아지는데 이부분의 영향일까요?
loss는 증가했는데 auc는 크게 저하되지 않았네요.