월간 데이콘 2 천체 유형 분류

  • 상금 : 총 94만원
  • 2020.02.01 ~ 2020.02.29 23:59
  • 657팀
  • D-10
참여

My first Model (천체 유형 분류)

  • day2020.02.14 17:52
  • view280 view
  • languagePython
  • writerby hansung.dev
댓글 3
모델별 Dacon 리더보드 제출 점수 * RandomForestClassifier : 0.50522 * XGBClassifier : 0.5009626058 * LGBMClassifier : 1.0451859688 * Catboot : 0.7275015319 LGB 경우 하이퍼파라미터 조금 수정해보니 0.5까지 향상 되었습니다. BaysianOptimization과 Hyperopt 도구를 이용해서 하이퍼파라미터 튜닝을 한다면 성능을 더 향상할수 있을것 같습니다.
코드

model score (Dacon 리더보드 제출 점수)

  • RandomForestClassifier : 0.50522
  • XGBClassifier : 0.5009626058
  • LGBMClassifier : 1.0451859688
  • Catboot : 0.7275015319
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
# SEED
RANDOM_SEED = 0
# Print Information
print('Seed: %i'%(RANDOM_SEED))
print('Pandas: %s'%(pd.__version__))
print('Numpy: %s'%(np.__version__))
#print('LightGBM: %s'%(lgb.__version__))
# print('Scikit-Learn: %s'%(sklearn.__version__))
Seed: 0
Pandas: 1.0.0
Numpy: 1.18.1

Load Files

data_path = '/Users/hansung.dev/Github/kaggle_Data/천체유형분류-월간데이콘2/'

train = pd.read_csv(data_path + 'data/train.csv', index_col=0)
test = pd.read_csv(data_path + 'data/test.csv', index_col=0)
sample_submission = pd.read_csv(data_path + 'data/sample_submission.csv', index_col=0)
# TRAIN 데이터의 TYPE을 SAMPLE_SUBMISSION에 대응하는 가변수 형태로 변환시킵니다.
column_number = {}
for i, column in enumerate(sample_submission.columns):
    column_number[column] = i
    
def to_number(x, dic):
    return dic[x]

train['type_num'] = train['type'].apply(lambda x: to_number(x, column_number))

Model, predict and solve

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score
from sklearn.metrics import f1_score, confusion_matrix, precision_recall_curve, roc_curve
train_x = train.drop(columns=['type', 'type_num'], axis=1)
train_y = train['type_num']
test_x = test
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(train_x, train_y, \
                                                  test_size=0.2, random_state=RANDOM_SEED, stratify = train_y)

Model, predict and solve : Catboot-0.7275015319

from catboost import CatBoostClassifier
%%time
cat_clf = CatBoostClassifier(iterations=100, random_state=RANDOM_SEED)
cat_clf.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_valid, y_valid)])
0:	learn: 2.6661732	test: 2.6661732	test1: 2.6655889	best: 2.6655889 (0)	total: 200ms	remaining: 19.8s
1:	learn: 2.4762134	test: 2.4762134	test1: 2.4751419	best: 2.4751419 (1)	total: 358ms	remaining: 17.5s
2:	learn: 2.3324057	test: 2.3324057	test1: 2.3315090	best: 2.3315090 (2)	total: 525ms	remaining: 17s
3:	learn: 2.2141264	test: 2.2141264	test1: 2.2134201	best: 2.2134201 (3)	total: 679ms	remaining: 16.3s
4:	learn: 2.1129994	test: 2.1129994	test1: 2.1123310	best: 2.1123310 (4)	total: 845ms	remaining: 16.1s
5:	learn: 2.0293120	test: 2.0293120	test1: 2.0285326	best: 2.0285326 (5)	total: 1s	remaining: 15.7s
6:	learn: 1.9535382	test: 1.9535382	test1: 1.9525887	best: 1.9525887 (6)	total: 1.19s	remaining: 15.7s
7:	learn: 1.8865950	test: 1.8865950	test1: 1.8855444	best: 1.8855444 (7)	total: 1.34s	remaining: 15.4s
8:	learn: 1.8270063	test: 1.8270063	test1: 1.8261054	best: 1.8261054 (8)	total: 1.5s	remaining: 15.2s
9:	learn: 1.7715183	test: 1.7715183	test1: 1.7705975	best: 1.7705975 (9)	total: 1.66s	remaining: 14.9s
10:	learn: 1.7208004	test: 1.7208004	test1: 1.7200709	best: 1.7200709 (10)	total: 1.83s	remaining: 14.8s
11:	learn: 1.6745966	test: 1.6745966	test1: 1.6739613	best: 1.6739613 (11)	total: 1.98s	remaining: 14.6s
12:	learn: 1.6298049	test: 1.6298049	test1: 1.6294606	best: 1.6294606 (12)	total: 2.17s	remaining: 14.5s
13:	learn: 1.5908225	test: 1.5908225	test1: 1.5905420	best: 1.5905420 (13)	total: 2.33s	remaining: 14.3s
14:	learn: 1.5538349	test: 1.5538349	test1: 1.5535396	best: 1.5535396 (14)	total: 2.5s	remaining: 14.2s
15:	learn: 1.5187010	test: 1.5187010	test1: 1.5184538	best: 1.5184538 (15)	total: 2.65s	remaining: 13.9s
16:	learn: 1.4869039	test: 1.4869039	test1: 1.4868436	best: 1.4868436 (16)	total: 2.82s	remaining: 13.8s
17:	learn: 1.4564343	test: 1.4564343	test1: 1.4564999	best: 1.4564999 (17)	total: 2.98s	remaining: 13.6s
18:	learn: 1.4261560	test: 1.4261560	test1: 1.4263994	best: 1.4263994 (18)	total: 3.14s	remaining: 13.4s
19:	learn: 1.3985289	test: 1.3985289	test1: 1.3987951	best: 1.3987951 (19)	total: 3.3s	remaining: 13.2s
20:	learn: 1.3732567	test: 1.3732567	test1: 1.3735297	best: 1.3735297 (20)	total: 3.46s	remaining: 13s
21:	learn: 1.3472254	test: 1.3472254	test1: 1.3475237	best: 1.3475237 (21)	total: 3.62s	remaining: 12.8s
22:	learn: 1.3246452	test: 1.3246452	test1: 1.3248761	best: 1.3248761 (22)	total: 3.77s	remaining: 12.6s
23:	learn: 1.3021305	test: 1.3021305	test1: 1.3023612	best: 1.3023612 (23)	total: 3.93s	remaining: 12.4s
24:	learn: 1.2800237	test: 1.2800237	test1: 1.2804253	best: 1.2804253 (24)	total: 4.09s	remaining: 12.3s
25:	learn: 1.2602678	test: 1.2602678	test1: 1.2607130	best: 1.2607130 (25)	total: 4.24s	remaining: 12.1s
26:	learn: 1.2405247	test: 1.2405247	test1: 1.2410091	best: 1.2410091 (26)	total: 4.4s	remaining: 11.9s
27:	learn: 1.2215805	test: 1.2215805	test1: 1.2220436	best: 1.2220436 (27)	total: 4.55s	remaining: 11.7s
28:	learn: 1.2037297	test: 1.2037297	test1: 1.2042218	best: 1.2042218 (28)	total: 4.71s	remaining: 11.5s
29:	learn: 1.1870548	test: 1.1870548	test1: 1.1875409	best: 1.1875409 (29)	total: 4.87s	remaining: 11.4s
30:	learn: 1.1710762	test: 1.1710762	test1: 1.1716153	best: 1.1716153 (30)	total: 5.03s	remaining: 11.2s
31:	learn: 1.1555681	test: 1.1555681	test1: 1.1560973	best: 1.1560973 (31)	total: 5.18s	remaining: 11s
32:	learn: 1.1408688	test: 1.1408688	test1: 1.1413838	best: 1.1413838 (32)	total: 5.34s	remaining: 10.9s
33:	learn: 1.1268232	test: 1.1268232	test1: 1.1273731	best: 1.1273731 (33)	total: 5.5s	remaining: 10.7s
34:	learn: 1.1119003	test: 1.1119003	test1: 1.1125074	best: 1.1125074 (34)	total: 5.66s	remaining: 10.5s
35:	learn: 1.0980565	test: 1.0980565	test1: 1.0988502	best: 1.0988502 (35)	total: 5.81s	remaining: 10.3s
36:	learn: 1.0856333	test: 1.0856333	test1: 1.0864826	best: 1.0864826 (36)	total: 5.97s	remaining: 10.2s
37:	learn: 1.0723788	test: 1.0723788	test1: 1.0734034	best: 1.0734034 (37)	total: 6.14s	remaining: 10s
38:	learn: 1.0594924	test: 1.0594924	test1: 1.0606074	best: 1.0606074 (38)	total: 6.3s	remaining: 9.85s
39:	learn: 1.0480244	test: 1.0480244	test1: 1.0492268	best: 1.0492268 (39)	total: 6.45s	remaining: 9.68s
40:	learn: 1.0368129	test: 1.0368129	test1: 1.0380792	best: 1.0380792 (40)	total: 6.61s	remaining: 9.52s
41:	learn: 1.0259426	test: 1.0259426	test1: 1.0272218	best: 1.0272218 (41)	total: 6.77s	remaining: 9.35s
42:	learn: 1.0158412	test: 1.0158412	test1: 1.0172001	best: 1.0172001 (42)	total: 6.93s	remaining: 9.18s
43:	learn: 1.0053516	test: 1.0053516	test1: 1.0068039	best: 1.0068039 (43)	total: 7.08s	remaining: 9.01s
44:	learn: 0.9955133	test: 0.9955133	test1: 0.9970482	best: 0.9970482 (44)	total: 7.24s	remaining: 8.85s
45:	learn: 0.9859104	test: 0.9859104	test1: 0.9875530	best: 0.9875530 (45)	total: 7.4s	remaining: 8.68s
46:	learn: 0.9773417	test: 0.9773417	test1: 0.9790714	best: 0.9790714 (46)	total: 7.56s	remaining: 8.52s
47:	learn: 0.9682551	test: 0.9682551	test1: 0.9700280	best: 0.9700280 (47)	total: 7.71s	remaining: 8.35s
48:	learn: 0.9598672	test: 0.9598672	test1: 0.9616754	best: 0.9616754 (48)	total: 7.88s	remaining: 8.2s
49:	learn: 0.9509392	test: 0.9509392	test1: 0.9528142	best: 0.9528142 (49)	total: 8.03s	remaining: 8.03s
50:	learn: 0.9422704	test: 0.9422704	test1: 0.9441952	best: 0.9441952 (50)	total: 8.2s	remaining: 7.88s
51:	learn: 0.9345377	test: 0.9345377	test1: 0.9364632	best: 0.9364632 (51)	total: 8.35s	remaining: 7.71s
52:	learn: 0.9257167	test: 0.9257167	test1: 0.9277736	best: 0.9277736 (52)	total: 8.52s	remaining: 7.55s
53:	learn: 0.9172973	test: 0.9172973	test1: 0.9194907	best: 0.9194907 (53)	total: 8.67s	remaining: 7.38s
54:	learn: 0.9099581	test: 0.9099581	test1: 0.9121863	best: 0.9121863 (54)	total: 8.83s	remaining: 7.23s
55:	learn: 0.9028473	test: 0.9028473	test1: 0.9051159	best: 0.9051159 (55)	total: 8.99s	remaining: 7.06s
56:	learn: 0.8956916	test: 0.8956916	test1: 0.8980575	best: 0.8980575 (56)	total: 9.15s	remaining: 6.9s
57:	learn: 0.8891572	test: 0.8891572	test1: 0.8915150	best: 0.8915150 (57)	total: 9.31s	remaining: 6.74s
58:	learn: 0.8832074	test: 0.8832074	test1: 0.8856073	best: 0.8856073 (58)	total: 9.47s	remaining: 6.58s
59:	learn: 0.8763245	test: 0.8763245	test1: 0.8787048	best: 0.8787048 (59)	total: 9.62s	remaining: 6.41s
60:	learn: 0.8704516	test: 0.8704516	test1: 0.8728743	best: 0.8728743 (60)	total: 9.78s	remaining: 6.25s
61:	learn: 0.8648263	test: 0.8648263	test1: 0.8673085	best: 0.8673085 (61)	total: 9.94s	remaining: 6.09s
62:	learn: 0.8582730	test: 0.8582730	test1: 0.8608005	best: 0.8608005 (62)	total: 10.1s	remaining: 5.93s
63:	learn: 0.8522552	test: 0.8522552	test1: 0.8547454	best: 0.8547454 (63)	total: 10.3s	remaining: 5.76s
64:	learn: 0.8466657	test: 0.8466657	test1: 0.8490947	best: 0.8490947 (64)	total: 10.4s	remaining: 5.61s
65:	learn: 0.8418386	test: 0.8418386	test1: 0.8442853	best: 0.8442853 (65)	total: 10.6s	remaining: 5.44s
66:	learn: 0.8370860	test: 0.8370860	test1: 0.8395578	best: 0.8395578 (66)	total: 10.7s	remaining: 5.29s
67:	learn: 0.8327654	test: 0.8327654	test1: 0.8352276	best: 0.8352276 (67)	total: 10.9s	remaining: 5.12s
68:	learn: 0.8270680	test: 0.8270680	test1: 0.8295614	best: 0.8295614 (68)	total: 11.1s	remaining: 4.96s
69:	learn: 0.8221842	test: 0.8221842	test1: 0.8247003	best: 0.8247003 (69)	total: 11.2s	remaining: 4.8s
70:	learn: 0.8174786	test: 0.8174786	test1: 0.8200348	best: 0.8200348 (70)	total: 11.4s	remaining: 4.64s
71:	learn: 0.8132565	test: 0.8132565	test1: 0.8158405	best: 0.8158405 (71)	total: 11.5s	remaining: 4.48s
72:	learn: 0.8092603	test: 0.8092603	test1: 0.8118828	best: 0.8118828 (72)	total: 11.7s	remaining: 4.32s
73:	learn: 0.8044471	test: 0.8044471	test1: 0.8070469	best: 0.8070469 (73)	total: 11.8s	remaining: 4.16s
74:	learn: 0.8002109	test: 0.8002109	test1: 0.8028017	best: 0.8028017 (74)	total: 12s	remaining: 4s
75:	learn: 0.7965655	test: 0.7965655	test1: 0.7992282	best: 0.7992282 (75)	total: 12.2s	remaining: 3.84s
76:	learn: 0.7926629	test: 0.7926629	test1: 0.7953798	best: 0.7953798 (76)	total: 12.3s	remaining: 3.68s
77:	learn: 0.7886065	test: 0.7886065	test1: 0.7913288	best: 0.7913288 (77)	total: 12.5s	remaining: 3.52s
78:	learn: 0.7847408	test: 0.7847408	test1: 0.7874891	best: 0.7874891 (78)	total: 12.6s	remaining: 3.36s
79:	learn: 0.7813292	test: 0.7813292	test1: 0.7840997	best: 0.7840997 (79)	total: 12.8s	remaining: 3.2s
80:	learn: 0.7768236	test: 0.7768236	test1: 0.7796202	best: 0.7796202 (80)	total: 13s	remaining: 3.04s
81:	learn: 0.7725442	test: 0.7725442	test1: 0.7754139	best: 0.7754139 (81)	total: 13.1s	remaining: 2.88s
82:	learn: 0.7696836	test: 0.7696836	test1: 0.7726035	best: 0.7726035 (82)	total: 13.3s	remaining: 2.72s
83:	learn: 0.7664060	test: 0.7664060	test1: 0.7693512	best: 0.7693512 (83)	total: 13.4s	remaining: 2.56s
84:	learn: 0.7625304	test: 0.7625304	test1: 0.7655066	best: 0.7655066 (84)	total: 13.6s	remaining: 2.4s
85:	learn: 0.7589366	test: 0.7589366	test1: 0.7619683	best: 0.7619683 (85)	total: 13.7s	remaining: 2.24s
86:	learn: 0.7558518	test: 0.7558518	test1: 0.7588860	best: 0.7588860 (86)	total: 13.9s	remaining: 2.08s
87:	learn: 0.7527659	test: 0.7527659	test1: 0.7558195	best: 0.7558195 (87)	total: 14.1s	remaining: 1.92s
88:	learn: 0.7500447	test: 0.7500447	test1: 0.7531353	best: 0.7531353 (88)	total: 14.2s	remaining: 1.76s
89:	learn: 0.7472323	test: 0.7472323	test1: 0.7503337	best: 0.7503337 (89)	total: 14.4s	remaining: 1.6s
90:	learn: 0.7436044	test: 0.7436044	test1: 0.7467512	best: 0.7467512 (90)	total: 14.5s	remaining: 1.44s
91:	learn: 0.7412852	test: 0.7412852	test1: 0.7444585	best: 0.7444585 (91)	total: 14.7s	remaining: 1.28s
92:	learn: 0.7387102	test: 0.7387102	test1: 0.7419615	best: 0.7419615 (92)	total: 14.8s	remaining: 1.12s
93:	learn: 0.7357762	test: 0.7357762	test1: 0.7390695	best: 0.7390695 (93)	total: 15s	remaining: 957ms
94:	learn: 0.7327220	test: 0.7327220	test1: 0.7360415	best: 0.7360415 (94)	total: 15.2s	remaining: 798ms
95:	learn: 0.7300092	test: 0.7300092	test1: 0.7333447	best: 0.7333447 (95)	total: 15.3s	remaining: 638ms
96:	learn: 0.7269955	test: 0.7269955	test1: 0.7303899	best: 0.7303899 (96)	total: 15.5s	remaining: 479ms
97:	learn: 0.7238659	test: 0.7238659	test1: 0.7273002	best: 0.7273002 (97)	total: 15.6s	remaining: 319ms
98:	learn: 0.7214367	test: 0.7214367	test1: 0.7248931	best: 0.7248931 (98)	total: 15.8s	remaining: 159ms
99:	learn: 0.7193626	test: 0.7193626	test1: 0.7228760	best: 0.7228760 (99)	total: 15.9s	remaining: 0us

bestTest = 0.7228759522
bestIteration = 99

CPU times: user 2min 54s, sys: 27.1 s, total: 3min 22s
Wall time: 16.2 s
<catboost.core.CatBoostClassifier at 0x1a22ff5910>
cat_pred = cat_clf.predict_proba(test_x)

Model, predict and solve : XGBClassifier-0.5009626058

from xgboost import XGBClassifier
%%time
xgb_clf = XGBClassifier(n_estimators=100, n_jobs=-1, random_state=RANDOM_SEED)
xgb_clf.fit(X_train,y_train, eval_set=[(X_train, y_train), (X_valid, y_valid)])
[0]	validation_0-merror:0.230505	validation_1-merror:0.234256
[1]	validation_0-merror:0.224636	validation_1-merror:0.226956
[2]	validation_0-merror:0.216173	validation_1-merror:0.21898
[3]	validation_0-merror:0.215317	validation_1-merror:0.21728
[4]	validation_0-merror:0.213979	validation_1-merror:0.21648
[5]	validation_0-merror:0.214998	validation_1-merror:0.217505
[6]	validation_0-merror:0.213498	validation_1-merror:0.21608
[7]	validation_0-merror:0.211654	validation_1-merror:0.21403
[8]	validation_0-merror:0.208417	validation_1-merror:0.21058
[9]	validation_0-merror:0.207429	validation_1-merror:0.20943
[10]	validation_0-merror:0.206942	validation_1-merror:0.208755
[11]	validation_0-merror:0.206017	validation_1-merror:0.20798
[12]	validation_0-merror:0.203841	validation_1-merror:0.20588
[13]	validation_0-merror:0.203166	validation_1-merror:0.20493
[14]	validation_0-merror:0.202691	validation_1-merror:0.20448
[15]	validation_0-merror:0.201704	validation_1-merror:0.204205
[16]	validation_0-merror:0.200548	validation_1-merror:0.203205
[17]	validation_0-merror:0.200341	validation_1-merror:0.202705
[18]	validation_0-merror:0.199554	validation_1-merror:0.201805
[19]	validation_0-merror:0.199322	validation_1-merror:0.201705
[20]	validation_0-merror:0.198804	validation_1-merror:0.20148
[21]	validation_0-merror:0.198091	validation_1-merror:0.20098
[22]	validation_0-merror:0.197054	validation_1-merror:0.199705
[23]	validation_0-merror:0.196735	validation_1-merror:0.199405
[24]	validation_0-merror:0.193141	validation_1-merror:0.196105
[25]	validation_0-merror:0.19236	validation_1-merror:0.195255
[26]	validation_0-merror:0.191016	validation_1-merror:0.19378
[27]	validation_0-merror:0.190072	validation_1-merror:0.19258
[28]	validation_0-merror:0.189728	validation_1-merror:0.19238
[29]	validation_0-merror:0.189009	validation_1-merror:0.191955
[30]	validation_0-merror:0.188022	validation_1-merror:0.190855
[31]	validation_0-merror:0.186859	validation_1-merror:0.190255
[32]	validation_0-merror:0.186466	validation_1-merror:0.18968
[33]	validation_0-merror:0.185822	validation_1-merror:0.18908
[34]	validation_0-merror:0.185384	validation_1-merror:0.188955
[35]	validation_0-merror:0.184784	validation_1-merror:0.18838
[36]	validation_0-merror:0.184603	validation_1-merror:0.188155
[37]	validation_0-merror:0.184172	validation_1-merror:0.18763
[38]	validation_0-merror:0.18304	validation_1-merror:0.186405
[39]	validation_0-merror:0.18244	validation_1-merror:0.18598
[40]	validation_0-merror:0.181172	validation_1-merror:0.18468
[41]	validation_0-merror:0.180078	validation_1-merror:0.18398
[42]	validation_0-merror:0.179609	validation_1-merror:0.183505
[43]	validation_0-merror:0.178984	validation_1-merror:0.18293
[44]	validation_0-merror:0.178653	validation_1-merror:0.182605
[45]	validation_0-merror:0.177365	validation_1-merror:0.181055
[46]	validation_0-merror:0.176371	validation_1-merror:0.179829
[47]	validation_0-merror:0.175709	validation_1-merror:0.179329
[48]	validation_0-merror:0.174952	validation_1-merror:0.178579
[49]	validation_0-merror:0.174434	validation_1-merror:0.178104
[50]	validation_0-merror:0.173402	validation_1-merror:0.177429
[51]	validation_0-merror:0.172452	validation_1-merror:0.176454
[52]	validation_0-merror:0.171677	validation_1-merror:0.175979
[53]	validation_0-merror:0.170959	validation_1-merror:0.175179
[54]	validation_0-merror:0.17029	validation_1-merror:0.174479
[55]	validation_0-merror:0.169315	validation_1-merror:0.173304
[56]	validation_0-merror:0.168215	validation_1-merror:0.172704
[57]	validation_0-merror:0.167783	validation_1-merror:0.172054
[58]	validation_0-merror:0.167065	validation_1-merror:0.171504
[59]	validation_0-merror:0.166577	validation_1-merror:0.171404
[60]	validation_0-merror:0.166177	validation_1-merror:0.171004
[61]	validation_0-merror:0.165465	validation_1-merror:0.170279
[62]	validation_0-merror:0.16519	validation_1-merror:0.169704
[63]	validation_0-merror:0.164758	validation_1-merror:0.169404
[64]	validation_0-merror:0.163833	validation_1-merror:0.168579
[65]	validation_0-merror:0.163296	validation_1-merror:0.167979
[66]	validation_0-merror:0.162739	validation_1-merror:0.167404
[67]	validation_0-merror:0.162146	validation_1-merror:0.166679
[68]	validation_0-merror:0.161396	validation_1-merror:0.166029
[69]	validation_0-merror:0.161158	validation_1-merror:0.165604
[70]	validation_0-merror:0.160883	validation_1-merror:0.165354
[71]	validation_0-merror:0.160496	validation_1-merror:0.165104
[72]	validation_0-merror:0.160246	validation_1-merror:0.164779
[73]	validation_0-merror:0.159958	validation_1-merror:0.164654
[74]	validation_0-merror:0.159364	validation_1-merror:0.163829
[75]	validation_0-merror:0.158927	validation_1-merror:0.163554
[76]	validation_0-merror:0.158702	validation_1-merror:0.163429
[77]	validation_0-merror:0.158495	validation_1-merror:0.163179
[78]	validation_0-merror:0.158102	validation_1-merror:0.163179
[79]	validation_0-merror:0.157664	validation_1-merror:0.162754
[80]	validation_0-merror:0.157383	validation_1-merror:0.162679
[81]	validation_0-merror:0.157114	validation_1-merror:0.162104
[82]	validation_0-merror:0.156777	validation_1-merror:0.161679
[83]	validation_0-merror:0.156483	validation_1-merror:0.161454
[84]	validation_0-merror:0.15607	validation_1-merror:0.161304
[85]	validation_0-merror:0.155695	validation_1-merror:0.160679
[86]	validation_0-merror:0.15552	validation_1-merror:0.160779
[87]	validation_0-merror:0.155026	validation_1-merror:0.160504
[88]	validation_0-merror:0.154614	validation_1-merror:0.160154
[89]	validation_0-merror:0.154276	validation_1-merror:0.159704
[90]	validation_0-merror:0.154108	validation_1-merror:0.159729
[91]	validation_0-merror:0.153895	validation_1-merror:0.159379
[92]	validation_0-merror:0.153476	validation_1-merror:0.159029
[93]	validation_0-merror:0.153164	validation_1-merror:0.158279
[94]	validation_0-merror:0.153014	validation_1-merror:0.158054
[95]	validation_0-merror:0.152464	validation_1-merror:0.157629
[96]	validation_0-merror:0.152183	validation_1-merror:0.157504
[97]	validation_0-merror:0.152101	validation_1-merror:0.157229
[98]	validation_0-merror:0.151751	validation_1-merror:0.156979
[99]	validation_0-merror:0.151526	validation_1-merror:0.156829
CPU times: user 28min 46s, sys: 4.37 s, total: 28min 50s
Wall time: 1min 49s
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=-1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)
xgb_pred = xgb_clf.predict_proba(test_x)

Model, predict and solve : LGBMClassifier - 1.0451859688

from lightgbm import LGBMClassifier
%%time
lgbm_clf = LGBMClassifier(n_estimators=100, n_jobs=-1, random_state=RANDOM_SEED)
lgbm_clf.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_valid, y_valid)])
[1]	training's multi_logloss: 1.89091	valid_1's multi_logloss: 1.90049
[2]	training's multi_logloss: 1.66878	valid_1's multi_logloss: 1.67833
[3]	training's multi_logloss: 1.52494	valid_1's multi_logloss: 1.54437
[4]	training's multi_logloss: 1.38843	valid_1's multi_logloss: 1.41067
[5]	training's multi_logloss: 1.28532	valid_1's multi_logloss: 1.30838
[6]	training's multi_logloss: 1.19183	valid_1's multi_logloss: 1.22099
[7]	training's multi_logloss: 1.11423	valid_1's multi_logloss: 1.15023
[8]	training's multi_logloss: 1.0527	valid_1's multi_logloss: 1.08569
[9]	training's multi_logloss: 0.997608	valid_1's multi_logloss: 1.04213
[10]	training's multi_logloss: 0.937073	valid_1's multi_logloss: 0.983177
[11]	training's multi_logloss: 0.895093	valid_1's multi_logloss: 0.946914
[12]	training's multi_logloss: 0.849116	valid_1's multi_logloss: 0.907706
[13]	training's multi_logloss: 0.81832	valid_1's multi_logloss: 0.884298
[14]	training's multi_logloss: 0.774846	valid_1's multi_logloss: 0.837762
[15]	training's multi_logloss: 0.749543	valid_1's multi_logloss: 0.806518
[16]	training's multi_logloss: 0.735271	valid_1's multi_logloss: 0.792319
[17]	training's multi_logloss: 0.702614	valid_1's multi_logloss: 0.76935
[18]	training's multi_logloss: 0.686772	valid_1's multi_logloss: 0.7535
[19]	training's multi_logloss: 0.667261	valid_1's multi_logloss: 0.73718
[20]	training's multi_logloss: 0.625317	valid_1's multi_logloss: 0.702836
[21]	training's multi_logloss: 0.620514	valid_1's multi_logloss: 0.704648
[22]	training's multi_logloss: 0.619669	valid_1's multi_logloss: 0.69429
[23]	training's multi_logloss: 0.566004	valid_1's multi_logloss: 0.652636
[24]	training's multi_logloss: 0.585664	valid_1's multi_logloss: 0.669663
[25]	training's multi_logloss: 0.569495	valid_1's multi_logloss: 0.658281
[26]	training's multi_logloss: 0.57752	valid_1's multi_logloss: 0.672334
[27]	training's multi_logloss: 0.535454	valid_1's multi_logloss: 0.638513
[28]	training's multi_logloss: 0.553835	valid_1's multi_logloss: 0.673696
[29]	training's multi_logloss: 0.528833	valid_1's multi_logloss: 0.64648
[30]	training's multi_logloss: 0.578538	valid_1's multi_logloss: 0.697522
[31]	training's multi_logloss: 0.517586	valid_1's multi_logloss: 0.658096
[32]	training's multi_logloss: 0.548796	valid_1's multi_logloss: 0.655248
[33]	training's multi_logloss: 0.47528	valid_1's multi_logloss: 0.612226
[34]	training's multi_logloss: 0.537562	valid_1's multi_logloss: 0.666986
[35]	training's multi_logloss: 0.505483	valid_1's multi_logloss: 0.651686
[36]	training's multi_logloss: 0.597609	valid_1's multi_logloss: 0.757709
[37]	training's multi_logloss: 0.541511	valid_1's multi_logloss: 0.68749
[38]	training's multi_logloss: 0.556333	valid_1's multi_logloss: 0.709643
[39]	training's multi_logloss: 0.615706	valid_1's multi_logloss: 0.761898
[40]	training's multi_logloss: 0.63206	valid_1's multi_logloss: 0.797695
[41]	training's multi_logloss: 0.580079	valid_1's multi_logloss: 0.739282
[42]	training's multi_logloss: 0.565135	valid_1's multi_logloss: 0.747775
[43]	training's multi_logloss: 0.581355	valid_1's multi_logloss: 0.753417
[44]	training's multi_logloss: 0.579443	valid_1's multi_logloss: 0.748452
[45]	training's multi_logloss: 1.02092	valid_1's multi_logloss: 1.15759
[46]	training's multi_logloss: 0.476178	valid_1's multi_logloss: 0.629517
[47]	training's multi_logloss: 1.18072	valid_1's multi_logloss: 1.36374
[48]	training's multi_logloss: 0.544892	valid_1's multi_logloss: 0.718467
[49]	training's multi_logloss: 0.893746	valid_1's multi_logloss: 1.06303
[50]	training's multi_logloss: 0.602278	valid_1's multi_logloss: 0.764528
[51]	training's multi_logloss: 0.481347	valid_1's multi_logloss: 0.658217
[52]	training's multi_logloss: 0.495474	valid_1's multi_logloss: 0.668389
[53]	training's multi_logloss: 0.475143	valid_1's multi_logloss: 0.655506
[54]	training's multi_logloss: 1.07671	valid_1's multi_logloss: 1.26056
[55]	training's multi_logloss: 0.613199	valid_1's multi_logloss: 0.803841
[56]	training's multi_logloss: 0.636192	valid_1's multi_logloss: 0.832045
[57]	training's multi_logloss: 0.673969	valid_1's multi_logloss: 0.852192
[58]	training's multi_logloss: 0.581995	valid_1's multi_logloss: 0.79244
[59]	training's multi_logloss: 0.577157	valid_1's multi_logloss: 0.799209
[60]	training's multi_logloss: 0.523105	valid_1's multi_logloss: 0.750686
[61]	training's multi_logloss: 0.51754	valid_1's multi_logloss: 0.749769
[62]	training's multi_logloss: 0.554517	valid_1's multi_logloss: 0.800669
[63]	training's multi_logloss: 0.5394	valid_1's multi_logloss: 0.799536
[64]	training's multi_logloss: 0.572897	valid_1's multi_logloss: 0.82795
[65]	training's multi_logloss: 0.637722	valid_1's multi_logloss: 0.885321
[66]	training's multi_logloss: 0.557729	valid_1's multi_logloss: 0.807432
[67]	training's multi_logloss: 0.693756	valid_1's multi_logloss: 0.967331
[68]	training's multi_logloss: 0.596841	valid_1's multi_logloss: 0.862969
[69]	training's multi_logloss: 0.560605	valid_1's multi_logloss: 0.830251
[70]	training's multi_logloss: 0.543307	valid_1's multi_logloss: 0.823477
[71]	training's multi_logloss: 0.621091	valid_1's multi_logloss: 0.890499
[72]	training's multi_logloss: 0.596961	valid_1's multi_logloss: 0.875604
[73]	training's multi_logloss: 0.627647	valid_1's multi_logloss: 0.917155
[74]	training's multi_logloss: 0.65678	valid_1's multi_logloss: 0.934713
[75]	training's multi_logloss: 0.657683	valid_1's multi_logloss: 0.958588
[76]	training's multi_logloss: 0.711232	valid_1's multi_logloss: 0.999443
[77]	training's multi_logloss: 0.775786	valid_1's multi_logloss: 1.1036
[78]	training's multi_logloss: 0.64225	valid_1's multi_logloss: 0.970808
[79]	training's multi_logloss: 0.675105	valid_1's multi_logloss: 1.01566
[80]	training's multi_logloss: 0.656443	valid_1's multi_logloss: 1.01565
[81]	training's multi_logloss: 0.687079	valid_1's multi_logloss: 1.02421
[82]	training's multi_logloss: 0.677322	valid_1's multi_logloss: 1.0196
[83]	training's multi_logloss: 0.683808	valid_1's multi_logloss: 1.04098
[84]	training's multi_logloss: 0.859556	valid_1's multi_logloss: 1.23005
[85]	training's multi_logloss: 0.789302	valid_1's multi_logloss: 1.15631
[86]	training's multi_logloss: 0.859019	valid_1's multi_logloss: 1.22701
[87]	training's multi_logloss: 0.718086	valid_1's multi_logloss: 1.07822
[88]	training's multi_logloss: 0.674162	valid_1's multi_logloss: 1.03994
[89]	training's multi_logloss: 0.746968	valid_1's multi_logloss: 1.12364
[90]	training's multi_logloss: 0.686721	valid_1's multi_logloss: 1.05724
[91]	training's multi_logloss: 0.653534	valid_1's multi_logloss: 1.04505
[92]	training's multi_logloss: 0.737548	valid_1's multi_logloss: 1.14149
[93]	training's multi_logloss: 0.651477	valid_1's multi_logloss: 1.06173
[94]	training's multi_logloss: 0.678207	valid_1's multi_logloss: 1.08973
[95]	training's multi_logloss: 0.69253	valid_1's multi_logloss: 1.10297
[96]	training's multi_logloss: 0.645114	valid_1's multi_logloss: 1.05593
[97]	training's multi_logloss: 0.690641	valid_1's multi_logloss: 1.10184
[98]	training's multi_logloss: 0.639622	valid_1's multi_logloss: 1.06443
[99]	training's multi_logloss: 0.635666	valid_1's multi_logloss: 1.06294
[100]	training's multi_logloss: 0.680217	valid_1's multi_logloss: 1.11163
CPU times: user 2min, sys: 815 ms, total: 2min 1s
Wall time: 7.74 s
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               importance_type='split', learning_rate=0.1, max_depth=-1,
               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
               n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
               random_state=0, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
lgbm_pred = lgbm_clf.predict_proba(test_x)

Model, predict and solve : RandomForestClassifier-0.50522

rf_clf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=RANDOM_SEED)
rf_clf.fit(train_x, train_y)
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=-1, oob_score=False, random_state=0, verbose=0,
                       warm_start=False)
rf_pred = rf_clf.predict_proba(test_x)

Submission

#RandomForestClassifier
submission = pd.DataFrame(data=rf_pred, columns=sample_submission.columns, index=sample_submission.index)
submission.to_csv(data_path + 'submission/submission_16th-Dacon-Basemodel_#4-RandomForestClassifier.csv', index=True)
#LGBMClassifier
submission = pd.DataFrame(data=lgbm_pred, columns=sample_submission.columns, index=sample_submission.index)
submission.to_csv(data_path + 'submission/submission_16th-Dacon-Basemodel_#4-LGBMClassifier.csv', index=True)
#XGBClassifier
submission = pd.DataFrame(data=xgb_pred, columns=sample_submission.columns, index=sample_submission.index)
submission.to_csv(data_path + 'submission/submission_16th-Dacon-Basemodel_#4-XGBClassifier.csv', index=True)
#CatBoostClassifier
submission = pd.DataFrame(data=cat_pred, columns=sample_submission.columns, index=sample_submission.index)
submission.to_csv(data_path + 'submission/submission_16th-Dacon-Basemodel_#4-CatBoostClassifier.csv', index=True)
댓글 3개
  • 회기역싸무라이회기역싸무라이 2020.02.14 21:53

    공유 감사합니다.

  • 정언용정언용 2020.02.15 13:24

    공유 감사합니다.

  • 세오자카종신 2020.02.16 00:23

    삭제된 댓글입니다

로그인이 필요합니다
목록으로