전화 해지 여부 분류 AI 해커톤

ai3달차 3에린이입니다 대충해봤는데 스코어가 안나오네요 조언좀부탁드려요 ㅜ

2023.03.14 09:23 2,058 Views

피처엔지니어링이 너무어렵네요

train_x['총통화시간'] = train_x['주간통화시간'] + train_x['저녁통화시간'] + train_x['밤통화시간']

train_x['총통화횟수'] = train_x['주간통화횟수'] + train_x['저녁통화횟수'] + train_x['밤통화횟수']

train_x['총통화요금'] = train_x['주간통화요금'] + train_x['저녁통화요금'] + train_x['밤통화요금']

train_x['평균통화시간'] = train_x['총통화시간'] / train_x['총통화횟수']
train_x['평균통화요금'] = train_x['총통화요금'] / train_x['총통화횟수']

coff 결과입니다

              가입일   음성사서함이용    주간통화시간    주간통화횟수    주간통화요금    저녁통화시간    저녁통화횟수  \

가입일      1.000000  0.069771  0.190535  0.128138  0.118500  0.223365  0.119532  

음성사서함이용  0.069771  1.000000  0.045449  0.031312  0.070361  0.067955  0.039927  

주간통화시간   0.190535  0.045449  1.000000  0.213815  0.336852  0.493299  0.239448  

주간통화횟수   0.128138  0.031312  0.213815  1.000000  0.100227  0.265814  0.147605  

주간통화요금   0.118500  0.070361  0.336852  0.100227  1.000000  0.219930  0.098014  

저녁통화시간   0.223365  0.067955  0.493299  0.265814  0.219930  1.000000  0.280098  

저녁통화횟수   0.119532  0.039927  0.239448  0.147605  0.098014  0.280098  1.000000  

저녁통화요금   0.208486  0.065085  0.483473  0.247990  0.216840  0.806827  0.278620  

밤통화시간    0.256722  0.068964  0.508642  0.289059  0.219115  0.605980  0.277744  

밤통화횟수    0.111628  0.052887  0.238676  0.109052  0.126778  0.276678  0.135633  

밤통화요금    0.064803  0.025955  0.177415  0.102214  0.075288  0.210535  0.094207  

상담전화건수   0.073547  0.010326  0.158215  0.072544  0.102932  0.189158  0.080626  

총통화시간    0.267436  0.072386  0.821856  0.306306  0.315776  0.831932  0.318683  

총통화횟수    0.184679  0.063807  0.355703  0.644852  0.167047  0.422846  0.661199  

총통화요금    0.174589  0.083076  0.463367  0.183239  0.886018  0.471509  0.187481  

평균통화시간  -0.000449 -0.016269  0.139136 -0.428207  0.025694  0.097329 -0.426316  

평균통화요금   0.004575  0.012031  0.090765 -0.374065  0.438109  0.054311 -0.376002  


           저녁통화요금     밤통화시간     밤통화횟수     밤통화요금    상담전화건수     총통화시간     총통화횟수  \

가입일      0.208486  0.256722  0.111628  0.064803  0.073547  0.267436  0.184679  

음성사서함이용  0.065085  0.068964  0.052887  0.025955  0.010326  0.072386  0.063807  

주간통화시간   0.483473  0.508642  0.238676  0.177415  0.158215  0.821856  0.355703  

주간통화횟수   0.247990  0.289059  0.109052  0.102214  0.072544  0.306306  0.644852  

주간통화요금   0.216840  0.219115  0.126778  0.075288  0.102932  0.315776  0.167047  

저녁통화시간   0.806827  0.605980  0.276678  0.210535  0.189158  0.831932  0.422846  

저녁통화횟수   0.278620  0.277744  0.135633  0.094207  0.080626  0.318683  0.661199  

저녁통화요금   1.000000  0.580330  0.272802  0.204482  0.189227  0.743614  0.410954  

밤통화시간    0.580330  1.000000  0.286981  0.305315  0.191133  0.838063  0.438840  

밤통화횟수    0.272802  0.286981  1.000000  0.083941  0.083491  0.320567  0.639344  

밤통화요금    0.204482  0.305315  0.083941  1.000000  0.043592  0.275359  0.144106  

상담전화건수   0.189227  0.191133  0.083491  0.043592  1.000000  0.215052  0.121657  

총통화시간    0.743614  0.838063  0.320567  0.275359  0.215052  1.000000  0.486048  

총통화횟수    0.410954  0.438840  0.639344  0.144106  0.121657  0.486048  1.000000  

총통화요금    0.521415  0.437950  0.205534  0.442940  0.149811  0.551529  0.296200  

평균통화시간   0.055688  0.098578 -0.410309  0.039865  0.007628  0.136011 -0.650152  

평균통화요금   0.082869  0.029019 -0.352775  0.189473  0.017994  0.071698 -0.566896  


            총통화요금    평균통화시간    평균통화요금  

가입일      0.174589 -0.000449  0.004575  

음성사서함이용  0.083076 -0.016269  0.012031  

주간통화시간   0.463367  0.139136  0.090765  

주간통화횟수   0.183239 -0.428207 -0.374065  

주간통화요금   0.886018  0.025694  0.438109  

저녁통화시간   0.471509  0.097329  0.054311  

저녁통화횟수   0.187481 -0.426316 -0.376002  

저녁통화요금   0.521415  0.055688  0.082869  

밤통화시간    0.437950  0.098578  0.029019  

밤통화횟수    0.205534 -0.410309 -0.352775  

밤통화요금    0.442940  0.039865  0.189473  

상담전화건수   0.149811  0.007628  0.017994  

총통화시간    0.551529  0.136011  0.071698  

총통화횟수    0.296200 -0.650152 -0.566896  

총통화요금    1.000000  0.049198  0.435640  

평균통화시간   0.049198  1.000000  0.826315  

평균통화요금   0.435640  0.826315  1.000000

로그인이 필요합니다
0 / 1000
joon-ai
2023.03.14 14:01

대충말고 열심히 하셔야죠

벵자민
2023.03.14 14:36

삭제된 댓글입니다

-_._-
2023.03.14 14:36

반성하겠습니다...ㅠㅠ

비공전함
2023.03.14 14:06

분류문제는 타겟값을 0,1 또는 0,1,2,3으로 분류하기 떄문에, 타겟값과의 상관관계를 찾아보는게 생각보다 의미가 없습니다. 다른 변수들과 상관관계를 보면 모를까. 그러다보니 corr보다는 모델을 만들고 k-fold를 계속 해보시는걸 추천드립니다.

suinsuin
2023.03.14 22:43

혹시 '타겟값을 0,1 또는 0,1,2,3으로 분류하기 떄문에, 타겟값과의 상관관계를 찾아보는게 생각보다 의미가 없습니다' 라고 하셨는데, 왜 의미가 없다는 건가요?? 이해가 잘 안가서요,,

비공전함
2023.03.15 08:48

예를 들어 회귀 분석의 타겟값이라고 생각하면, 각 변수가 0.8,0.88,0.9등 이런 식으로 정렬이 되어 있습니다. 이때 각 값은 상관관계가 의미가 있습니다. 왜냐면 이 수치는 저희가 임의로 라벨링을 한 값이 아니라 그 수치 자체이기 때문입니다. 하지만 분류의 경우, 보통 각 값들이 0,1,2,3으로 저희가 임의로 라벨링을 합니다. 이때 상관관계를 보시면 저희가 라벨링을 한 변수가 아니라 라벨링 자체값인 0,1,2,3과의 상관관계를 보여줍니다.물론 상관관계가 유의한 경우도 있겠지만. 아닌경우가 대다수 입니다. 실제로 이번 대회에서 corr을 찍어보시면 라벨링이 된 타겟값과 변수들관의 상관관계가 상당히 낮은 것을 알 수 있습니다. 하지만 단순히 분류 문제에서 상관관계가 낮다는 이유로 임의의 변수를 삭제하게 되다면, 성능이 오히려 떨어지는 경우를 보게 됩니다.

suinsuin
2023.03.16 23:06

답변 감사드립니다 !

오븐
2023.03.14 20:23

스코어를 높이기 위해서는 Feature Engineering 외에도 모델 선택, 하이퍼파라미터 적용, 전처리 및 스케일링 등등 많은 요소가 복합적으로 작용합니다. 이러한 부분을 고려하여 복합적으로 시도해보시는 걸 추천 드립니당 

Mark2do
2023.03.23 14:32

삭제된 댓글입니다