Credit Card Customer Segmentation AI Competition

Algorithm | Monthly Dacon | Structured | Finance | Classification | F1 Score

  • moneyIcon Prize : DASCHOOL Pro Subscription
  • 2025.03.10 ~ 2025.04.30 09:59 + Google Calendar
  • 1,472 Users Completed

 

[Private 5th] XGB + Feature Engineering(laddering technique)

2025.05.01 23:15 350 Views language

2025.05.02. 필요한 코드를 추가했습니다. (a100, data merge)
주로 14, 66 trial 주변에서 optimal 값이 나오는 것 같습니다.
Tabular data is always XGB ~
감사합니다.

| ✅ Step 1 | 설정 및 데이터 로드 | 경로 설정, Parquet 데이터 로드 | `pandas`, `os` |
| ✅ Step 2 | 기본 전처리 | 결측치 보간, 이상치 제거(Winsorization), 라벨 인코딩 | `SimpleImputer`, `winsorize`, `LabelEncoder` |
| ✅ Step 3 | 피처 엔지니어링 함수 정의 | 로그변환, 집계 피처 생성, 소비 패턴/위험 지표 생성 등 | `numpy`, `KMeans`, `QuantileTransformer` |
| ✅ Step 4 | 피처 엔지니어링 실행 | 함수 적용하여 train/test에 FE 적용 | - |
| ✅ Step 5 | 피처 선택 | LightGBM 기반 SelectFromModel 적용 | `lightgbm`, `SelectFromModel` |
| ✅ Step 6 | Optuna Objective 정의 | 서브샘플링, KFold, SMOTE, Target Encoding, F1 평가 | `Optuna`, `SMOTE`, `TargetEncoder` |
| ✅ Step 7 | Optuna 튜닝 실행 | Optuna DB 로딩 or 생성, 최적 파라미터 추출 | `Optuna`, `sqlite` |
| ✅ Step 8 | 최종 모델 학습 | 전체 데이터 기반 KFold 학습 + 예측 + Confusion Matrix 시각화 | `xgboost`, `matplotlib`, `seaborn` |
| ✅ Step 9 | 최종 제출 파일 생성 | 테스트 데이터에 대해 최빈값 기준 예측 저장 | `pandas` |
| ✅ Step 10 | 모델 및 객체 저장 | 모델, 인코더, 임퓨터, 피처 리스트 저장 | `joblib` |

PDF
Code