Credit Card Customer Segmentation AI Competition

Algorithm | Monthly Dacon | Structured | Finance | Classification | F1 Score

  • moneyIcon Prize : DASCHOOL Pro Subscription
  • 2025.03.10 ~ 2025.04.30 09:59 + Google Calendar
  • 1,600 Users Completed

 

[Private 7th] XGBoost_single_model

2025.05.08 10:05 809 Views language

먼저 일정 확인을 못하여 늦게 제출하는 점 사과드립니다.

1. Data Load 및 Merge
  1.1 데이터 타입 최적화
    병합 전에 optimize_types() 함수를 구현하여  int64 → int32, float64 → float32, object -> category 로 다운캐스팅하여 각 DataFrame의 메모리 사용량을 대폭 감소
  1.2 순차 병합 및 중간 저장
    모든 데이터를 한 번에 merge하지 않고, merge 대상 DataFrame들을 순차적으로 하나씩 merge
    병합이 완료된 시점마다 결과를 parquet 파일로 저장한 뒤 다시 읽어오는 방식으로 메모리 누수 및 누적 사용을 방지함
  1.3 gc 활용
    병합에 사용된 객체는 del로 제거하고, gc.collect()를 통해 불필요한 메모리를 수동 회수하여 메모리 효율을 극대화함

2. 데이터 전처리 및 피처엔지니어링

3. 데이터 인코딩

4. 각 클래스의 샘플 수의 비율에 반비례하는 값으로 class weights 계산 

5. 피쳐 선택 
  5.1 계산한 Class Weights를 적용하여 XGBoost 모델로 100,000 개의 Sample을 뽑아 Feature importance 도출
  5.2 전체 Feature importance  중 90%를 설명하는 주요 변수만 선택
  5.3 이를 통해 모델의 복잡도를 낮추는 동시에 일반화 성능을 향상

6. 모델 훈련

7. submission 제출

PDF
Code