Skip to content

πŸ… ν”„λ ˆμ‹œν•˜κ²Œ ν’€μ–΄λ³΄μž, DKT - Fresh Tomato πŸ…

Notifications You must be signed in to change notification settings

bcaitech1/p4-dkt-freshtomato

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

β­•βŒ Deep Knowledge Tracing


Task Description

Task

학생이 ν‘Ό 문제 λ¦¬μŠ€νŠΈμ™€ μ •λ‹΅ μ—¬λΆ€κ°€ λ‹΄κΈ΄ 데이터λ₯Ό λ°”νƒ•μœΌλ‘œ ν•™μƒμ˜ μ§€μ‹μƒνƒœλ₯Ό μΆ”μ ν•˜κ³ , λ―Έλž˜μ— 학생이 νŠΉμ • 문제λ₯Ό λ§žμΆœμ§€ 틀릴지λ₯Ό μ˜ˆμΈ‘ν•©λ‹ˆλ‹€. 이λ₯Ό 톡해 ν•™μƒμ—κ²Œ 개인 λ§žμΆ€ν˜• κ΅μœ‘μ„ μ œκ³΅ν•©λ‹ˆλ‹€.

Metric

AUROC, Accuracy


Pipeline

freshtomato_pipeline

Command Line Interface

1️⃣ Train/Valid Ratio 9:1 (random split)

Train Phase

>>> cd code
>>> python train.py --wandb_project_name [PROJECT_NAME] --wandb_run_name [RUN_NAME] --model [MODEL]

Inference Phase

>>> cd code
>>> python inference.py --wandb_run_name [RUN_NAME] --model [MODEL]

2️⃣ K-fold

Train Phase

>>> cd code
>>> python train_kfold.py --wandb_project_name [PROJECT_NAME] --wandb_run_name [RUN_NAME] --model [MODEL] --kfold 10

Inference Phase

>>> cd code
>>> python inference_kfold.py --wandb_run_name [RUN_NAME] --model [MODEL] --kfold 10

3️⃣ Stratified K-fold

Train Phase

>>> cd code
>>> python train_stfkfold.py --wandb_project_name [PROJECT_NAME] --wandb_run_name [RUN_NAME] --model [MODEL] --kfold 10

Inference Phase

>>> cd code
>>> python inference_kfold.py --wandb_run_name [RUN_NAME] --model [MODEL] --kfold 10

Implemented models

  • LSTM (lstm)
  • LSTM + Attention (lstmattn)
  • Bert (bert)
  • GRUATTN (gruattn)
  • ATTNGRU (attngru)
  • Saint (saint)
  • Saint_custom (saintcustom)
  • LastQuery (lastquery)
  • BaseCNN (cnn)
  • DeepCNN (deepcnn)

Directory structure

β”œβ”€β”€ README.md                 - λ¦¬λ“œλ―Έ 파일
β”‚ 
β”œβ”€β”€ requirements.md           - ν•„μš”ν•œ library
| 
β”œβ”€β”€ dkt/                      - DLνŒ€ utils 파일
β”‚   β”œβ”€β”€ criterion.py           
β”‚   β”œβ”€β”€ custom_model.py           
β”‚   │── dataloader.py         
β”‚   │── feature.py       
β”‚   │── model.py
β”‚   │── optimizer.py         
β”‚   │── scheduler.py       
β”‚   │── trainer.py                  
|   └── utils.py
| 
β”œβ”€β”€ code/                     - DLνŒ€ μ½”λ“œ 폴더
β”‚   β”œβ”€β”€ args.py           
β”‚   β”œβ”€β”€ inference.py           
β”‚   │── inference_kfold.py         
β”‚   │── train.py       
β”‚   │── train_kfold.py                  
|   └── train_stfkfold.py
β”‚ 
β”œβ”€β”€ notebook_pycaret          - MLνŒ€ μ½”λ“œ 폴더
|   β”œβ”€β”€ Add_Feature_with_Groupby.ipynb
|   β”œβ”€β”€ get_logsμ—°μŠ΅ν•΄λ³΄κΈ°.ipynb
|   β”œβ”€β”€ kaggle_riid_μ „μ²˜λ¦¬.ipynb
|   β”œβ”€β”€ LGBM_Validμ›ν•˜λŠ”λŒ€λ‘œκ΅¬μΆ•μ„±κ³΅.ipynb
β”‚   β”œβ”€β”€ Optuna_LightGBM_λ¬Έμ œμ‹œκ°„κ°„κ²©ν›„μ²˜λ¦¬X.ipynb
β”‚   β”œβ”€β”€ output파일_bestλž‘λΉ„κ΅ν•΄λ³΄κΈ°.ipynb
|   └── PermutationImportance.ipynb
| 
└── notebooks              
    β”œβ”€β”€ baseline.ipynb
    β”œβ”€β”€ EDA_Minyong.ipynb            
    β”œβ”€β”€ EDA-arabae.ipynb
    β”œβ”€β”€ hard_and_soft_ensemble.ipynb
    β”œβ”€β”€ output_confidence.ipynb
    └── Riiid_pretrain.ipynb

πŸ… Members

κ°•λ―Όμš© T1001 [Github] [Blog]

  • EDA
  • GRU + Attention & SAINT Modeling
  • User Data Augmentation, Pseudo Labeling
  • Deep Learning Code κ°œμ„ 
  • DKT, DKT+, DKVMN λ…Όλ¬Έ 정리 및 곡유

κΉ€μ§„ν˜„ T1248 [Github] [Blog]

  • EDA
  • Saint, Saint+ Modeling
  • Feature Searching (문항별 λ‚œμ΄λ„ / KnowledgeTag)
  • Ensemble (Hard + Soft voting)

λ¬Έμž¬ν›ˆ T1058 [Github] [Blog]

  • ML (with Customized Optuna & Pycaret)
  • κ²€μ¦μ „λž΅ (HoldOut Set, Customized CV)
  • Efficient Feature Engineering (with Pandas method)
  • Feature Selection (with Permutation Importance)
  • Riiid Dataset Processing for Pre-training

배아라 T1084 [Github] [Blog]

  • LSTM, LSTM+Attention, BERT, CNN, Last Query, SAINT λ“± λ‹€μ–‘ν•œ λͺ¨λΈ κ΅¬ν˜„ 및 μ‹€ν—˜
  • User별 Feature Engineering
  • Deep Learning Code κ°œμ„ 
  • Riiid 데이터λ₯Ό ν™œμš©ν•œ pre-train μ‹œλ„
  • Ensemble (soft-voting, weighted soft-voting)

μ΄μ •ν˜„ T1160 [Github] [Blog]

  • EDA
  • RNN계열(LSTM, LSTM+Attention) λͺ¨λΈ μ‹€ν—˜
  • DKT, DKVMN 논문정리
  • Riiid Competition Data Analysis for Transfer Learning
  • 회의 λ‚΄μš© 정리

졜유라 T1212 [Github] [Blog]

  • EDA - ν•™μŠ΅ 데이터, ν…ŒμŠ€νŠΈ 데이터 뢄포 νŒŒμ•…
  • Feature engineering - 풀이 μ‹œκ°„, 정확도 평균 feature μΆ”κ°€
  • ML λͺ¨λΈ ν•™μŠ΅ - LightGBM, XGBoost, CatBoost
  • Validation set μ°ΎκΈ°

About

πŸ… ν”„λ ˆμ‹œν•˜κ²Œ ν’€μ–΄λ³΄μž, DKT - Fresh Tomato πŸ…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published