Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final dh #5

Open
wants to merge 65 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
f610ed7
Create README.md
sangki930 May 25, 2021
97f4701
add feature
tofulim May 27, 2021
d2521e1
models commit
sangki930 May 28, 2021
b0e15a1
model recommit
sangki930 May 28, 2021
4913de4
[dh] new feature commit
tofulim May 28, 2021
1357646
[dh] k-fold commit
tofulim Jun 2, 2021
4ab9a96
[dh] make time2global feature
tofulim Jun 3, 2021
312c880
commit test
sangki930 Jun 3, 2021
1160816
[sangki930] branch commit
sangki930 Jun 3, 2021
af9892a
[dh] commit to merge
tofulim Jun 4, 2021
e527d8c
Update README.md
sangki930 Jun 4, 2021
23bf831
Merge pull request #2 from bcaitech1/new_branch_01
sangki930 Jun 4, 2021
5a4307a
pplz
tofulim Jun 4, 2021
4831d39
[dh] test
tofulim Jun 4, 2021
0d933bd
[dh] oh yeah
tofulim Jun 4, 2021
7a99180
add master
PrimeOfMine Jun 4, 2021
0f34d9b
dh explane
tofulim Jun 6, 2021
b110420
Merge branch 'comb_main' into sangki930
tofulim Jun 6, 2021
cf7a7e5
Merge pull request #3 from bcaitech1/sangki930
tofulim Jun 6, 2021
85449dd
test_sangki
sangki930 Jun 6, 2021
8cbe13d
sangki commit
sangki930 Jun 6, 2021
401ef43
cm
tofulim Jun 6, 2021
c55eb22
Merge branch 'comb_main' of https://github.com/bcaitech1/p4-dkt-olleh…
tofulim Jun 6, 2021
4f73e48
[dh] cont fix..
tofulim Jun 7, 2021
c813636
cm
tofulim Jun 7, 2021
02ab294
[dh] continuous fix
tofulim Jun 7, 2021
eff3dea
cm
tofulim Jun 7, 2021
088158b
[dh] submit fix, lstmattn fix
tofulim Jun 7, 2021
db91d72
[dh] change setting and split model.py to each architecture
tofulim Jun 9, 2021
03b01a9
[dh] change setting
tofulim Jun 10, 2021
b091795
[dh] make new branch to merge feat
tofulim Jun 10, 2021
324463c
cm
tofulim Jun 10, 2021
975b8f6
[dh] add presicion,recall,f1 metric
tofulim Jun 11, 2021
e68165a
[dh] cont/cate mid check
tofulim Jun 11, 2021
0f8e947
[dh] mid check
tofulim Jun 11, 2021
19428b2
[dh] push for compare
tofulim Jun 12, 2021
40fb49c
[dh] apply on model
tofulim Jun 12, 2021
047c851
fixed untracked files
tofulim Jun 13, 2021
aaab502
[dh] model final fix
tofulim Jun 13, 2021
105a1d3
[dh] final model fix
tofulim Jun 13, 2021
edee0f0
[dh] lgbm change
tofulim Jun 14, 2021
c8a7246
[dh] lgbm change
tofulim Jun 14, 2021
f3d3de8
[dh] cm
tofulim Jun 14, 2021
e701e79
[dh] cm
tofulim Jun 14, 2021
9a1fe4b
edit for k-fold
PrimeOfMine Jun 14, 2021
731b63e
add comments
PrimeOfMine Jun 14, 2021
8c0194c
debugging
PrimeOfMine Jun 14, 2021
c5f44df
[dh] fix & pull
tofulim Jun 14, 2021
0be294a
Merge branch 'final_dh' of https://github.com/bcaitech1/p4-dkt-ollehd…
tofulim Jun 14, 2021
f610a72
[dh] use test file
tofulim Jun 15, 2021
8f39e38
[dh] final push
tofulim Jun 15, 2021
0488c36
[dh] push
tofulim Jun 15, 2021
d6c3d6e
[dh] ffffinal commit
tofulim Jun 15, 2021
552e5ce
Update README.md
tofulim Jun 20, 2021
5eccb79
Update README.md
tofulim Jun 20, 2021
cba42b1
Update README.md
tofulim Jul 20, 2021
12a85d2
Update README.md
tofulim Jul 20, 2021
0a26104
Update README.md
tofulim Jul 24, 2021
0b1e351
Update README.md
tofulim Jul 24, 2021
6bd2e44
Update README.md
tofulim Jul 24, 2021
1512d6d
Update README.md
tofulim Jul 25, 2021
c98b8ae
Update README.md
tofulim Jul 25, 2021
ca4a390
Create README.md
tofulim Jul 25, 2021
e9c5699
Update README.md
tofulim Jul 26, 2021
7b94b5f
Update README.md
tofulim Jul 27, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@
models
output
wandb
src
output

#files
.args.py
#add when u clone this file
conf.yml
# conf.yml
# README.md
.gitignore
# .gitignore
NanumGothic.ttf
#etc
/.ipynb_checkpoints
Expand Down
30 changes: 23 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,32 @@
# pstage_04_dkt
# pstage_04_dkt(Deep Knowledge Tracing)
- 기간 : 2021.05.24~2021.06.15
- 대회 내용 : 학생의 지식 상태를 추적하여 문제 리스트 중 마지막 문제 정답 여부 예측(AUC : 0.8362 최종 7등/15팀 중)
![task img](https://user-images.githubusercontent.com/52443401/126865028-66d9f100-e1c3-4633-8790-86c1f7d84f47.JPG)
- 수행 요약 : 학생이 과목별, 시험지별 수행능력이 다름을 인지하고 한 학생을 여러 학생으로 split 하여 통계량 추출, short-term에 집중하였음
- 사용 모델 : LGBM, LSTM, LSTM with attention, Bert, Saint, LastQuery
### Important Technic
- k-fold (with user split)
- config.yml파일을 통한 실험으로 편리함 증진
- NN모델에 범주/연속형 피처를 자유롭게 넣을 수 있게 수정
- solve_time관련 feature들 생성
- user_month_split

### Important Feature
- user's last order time
![fi 사진](https://user-images.githubusercontent.com/52443401/126864608-e6af562b-e2b0-4ad7-9c2f-7a86bbac5b98.png)


## config 파일을 통한 실행법
### 1. config setting
모델 선택, 하이퍼 파라미터 선택
### 1. config.yml setting
모델 선택, 하이퍼 파라미터 선택, 기타 테크닉 옵션 선택

### 2. $ python3 train / inference .py
기존과 동일

### 3. $ python3 whole-in-one.py
학습-추론 한번에 실행
단, lgbm은 inference를 따로 수행하지 않아도 됩니다. train부분에서 모두 처리
실행시 폴더에 학습 때 사용한 하이퍼 파라미터와 피처를 json으로 저장

### 4. $ python3 submit.py
key와 파일path를 입력하면 다운로드할 필요 없이 서버에서 바로 제출

## lgbm 합치기
### 1. __feature_engineering
key와 파일path를 입력하면 제출용 csv를 다운로드할 필요 없이 서버에 바로 제출
22 changes: 15 additions & 7 deletions args.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,34 @@
import argparse



def parse_args(mode='train'):
parser = argparse.ArgumentParser()


parser.add_argument('--task_name', default='lstm_attn', type=str, help='task_name')
parser.add_argument('--seed', default=42, type=int, help='seed')

parser.add_argument('--device', default='gpu', type=str, help='cpu or gpu')
parser.add_argument('--device', default='cpu', type=str, help='cpu or gpu')


parser.add_argument('--data_dir', default='/opt/ml/input/data/train_dataset', type=str, help='data directory')
parser.add_argument('--asset_dir', default='asset/', type=str, help='data directory')

parser.add_argument('--infer',default=False,help='inferenc or not')
parser.add_argument('--Tfixup',default=True,help='tfix or not')

parser.add_argument('--file_name', default='train_data.csv', type=str, help='train file name')
# parser.add_argument('--file_name', default='test_jongho.csv', type=str, help='train file name')


parser.add_argument('--model_dir', default='models/', type=str, help='model directory')
parser.add_argument('--model_name', default='model.pt', type=str, help='model file name')

parser.add_argument('--output_dir', default='output/', type=str, help='output directory')
parser.add_argument('--test_file_name', default='test_data.csv', type=str, help='test file name')
# parser.add_argument('--test_file_name', default='test2.csv', type=str, help='test file name')

parser.add_argument('--max_seq_len', default=20, type=int, help='max sequence length')
parser.add_argument('--max_seq_len', default=24, type=int, help='max sequence length')

parser.add_argument('--num_workers', default=1, type=int, help='number of workers')

# 모델
Expand All @@ -42,9 +49,10 @@ def parse_args(mode='train'):
parser.add_argument('--log_steps', default=50, type=int, help='print log per n steps')


### 중요 ###
parser.add_argument('--model', default='lstmattn', type=str, help='model type')
parser.add_argument('--optimizer', default='adam', type=str, help='optimizer type')

parser.add_argument('--model', default='lstm', type=str, help='model type')
parser.add_argument('--optimizer', default='adamP', type=str, help='optimizer type')

parser.add_argument('--scheduler', default='plateau', type=str, help='scheduler type')

args = parser.parse_args()
Expand Down
90 changes: 90 additions & 0 deletions conf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
model : bert # {lstm, lstmattn, bert, lgbm, lstmroberta, lastquery, saint, lstmalbertattn}

# (비일반화) : base feature만 사용
# - lstm,lstmattn,bert
# (일반화) : 추가한 컬럼까지 범주형으로 사용

wandb :
using: True
project: DKT

## 자신의 wandb 아이디를 적어주세요
entity: vail131
tags:
- baseline

##main params
task_name: bert_seo1try_user_split
seed: 42
device: cuda

data_dir: /opt/ml/input/data/train_dataset

# For training files
file_name: train_time_finalfix.csv
test_train_file_name : test_time_finalfix2.csv

# For predicting files
test_file_name: test_time_finalfix.csv

asset_dir: asset/
model_dir: models/
output_dir: output/

max_seq_len: 128
num_workers: 1

##K-fold params
use_kfold : False #n개의 fold를 이용하여 k-fold를 진행한다.
use_stratify : False
n_fold : 4
split_by_user : False #k-fold를 수행할 dataset을 user 기준으로 split
user_split_augmentation : True
use_total_data : True
##모델
hidden_dim : 512
n_layers : 2
n_heads : 2
drop_out: 0.2

#train
n_epochs: 20
batch_size: 128
lr: 0.0001
clip_grad : 10
patience : 5
log_steps : 50
split_ratio : 0.8

#중요
optimizer : adamW
scheduler: plateau


#use only in lgbm
lgbm:
model_params: {
'objective': 'binary', # 이진 분류
'boosting_type': 'gbdt',
'metric': 'auc', # 평가 지표 설정
'feature_fraction': 0.4, # 원래 0.8 피처 샘플링 비율
'bagging_fraction': 0.6, # 원래 0.8 데이터 샘플링 비율
'bagging_freq': 1,
'n_estimators': 10000, # 트리 개수
'early_stopping_rounds': 100,
'seed': 42,
'verbose': -1,
'n_jobs': -1,
}

verbose_eval : 100 #ori 100
num_boost_round : 500
early_stopping_rounds : 100


## LGBM feature enginnering 용 args
make_sharing_feature : True #extract statistics featrue from train + test(except last rows)
use_test_data : True #use test_data for train

# distance 피처 쓰기
use_distance : False
68 changes: 0 additions & 68 deletions conf.yml

This file was deleted.

1 change: 1 addition & 0 deletions dkt/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .metric import *
2 changes: 2 additions & 0 deletions dkt/criterion.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@

def get_criterion(pred, target):
loss = nn.BCELoss(reduction="none")
# loss = nn.CrossEntropyLoss(reduction="none")

return loss(pred, target)
Loading