-
Notifications
You must be signed in to change notification settings - Fork 5
๋ฏผ์ฉ
MignonDeveloper edited this page May 30, 2021
·
22 revisions
- LSTMATTN -> lr๋ง 1e-4์์ 1e-5๋ก ์์ , clip_grad๋ฅผ 10->20์ผ๋ก ํ๊ณ ๋ค๋ฅธ ํ๊ฒฝ์ ๋ค ๋์ผํ๊ฒ ํ ์ํ์์ LSTM+ATTN์ผ๋ก ๋ชจ๋ธ๋ง ๋ณ๊ฒฝํด์ ์งํ
- batch size: 32
- layer: 2
- max_seq_len: 100
- hidden_dim: 512
- seed: 42
- lr: 1e-5
- num_workers: 4
- clip_grad: 20
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(3): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ knowledgeTag์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7836 / lb_auc: / ์ฐจ์ด
- valid_acc: 0.7201 / lb_acc: / ์ฐจ์ด
"0 Fold Best Valid ACC": 0.7208053691275168,
"1 Fold Best Valid ACC": 0.7194630872483222,
"2 Fold Best Valid ACC": 0.7217741935483871,
"3 Fold Best Valid ACC": 0.7298387096774194,
"4 Fold Best Valid ACC": 0.728494623655914,
"5 Fold Best Valid ACC": 0.6975806451612904,
"6 Fold Best Valid ACC": 0.7271505376344086,
"7 Fold Best Valid ACC": 0.717741935483871,
"8 Fold Best Valid ACC": 0.7190860215053764,
"9 Fold Best Valid ACC": 0.7190860215053764,
"Average ACC": 0.7201021144547882,
"0 Fold Best Valid AUC": 0.7940149733854565,
"1 Fold Best Valid AUC": 0.7981395822423925,
"2 Fold Best Valid AUC": 0.7903639715184154,
"3 Fold Best Valid AUC": 0.7757039071818412,
"4 Fold Best Valid AUC": 0.7903733107741698,
"5 Fold Best Valid AUC": 0.7640702809358264,
"6 Fold Best Valid AUC": 0.7918058500595462,
"7 Fold Best Valid AUC": 0.7672836204902869,
"8 Fold Best Valid AUC": 0.7751429252885609,
"9 Fold Best Valid AUC": 0.7899304424612222,
"Average AUC": 0.7836828864337717
- GRUATTN -> ๋ค๋ฅธ ํ๊ฒฝ์ ๋ค ๋์ผํ๊ฒ ํ ์ํ์์ GRU+ATTN์ผ๋ก ๋ชจ๋ธ๋ง ๋ณ๊ฒฝํด์ ์งํ
- batch size: 32
- layer: 2
- max_seq_len: 100
- hidden_dim: 512
- seed: 42
- lr: 1e-5
- num_workers: 4
- clip_grad: 20
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(3): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ knowledgeTag์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7905 / lb_auc: / ์ฐจ์ด
- valid_acc: 0.7262 / lb_acc: / ์ฐจ์ด
"0 Fold Best Valid ACC": 0.7315436241610739,
"1 Fold Best Valid ACC": 0.738255033557047,
"2 Fold Best Valid ACC": 0.7311827956989247,
"3 Fold Best Valid ACC": 0.7163978494623656,
"4 Fold Best Valid ACC": 0.7352150537634409,
"5 Fold Best Valid ACC": 0.6895161290322581,
"6 Fold Best Valid ACC": 0.739247311827957,
"7 Fold Best Valid ACC": 0.7258064516129032,
"8 Fold Best Valid ACC": 0.7271505376344086,
"9 Fold Best Valid ACC": 0.728494623655914,
"Average ACC": 0.7262809410406293,
"0 Fold Best Valid AUC": 0.7998571902543167,
"1 Fold Best Valid AUC": 0.8169910397393378,
"2 Fold Best Valid AUC": 0.7981349622293707,
"3 Fold Best Valid AUC": 0.7708750496981964,
"4 Fold Best Valid AUC": 0.8003224803511131,
"5 Fold Best Valid AUC": 0.7663180566497301,
"6 Fold Best Valid AUC": 0.79837772678421,
"7 Fold Best Valid AUC": 0.7753093200740054,
"8 Fold Best Valid AUC": 0.7898944051344691,
"9 Fold Best Valid AUC": 0.789293495175848,
"Average AUC": 0.7905373726090598
- GRUATTN -> layer=1, max_seq_len=200
- batch size: 32
- layer: 1
- max_seq_len: 200
- hidden_dim: 512
- seed: 42
- lr: 1e-5
- num_workers: 4
- clip_grad: 20
- scheduler:
ReduceLROnPlateau(optimizer, patience=2, factor=0.85, mode="max", verbose=True)
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(3): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ knowledgeTag์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7963 / lb_auc: / ์ฐจ์ด
- valid_acc: 0.7331 / lb_acc: / ์ฐจ์ด
"0 Fold Best Valid ACC": 0.7395973154362416,
"1 Fold Best Valid ACC": 0.7315436241610739,
"2 Fold Best Valid ACC": 0.7365591397849462,
"3 Fold Best Valid ACC": 0.728494623655914,
"4 Fold Best Valid ACC": 0.7486559139784946,
"5 Fold Best Valid ACC": 0.7029569892473119,
"6 Fold Best Valid ACC": 0.75,
"7 Fold Best Valid ACC": 0.7204301075268817,
"8 Fold Best Valid ACC": 0.7405913978494624,
"9 Fold Best Valid ACC": 0.7325268817204301,
"Average ACC": 0.7331355993360755,
"0 Fold Best Valid AUC": 0.8063341171039915,
"1 Fold Best Valid AUC": 0.8101399313434572,
"2 Fold Best Valid AUC": 0.8134673076227996,
"3 Fold Best Valid AUC": 0.7824664763075144,
"4 Fold Best Valid AUC": 0.8026000882121139,
"5 Fold Best Valid AUC": 0.7773401079799651,
"6 Fold Best Valid AUC": 0.8014349202660702,
"7 Fold Best Valid AUC": 0.7790240518038853,
"8 Fold Best Valid AUC": 0.7973026691433156,
"9 Fold Best Valid AUC": 0.7938389826214723,
"Average AUC": 0.7963948652404584
- GRUATTN -> scheduler: CosineAnnealingWarmupRestarts, patience: 7
- batch size: 32
- layer: 1
- max_seq_len: 200
- hidden_dim: 512
- seed: 42
- lr: 1e-5
- patience: 7
- num_workers: 4
- clip_grad: 20
- scheduler:
CosineAnnealingWarmupRestarts(optimizer, first_cycle=20, warmup_steps=5, cycle_mult=1.0, max_lr=args.lr, min_lr=args.lr/100, gamma=0.5)
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(3): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ knowledgeTag์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7983 / lb_auc: 0.7896 / ์ฐจ์ด: 0.009
- valid_acc: 0.7325 / lb_acc: 0.7231 / ์ฐจ์ด: 0.009
"0 Fold Best Valid ACC": 0.7355704697986577,
"1 Fold Best Valid ACC": 0.7409395973154362,
"2 Fold Best Valid ACC": 0.7432795698924731,
"3 Fold Best Valid ACC": 0.728494623655914,
"4 Fold Best Valid ACC": 0.7365591397849462,
"5 Fold Best Valid ACC": 0.706989247311828,
"6 Fold Best Valid ACC": 0.7432795698924731,
"7 Fold Best Valid ACC": 0.7325268817204301,
"8 Fold Best Valid ACC": 0.7298387096774194,
"9 Fold Best Valid ACC": 0.728494623655914,
"Average ACC": 0.7325972432705492,
"0 Fold Best Valid AUC": 0.8052810755449129,
"1 Fold Best Valid AUC": 0.808067143771455,
"2 Fold Best Valid AUC": 0.8152239129649039,
"3 Fold Best Valid AUC": 0.7842050095781978,
"4 Fold Best Valid AUC": 0.806989002407757,
"5 Fold Best Valid AUC": 0.774456305697497,
"6 Fold Best Valid AUC": 0.8072225288291167,
"7 Fold Best Valid AUC": 0.7838083950046254,
"8 Fold Best Valid AUC": 0.8009815046364891,
"9 Fold Best Valid AUC": 0.7974073350270341,
"Average AUC": 0.798364221346199
1. ํ์ฌ ์งํํ๋ ๋ฐฉ๋ฒ(seed42, kfold=10)์ lb์ val ์ฌ์ด์ ์ ์๋ฏธํ ๊ด๊ณ๊ฐ ์๋์ง ์ดํด๋ณด๊ณ validation ์ ๋ต ๊ณ ๋ฏผ
- ํ์คํ ์ง๊ธ์ ๋ฐฉ๋ฒ์๋ ์ถฉ๋ถํ ์๊ด๊ด๊ณ๊ฐ ์กด์ฌํ๋ ๊ฒ์ ๋ถ๋ช
ํ์ง๋ง! ์์ง์ ์ฐจ์ด๊ฐ 0.009์ ๋ ๋ํ๋๋ ์์ผ๋ก๋ ๊ณ์ ๊ณ ๋ฏผ์ด ํ์ํ๋ค
- ํ์ง๋ง ์ง๊ธ์ validation ์ ๋ต์์์ validation์ ์ ์๊ฐ ์ค๋ฅด๋ฉด lb์์์๋ ์ถฉ๋ถํ ๊ฐ์ด ๋ง์ด ์ฌ๋ผ๊ฐ๋ ๊ด์ฐฎ๋ค! ํ์ง๋ง ๊ณ์ ๊ณ ๋ฏผํ ๊ฒ!
2. discussion์ ์ฌ๋ผ์จ ๋ด๊ฐ ๋งก์ ๋ค๋ฅธ feature ์ถ๊ฐ
3. ์ ๋ต๋ฅ ์ ๊ฐ์ค์น๋ฅผ ๋ฌ์ user์ ์ค๋ ฅ์ ํ์
ํ๋ ๊ฒ์ ์กฐ๊ธ ๋ ๊ณ ๋ฏผ์ด ํ์ํด๋ณด์ธ๋ค@@ ์ถ๊ฐํ๋ฉด ์คํ๋ ค valid score๊ฐ ํ๋ฝํจ
-
LSTM
- batch size: 32
- layer: 2
- max_seq_len: 100
- hidden_dim: 512
- seed: 42
- lr: 1e-4
- categorical feature: "testId", "knowledgeTag", "assessmentItemID", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature: ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ๋ต๋ฅ
- valid_auc: 0.7967 / lb_auc: 0.7559 / 0.04 ์ฐจ์ด -> ๋ ์ ํํ validation์ด ํ์ํฉ๋๋ค ใ
- valid_acc: 0.7209 / lb_acc: 0.6801 / 0.04 ์ฐจ์ด
-
LSTM Ensemble
- ์์ ๋์ผํ ๊ตฌ์กฐ๋ฅผ ๊ฐ์ง๊ณ ์คํ์ ํ์์ผ๋ฉฐ, sklearn์ kfold๋ก 10๋ฑ๋ถ์ ํ์ผ๋ฉฐ soft voting์ ํตํ ์์๋ธ์ ์งํ
- cv_auc(ํ๊ท ): 0.7652 / lb_auc: 0.7557 / 0.01 ์ฐจ์ด
- cv_acc(ํ๊ท ): 0.7117 / lb_acc: 0.6774 / 0.04 ์ฐจ์ด
-
LSTM Ensemble
- kfold: 10
- batch size: 32
- layer: 2
- max_seq_len: 100
- hidden_dim: 512
- seed: 42
- lr: 1e-4
- scheduler: Plateau, patience=2, factor=0.85
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(2): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7671 / lb_auc: 0.7609 / 0.0062 ์ฐจ์ด -> ์ฅ ์ค์ด๋ค์๋ค ใ ใ
- valid_acc: 0.7093 / lb_acc: 0.6989 / 0.01 ์ฐจ์ด
-
LSTM Ensemble with train & test dataset (3๋ฒ๊ณผ ๋์ผํ๋ฐ dataset์ ์ ์ฒด๋ก ๋๋ ค๋ดค๋๋ ์คํ๋ ค ์กฐ๊ธ ๋จ์ด์ง๋ค) ๊ฐ ์๋๋ผ ์๋ชป ํ๋น,, inferenceํ๊ธฐ ์ ์ ํ์ต์ ํตํด ์์ฑํ feature๋ฅผ ๊ฐ์ ธ์์ด์ผ ํ๋๋ฐ ์ค์๋ก test dataset์์ ๋ฝ์๋ด๋ฒ๋ ธ๋ค,, ๋ค์ ์๋ํด๋ด์ผํ ๋ฏ ใ ใ )
- kfold: 10
- batch size: 32
- layer: 2
- max_seq_len: 100
- hidden_dim: 512
- seed: 42
- lr: 1e-4
- scheduler: Plateau, patience=2, factor=0.85
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(2): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7618 / lb_auc: 0.7577 / 0.0041 ์ฐจ์ด -> ์ค๊ธด ํ๋๋ฐ ์ ์๊ฐ ์ค์๋น
- valid_acc: 0.7069 / lb_acc: 0.6694 / 0.027 ์ฐจ์ด
-
LSTM Ensemble
- kfold: 10
- batch size: 32
- layer: 2
- max_seq_len: 100
- hidden_dim: 512
- seed: 42
- lr: 1e-4
- scheduler: Plateau, patience=2, factor=0.85
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(3): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ knowledgeTag์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7691 / lb_auc: 0.7611 / ์ฐจ์ด: 0.008
- valid_acc: 0.7114 / lb_acc: 0.6774 / ์ฐจ์ด: 0.034
-
LSTM Ensemble with train & test dataset (5๋ฒ๊ณผ ๋์ผํ๋ฐ dataset์ ์ ์ฒด๋ก ๋๋ฆฐ ๊ฒ!)
- kfold: 10
- batch size: 32
- layer: 2
- max_seq_len: 100
- hidden_dim: 512
- seed: 42
- lr: 1e-4
- scheduler: Plateau, patience=2, factor=0.85
- categorical feature(4): "testId", "knowledgeTag", "assessmentItemID_post3", ๊ฐ์ ์ํ์ง๋ฅผ ๋ช๋ฒ์งธ ๋ง๋๋์ง
- continuous feature(3): ํ๊ณ ์๋ ๋ฌธ์ ์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ ์ํ์ง์ ์ ์ฒด ์ ๋ต๋ฅ , ํ๊ณ ์๋ knowledgeTag์ ์ ์ฒด ์ ๋ต๋ฅ
- valid_auc: 0.7619 / lb_auc: 0.7612 / ์ฐจ์ด: 0.0007
- valid_acc: 0.7047 / lb_acc: 0.6828 / ์ฐจ์ด: 0.0219
- fold๋ณ ์ ์ -> ์๊ฐ๋ณด๋ค ๊ต์ฅํ ๋น์ทํ๊ฒ ๋์ค๋ค์!
"0 Fold Best Valid ACC": 0.7181208053691275,
"1 Fold Best Valid ACC": 0.7087248322147651,
"2 Fold Best Valid ACC": 0.7056451612903226,
"3 Fold Best Valid ACC": 0.6989247311827957,
"4 Fold Best Valid ACC": 0.7056451612903226,
"5 Fold Best Valid ACC": 0.6895161290322581,
"6 Fold Best Valid ACC": 0.7150537634408602,
"7 Fold Best Valid ACC": 0.6908602150537635,
"8 Fold Best Valid ACC": 0.7083333333333334,
"9 Fold Best Valid ACC": 0.706989247311828,
"Average ACC": 0.7047813379519376,
"0 Fold Best Valid AUC": 0.7751395640696451,
"1 Fold Best Valid AUC": 0.777768080525979,
"2 Fold Best Valid AUC": 0.7647522319008204,
"3 Fold Best Valid AUC": 0.7428344236816424,
"4 Fold Best Valid AUC": 0.7715884687967723,
"5 Fold Best Valid AUC": 0.7452858144392487,
"6 Fold Best Valid AUC": 0.773194004705609,
"7 Fold Best Valid AUC": 0.7403771103145236,
"8 Fold Best Valid AUC": 0.765053231087244,
"9 Fold Best Valid AUC": 0.763251036848848,
"Average AUC": 0.7619243966370333
1. <s>์ ๋ผ๊ฐ ๋ง๋ค์ด์ค train + test dataset ํ์ฉํด์ ๋ ์ ํํ feature ๋ง๋ค๊ธฐ</s>
2. ๋ ์ ํํ validation์ ์ํด ๋ค์ํ ๋ฐฉ๋ฒ์ ์๊ฐํด๋ณด์. (๋ง์ง๋ง ๋ฌธ์ ๋ฅผ ๋ง์ถ ์ฌ๋์ ๊ธฐ์ค์ผ๋ก ๋๋๋ ๊ฒ๋ ๋๋ ๊ด์ฐฎ์ ๋ณด์ด๋๋ฐ!)
3. testid, knowledgeTag, assessmentItem์ ์กฐ๊ธ ๋ ๊ฐ๋ณ๊ฒ ๋ง๋ค์ด์ฃผ๊ธฐ / ๋๋ categorical ์ ์ฒ๋ฆฌ์ ๋ํด์ ์๊ฐ
- assessmentItem์ ๊ตฌ์กฐ๊ฐ ์ํ์ง id + ๋ฌธํญ๋ฒํธ ์ด๋ฏ๋ก ๊ตณ์ด ์์ ๊ฒ์ ์ด๋ฆดํ์ ์๋ค -> ์ฆ, ๋ค์ 3์๋ฆฌ ๋ฌธํญ๋ฒํธ๋ง ์ ๊ณตํด์ ๊ฐ๋ณ๊ฒ ๋ง๋ค์ด์ฃผ์.
4. discussion์ ์ฌ๋ผ์จ ๋ด๊ฐ ๋งก์ ๋ค๋ฅธ feature ์ถ๊ฐ