Skip to content

Commit

Permalink
5 inputoutput result of tables (#7)
Browse files Browse the repository at this point in the history
  • Loading branch information
ledong0110 authored Sep 5, 2024
1 parent 69b1b7c commit d75f69c
Show file tree
Hide file tree
Showing 290 changed files with 47,731 additions and 31,121 deletions.
2 changes: 1 addition & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ title: MELT
description: "Multilingual Evaluation Toolkits"

# disabled because we are using a custom domain
baseurl: https://ai.stanford.edu/~sttruong/melt
# baseurl: https://ai.stanford.edu/~sttruong/melt

color-primary: "#B1040E"
color-light: "#E50808"
Expand Down
1 change: 1 addition & 0 deletions _data/categories.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
- zero-shot
- few-shot
- weaker-prompt
- medium-prompt
- fairness-aware
- robustness-aware
- chain-of-thought
Expand Down
2 changes: 2 additions & 0 deletions _data/lang_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ vi:
zero-shot: true
few-shot: false
weaker-prompt: true
medium-prompt: true
fairness-aware: true
robustness-aware: true
chain-of-thought: false
Expand All @@ -12,6 +13,7 @@ vi:
zero-shot: true
few-shot: false
weaker-prompt: true
medium-prompt: true
fairness-aware: false
robustness-aware: true
chain-of-thought: false
Expand Down
146 changes: 146 additions & 0 deletions _data/leaderboard/vi/bias_toxicity/question_answering.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
XQuAD:
URA-LLaMa 70B:
DRR: null
DRG: 0.39
DRG_std: 0.01
SAR: null
SAG: 0.41
SAG_std: 0.00
Tox: 0.02
Tox_std: 0.00
URA-LLaMa 13B:
DRR: null
DRG: 0.39
DRG_std: 0.01
SAR: null
SAG: 0.45
SAG_std: 0.01
Tox: 0.02
Tox_std: 0.00
URA-LLaMa 7B:
DRR: null
DRG: 0.43
DRG_std: 0.01
SAR: null
SAG: 0.48
SAG_std: 0.00
Tox: 0.03
Tox_std: 0.00
LLaMa-2 13B:
DRR: null
DRG: 0.35
DRG_std: 0.03
SAR: null
SAG: 0.46
SAG_std: 0.00
Tox: 0.01
Tox_std: 0.00
LLaMa-2 7B:
DRR: null
DRG: 0.46
DRG_std: 0.01
SAR: null
SAG: 0.42
SAG_std: 0.00
Tox: 0.01
Tox_std: 0.00
Vietcuna 7B:
DRR: null
DRG: 0.50
DRG_std: 0.00
SAR: null
SAG: null
SAG_std: null
Tox: 0.04
Tox_std: 0.00
GPT-3.5:
DRR: null
DRG: 0.43
DRG_std: 0.01
SAR: null
SAG: 0.48
SAG_std: 0.00
Tox: 0.02
Tox_std: 0.00
GPT-4:
DRR: null
DRG: 0.40
DRG_std: 0.01
SAR: null
SAG: 0.45
SAG_std: 0.00
Tox: 0.02
Tox_std: 0.00
MLQA:
URA-LLaMa 70B:
DRR: null
DRG: 0.14
DRG_std: 0.02
SAR: null
SAG: 0.42
SAG_std: 0.03
Tox: 0.02
Tox_std: 0.00
URA-LLaMa 13B:
DRR: null
DRG: 0.17
DRG_std: 0.1
SAR: null
SAG: 0.38
SAG_std: 0.00
Tox: 0.02
Tox_std: 0.00
URA-LLaMa 7B:
DRR: null
DRG: 0.18
DRG_std: 0.01
SAR: null
SAG: 0.37
SAG_std: 0.01
Tox: 0.02
Tox_std: 0.00
LLaMa-2 13B:
DRR: null
DRG: 0.27
DRG_std: 0.01
SAR: null
SAG: 0.43
SAG_std: 0.00
Tox: 0.01
Tox_std: 0.00
LLaMa-2 7B:
DRR: null
DRG: 0.21
DRG_std: 0.06
SAR: null
SAG: 0.45
SAG_std: 0.00
Tox: 0.01
Tox_std: 0.00
Vietcuna 7B:
DRR: null
DRG: 0.23
DRG_std: 0.09
SAR: null
SAG: 0.49
SAG_std: 0.01
Tox: 0.04
Tox_std: 0.00
GPT-3.5:
DRR: null
DRG: 0.18
DRG_std: 0.01
SAR: null
SAG: 0.40
SAG_std: 0.00
Tox: 0.02
Tox_std: 0.00
GPT-4:
DRR: null
DRG: 0.16
DRG_std: 0.01
SAR: null
SAG: 0.41
SAG_std: 0.01
Tox: 0.02
Tox_std: 0.00
146 changes: 146 additions & 0 deletions _data/leaderboard/vi/bias_toxicity/summarization.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
VietNews:
URA-LLaMa 70B:
DRR: null
DRG: 0.21
DRG_std: 0.01
SAR: null
SAG: 0.31
SAG_std: 0.01
Tox: 0.05
Tox_std: 0.00
URA-LLaMa 13B:
DRR: null
DRG: 0.20
DRG_std: 0.01
SAR: null
SAG: 0.29
SAG_std: 0.01
Tox: 0.04
Tox_std: 0.00
URA-LLaMa 7B:
DRR: null
DRG: 0.24
DRG_std: 0.02
SAR: null
SAG: 0.33
SAG_std: 0.01
Tox: 0.04
Tox_std: 0.00
LLaMa-2 13B:
DRR: null
DRG: 0.26
DRG_std: 0.01
SAR: null
SAG: 0.38
SAG_std: 0.01
Tox: 0.01
Tox_std: 0.00
LLaMa-2 7B:
DRR: null
DRG: 0.28
DRG_std: 0.02
SAR: null
SAG: 0.39
SAG_std: 0.01
Tox: 0.01
Tox_std: 0.00
Vietcuna 7B:
DRR: null
DRG: 0.21
DRG_std: 0.02
SAR: null
SAG: 0.32
SAG_std: 0.02
Tox: 0.04
Tox_std: 0.00
GPT-3.5:
DRR: null
DRG: 0.22
DRG_std: 0.01
SAR: null
SAG: 0.29
SAG_std: 0.01
Tox: 0.04
Tox_std: 0.00
GPT-4:
DRR: null
DRG: 0.19
DRG_std: 0.01
SAR: null
SAG: 0.28
SAG_std: 0.01
Tox: 0.06
Tox_std: 0.00
WikiLingua:
URA-LLaMa 70B:
DRR: null
DRG: 0.03
DRG_std: 0.02
SAR: null
SAG: 0.25
SAG_std: 0.02
Tox: 0.03
Tox_std: 0.00
URA-LLaMa 13B:
DRR: null
DRG: 0.07
DRG_std: 0.04
SAR: null
SAG: 0.31
SAG_std: 0.03
Tox: 0.02
Tox_std: 0.00
URA-LLaMa 7B:
DRR: null
DRG: 0.07
DRG_std: 0.02
SAR: null
SAG: 0.38
SAG_std: 0.02
Tox: 0.03
Tox_std: 0.00
LLaMa-2 13B:
DRR: null
DRG: 0.17
DRG_std: 0.08
SAR: null
SAG: 0.50
SAG_std: 0.02
Tox: 0.01
Tox_std: 0.00
LLaMa-2 7B:
DRR: null
DRG: 0.39
DRG_std: 0.05
SAR: null
SAG: 0.50
SAG_std: 0.02
Tox: 0.01
Tox_std: 0.00
Vietcuna 7B:
DRR: null
DRG: 0.17
DRG_std: 0.04
SAR: null
SAG: 0.39
SAG_std: 0.03
Tox: 0.03
Tox_std: 0.00
GPT-3.5:
DRR: null
DRG: 0.03
DRG_std: 0.02
SAR: null
SAG: 0.28
SAG_std: 0.01
Tox: 0.02
Tox_std: 0.00
GPT-4:
DRR: null
DRG: 0.09
DRG_std: 0.02
SAR: null
SAG: 0.28
SAG_std: 0.01
Tox: 0.02
Tox_std: 0.00
Loading

0 comments on commit d75f69c

Please sign in to comment.