Reproduce the results of the code in the paper #30

hsb1995 · 2024-06-27T12:02:36Z

How do you reproduce the results of the code in the paper in your code? According to your example, I get the following result when I execute "generate".
But I still can't feel the quantisation in the paper on the downstream task.

ridgerchu · 2024-06-27T23:14:27Z

Hi, To properly evaluate the quantized model on downstream tasks, you should use the lm-evaluation-harness framework. This framework is designed to be compatible with Hugging Face Transformers and provides a standardized way to run various benchmarks.

hsb1995 · 2024-06-27T23:34:04Z

Thank you Professor, I know the framework. But this framework is also needed to quantise the model after I get it and then do the inference task.
In the project you provided, I checked the code carefully and didn't find the final quantised generated code and weights~

ridgerchu · 2024-06-27T23:35:59Z

Hi, it actually will be quantized when you try to inference it. If you check our code closely you will find it will be quantized during the inference.

hsb1995 · 2024-06-28T03:00:45Z

Dear Professor, I followed what you said about framing this lm-evaluation-harness , and tried to rewrite this code.
The original article runs the code through lm_eval inside and I want to rewrite this to run as a py file. So, would really like to reproduce the results of your code to learn. Could you please help me with my reasoning code, it still doesn't work. I would be grateful if you could provide me with your reasoning code!

import argparse
import fnmatch
import json
import logging
import os
import sys

import torch

sys.path.append("./lm-evaluation-harness")
import lm_eval.models
from lm_eval import evaluator, tasks, utils

try:
import wandb

wandb_installed = True

except ModuleNotFoundError:
wandb_installed = False

logging.getLogger("openai").setLevel(logging.WARNING)

class MultiChoice:
def init(self, choices):
self.choices = choices

# Simple wildcard support (linux filename patterns)
def __contains__(self, values):
    for value in values.split(","):
        if len(fnmatch.filter(self.choices, value)) == 0:
            return False

    return True

def __iter__(self):
    for choice in self.choices:
        yield choice

def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--model", required=True)
parser.add_argument("--model_args", default="")
parser.add_argument("--tasks", default=None, choices=MultiChoice(tasks.ALL_TASKS))
parser.add_argument("--provide_description", action="store_true")
parser.add_argument("--num_fewshot", type=int, default=0)
parser.add_argument("--batch_size", type=int, default=None)
parser.add_argument("--exp_name", type=str, default=None, help="Experiment name if not load.")
parser.add_argument("--device", type=str, default="cuda:0")
parser.add_argument("--output_path", default=None)
parser.add_argument("--limit", type=int, default=None)
parser.add_argument("--decontamination_ngrams_path", default=None)
parser.add_argument("--description_dict_path", default=None)
parser.add_argument("--check_integrity", action="store_true")
parser.add_argument("--log_wandb", action="store_true")
parser.add_argument(
"--model_seqlen",
type=int,
default=4096,
choices=[2048, 4096],
help="Model seqlen and calibration data context length.",
)
parser.add_argument("--load_weights", type=str, default=None, help="Path to load model weights.")

return parser.parse_args()

Returns a list containing all values of the source_list that

match at least one of the patterns

def pattern_match(patterns, source_list):
task_names = set()
for pattern in patterns:
for matching in fnmatch.filter(source_list, pattern):
task_names.add(matching)
return list(task_names)

def main():
args = parse_args()

assert not args.provide_description  # not implemented
if args.log_wandb:
    assert args.exp_name or args.load_weights
    wandb.init(
        config={a: getattr(args, a) for a in dir(args) if not a.startswith("_")},
    )

if args.limit:
    print("WARNING: --limit SHOULD ONLY BE USED FOR TESTING. REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.")

if args.tasks is None:
    task_names = tasks.ALL_TASKS
else:
    task_names = pattern_match(args.tasks.split(","), tasks.ALL_TASKS)

print(f"Selected Tasks: {task_names}")

description_dict = {}
if args.description_dict_path:
    with open(args.description_dict_path, "r") as f:
        description_dict = json.load(f)

if args.model_args is None:
    args.model_args = ""

lm = lm_eval.models.get_model(args.model).create_from_arg_string(
    args.model_args, dict(batch_size=args.batch_size, device=args.device)
)
print("lm.device", lm.device)
if hasattr(lm.model, "hf_device_map"):
    print("Model device map:\n", lm.model.hf_device_map)

if args.load_weights:
    print("Loading model weights ...")
    lm.model.load_state_dict(torch.load(args.load_weights))

results = evaluator.simple_evaluate(
    model=lm,
    model_args=args.model_args,
    tasks=task_names,
    num_fewshot=args.num_fewshot,
    batch_size=args.batch_size,
    device=args.device,
    no_cache=True,
    limit=args.limit,
    description_dict=description_dict,
    decontamination_ngrams_path=args.decontamination_ngrams_path,
    check_integrity=args.check_integrity,
    log_wandb=args.log_wandb,
)
if not isinstance(results["config"]["model"], str):
    results["config"]["model"] = results["config"]["model"].model.config._name_or_path
dumped = json.dumps(results, indent=2)
print(dumped)

if args.output_path:
    with open(args.output_path, "w") as f:
        f.write(dumped)

print(
    f"{args.model} ({args.model_args}), limit: {args.limit}, provide_description: {args.provide_description}, "
    f"num_fewshot: {args.num_fewshot}, batch_size: {args.batch_size}"
)
print(evaluator.make_table(results))

if name == "main":
main()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce the results of the code in the paper #30

Reproduce the results of the code in the paper #30

hsb1995 commented Jun 27, 2024

ridgerchu commented Jun 27, 2024

hsb1995 commented Jun 27, 2024

ridgerchu commented Jun 27, 2024

hsb1995 commented Jun 28, 2024

Reproduce the results of the code in the paper #30

Reproduce the results of the code in the paper #30

Comments

hsb1995 commented Jun 27, 2024

ridgerchu commented Jun 27, 2024

hsb1995 commented Jun 27, 2024

ridgerchu commented Jun 27, 2024

hsb1995 commented Jun 28, 2024

Returns a list containing all values of the source_list that

match at least one of the patterns