The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

This is the official repository of the COLM 2024 paper: The Larger the Better? Improved LLM Code-Generation via Budget Reallocation by Michael Hassid*, Tal Remez*, Jonas Gehring, Roy Schwartz and Yossi Adi.

Data

We release the Code Llama 7B generations for the HumanEval and MBPP benchmarks.

Below we present an example of the data format (json file):

{
    "HumanEval/0": {
        "task_id": "HumanEval/0",
        "prompt": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n",
        "samples": [
            {
                "pass_at_1": 100.0,
                "generation": "   for i in range(len(numbers)):\n        for j in range(i + 1, len(numbers)):\n            if abs(numbers[i] - numbers[j]) <= threshold:\n                return True\n\n    return False\n"
            },
            {
                "pass_at_1": 100.0,
                "generation": "   for idx, i in enumerate(numbers):\n        for j in numbers[idx + 1:]:\n            if abs(i - j) <= threshold:\n                return True\n    return False\n"
            },
]
}
}

rank-score@k code

We also provide the python function for calculating the rank-score@k metric. Full details at the paper.

def rank_score_at_k(n, k, pass_sorted):
    """
    :param n: total number of samples
    :param k: k in rank-score@k
    :param pass_sorted: a binary list of pass scores. The list is sorted by the ranks assigned to examples by a ranker.
    """
    numerator_sum = 0
    for i in range(1, n-k+2):
        numerator_sum += math.comb(n-i, k-1) * scores_and_pass[i-1]
    score = (numerator_sum / math.comb(n, k)) * 100
    return score

Citation

@article{hassid2024larger,
  title={The Larger the Better? Improved LLM Code-Generation via Budget Reallocation},
  author={Hassid, Michael and Remez, Tal and Gehring, Jonas and Schwartz, Roy and Adi, Yossi},
  journal={arXiv preprint arXiv:2404.00725},
  volume={arXiv:2404.00725}
  url={http://arxiv.org/abs/2404.00725},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
fig_1.png		fig_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

Data

rank-score@k code

Citation

About

Releases

Packages

License

slp-rl/budget-realloc

Folders and files

Latest commit

History

Repository files navigation

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

Data

rank-score@k code

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages