Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few small fixes for the untranslated datasets #163

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rowan-walshe
Copy link

Hi, while working on #162, I was digging into some translation issues and ran across a few things in the untranslated datasets that I thought should probably be updated.

I've included two commits. The first makes a few mionr changes to doctests, one of which is just a change in whitespacing, while the other fixes an issue with the doctest that causes a translation failure. The second commit includes a few typehint changes, mostly for cases where Any was used, but a more descriptive type could have been used instead.

While I think most of the changes are fine (totally unbiased, of course), I suspect the two most objectional changes will be:

  1. Increasing the use of Tuple[T, ...]
    This indicates that a Tuple could have any length, with all elements being of type T. These changes generally replace instances of Any with either Tuple[Any, ...] or Tuple[some_type, ...]

    However there are some translators, like the one for typescript, that support Any but do not support the ellipsis notation in Tuples, hence why maybe this change could be objectional.

  2. MBPP_595
    I'm convinced it is impossible to pass this problem as it is currently written without cheating. It's typed to just return Any. The docstring doesn't include any guidance about the expected output format. Most logically, this would be either int | None, or int and -1 is used to represent that it's impossible. I don't believe anyone would guess that the expected output format is int | 'Not Possible'. So, while I would be very surprised if any of the translators support translating this typehint (Union[int, Literal['Not Possible']]), without this change I'm not sure how any model is expected to generate a correct solution for this problem

For completeness here are the changes in translation ratio that I observed:
Difference Before After Lang prompt-terminology doctests originals
-0.01 0.97 0.96 ts reworded transform mbpp-typed
-0.01 0.97 0.96 ts verbatim keep mbpp-typed
-0.01 0.97 0.96 jl reworded transform mbpp-typed
-0.01 0.97 0.96 jl verbatim keep mbpp-typed
-0.01 0.94 0.93 go reworded transform mbpp-typed
-0.01 0.94 0.93 go verbatim keep mbpp-typed
0.01 0.98 0.99 swift reworded transform originals-with-cleaned-doctests
0.01 0.98 0.99 swift verbatim transform originals-with-cleaned-doctests
0.01 0.89 0.9 ml reworded transform mbpp-typed
0.01 0.89 0.9 ml verbatim keep mbpp-typed
0.01 0.89 0.9 hs reworded transform mbpp-typed
0.01 0.89 0.9 hs verbatim keep mbpp-typed
0.01 0.9 0.91 ada reworded transform mbpp-typed
0.01 0.97 0.98 ada reworded transform originals-with-cleaned-doctests
0.01 0.9 0.91 ada verbatim keep mbpp-typed
0.01 0.97 0.98 ada verbatim transform originals-with-cleaned-doctests
Full summary of changes
  • HumanEval 69 (cleaned-doctests): Removed an extra whitespace that was in one of the doctests
  • HumanEval 142 (cleaned-doctests): Fixed the doctests, which were causing a translation failure
  • MBPP 222 (typed): Changed Any to Tuple[Any, ...]
  • MBPP 262 (typed): Changed Any to Tuple[List[Any], List[Any]] to match the docstring and tests
  • MBPP 407 (typed): Changed Any to Union[int, bool].
    • Could be Union[int, Literal[False]] to be more accurate, but existing translators won't handle this
  • MBPP 413 (typed): Changed List[Any] to List[Union[str, int]], as the input was restricted to only contain strings and integers.
    • Note that based on the docstring, you could also change the input typehint to take List[Tuple[Any, ...]], and leave the return type as List[Any]
  • MBPP 446 (typed): Changed Any to Tuple[Any, ...]
  • MBPP 587 (typed): Changed Any to Tuple[int, ...]
  • MBPP 595 (typed): Changed Any to Union[int, Literal['Not Possible']
    • Note that translators almost certainly don't currently support this typehint. That said, without the typehint, this problem should have been impossible without cheating. How are you supposed to know to return that string 🤦
  • MBPP 725 (typed): Changed List[Any] to List[str]
  • MBPP 726 (typed): Changed List[Any] to List[int]
  • MBPP 744 (typed): Changed Any to Tuple[Any, ...]
  • MBPP 754 (typed): Changed List[Any] to List[int]
There are also a number of other things that I noticed but didn't change, mostly because I wasn't confident about the solution, but that maybe should still be looked into. Let me know if you'd like me to raise an issue to capture these.
  • MBPP 115 - Docstring states it takes a list of dictionaries. Test cases contain both list of dictionaries and dictionary not in a list.
  • MBPP 117 - Docstring states it takes a list of list. Test cases contain list of tuples
  • MBPP 401 - Docstring and test cases are list of lists. Function and parameter names indicate they should be tuples of tuples.
  • MBPP 417 - Function name implies tuples, everything else implies or uses lists
  • MBPP 431 - Bizarre tests that expect the return value to be True or None, not True or False. Typehint could be updated to Optional[Literal[True]], though the existing translators wouldn't support this.
  • MBPP 444 - Function name implies tuples, everything else implies or uses lists
  • MBPP 582 - Function name, argument name, and docstring state that the function should check if a dictionary is empty. But two out of 3 test cases use a set, not a dictionary, and the version with typehints specifies that it takes a Set not a Dict.
  • MBPP 756 - Duplicate of 434. Should it be removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant