A few small fixes for the untranslated datasets #163

rowan-walshe · 2024-11-20T18:52:42Z

Hi, while working on #162, I was digging into some translation issues and ran across a few things in the untranslated datasets that I thought should probably be updated.

I've included two commits. The first makes a few mionr changes to doctests, one of which is just a change in whitespacing, while the other fixes an issue with the doctest that causes a translation failure. The second commit includes a few typehint changes, mostly for cases where Any was used, but a more descriptive type could have been used instead.

While I think most of the changes are fine (totally unbiased, of course), I suspect the two most objectional changes will be:

Increasing the use of Tuple[T, ...]
This indicates that a Tuple could have any length, with all elements being of type T. These changes generally replace instances of Any with either Tuple[Any, ...] or Tuple[some_type, ...]

However there are some translators, like the one for typescript, that support Any but do not support the ellipsis notation in Tuples, hence why maybe this change could be objectional.
MBPP_595
I'm convinced it is impossible to pass this problem as it is currently written without cheating. It's typed to just return Any. The docstring doesn't include any guidance about the expected output format. Most logically, this would be either int | None, or int and -1 is used to represent that it's impossible. I don't believe anyone would guess that the expected output format is int | 'Not Possible'. So, while I would be very surprised if any of the translators support translating this typehint (Union[int, Literal['Not Possible']]), without this change I'm not sure how any model is expected to generate a correct solution for this problem

For completeness here are the changes in translation ratio that I observed:

Difference	Before	After	Lang	prompt-terminology	doctests	originals
-0.01	0.97	0.96	ts	reworded	transform	mbpp-typed
-0.01	0.97	0.96	ts	verbatim	keep	mbpp-typed
-0.01	0.97	0.96	jl	reworded	transform	mbpp-typed
-0.01	0.97	0.96	jl	verbatim	keep	mbpp-typed
-0.01	0.94	0.93	go	reworded	transform	mbpp-typed
-0.01	0.94	0.93	go	verbatim	keep	mbpp-typed
0.01	0.98	0.99	swift	reworded	transform	originals-with-cleaned-doctests
0.01	0.98	0.99	swift	verbatim	transform	originals-with-cleaned-doctests
0.01	0.89	0.9	ml	reworded	transform	mbpp-typed
0.01	0.89	0.9	ml	verbatim	keep	mbpp-typed
0.01	0.89	0.9	hs	reworded	transform	mbpp-typed
0.01	0.89	0.9	hs	verbatim	keep	mbpp-typed
0.01	0.9	0.91	ada	reworded	transform	mbpp-typed
0.01	0.97	0.98	ada	reworded	transform	originals-with-cleaned-doctests
0.01	0.9	0.91	ada	verbatim	keep	mbpp-typed
0.01	0.97	0.98	ada	verbatim	transform	originals-with-cleaned-doctests

Full summary of changes

HumanEval 69 (cleaned-doctests): Removed an extra whitespace that was in one of the doctests
HumanEval 142 (cleaned-doctests): Fixed the doctests, which were causing a translation failure
MBPP 222 (typed): Changed Any to Tuple[Any, ...]
MBPP 262 (typed): Changed Any to Tuple[List[Any], List[Any]] to match the docstring and tests
MBPP 407 (typed): Changed Any to Union[int, bool].
- Could be Union[int, Literal[False]] to be more accurate, but existing translators won't handle this
MBPP 413 (typed): Changed List[Any] to List[Union[str, int]], as the input was restricted to only contain strings and integers.
- Note that based on the docstring, you could also change the input typehint to take List[Tuple[Any, ...]], and leave the return type as List[Any]
MBPP 446 (typed): Changed Any to Tuple[Any, ...]
MBPP 587 (typed): Changed Any to Tuple[int, ...]
MBPP 595 (typed): Changed Any to Union[int, Literal['Not Possible']
- Note that translators almost certainly don't currently support this typehint. That said, without the typehint, this problem should have been impossible without cheating. How are you supposed to know to return that string 🤦
MBPP 725 (typed): Changed List[Any] to List[str]
MBPP 726 (typed): Changed List[Any] to List[int]
MBPP 744 (typed): Changed Any to Tuple[Any, ...]
MBPP 754 (typed): Changed List[Any] to List[int]

There are also a number of other things that I noticed but didn't change, mostly because I wasn't confident about the solution, but that maybe should still be looked into. Let me know if you'd like me to raise an issue to capture these.

MBPP 115 - Docstring states it takes a list of dictionaries. Test cases contain both list of dictionaries and dictionary not in a list.
MBPP 117 - Docstring states it takes a list of list. Test cases contain list of tuples
MBPP 401 - Docstring and test cases are list of lists. Function and parameter names indicate they should be tuples of tuples.
MBPP 417 - Function name implies tuples, everything else implies or uses lists
MBPP 431 - Bizarre tests that expect the return value to be True or None, not True or False. Typehint could be updated to Optional[Literal[True]], though the existing translators wouldn't support this.
MBPP 444 - Function name implies tuples, everything else implies or uses lists
MBPP 582 - Function name, argument name, and docstring state that the function should check if a dictionary is empty. But two out of 3 test cases use a set, not a dictionary, and the version with typehints specifies that it takes a Set not a Dict.
MBPP 756 - Duplicate of 434. Should it be removed?

Fix imports for modified files

rowan-walshe added 2 commits November 19, 2024 16:16

Fix doc tests and remove trailing whitespaces from modified files

3c7ecb7

Improve typehints for a number of problems that are using Any

ce858d2

Fix imports for modified files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few small fixes for the untranslated datasets #163

A few small fixes for the untranslated datasets #163

rowan-walshe commented Nov 20, 2024

A few small fixes for the untranslated datasets #163

Are you sure you want to change the base?

A few small fixes for the untranslated datasets #163

Conversation

rowan-walshe commented Nov 20, 2024