Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define edit block dividers as constants to simplify experimentation #1817

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

akaihola
Copy link
Contributor

@akaihola akaihola commented Sep 29, 2024

This stlil doesn't make the dividers (<<<<<<< SEARCH, =======, and >>>>>>> REPLACE) user configurable, but at least it's now more straightforward to try out and evaluate different divider strings in this branch.

The patch in this PR should make it easier to fix:

Note that regular expression matchers are now the exact divider strings instead of the old

HEAD = r"<{5,9} SEARCH"
DIVIDER = r"={5,9}"
UPDATED = r">{5,9} REPLACE"

Maybe we could have them configured separately as well.

@akaihola akaihola mentioned this pull request Sep 29, 2024
@paul-gauthier
Copy link
Collaborator

The flexibility is needed for aider to work with the o1 models. They don't conform to the edit format and return variable numbers of the prefix characters.

@akaihola
Copy link
Contributor Author

akaihola commented Oct 5, 2024

The flexibility is needed for aider to work with the o1 models. They don't conform to the edit format and return variable numbers of the prefix characters.

Ok, I understand.

But I believe there's a bug which makes the regex built from DIVIDER = r"={5,9}" match not only lines like:

=====
======
=======
========
=========

but also lines with any number of equal signs, e.g. here:

This is a long reStructuredText heading with way more than nine equal signs
===========================================================================

or, as a matter of fact, lines with any trailing junk (like ========junkjunkjunk).

This is due to

DIVIDER = r"={5,9}"
divider_pattern = re.compile(DIVIDER)

and subsequent

divider_pattern.match(lines[i].strip())

only checking that the beginning of the line matches (in find_original_update_blocks()).

So what happes is:

>>> lines = [
...     "This is a long reStructuredText heading with way more than nine equal signs\n",
...     "===========================================================================\n",
... ]
>>> re.compile(r"={5,9}").match(lines[1].strip())
<re.Match object; span=(0, 9), match='========='>

This is the reason for #1803, and the easy fix is:

divider_pattern = re.compile(fr"{DIVIDER}[ ]*$")

although that will still cause problems with reStructuredText files which have 5-to-9-character titles. Thus, I experimented (successfully) with pseudo-XML separators (see #1803 comment), although something like ======= DIVIDER should help just as well.

@paul-gauthier
Copy link
Collaborator

I will apply the fix so that it doesn't match =====whatever.

@paul-gauthier
Copy link
Collaborator

That change is in the main branch now.

@CLAassistant
Copy link

CLAassistant commented Oct 16, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants