Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

surprising behavior of replacement using number capture groups when combined with multiple search patterns #2935

Open
1 task done
hugues-aff opened this issue Nov 16, 2024 · 3 comments
Labels
question An issue that is lacking clarity on one or more points.

Comments

@hugues-aff
Copy link

hugues-aff commented Nov 16, 2024

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 14.1.1 (rev 4649aa9700)

features:+pcre2
simd(compile):+NEON
simd(runtime):+NEON

How did you install ripgrep?

homebrew

What operating system are you using ripgrep on?

macOS 14.7

Describe your bug.

ripgrep behavior when combining multiple patterns and replacement strings using numbered capture groups is very surprising.

What are the steps to reproduce the behavior?

Run rg -e '(foo)' -e '(bar)' -r '$1' -o in a directory with files containing strings foo and bar

What is the actual behavior?

Lines that contain foo get printed as foo, as expected
Lines that contain bar get printed as empty, which is rather surprising since neither of the patterns allow empty string

If I switch the replacement to -r '$2', I see:

Lines that contain foo get printed as empty
Lines that contain bar get printed as bar

What is the expected behavior?

I would expect the replacement index to reference whichever pattern actually matched, resulting in:

Lines that contain foo get printed as foo
Lines that contain bar get printed as bar

As a workaround, I can achieve this behavior right now using -r '$1$2' which works but is rather unintuitive

My preferred solution would be to allow a 1:1 mapping for -r to -e, but failing that, it seems like the indexing into capture groups when given multiple patterns should either be changed to be more intuitive, or at least documented if we're worried about breaking backwards compatibility

@BurntSushi
Copy link
Owner

Please provide an MRE.

@hugues-aff
Copy link
Author

hugues.bruant@HBruant-M-C600F bugs % cat - >test <<EOF
heredoc> foo
heredoc> bar
heredoc> baz
heredoc> qux
heredoc> EOF
hugues.bruant@HBruant-M-C600F bugs % cat test
foo
bar
baz
qux
hugues.bruant@HBruant-M-C600F bugs % rg -e '(foo)' -e '(bar)' -r '$1' -o
test
1:foo
2:
hugues.bruant@HBruant-M-C600F bugs % rg -e '(foo)' -e '(bar)' -r '$2' -o
test
1:
2:bar
hugues.bruant@HBruant-M-C600F bugs % rg -e '(foo)' -e '(bar)' -r '$1$2' -o
test
1:foo
2:bar
hugues.bruant@HBruant-M-C600F bugs %

@BurntSushi
Copy link
Owner

Yeah the issue here is that ripgrep doesn't really provide multi pattern support everywhere. When you provide more than one pattern, it "just" stitches them all together into one regex.

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

2 participants