Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CompactStr #9229

Closed
wants to merge 8 commits into from
Closed

Use CompactStr #9229

wants to merge 8 commits into from

Conversation

MichaReiser
Copy link
Member

Summary

To get a comparison to SmolStr which seems to optimize for

  • O(1) clone
  • Storing strings longer than 23 that soley consist of Whitespace (not a use case for us)

I found the API a bit clumsy but maybe it's just me being confused by Deref.

Test Plan

cargo test

@@ -844,7 +844,7 @@ where
range: _,
}) => {
if let Expr::Name(ast::ExprName { id, ctx, range: _ }) = func.as_ref() {
if id == "locals" && ctx.is_load() {
if *id == "locals" && ctx.is_load() {
Copy link
Member Author

@MichaReiser MichaReiser Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this annoying. Not sure why &CompactStr == &'static str doesn't work.

@BurntSushi probably knows why

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure off the top of my head, so let's run through it. x == y is syntactic sugar for std::cmp::PartialEq::eq(&x, &y). If id is a &CompactString and y is a &'static str, then &x is actually an &&CompactString and y is actually a &&str. So I believe this means we need to look for a PartialEq impl for &&CompactString (notice the double & there).

The only PartialEq impl for CompactString is impl<T: AsRef<str>> PartialEq<T> for CompactString, which on its own does not fit here because we're looking for &&CompactString.

At least, I think that's why. It's possible the compact-str crate could add more impls to alleviate this. But maybe not.

Auto-deref also comes into the picture here. See:

Copy link

codspeed-hq bot commented Dec 21, 2023

CodSpeed Performance Report

Merging #9229 will improve performances by 4.81%

Comparing compact_str (b60ac75) with main (c6d8076)

Summary

⚡ 2 improvements
✅ 28 untouched benchmarks

Benchmarks breakdown

Benchmark main compact_str Change
lexer[pydantic/types.py] 3.7 ms 3.6 ms +4.07%
lexer[large/dataset.py] 8.4 ms 8 ms +4.81%

@MichaReiser MichaReiser added performance Potential performance improvement parser Related to the parser labels Dec 21, 2023
Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@BurntSushi
Copy link
Member

I'd also be interested in seeing how hipstr fairs. It appears to have some useful properties.

(Sorry, I know there are a billion "small string" crates...)

@LaBatata101
Copy link
Contributor

I'd also be interested in seeing how hipstr fairs. It appears to have some useful properties.

(Sorry, I know there are a billion "small string" crates...)

I can try it later today

@charliermarsh
Copy link
Member

Interesting, the benchmarks look better. (We almost never clone strings so perhaps O(1) clone is not an important tradeoff right now.)

@MichaReiser
Copy link
Member Author

MichaReiser commented Dec 21, 2023

I'd also be interested in seeing how hipstr fairs. It appears to have some useful properties.

(Sorry, I know there are a billion "small string" crates...)

Hipstr has interesting properties but it might not work for us without dropping wasm support (playground):

This crate is only supported on platforms where:

  • pointers have the same memory size has usize

source

Wasm is 32 bit if I remember it correctly.

@BurntSushi
Copy link
Member

Wasm is 32 bit if I remember it correctly.

Yeah but the pointers are 32-bit too right? So that should be fine. I think the note in hipstr is talking about platforms with tagged pointers (like CHERI), which are pretty obscure at present.

@LaBatata101
Copy link
Contributor

I'd also be interested in seeing how hipstr fairs. It appears to have some useful properties.

(Sorry, I know there are a billion "small string" crates...)

The problem with hipstr is that we would have to add lifetime annotations to the AST and Tok types.

@MichaReiser
Copy link
Member Author

We'll revisit after merging the hand-written parser.

@MichaReiser MichaReiser deleted the compact_str branch August 12, 2024 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants