Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify the self-referencing Parsed struct to use the ouroboros crate. #1127

Closed
wants to merge 1 commit into from

Conversation

moshevds
Copy link

@moshevds moshevds commented Feb 8, 2025

I came across this self referential struct while looking at the use of the unsafe keyword in various rust crates. In part I was trying to search for candidates to work on related to Google's Patch Rewards program.
The custom-built self-referencing that this PR replaces is a relatively straight-forward and easy to understand use of unsafe rust, but I think an argument can be made that it is better to refactor it to use ouroboros. It reduces the surface area of unsafe rust in askama.

Do you think this is worthwhile, or do you dislike such a change for other reasons? (Such as the added dependency perhaps.)

This change passses the existing testsuite, but I noticed that this code path is not covered very well. If this refactoring seems reasonable for askama, then perhaps I should add a number of testcases for this part of the codebase?

PS:
I doubt that this change will qualify for Patch Rewards program, but it seemed like an easy and useful fix to do anyway. How do you look towards removing the remaining uses of unsafe rust as well? All of them are relate to from_utf8_unchecked, which I think is even less of a issue, so I'm not sure if that is worth it. Although I noticed that some of those conversions rely on serde_json's behavior of never outputting non-utf8. Which it never does, but as far as I know that is not given as an external guarantee by serde_json, and the JSON spec weirdly actually allows non-utf8 for non-internet use. I have no idea yet how to refactor that code by the way, it may be hard to do while keeping the performance.

Removing this usage of unsafe rust is aimed at lowering the maintenance
burden and risk of potential future memory-safety bugs. Future changes
to this part of the code won't need to be (re)analyzed manually to
confirm that the safety guarantees continue to hold, and it will be
easier for third-party organisations to be confident that askama is
memory-safe.
@Kijewski
Copy link
Collaborator

Kijewski commented Feb 8, 2025

Thank you for your interest, but in this instance I don't think your proposed changes would be useful. For the Parsed implementation ouroboros would not do anything different than the current implementation.

As for from_utf8_unchecked, I'd need a good example when the current usage could be wrong. You could do much more good if you tried to move rust-lang/rust#110998 along. Once ascii strings are stable, we could remove most instances of from_utf8_unchecked.

@Kijewski Kijewski closed this Feb 8, 2025
@moshevds
Copy link
Author

moshevds commented Feb 8, 2025

Hi @Kijewski , I agree that the current usage is not wrong. The point is that having unsafe rust where it is not needed, typically makes the code harder to maintain and adjust. You mention that the proposed change with ouroboros does not do anything different, and indeed, I took care to make sure that it would not do so.

The purpose for suggesting this change is to harden the code against accidental mistakes and lower the maintenance burden. If future changes are made against the Parsed, Ast or Node impls, then the unsafe invariant might not be upheld. This is precisely one of the things that the Patch Rewards program targets, and of course is also something I personally find valuable to improve in the Rust ecosystem.

I don't want to second guess the approach askama takes with this, and I can certainly see that maintainability arguments can go both ways. For example, adding new dependencies does add a maintenance burden of its own, so then the question becomes whether removing the unsafe rust in this way is a net positive or net negative.

Do you think this change is a net negative? (or at least not a positive change?)

As for the from_utf8_unchecked, all current uses seem good to me, and those uses are also much more trivial to analyze than the self referential struct. I'm not sure whether I think those can be improved, but I did want to mention them.
The from_utf8_unchecked that worries me the most is where output from serde_json is used directly. As far as I know, serde_json gives no guarantee to its users that the output is always UTF-8. In practice this is true, as can be seen in the source for serde_json::to_string. But the JSON spec allows invalid UTF-8, and if serde_json were to ever add support for that, that alone could break the unsafe invariant that askama relies on.

I didn't want to imply that there is any current memory safety issue, and I don't want to take up valuable time if there is no interest in these kind of changes, I simply think that removing unneeded unsafe rust from this codebase can be worthwhile as an improvement in itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants