RFC 104: Validation on publish #104

gasman · 2024-11-08T20:25:26Z

A frequently-requested feature (see Wagtail issue #12505) is the ability to save draft versions of pages in an incomplete state that would not pass validation, but still enforce validation at the point that the page is published or submitted for approval. This is also a prerequisite for an effective auto-save implementation, as noted in RFC 99. This RFC sets out the requirements for such a feature, and an approach to implementing it.

lb- · 2024-12-02T06:45:53Z

text/104-validation-on-publish.md

+
+
+## Open Questions
+


Hey @gasman

I just wanted to bring over the idea/question from the RFC 99 #99 (comment)

Is it worth us considering a different approach, where these autosaves sit as revisions that are not applied to the model somehow, allowing us to skip validation for these only.

Autosave would create a 'unstable'/'dirty' revision, these would be attached to the user and of course the Page/Snippet instance.

When loading a page that has a dirty revision, it's used to populate the fields and it also notifies the user that we have loaded a previous 'unsaved change', the user can opt to go to the last saved state if they want (if it's available).

These unstable revisions would possibly not show up in revision reports by default.

When the user opts to actually save or publish, the current state approach would not change. It validates and creates a formal/stable revision. Because of the way revisions are saved (JSON), we can eventually opt in to some smarter live editing/revisions down the road with something like https://github.com/automerge/automerge

In addition, because we serialise things to JSON for revisions, we may find ourselves not having to solve a bunch of other edge cases such as nuanced validation scenarios.

The unstable revisions would only ever be used to show a new 'edit' view when the user refeshes / loads a previously un-saved model and would be given the opportunity to revert this.

Even for initial model creation, before saving anything, we could consider an unattached revision. This would still not validate fully until saved and could appear in the dashboard and page listings separate from actual saved items. We would have to consider a placeholder id or something that allows guests to get back to this, maybe it appears when they go to create a new page at the same point in the tree later, giving them an option to create a new one or continue where they left off.

This also gives a mechanism for users to just 'ignore' things they have started and do not want to accidentally save. It also avoids having to have rules for some fields such as title, we can just call it 'unsaved page' or 'unsaved snippet' if there's no title at the specific revision.

This approach is kind of borrowed from how Confluence works, there's a clear difference between an unsaved and saved page, even when editing one. I can close the tab for any unsaved work (within reason) and be confident I can get back to it later without issue.

It also means that developers can be confident that anything in the database will be consistently validated, including anything that may be published (from a full save) later in the future.

@lb-

That's a good point about having a clear difference between unsaved vs saved changes. I do agree that I think it's worth considering a different approach that doesn't try to fight against Django's validation too much. Instead, just skip the Django validation entirely (at least if the data is not valid), but still "save" the work (i.e. whatever is currently in the form) to the database somehow.

However, I'm not sure we should use our Revision model for that purpose.

I think we're dealing with at least two different concepts of data here: the "form state" (which might be invalid or "unstable") vs "a valid instance at a given point in time". Our Revision model is currently the latter. Mixing both concepts in Revision sounds like it would require a non-trivial amount of work that might end up muddying the waters.

Three concepts if you also count the "live" instance. To illustrate:

flowchart LR A -- "ModelForm(request.POST), {{ form }}, new FormData(<form>)" --> A["ModelForm, <form>, request.POST"] A -- "form.save() / form.instance" --> B["MyModel"] B -- "ModelForm(instance=instance)" --> A B -- ".save_revision()" --> C["Revision"] C -- ".as_object()" --> B

Loading

(Technically ModelForm, <form>, and request.POST are different concepts, but they work so closely together that we can consider them a single concept for our current focus.)

Current state of Revision

While we do use JSON in our Revision model, we can't just say that "oh it's JSON, we can put anything in it". It's worth noting how Revision currently works:

The Revision model is heavily tied to the model instance. To save a revision, you need to have a model instance; you cannot go from a ModelForm (i.e. what the user submitted) to a Revision directly. Vice versa, from a Revision you cannot go to a ModelForm directly; you need to construct the model instance first.

The heavy lifting between converting a model instance to serializable data (and back) is done by django-modelcluster. This makes use of Django's Field.value_to_string() and Field.to_python() methods.

This means the field values in the revisions must be able to be mapped to their Python object counterparts. For example, it must be possible to construct the string value of a DateTimeField in the revision data into a datetime object. So in order to store the value, you need to have a valid datetime object in the first place (i.e. you cannot have something like 2024-02-31).

Of course, this would break if you changed the model definition to have a different field type while keeping the same field name, but that's to be expected at the moment.

Also note that this is different to Django validations – you can still end up with a model instance that no longer passes validation, e.g. if there's a validator that ensures a date is in the future.

If we introduce the concept of "unstable" revisions for the purpose of storing the form state, this means we need to introduce a new code path that goes from ModelForm directly to Revision and vice versa – to skip to_python() and other validations.

Then, we'll need to make the distinction of stable vs unstable revisions. This is the bit I'm not sure about – what are the implications of this? An "unstable" revision would potentially raise an error when you call as_object() (e.g. ValidationError in to_python) even when no changes have been made to the model. Would we want to handle this change of behaviour, and signal it to the developers – or would we be better off using a different model?

Although now that I think about it, a revision from an older model state is pretty much similar to the "unstable" revision concept. If you change a model field e.g. from a TextField to a DateTimeField without updating the revisions, loading the editor could fail due to a ValidationError raised by as_object when it tries to convert the string to datetime using to_python(). This is similar to if we saved an "unstable" revision using a form state that contains an invalid datetime value for the field, and trying to call as_object on the revision.

So, I think having the ability to construct a ModelForm from saved data that does not necessarily pass to_python() is the main goal we'd like to achieve. This would allow us to fix an issue that happens when you try to revert to an older revision in Wagtail that's not compatible with the current model definition. Instead of crashing with an error as it does now, you would end up with an invalid form (with some empty or incorrect field values) that gets presented to the user.

FormState model

If we use a new model, e.g. FormState, we need to be able to:

Construct and save a FormState instance from a ModelForm instance. At the very least this should contain what gets passed to the data parameter of ModelForm.__init__, as well as the necessary pointers to the model (or model instance).

The data structure should be as close to the QueryDict from request.POST and request.GET as possible, including support for multiple values of the same key.

Bind a ModelForm with the data from a FormState instance, e.g. ModelForm(form_state.data).

This allows us to remember the state of the form for the user, regardless whether there are validation error or not, and it won't crash if the model changes – it would just result in an invalid form. Also, this could be more performant in the autosave situation as we won't have to run the validation process at all on every autosave call.

For the act of "Saving" i.e. clicking the "Save draft", "Publish" etc. buttons, the current state of approach will still be the same (as you said).

Optionally, we can add an as_form_state() method in Revision that gives us a FormState instance. In edit views, if revision.as_object() fails, we can still fall back to revision.as_form_state() to load the form's initial values (or bind it as data and do .is_valid() so that any errors are immediately rendered).

This can also be used to solve wagtail/wagtail#4521, as we can update our preview mechanism to load the data from the FormState model instead of the session. We could potentially store request.FILES too, which is currently unsupported by previews.

Moreover, we can make use of the FormState data to help use cases such as headless previews. (The headless preview uses a similar intermediary model, though it's closer to Revision.)

Not using Revision also means we can support snippets without revisions enabled.

We don't need to implement a way to save a non-validated ModelForm into Revision, as we only save Revision instances for validated forms (per the current approach).

The FormState approach can potentially be extended to plain Forms and not just ModelForms, provided we have a pointer to the form class. This could be useful to implement features such as "Saved views" for universal listings.

Revision model

Fitting this into the Revision model would mean that the Revision.as_form_state() method would be necessary (rather than optional) to implement.

Then, we need to implement a way to save a Revision from a ModelForm instance without going through a model instance (to avoid to_python and other validations), which can be a bit challenging. The structure of Revision is very different to what's in the ModelForm, especially in regards to clusterable models. In Revisions, such formsets would be nested in the object, whereas the non-validated ModelForm would just be in the shape of QueryDict.

May not be a completely bad idea, but something we need to be more careful about compared to using a separate FormState model given how closely tied to the content model the Revision currently is.

With this approach, we won't be able to cleanly support non-revisioned snippets and non-ModelForm cases too (not that it matters much though).

What does autosave look like? a.k.a. in summary

I think I agree with @lb-'s proposed UX for autosave here.

Your work (whatever is in the <form>, and consequently request.POST) always gets saved – but not necessarily to the content model, or to the revision. When you do click Save, that's when the current existing behaviour kicks in. You can be sure that the current behaviour is preserved when you do such actions, i.e. validation is done and a revision is saved.

We could potentially make it so that if the form is valid, autosave will also update the latest revision. But in the case where it isn't, instead of going through hoops of avoiding Django's validation, we should just save the form as-is (validation errors included), and let the user know that their work is not lost, but not yet captured as a revision.

Wow, thank you Sage. This is super helpful for me to understand this area better.

It makes sense that Revision may not be our 'box' for this, I did not even think about non-revision enabled models also. Glad to see that there may be some ways to solve for this direction and that others way smarter than me are adding input. Thanks @gasman and @laymonage for wanting to pull this together.

No further comments from me here, I will let others chime in with their thoughts.

laymonage

Thanks @gasman! Excellent write-up on the current state of validation and how autosave might come into play.

However I'm not convinced we should go through hoops of modifying the form classes and customising the validation etc.

For autosave to work, I think we'd get the best value out of it if developers don't have to learn and write yet another validation rules for their models. For users, the best value would be to make sure whatever they have worked on is not lost, which may contain invalid data when validated as a whole.

I think we should go with a different approach of capturing the current form state as-is, which can then be passed on to existing validation rules in the form at the point when a revision needs to be saved. See my comment for more details.

Thanks!

laymonage · 2024-12-04T16:25:42Z

text/104-validation-on-publish.md

+
+
+## Open Questions
+


@lb-

That's a good point about having a clear difference between unsaved vs saved changes. I do agree that I think it's worth considering a different approach that doesn't try to fight against Django's validation too much. Instead, just skip the Django validation entirely (at least if the data is not valid), but still "save" the work (i.e. whatever is currently in the form) to the database somehow.

However, I'm not sure we should use our Revision model for that purpose.

I think we're dealing with at least two different concepts of data here: the "form state" (which might be invalid or "unstable") vs "a valid instance at a given point in time". Our Revision model is currently the latter. Mixing both concepts in Revision sounds like it would require a non-trivial amount of work that might end up muddying the waters.

Three concepts if you also count the "live" instance. To illustrate:

flowchart LR A -- "ModelForm(request.POST), {{ form }}, new FormData(<form>)" --> A["ModelForm, <form>, request.POST"] A -- "form.save() / form.instance" --> B["MyModel"] B -- "ModelForm(instance=instance)" --> A B -- ".save_revision()" --> C["Revision"] C -- ".as_object()" --> B

Loading

(Technically ModelForm, <form>, and request.POST are different concepts, but they work so closely together that we can consider them a single concept for our current focus.)

Current state of Revision

While we do use JSON in our Revision model, we can't just say that "oh it's JSON, we can put anything in it". It's worth noting how Revision currently works:

The Revision model is heavily tied to the model instance. To save a revision, you need to have a model instance; you cannot go from a ModelForm (i.e. what the user submitted) to a Revision directly. Vice versa, from a Revision you cannot go to a ModelForm directly; you need to construct the model instance first.

The heavy lifting between converting a model instance to serializable data (and back) is done by django-modelcluster. This makes use of Django's Field.value_to_string() and Field.to_python() methods.

This means the field values in the revisions must be able to be mapped to their Python object counterparts. For example, it must be possible to construct the string value of a DateTimeField in the revision data into a datetime object. So in order to store the value, you need to have a valid datetime object in the first place (i.e. you cannot have something like 2024-02-31).

Of course, this would break if you changed the model definition to have a different field type while keeping the same field name, but that's to be expected at the moment.

Also note that this is different to Django validations – you can still end up with a model instance that no longer passes validation, e.g. if there's a validator that ensures a date is in the future.

If we introduce the concept of "unstable" revisions for the purpose of storing the form state, this means we need to introduce a new code path that goes from ModelForm directly to Revision and vice versa – to skip to_python() and other validations.

Then, we'll need to make the distinction of stable vs unstable revisions. This is the bit I'm not sure about – what are the implications of this? An "unstable" revision would potentially raise an error when you call as_object() (e.g. ValidationError in to_python) even when no changes have been made to the model. Would we want to handle this change of behaviour, and signal it to the developers – or would we be better off using a different model?

Although now that I think about it, a revision from an older model state is pretty much similar to the "unstable" revision concept. If you change a model field e.g. from a TextField to a DateTimeField without updating the revisions, loading the editor could fail due to a ValidationError raised by as_object when it tries to convert the string to datetime using to_python(). This is similar to if we saved an "unstable" revision using a form state that contains an invalid datetime value for the field, and trying to call as_object on the revision.

So, I think having the ability to construct a ModelForm from saved data that does not necessarily pass to_python() is the main goal we'd like to achieve. This would allow us to fix an issue that happens when you try to revert to an older revision in Wagtail that's not compatible with the current model definition. Instead of crashing with an error as it does now, you would end up with an invalid form (with some empty or incorrect field values) that gets presented to the user.

FormState model

If we use a new model, e.g. FormState, we need to be able to:

Construct and save a FormState instance from a ModelForm instance. At the very least this should contain what gets passed to the data parameter of ModelForm.__init__, as well as the necessary pointers to the model (or model instance).

The data structure should be as close to the QueryDict from request.POST and request.GET as possible, including support for multiple values of the same key.

Bind a ModelForm with the data from a FormState instance, e.g. ModelForm(form_state.data).

This allows us to remember the state of the form for the user, regardless whether there are validation error or not, and it won't crash if the model changes – it would just result in an invalid form. Also, this could be more performant in the autosave situation as we won't have to run the validation process at all on every autosave call.

For the act of "Saving" i.e. clicking the "Save draft", "Publish" etc. buttons, the current state of approach will still be the same (as you said).

Optionally, we can add an as_form_state() method in Revision that gives us a FormState instance. In edit views, if revision.as_object() fails, we can still fall back to revision.as_form_state() to load the form's initial values (or bind it as data and do .is_valid() so that any errors are immediately rendered).

This can also be used to solve wagtail/wagtail#4521, as we can update our preview mechanism to load the data from the FormState model instead of the session. We could potentially store request.FILES too, which is currently unsupported by previews.

Moreover, we can make use of the FormState data to help use cases such as headless previews. (The headless preview uses a similar intermediary model, though it's closer to Revision.)

Not using Revision also means we can support snippets without revisions enabled.

We don't need to implement a way to save a non-validated ModelForm into Revision, as we only save Revision instances for validated forms (per the current approach).

The FormState approach can potentially be extended to plain Forms and not just ModelForms, provided we have a pointer to the form class. This could be useful to implement features such as "Saved views" for universal listings.

Revision model

Fitting this into the Revision model would mean that the Revision.as_form_state() method would be necessary (rather than optional) to implement.

Then, we need to implement a way to save a Revision from a ModelForm instance without going through a model instance (to avoid to_python and other validations), which can be a bit challenging. The structure of Revision is very different to what's in the ModelForm, especially in regards to clusterable models. In Revisions, such formsets would be nested in the object, whereas the non-validated ModelForm would just be in the shape of QueryDict.

May not be a completely bad idea, but something we need to be more careful about compared to using a separate FormState model given how closely tied to the content model the Revision currently is.

With this approach, we won't be able to cleanly support non-revisioned snippets and non-ModelForm cases too (not that it matters much though).

What does autosave look like? a.k.a. in summary

I think I agree with @lb-'s proposed UX for autosave here.

Your work (whatever is in the <form>, and consequently request.POST) always gets saved – but not necessarily to the content model, or to the revision. When you do click Save, that's when the current existing behaviour kicks in. You can be sure that the current behaviour is preserved when you do such actions, i.e. validation is done and a revision is saved.

We could potentially make it so that if the form is valid, autosave will also update the latest revision. But in the case where it isn't, instead of going through hoops of avoiding Django's validation, we should just save the form as-is (validation errors included), and let the user know that their work is not lost, but not yet captured as a revision.

gasman · 2024-12-11T16:16:59Z

@tm-kn has raised the important question of whether this only covers text fields, or whether for example it would be possible to leave a foreign key field blank and still save as a draft.

The intention was very much for it to work on all field types, but I now realise there's one aspect where I've fallen into the trap of text-centric thinking: for non-text fields, this fails to satisfy the goal of "it should just work without the site implementor having to specifically accommodate it in their code". When bypassing the Django-level validation, a text field that's been left blank can still be saved to the database (and will be written as an empty string) but this is not the case for non-text fields unless null=True is specified. In other words, we're reliant on developers to remember to define a field as IntegerField(null=True) if they want it to be a required field on publish but still allow saving incomplete drafts (as distinct from IntegerField(null=True, blank=True) which would be a non-required field on publish). I don't think there's a good way around this, though.

laymonage · 2024-12-11T17:18:44Z

@gasman Yeah, I think there’s going to be a lot of edge cases to cover if we went down that path, e.g. database constraints.

Hence I think we should treat the form state as something else entirely and save it to the database as-is, skipping the entire validation chain for the model/form (until the form is actually saved).

RFC 104: Validation on publish

a648479

gasman added the 1:New label Nov 8, 2024

lb- reviewed Dec 2, 2024

View reviewed changes

laymonage reviewed Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 104: Validation on publish #104

RFC 104: Validation on publish #104

gasman commented Nov 8, 2024 •

edited

Loading

lb- Dec 2, 2024

laymonage Dec 4, 2024 •

edited

Loading

lb- Dec 6, 2024

laymonage left a comment

laymonage Dec 4, 2024 •

edited

Loading

gasman commented Dec 11, 2024

laymonage commented Dec 11, 2024

RFC 104: Validation on publish #104

Are you sure you want to change the base?

RFC 104: Validation on publish #104

Conversation

gasman commented Nov 8, 2024 • edited Loading

lb- Dec 2, 2024

Choose a reason for hiding this comment

laymonage Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Current state of Revision

FormState model

Revision model

What does autosave look like? a.k.a. in summary

lb- Dec 6, 2024

Choose a reason for hiding this comment

laymonage left a comment

Choose a reason for hiding this comment

laymonage Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Current state of Revision

FormState model

Revision model

What does autosave look like? a.k.a. in summary

gasman commented Dec 11, 2024

laymonage commented Dec 11, 2024

gasman commented Nov 8, 2024 •

edited

Loading

laymonage Dec 4, 2024 •

edited

Loading

Current state of `Revision`

`FormState` model

`Revision` model

laymonage Dec 4, 2024 •

edited

Loading

Current state of `Revision`

`FormState` model

`Revision` model