Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Requirement for Deterministic Serialization and Support Non-Deterministic Serializations #51

Closed
sargas opened this issue Jan 21, 2025 · 1 comment · Fixed by #52

Comments

@sargas
Copy link

sargas commented Jan 21, 2025

The restate documentation specifies how handlers are to be deterministic, keeping non-deterministic operations in restate.Run or restate.Rand. This documentation doesn't mention that the codec (JSONCodec by default) must also be deterministic.

This came up with the CloudEvents SDK, which has a MarshalJSON implementation that iterates through a stored map to add the keys/values to a JSON document. Since Go uses a randomized iteration order for maps, this means that converting CloudEvent's Event objects to JSON may result in different orderings.

This is reproducible with the following code:

func main() {
	if err := server.NewRestate().
		Bind(restate.Reflect(FirstService{})).
		Bind(restate.Reflect(SecondService{})).
		Start(context.Background(), ":9080"); err != nil {
		log.Fatal(err)
	}
}

type SecondService struct{}

func (SecondService) Process(_ restate.Context, _ cloudevents.Event) error {
	return nil
}

type FirstService struct{}

func (FirstService) Handle(ctx restate.Context) error {
	event, _ := restate.Run(ctx, func(ctx restate.RunContext) (cloudevents.Event, error) {
		event := cloudevents.NewEvent()
		event.SetID(uuid.NewString())
		event.SetSource("example/uri")
		event.SetType("example.type")
		event.SetTime(time.Now())
		event.SetExtension("a", "b")
		event.SetExtension("c", "d")
		event.SetExtension("e", "f")
		return event, nil
	})

	_, _ = restate.Service[restate.Void](ctx, "SecondService", "Process").Request(event)
	if rand.Float32() < 0.5 {
		return nil
	} else {
		return fmt.Errorf("failing with purposeful error")
	}
}

If the SetExtension lines are commented out, then sometimes requests will pass, and other times they will be retried until successful. If the SetExtension lines are kept; however, there is a chance that a retry will result in the call to restate.Service using a differently ordered JSON file - causing this error:

2025/01/20 22:30:16 ERROR Journal mismatch: Replayed journal entries did not correspond to the user code. The user code has to be deterministic! method=FirstService/Handle invocationID=inv_1gdbLDeS59C52GA3G1nv6lPHPgBoN7LaHn expectedType=*wire.CallEntryMessage expectedMessage="{\"service_name\":\"SecondService\",\"handler_name\":\"Process\",\"parameter\":\"eyJzcGVjdmVyc2lvbiI6IjEuMCIsImlkIjoiMmMzNzJlY2YtNzY2Zi00MzcwLWI2YmMtZGIxNTE1M2M2OTlkIiwic291cmNlIjoiZXhhbXBsZS91cmkiLCJ0eXBlIjoiZXhhbXBsZS50eXBlIiwidGltZSI6IjIwMjUtMDEtMjFUMDQ6MzA6MTYuNTc0NTgxNzM3WiIsImUiOiJmIiwiYSI6ImIiLCJjIjoiZCJ9\",\"Result\":null}" actualType=*wire.CallEntryMessage actualMessage="{\"service_name\":\"SecondService\",\"handler_name\":\"Process\",\"parameter\":\"eyJzcGVjdmVyc2lvbiI6IjEuMCIsImlkIjoiMmMzNzJlY2YtNzY2Zi00MzcwLWI2YmMtZGIxNTE1M2M2OTlkIiwic291cmNlIjoiZXhhbXBsZS91cmkiLCJ0eXBlIjoiZXhhbXBsZS50eXBlIiwidGltZSI6IjIwMjUtMDEtMjFUMDQ6MzA6MTYuNTc0NTgxNzM3WiIsImEiOiJiIiwiYyI6ImQiLCJlIjoiZiJ9\",\"Result\":{\"Value\":\"\"}}"

The expected parameter of this call are:

{"specversion":"1.0","id":"2c372ecf-766f-4370-b6bc-db15153c699d","source":"example/uri","type":"example.type","time":"2025-01-21T04:30:16.574581737Z","e":"f","a":"b","c":"d"}%

while the actual parameters are:

{"specversion":"1.0","id":"2c372ecf-766f-4370-b6bc-db15153c699d","source":"example/uri","type":"example.type","time":"2025-01-21T04:30:16.574581737Z","a":"b","c":"d","e":"f"}

Note that the only difference between these is the ordering of the e, a, or c fields.

This issue is to ask two related questions:

  • Is there any way to update the comparison for whats the "same" about a message to consider equivalent JSON? Perhaps by adding a mime type for the parameter in the journal message
  • Would there be appetite for a PR that optionally forces normalization of the JSON that goes through a codec? (e.g., arranges the fields within each nested object to be in alphabetic order). I'm not sure of a straightforward way to do that without parsing and serializing again though.
@jackkleeman
Copy link
Contributor

Hey @sargas ! Thanks for opening this issue.

Certainly it makes sense to document this requirement. I will do so

I feel a bit uncertain about allowing for byte differences in the serialised message, and indeed this requires storing some information about the nature of the data. Its quite expensive to do a more flexible equivalency check by default, imo, and would only benefit some rare edge cases

That said, I think force-normalizing JSON that goes through a codec is fine. I don't think this should be enabled by default, but it would be easy to have a user-provided codec which does this, or it could be a non-default JSON codec thats bundled with the library. Indeed, the only way I can think of doing it is to parse into a interface{} and then reserialise, which will be pretty slow :(. Maybe there is some fancy json library that could do it faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants