Provide schema for Persistence Network #1333

LadyCailin · 2022-11-08T12:10:14Z

Currently, all values in the PN are inherently typed due to the fact that only simple values can be stored. This works while only simple types are available to users, but does mean that eventually, when complex objects are added, the data being stored will not inherently know its type, as complex objects will need to be stored as strings, for some backing protocols.

There are three ways to solve this.

Layer an additional escaping on top of strings, to know when the item is a string vs a more complex object.
Provide an internal schema, for instance an additional value in the DB that contains the DB schema (schema.key instead of storage.key).
Provide an external schema.

Each of these has pros and cons that need to be discussed.

Pros:

No additional configuration or user input is required.
No chance of already existing user values accidentally replicating the escaping mechanism.
No unexpected additional values being stored in the DB under a brand new top level key.

Cons:

Existing user values might accidentally replicate the escaping mechanism. This could be solved by doing an upgrade routine, but is not ideal, as some data sources may be offline, so would also require providing a mechanism to run offline. All strings (at least) would need to be changed anyways, as the basis of the complex object serialization would certainly be the string type, at least for many data source types.
The new schema top level key would almost surely be stored in the default namespace, which may be a different location than the value itself, which is likely not what the user would want. We could put it in the storage namespace, but then we might clobber a user value, so this is not a reasonable solution either. We could also special case the behavior of this one namespace, so it automatically uses the same namespace as the associated key, but then this is a new mechanism that users would just have to be aware of, and it seems like not a clean solution either. Another con, if the schema info and value info live in different files, it's more likely that the schema is separated from the PN, and the value would no longer be able to properly be read in. This could be offset by the user-provided schema, where defined (see below), but that is not the point of that schema, and so may be in conflict anyways. Further, the user provided schema can use higher level types (i.e. mixed) and so cannot be used to know which type to deserialize to anyways.
There is no obvious location to put this. Further, the strengths of the PN itself for data storage would then be ignored.

All in all, approach 2 seems to be the best to me. It would require an upgrade notice, and likely be applied only as part of a major version bump, but the existing tooling for data source migration can be used by users to correct the location of the values after the fact. Since the schema would only be used for complex values, it would start out initially by not being used anyways, which would give most users a chance to simply change the location, even if they have already upgraded.

User Defined Schema

One additional feature that should be considered is the fact that some keys may wish to have a user defined schema associated with them anyways. This would be useful for enforcing data types on certain keys. This will require users to provide a declarative schema (perhaps through annotations, or a separate configuration file type), which would supplement the built in schema mechanism regardless of how it's implemented, but would also provide a mechanism for the compiler itself to do static analysis, allowing the get/set_value functions to be properly typechecked. The purpose of this schema is not to be confused with the built-in schema however. The built in schema is meant to be able to properly parse the data in the key into its original object type. If the currently stored data is for instance, an int, and the user-defined schema is later edited to define it as an array, the value stored should still be parsed as an int, it's just that it would cause a runtime cast exception since the user schema defines it as an array, not an int. A separate utility for verifying DB values against the user schema can be implemented to assist in identifying problem areas before runtime, but this would not normally be detected by the compiler.

Discussion encouraged.

The text was updated successfully, but these errors were encountered:

LadyCailin added engineering Internal changes to the system that aren't visible to end users discussion wanted There are still undetermined aspects of this issue, please comment! labels Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide schema for Persistence Network #1333

Provide schema for Persistence Network #1333

LadyCailin commented Nov 8, 2022 •

edited

Loading

Provide schema for Persistence Network #1333

Provide schema for Persistence Network #1333

Comments

LadyCailin commented Nov 8, 2022 • edited Loading

User Defined Schema

LadyCailin commented Nov 8, 2022 •

edited

Loading