Provide schema for Persistence Network #1333
Labels
discussion wanted
There are still undetermined aspects of this issue, please comment!
engineering
Internal changes to the system that aren't visible to end users
Currently, all values in the PN are inherently typed due to the fact that only simple values can be stored. This works while only simple types are available to users, but does mean that eventually, when complex objects are added, the data being stored will not inherently know its type, as complex objects will need to be stored as strings, for some backing protocols.
There are three ways to solve this.
schema.key
instead ofstorage.key
).Each of these has pros and cons that need to be discussed.
Pros:
Cons:
schema
top level key would almost surely be stored in the default namespace, which may be a different location than the value itself, which is likely not what the user would want. We could put it in the storage namespace, but then we might clobber a user value, so this is not a reasonable solution either. We could also special case the behavior of this one namespace, so it automatically uses the same namespace as the associated key, but then this is a new mechanism that users would just have to be aware of, and it seems like not a clean solution either. Another con, if the schema info and value info live in different files, it's more likely that the schema is separated from the PN, and the value would no longer be able to properly be read in. This could be offset by the user-provided schema, where defined (see below), but that is not the point of that schema, and so may be in conflict anyways. Further, the user provided schema can use higher level types (i.e. mixed) and so cannot be used to know which type to deserialize to anyways.All in all, approach 2 seems to be the best to me. It would require an upgrade notice, and likely be applied only as part of a major version bump, but the existing tooling for data source migration can be used by users to correct the location of the values after the fact. Since the schema would only be used for complex values, it would start out initially by not being used anyways, which would give most users a chance to simply change the location, even if they have already upgraded.
User Defined Schema
One additional feature that should be considered is the fact that some keys may wish to have a user defined schema associated with them anyways. This would be useful for enforcing data types on certain keys. This will require users to provide a declarative schema (perhaps through annotations, or a separate configuration file type), which would supplement the built in schema mechanism regardless of how it's implemented, but would also provide a mechanism for the compiler itself to do static analysis, allowing the get/set_value functions to be properly typechecked. The purpose of this schema is not to be confused with the built-in schema however. The built in schema is meant to be able to properly parse the data in the key into its original object type. If the currently stored data is for instance, an int, and the user-defined schema is later edited to define it as an array, the value stored should still be parsed as an int, it's just that it would cause a runtime cast exception since the user schema defines it as an array, not an int. A separate utility for verifying DB values against the user schema can be implemented to assist in identifying problem areas before runtime, but this would not normally be detected by the compiler.
Discussion encouraged.
The text was updated successfully, but these errors were encountered: