Skip to content

Augmenter

Joe Abbate edited this page May 22, 2013 · 1 revision

This is a placeholder for discussing a new Pyrseas tool (see issue #17).

The Pyrseas programs yamltodb and dbtoyaml are general purpose tools, in the vein of pg_dump and pg_restore: i.e. they have no interest in the objects in the database, they just process them as-is and they do not add or remove any code from the database.

There is a case for extending Pyrseas based on the following hypothesis:

  • It is better to enforce data integrity as close to the database as possible
  • It is better to use native Postgresql functionality where possible, e.g. domains, enums, foreign key and check constraints
  • Some common data integrity and business logic requirements cannot be enforced in the database without code, e.g. denormalizations (including copies, aggregates and calculations), audit columns and journal/history tables
  • These things are frequently not implemented in the database because it is not considered cost-effective
  • If these things could be implemented declaratively then it would be cost-effective to implement them in the database, and the data would be better protected as a result, and both the developer and the DBA would sleep better at night.

Augmenter (dbaugment) will be an add-on tool to Pyrseas. It will use a yaml input file very similar to the input file for yamltodb, but with additional syntax describing the business logic to be generated into the database. The output from Augmenter will be another yaml file in the format required for input to yamltodb. This allows Augmenter users to maintain a single yaml file (usually under version-control) defining both the database objects and some additional business logic, but does not affect current users of Pyrseas and does not duplicate any of the existing functionality of Pyrseas.

[jmafc] I envisioned something somewhat different, namely:

Dbaugment will accept two or three arguments. The first is a database name. The second argument would be a hand-coded YAML file containing the desired extensions. The third input would be an optional configuration file. Proposed syntax:

dbaugment dbname extend.yaml --config config.file

[rhunwicks] If you are going to allow the arguments for the schema and the extensions to both be yaml files, would it be a lot of extra work to support a single argument version that references a combined yaml file?

[jmafc] I changed the description above, but hadn't changed the syntax.

Simple use case: start off with a database with a film table as here. Assume for simplicity sake that "audit_trails: true" means only adding the "last_update" column as in the post. Then the extend.yaml file would just consist of:

schema public:
  table film:
     audit_trails: true

Dbaugment will then output the YAML spec as seen in the post above. This can then be input to yamltodb.

Although the configuration could be an intro part of extend.yaml, I think you'd want to use it multiple times, across databases or projects. I'm also undecided as to whether YAML is the best format for the config file.

Ideally, dbaugment should be idempotent, i.e., if it's re-run against the example database above, it should not regenerate the spec since it will see the table is already "audited." OTOH, if the config file now says that in addition to a timestamp, we also need a username, then it would add that.

[jmafc] End

[rhunwicks] I agree that idempotent is good.

My personal preference is for dbaugment to produce an input file for yamltodb, i.e. for it to be used serially rather than in parallel. My reasoning is:

  • If yamltodb is already idempotent then if dbaugment produces the yaml file that represents the correct database state, then dbaugment will also be idempotent
  • Similarly, if dbaugment is going to generate SQL DDL then it will have a large amount of code reuse from yamltodb to compare the input file to the database and generate ALTER ADD DROP statements as required. I guess that we would need to separate much of this functionality into a library that would be used by both tools. On the other hand, if dbaugment generates a yamltodb input file then yamltodb and dbtoyaml can remain unaltered without needing to duplicate code

[jmafc] I've convinced myself that YAML output is the way to go, not because of the code reuse (which wouldn't be a problem) but because generating SQL requires being aware of inter-object dependencies (something that yamltodb already handles).

  • I have a strong preference for a single definition file. I recognize that this is just a preference, probably the result of my many years doing "model driven development" using Oracle Designer. I like the idea of having a single definition for a table that tells me what columns it contains, and also what business logic it obeys. If I have separate input files for the table structure and the business logic I am concerned that users will:
    • look at the structure without realizing what additional columns there might be, e.g. audit columns
    • more importantly, there might be some columns defined in the structure that are subject to business logic that is not obvious - for example if we automate the denormalization of a grandparent primary key into a child table then my instinct is to show the column in the structure, but then I would want to know by looking at its yaml definition that it will be automated. Another example, which we haven't discussed yet is, is forcing a particular column to contain only upper or lowercase text. I know we can set a check constraint to make sure the data is valid, but we might also want to allow the user to enter the text as they prefer, but convert it internally before saving it.
    • similarly, I am worried that looking at the dbaugment file without the structure of the table to provide context will be less meaningful: for example, say we have a requirement that tables containing credit card numbers are always fully audited - then if I see the audit requirement in a separate file without being to seeing the credit_card column in the table, then I might not remember why it is there. I could add a comment to the dbaugment yaml file, but to some extent yaml is supposed to be self-documenting.
  • I am less concerned about storing the configuration data in a separate, possibly non-yaml, file. I can see the reasons for separating it, although personally I would rather have it all in one, so that:
    • I can see the business logic definition and easily see how it will be implemented, without having to open a separate file
    • if we allow named audit sections, for example, then the business logic definition will be meaningless without the corresponding audit config section
    • I want the config to be version-controlled alongside the database definition
    • I don't want to take the chance that someone alters the config in a shared file to meet a requirement on another database and consequently changes all the business logic in MY database, e.g. dropping a load of audit columns
  • I haven't thought it through completely, but instinctively it seems like separate files will make round tripping more difficult. If I have a separate structure file and logic file and I want to make a change in the database and then pull that back into version control using dbtoyaml I will suddenly have all the dbaugment functions and triggers in my structure config file, unless I have some mechanism for identifying them and ignoring them. If I don't remove them I could end up in a situation where doing a yamltodb to add a new table with no business logic cause the existing logic to be changed, because an old version of the functions and triggers is generated into the database.

Do you have any thoughts about my more detailed reasons, and/or want to explain your preferred approach in more detail? I recognize that ultimately it's your decision.

[jmafc] I'll be posting a prototype man page in a dbaugment branch here on GitHub, and will work on a small proof-of-concept dbaugment.py.

Note that I don't know/mind whether I want dbaugment to actually produce the intermediate file on disk, or just produce it on standard output so it can be piped into yamltodb, or even just consume the yamltodb classes internally. I guess some users will want to see, save and maybe edit the intermediate file, while others will just want the DDL.

[rhunwicks] End

Augmenter will support round-tripping: either by processing a dbtoyaml output file and converting the generated business logic back to the declarations; or by allowing a dbtoyaml output file to be used as its own input file, and only generating new or changed business logic. Previously generated business logic which has not been changed would be passed through to yamltodb unchanged.

Potential Functionality to be implemented by Augmenter includes:

  • [Denormalizations](Augmenter Denormalizations)
  • Audit Columns
  • History/Journal Tables

Augmenter will offer [configuration options](Augmenter Configuration) to allow the naming conventions for generated functions and triggers to be customized, and to control what user owns the generated objects.

The Audit Column and History Table functionality will require additional functions to support [session variables](Augmenter Session Variables).

Clone this wiki locally