Configuration Management

What is the Problem?

Considerable discussion has been devoted to the problem of managing configuration values in DSpace, but I suspect there is not a common understanding of just what the problem or problems are, much less agreement on practical solutions to it (them). mds - as always - stakes out an opinion here - and this page will describe both the analysis of configuration issues (which will entail a critique of other opinions), together with the changes mds has put in place to address them. Alleged problems include:

Too much configuration data

Originally this manifested itself in a vastly unwieldy 'dspace.cfg' file which was headed for the 6-8 thousand lines of properties. This made locating changes, analyzing diffs, etc very painful. The modularization of config files has stopped the cancerous growth of that one file, but the cumulative lines of all '.cfg' files continues to grow apace. My feeling is that this isn't really a configuration management problem per se, it is a modularity problem: as long as the distribution continues as a monolith, all config files will be distributed. The real goal of modular configuration can only be achieved when modularity is, i.e. when what a user receives is only what they need. Then the mass of config data will be drastically reduced.

Tortured configuration syntax

I think there is universal agreement that much current configuration data stretches simple name=value property syntax way past the breaking point, and is very difficult to interpret and maintain. The reasons for this abuse are interesting to consider, however, and I think fall into several distinct categories:

Lack of well-understood alternative models. There are no standard interfaces offered for configuration other than the property-based ConfigurationManager hammer, thus everything is turned into a nail. In particular, DSpace stores what really constitutes more registry-like data in configuration properties. Examples are packager plugin configs, etc. These should just be removed to either an XML file, database etc. and accessed via more appropriate interfaces. mds removes a number of such registry-like entries.

Paucity of property syntax itself. There are cases where name-value is not quite enough, and one needs slightly richer syntax, but much less than a complex XML schema. For this reason, mds has adopted a richer config syntax called 'HOCON' (human optimized config object notation - JSON parallel quite intentional), and a supporting library to parse it. Fortunately, it is 'backward-compatible' with simple properties files (mostly), but can also express config info in a simplified JSON. This little additional expressive power can eliminate most if not all of the ugliest stuff in DSpace configuration. See Typesafe Config for more details.

Non-dynamic nature of configuration data

This is expressed in various ways, but a frequently posed solution is to imagine a config UI in which one could change values at will. It could have several benefits, including better contextualized help language, etc. While sympathetic to many of these goals, I actually have difficulty imagining how such UIs (particularly if part of the DSpace administrative UI) could work effectively. First, there is the question of what is being managed by these UIs. There are actually up to four distinct manifestations of configuration data, and the relationships among them can be complicated.

First, there are the values that are held (cached) in memory when the configuration service loads. A UI could easily manipulate these values, but there are several problems with doing so. First, a change in a value could invalidate any current running thread relying on it (or on related values). Second, the change would be entirely ephemeral, meaning it would be lost when the JVM exits.

Second, there are values stored on disk on the deployed system, that the configuration service loads at start-up to produce the above. One could imagine a UI changing these, but problems also arise. First, if only the disk value was changed, there would be an inconsistency between what is in memory and on disk, with unexpected behavior. Second, if the UI changed both at the same time, the above (invalidating thread) issues arise again. Third, there are not generally good flat-file synchronization methods handy, so multiple admins could corrupt the config files. And last, the new values would exist only on the deployed system (not the 'source' system), so they would either be lost when 'update-configs' ran, or be manually back-copied (arghh)

Third, there are the values stored on disk in the source directory, which get deployed to the location described above. It gets more difficult to imagine a web UI updating these values, since they likely are not even on the same server or system as the deployed config files. But even if one could, there are now 3 different copies to be managed by the UI, and it starts to look precarious. True, one could imagine, 'rsych'-like processes outside of a UI, that would take care of this at shutdown (or periodically), but its generally not the direction one wants data to move (from deploy to source)

Finally, there are often values stored in a version control system, which get checked out to populate the source directories mentioned above. A UI could not reach back into these systems without great complication, yet this is where one would eventually want all config values to be the place of record.

Combine these concerns with the non-trivial task of creating a UI capable of managing disparate types of data, and the project seems rather ambitious.

mds and the Typesafe library impose (for in-memory values) immutability, but it is possible to request that they all be reloaded from disk when desired, so changes can be picked up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration Management

What is the Problem?

Too much configuration data

Tortured configuration syntax

Non-dynamic nature of configuration data

Clone this wiki locally