ArchivistUtility

[no longer maintained] Helps automate parts of the workflow for creating and maintaining metadata files

This program was originally written by Tonio Lowald for the University of Alabama Libraries in 2011. MACR Utility is no longer maintained and may be used on an "AS IS" basis in accordance with the ECL-2.0 License.

The following is from the legacy Acumen project page:

Archivist Utility v0.2

Archivist Utility is a component of the Acumen project.

It is intended to ease and/or automate parts of the workflow for creating and maintaining metadata files (particularly automating the conversion of data from flat spreadsheets to valid XML by use of templates) and creating and maintaining XSL files to summarize and render metadata.

Archivist Utility offers two main windows. One is for dealing with XML metadata the other with XSL transformations -- although the two may be merged in future.

As of now, the XML window is in a much more advanced state than the XSL window (ironic, since the latter is much more useful to me).

[Fix] Pretty Printer no longer adds whitespace to the content of tags.

Archivist Utility is also available for Mac OS X and Linux on request. There is also a command-line version which offers some of the functionality.

Downloads

Archivist Utility for Windows (zip archive)
Archivist Utility for Mac OS X (zip archive)
Archivist Utility for Linux (zip archive)

BOM-Skipping

Archivist Utility now skips leading bytes that have a value of 0, or >127. In effect this skips over Byte Order Marks that some programs (apparently a lot of Microsoft programs for example) store at the beginning of plain text files to indicate their encoding. The problem fiels I was sent had a 3-byte UTF-8 BOM prepended which was being treated as part of the first data column's name.

Improved XSL Tool

I've completely rewritten the XSL window so it works as a live preview as you edit. Load XML and XSL files and preview the results. Save changes and the preview updates immediately.

Note (for Shawn): it uses the XSLT engine built into Real Studio so, for now, that's not XSLT2 or whatever Black Magic you use. I really ought to look into bundling something better into it.

I've just posted (as of 5/5/11) a new version of Archivist Utility. If you don't use the XSL preview tool then it probably won't affect you at all, but if you do then things are very different.

I don't know how (or even if) anyone else used the XSL preview tool but I use it a lot for working on XSL. The way it works now is that you load an XML file into the XML pane, the XSL file into the XSL pane, and you can preview the output in the preview panes (raw output or web preview). Instead of assuming you'll want to edit the inputs (XML and XSL) inside the panels -- which is ridiculous since I am obviously not going to write a nice XML Editor from scratch in any recent time frame -- it assumes you'll edit them in your favorite editor™ and when you save changes the tool detects them and updates the preview.

Drag-and-drop are supported too, so you should be able to drag XML and XSL files to the window (it doesn't matter which pane is active) and they'll be loaded into the correct panel automagically.

Template Language

Both utilities take spreadsheets and convert each row into an output (text) file (assumed to be XML for now) using the top row as a set of column identifiers. We've created a simple templating language to determine which bit of the spreadsheet row ends up where in the template, and also to identify errors in the spreadsheet or template. All template inserts are surrounded by matching {{ }}. Note that this means you can't put {{ and }} inside a template -- but this shouldn't be a huge problem.

Note: template commands are not case-sensitive, so {{for:foo}} and {{FOR:FOO}} and {{FoR:fOo}} are the same thing.

Inserts the contents of the column named column_name.

Inserts the contents of the column named column_name. Generates an error message if the column is absent or empty.

Evaluates condition (a boolean expression) and inserts value_if_true if the condition is true, and value_if_false otherwise.

condition is a boolean expression composed of column names, parentheses, and logical operators (&& and ||). If you're familiar with C-style boolean operations this will make sense, otherwise not. A column name evaluates to TRUE if the column is presented and non-empty, and false otherwise.

value_if_true and value_if_false may be column names (in which case the appropriate value is substituted) or string literals if quoted (e.g. {{if:foo:bar}} inserts the contents of column bar if foo is present and non-empty and nothing otherwise, {{if:foo:"hello":'good-bye'}} inserts the word "hello" (no quotes) if foo is present and non-empty, and "good-bye" otherwise. Note that there's no difference (for now!) between double and single quotes, but we may turn double-quotes into interpolated strings in the future (i.e. allow variable substitution into strings).

The contents of column name (if present) are split into pieces by separator, and then the stuff between the for and next are repeated once for each piece.

Any occurence of each (with the same column_name) in the loop body (i.e. the stuff between for and next) will be replaced by the piece.

e.g. if the column named foo contains This--is--a--test, then

{{for:foo:--}}<bar>{{each:foo}}</bar>{{next:foo}}

would generate

System Variables

Note: column names which begin with double underscore "__" are henceforth considered to be internal system variables, so don't use them in your spreadsheets!

There are some system variables available for use in templates to aid in maintenance. They work just like regular column names. These are:

__generator inserts the name, version, and build date of the program that created the file. __timestamp inserts the date and time when the file was created. __source inserts the name of the file and the row within that file from which the file was generated. __encoding inserts the internet name of the text encoding used to read the source data file. __unused lists the unused columns* (if any) and their contents __errors lists any errors* that occurred while creating the file

Note: * only those errors and omissions thus far will be inserted, so these variables work best in the tail of the template.

To Do

[ ] Make comments and log consistent (unused columns should be a WARNING not an error or whatever) [ ] Improve UTF8 vs. ASCII vs. 8-bit encoding sniffing using the algorithm described towards the bottom of this page:

Without going into the full algorithm, just perform UTF-8 decoding on the file looking for an invalid UTF-8 sequence. The correct UTF-8 sequences look like this:

0xxxxxxx ASCII < 0x80 (128)
110xxxxx 10xxxxxx 2-byte >= 0x80
1110xxxx 10xxxxxx 10xxxxxx 3-byte >= 0x400
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4-byte >= 0x10000

If the file is all ASCII, you cannot know if it was meant to be UTF-8 or any other character set compatible with ASCII in the lower 128.

[ ] {{map:column_name:map_list_id}} and {{map:column_name:map_list:separator}} to validate vocabulary [ ] {{map_strict:...}} as above but generates an error if element is not in the map [ ] {{req:expression:regular_expression}} as per req but requires that the column fulfilling the condition match the expression [ ] {{skip:on_error|on_warning}} skip export of this file

Example

{{req:Filename:^[a-z]{1}\d4_\d{7}_\d{7}(_\d{4}(_\d{3})?)?$}}

{{skip:on_error}}

would cause an error for non-matching entries and then prevent the resulting file from being created.

[ ] {{halt:on_error|on_warning}} stop batch process [ ] map list entries are in the following format: each line starts with a valid vocab term, and then is followed by a tab-delimited list of terms that get converted into that term. E.g. a line might contain "still imagephotographimage, still" indicating that "photograph" and "image, still" all get mapped to "still image". The map will try to guess the mapping for unlisted terms by (a) looking for a listed term containing the unmapped term (e.g. "photo" is contained in "photograph"), then (b) looking for a listed term that is contained by the unlisted term (e.g. "photographs" contains "photograph"); if either scores a hit, it will use that term BUT generate a warning. If it can't find a match this way it will generate an ERROR message, but pass the term through unmolested. [ ] UI for viewing and modifying map lists (possibly allow automatic matches to be "sucked in" to a map list, e.g. if "photo" were guessed as a synonym of "still image" then it would be added to that term's list of mapped terms and no longer generate warnings). [ ] update archutil [ ] build archutil testing into Archivist Utility (i.e. when you do something in Archivist Utility in debug mode it should try to do the same thing in archutil via the shell and verify it behaves as expected).

Ideas

If we need complex mapping rules {{map:column_name:/regular_expression/:map_list_id_1:map_list_id_2:...}} for really complex validation (basically the regex outputs would be matched against map_lists (empty entries would be skipped) -- e.g. {{map:foo:/([\w]+);[\s]*([\w]+)/:bar:baz}} would expect two words separated by a semicolon (with optional whitespace after the semicolon) in foo, and generate an error if it didn't match this, or if the first word weren't in the map list named "bar", or if the second word weren't in the map list named "baz".
Similarly we could do regex processing on column contents with something like {{grep:column_name:regular_expression:replacement_expression}}.
{{FOR:column_name:separator:REQ}} -- exactly like {{FOR:column_name:separator}} but reports an error if column_name is missing or empty
GUI for spreadsheet creation. Allow the spreadsheet view component to allow editing and adding new rows to spreadsheets. It would also use the currently loaded template (if any) to map fields (providing a popup menu or combox box)

Changes

All Tools

More sophisticated template features
Commands are now {{foo}} instead of %foo%
Strip quotation marks from cell contents in spreadsheet constructor (was LoadSpreadsheet)
"System Variables" now allow reporting of conversion details into output files

Archivist Utility

Automatic sniffing of files encoding (will respect files named *.utf8, but only if a contradictory BOM is not found)
Auto sniffing is the default
Spreadsheet class transparently implements trimming of cell contents, verification of row sizes, etc.
Provide UI for setting encoding options in Archivist Utility
Batch Processing Implemented
improved error reporting to show row that is the source of a problem
Can now filter and save error logs
huge amounts of testing done -- can now process entire data set with no major issues

Console Tools

archutil refactored for greater compatibility with Archivist Utility codebase and stability
Standalone encoding sniffer tool (it's now called uenc)
Documentation for Template Language (see below)

History

###v0.2

Modified the template engine to (a) simply replace all tags (%foo%) with column contents, and then (b) recursively strip empty nodes from the output. This should make converting a "mockup" into a template as simple as replacing template content with %column_name%.
All the relevant code has been nicely refactored so implementing batch processing should be easier.
Added an XML prettifier which correctly indents XML output to make viewing output easier.

###v0.1

Initial Release

Archivist Utility is a desktop application for batch-production of metadata XML files from spreadsheet (tab- or comma- delimited) text files using simple templates and also for the editing and testing of XML metadata files and XSL transformations.

The goal of Archivist Utility is to allow seamless previewing of results from given input. Eventually you should be able to edit files at any stage of the content pipeline and see the results instantly, i.e.

Spreadsheet (TXT/CSV) -> Metadata Template (XML) -> Metadata (XML) -> XSL Transformation (XSL) -> Final Output (XML/HTML)

At the moment this is broken into two windows:

XML Generator: Spreadsheet (TXT/CSV) -> Metadata Template (XML) -> Metadata (XML)

The assumed user for this is an archivist producing XML from spreadsheets wishing to verify that everything is working as expected.

XSL Editor: Metadata (XML) -> XSL Transformation (XSL) -> Final Output (XML/HTML)

The assumed user for this is either an XSL developer wishing to test XSL transforms on different input files, or an archivist trying to determine if XML metadata is being correctly transformed (e.g. into summary XML files or end-user HTML for rendering in a web browser).

It will be possible in future to see the entire workflow in a single window or perhaps to simply link the output of an XML Generator window to the input of an XSL Editor window, so that you can load a spreadsheet, select a row, and see it transformed into XML via template, and then XML or HTML via XSL -- live.

General Features

The program remembers its window layout (i.e. which windows were open, how they were positioned and sized, etc.) between sessions.

XML Templates

The templates are simply XML files with "%"-decorated tags indicating where a column value should be inserted, e.g.

<foo>%bar%</foo> indicates that the value in the column headed "bar" should be inserted in the <foo></foo> tag.

The template engine simply:

Looks for all the tags in the template, replacing them with corresponding column data if found (or nothing otherwise)
If a column is missing from the input data, a warning is flagged*
If a column is used more than once, a warning is flagged*
If a column in the source data is not used, an error is flagged (since this would result in data loss during conversion)*
The resulting XML is then parsed and any empty nodes are stripped (an empty node is defined as having no text content and no non-empty children).
And finally this XML is "prettified" (indented and cleaned up) and displayed*

Note: * these steps will be skipped during batch-processing when implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Assets		Assets
Builds - Archivist Utility.rbvcp		Builds - Archivist Utility.rbvcp
Builds - Encodinator.rbvcp/Windows		Builds - Encodinator.rbvcp/Windows
Builds - archutil.rbvcp		Builds - archutil.rbvcp
Builds - uenc.rbvcp		Builds - uenc.rbvcp
Metadata Templates		Metadata Templates
UI Assets		UI Assets
.Archivist Utility.rbuistate		.Archivist Utility.rbuistate
.Encodinator.rbuistate		.Encodinator.rbuistate
.archutil.rbuistate		.archutil.rbuistate
.uenc.rbuistate		.uenc.rbuistate
AU_Menubar.rbmnu		AU_Menubar.rbmnu
App.rbbas		App.rbbas
Archive_Utilities.rbbas		Archive_Utilities.rbbas
Archive_Utilities.rbo		Archive_Utilities.rbo
Archivist Utility.rbres		Archivist Utility.rbres
Archivist Utility.rbvcp		Archivist Utility.rbvcp
BatchSpreadsheetToXML.rbbas		BatchSpreadsheetToXML.rbbas
Build Automation.rbbas		Build Automation.rbbas
Data Flow in Metaview.vsd		Data Flow in Metaview.vsd
Enc_Menubar.rbmnu		Enc_Menubar.rbmnu
Encodinator.rbfrm		Encodinator.rbfrm
Encodinator.rbp		Encodinator.rbp
Encodinator.rbres		Encodinator.rbres
Encodinator.rbvcp		Encodinator.rbvcp
EncodinatorFileTypes.rbbas		EncodinatorFileTypes.rbbas
Installer Script.iss		Installer Script.iss
IntPoint2D.rbbas		IntPoint2D.rbbas
README.md		README.md
Reflection.rbbas		Reflection.rbbas
Reflection.rbo		Reflection.rbo
ThreadCaller.rbbas		ThreadCaller.rbbas
Utilities.rbbas		Utilities.rbbas
Utilities.rbo		Utilities.rbo
Utilities_GUI.rbbas		Utilities_GUI.rbbas
XML_Generator.rbfrm		XML_Generator.rbfrm
XML_Generator_Toolbar.rbtbar		XML_Generator_Toolbar.rbtbar
XML_Utilities.rbbas		XML_Utilities.rbbas
XML_Utilities.rbo		XML_Utilities.rbo
XSL_Editor.rbfrm		XSL_Editor.rbfrm
XSL_Editor_Toolbar.rbtbar		XSL_Editor_Toolbar.rbtbar
archivist_utility_app.rbbas		archivist_utility_app.rbbas
archutil.rbres		archutil.rbres
archutil.rbvcp		archutil.rbvcp
archutil_app.rbbas		archutil_app.rbbas
dlgProgress.rbfrm		dlgProgress.rbfrm
editor_files.rbbas		editor_files.rbbas
encodinator_setup.iss		encodinator_setup.iss
spreadsheet.rbbas		spreadsheet.rbbas
uenc.rbres		uenc.rbres
uenc.rbvcp		uenc.rbvcp
wndReplace.rbfrm		wndReplace.rbfrm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArchivistUtility

The following is from the legacy Acumen project page:

Archivist Utility v0.2

Downloads

BOM-Skipping

Improved XSL Tool

Template Language

{{column_name}}

{{req:column_name}}

{{if:condition:value_if_true:value_if_false}}

{{for:column_name:separator}} {{each:column_name}} {{next:column_name}}

System Variables

To Do

Ideas

Changes

All Tools

Archivist Utility

Console Tools

History

General Features

XML Templates

About

Releases 1

Packages

Languages

AcumenProject/ArchivistUtility

Folders and files

Latest commit

History

Repository files navigation

ArchivistUtility

The following is from the legacy Acumen project page:

Archivist Utility v0.2

Downloads

BOM-Skipping

Improved XSL Tool

Template Language

{{column_name}}

{{req:column_name}}

{{if:condition:value_if_true:value_if_false}}

{{for:column_name:separator}} {{each:column_name}} {{next:column_name}}

System Variables

To Do

Ideas

Changes

All Tools

Archivist Utility

Console Tools

History

General Features

XML Templates

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages