-
Notifications
You must be signed in to change notification settings - Fork 0
Relations
xml-to-postgres
can split out repeating elements or element sets into subtables linked back to the main table through the use of foreign keys. To make this work, xml-to-postgres
assumes that the first column you define for a table contains a unique key for the row. If such a key is not available or the row has a composite key, you can have xml-to-postgres
generate unique id numbers by defining a column with the 'seri' option (see configuration options).
Subtables are controlled by the 'file' and 'norm' options that can be set on columns, both of which should contain a filename where the data will be written to. The 'file' option produces a subtable with a one-to-many relation with its parent table, so the subtable will have as its first column the first column of the parent table for use as a foreign key. The 'norm' option (short for normalize) produces a subtable with a many-to-one relation to the parent table, so the parent table will have an auto-generated id for the subtable row acting as a foreign key. With 'norm', the subtable will be deduplicated, so even when an entry appears many times in the source XML, each will only appear once in the subtable data. The 'file' and 'norm' options can also be used together, producing a many-to-many relation. In this case the filename in the 'file' option will contain data for the crosslink table (the foreign keys for both the parent table and the subtable) and the filename in the 'norm' option will contain the deduplicated subtable.
All three possibilities above can optionally be combined with a 'cols' option to specify multiple columns to be included in the subtable. In this case, xml-to-postgres
once again expects the first defined column for each table to be a unique key. As a special case, you can have a many-to-many relation with additional attributes on the crosslink table by putting the 'file' option on one subtable and then the 'norm' option on another subtable below it (see below for an example of this).
These examples extend the basic example used elsewhere in this wiki. It defines a main table 'books' with a unique id as its first column (as required by many of the examples).
Here we've added the 'file' option to the authors column. This allows a book to have multiple authors, which will be written out to the separate file 'authors.dump'. This file will then contain two columns, first the 'id' column of the books table and then the column with the author name.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
file: authors.dump
Here we've added the 'norm' option to the authors column. With this we're back to a book having only a single author, but the author names will be deduplicated so that each author only appears once in 'authors.dump'. To make this work, xml-to-postgres
will generate a unique id for each author and store that as a foreign key in the 'author' column of the books table. The 'authors.dump' file will again contain two columns, this time the id column will be its own unique id and then again the column with the author name.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
norm: authors.dump
Here we combine these possibilities by adding both the 'file' and the 'norm' options to the author column. This produces a many-to-many relation between books and authors, as it is in reality. The 'book_author.dump' file will then contain two columns with foreign keys, the first column having the id of the book and the second column having the generated id for the author. The 'authors.dump' file will once again contain two columns with first its own unique id and then the author name.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
file: book_author.dump
norm: authors.dump
To demonstrate subtables with multiple columns we'll need to extend the example XML a little bit, like this:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author role="Main author">
<name>Gambardella, Matthew</name>
<gender>male</gender>
</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author role="Main author">
<name>Ralls, Kim</name>
<gender>female</gender>
</author>
<author role="Editor">
<name>Gambardella, Matthew</name>
</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
Note how the element set for author has changed.
Here we return to the basic one-to-many relation, but this time we have multiple columns about each author. The file
option again causes these columns to be written out to a separate file, along with the foreign key for each book.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
file: authors.dump
cols:
- name: name
path: /name
- name: gender
path: /gender
This demonstrates the many-to-one relation with multiple columns. Because the example XML contains no unique id (primary key) for the authors, we use the seri
option to generate one. The result is that the generated ids appear in the second ('author') column of the books table and the first column of the authors table. Because one of the books has multiple authors, this configuration will generate a warning that only the first author is included. Normally you'd fix that by making it a many-to-many relation (see below), but this is just an example.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
norm: authors.dump
cols:
- name: id
seri: true
- name: name
path: /name
- name: gender
path: /gender
This example demonstrates a many-to-many relation with a 'role' column which describes the relation between book and author and as such needs to go into the link table. To make this happen we have a 'file' option for author on the first level which also contains the 'role' column. Then there is another level for 'author' which writes out the normalized 'authors.sql' file. Because the author elements in the XML don't include a unique id, we need to provide that through the use of a virtual id column with the 'seri' option to generate a serial.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
file: books_authors.dump
cols:
- name: role
path: /
attr: role
- name: author
path: /
norm: authors.sql
cols:
- name: id
seri: true
- name: name
path: /name
- name: gender
path: /gender