-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTableReg.tex
574 lines (473 loc) · 23.4 KB
/
TableReg.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\input gitmeta
\usepackage{tikz}
\lstloadlanguages{XML}
\lstset{flexiblecolumns=true,tagstyle=\ttfamily,
showstringspaces=False,basicstyle=\footnotesize}
\definecolor{termcolor}{rgb}{0.6,0.1,0.1}
\iftth
\def\vocterm#1{\emph{\color{termcolor}#1}}
\else
\def\vocterm{\startvocterm\realvocterm}
\def\realvocterm#1{\emph{\color{termcolor}#1}\endvocterm}
\begingroup
\gdef\breakablecolon{:\hskip0pt}
\catcode`\:=\active
\gdef\startvocterm{\begingroup
\catcode`\:=\active\let:=\breakablecolon}
\gdef\endvocterm{\endgroup}
\endgroup
\fi
\title{TableReg: Registering TAP-Queriable Tables Conforming to Standard
Schemas}
% see ivoatexDoc for what group names to use here; use \ivoagroup[IG] for
% interest groups.
\ivoagroup{Registry}
\author[http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/MarkusDemleitner]{Markus Demleitner}
\editor{Markus Demleitner}
\previousversion{This is the first public release}
\begin{document}
\begin{abstract}
An increasingly popular pattern in the Virtual Observatory, pioneered
by Obscore, is to define a schema for one or more tables in a database
and then publish data or metadata by putting conforming tables into
TAP services. This document discusses how such resources
should be represented in the VO Registry to facilitate data discovery,
in particular global, all-VO dataset discovery.
It turns out that the existing registration patterns for Obscore,
RegTAP, and EPN-TAP require some adjustments. The document therefore
also proposes transition strategies for these.
\end{abstract}
\section*{Conformance-related definitions}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The \href{https://www.ivoa.net}{International
Virtual Observatory Alliance (IVOA)} is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
\section{Introduction}
Beginning with Obscore 1.0 \citep{2011ivoa.spec.1028T}, an increasing
number of Virtual Observatory standards at their core just define a
table schema -- understood here as a well-defined set of columns within
one or more relations -- and rely on the IVOA's Table Access Protocol
TAP \citep{2019ivoa.spec.0927D} to
let clients actually run queries. Standards of this type include:
\begin{itemize}
\item Obscore \citep{2017ivoa.spec.0509L} -- a table for metadata of
observational data products
\item RegTAP \citep{2019ivoa.spec.1011D} -- a 13-table schema with
metadata of VO resources
\item ObsLocTAP \citep{2021ivoa.spec.0724S} -- a table schema to
communicate plans for observations and metadata of completed
observations
\item EPN-TAP \citep{2022ivoa.spec.0822E} -- a table schema for solar
system data
\end{itemize}
More such standards are currently being developed. Examples include LineTAP
\citep{wd:linetap23} and Obscore extensions for radio and high-energy data.
Of course, resources complying to these standards must be made
discoverable to be useful. Both Obscore and RegTAP have employed the
\xmlel{dataModel} element specifically introduced into TAPRegExt
\citep{2012ivoa.spec.0827D} to declare the presence of tables adhering
to a standard schema in a TAP service.
In practice, however, the \xmlel{dataModel} scheme
has some severe shortcomings:
\begin{enumerate}
\item Lack of resource metadata: In resource records located during
discovery, the global VOResource metadata (title, authors, and perhaps
most importantly coverage in space, time and spectrum) is that of the
TAP service, which the standard tables may share with any number of
other tables. Hence, it will at best be overly general. More often
than not, it will be severely misleading.
\item Unclear relationships: In particular for Obscore -- rather
typically serving several data collections at once --, a serious
problem is that data collection records can only generically say
that they are served by the TAP service (cf.~\citet{2019ivoa.spec.0520D}
for the general scheme of relating data collections and services). From
that, clients cannot deduce whether the data is available through, say,
obscore, or only in some custom table.
\item Suitability: Adherence to a data model simply is not a property of a
TAP service. It is a property of a specific table or schema.
\end{enumerate}
In addition, when standards do not define ``singletons'' (i.e., a TAP
service can only contain one instance of each
complying table, and the tables' names are
fixed) but instead table schemas applicable to arbitrarily-named tables,
the discovery process also needs to yield table names, which is
impossible using \xmlel{dataModel}.
EPN-TAP was the first standard with such a requirement. As a
solution, it switched to using the table's \xmlel{utype} element in
the VODataService \citep{2021ivoa.spec.1102D} \xmlel{tableset} for
discovery.
This mode of discovery was subsequently also employed in ObsLocTAP and
LineTAP. It still has a drawback, though: Clients discovering a service
with a \xmlel{table} (or \xmlel{schema} in the case of RegTAP) with the
bespoke \xmlel{utype} have a hard time determining whether what they
have found is the data collection's record (and hence the global
metadata pertains to the data collection itself) or the record of the
TAP service serving the data collection (in which case the global
metadata pertains to the TAP service and will be essentially unrelated
to the data collection in question).
The registration scheme this note proposes
remedies that by prescribing that for resource records for
standard-compliant tables, the resource type \xmlel{vs:CatalogResource}
(against \xmlel{vs:CatalogService} of the actual TAP service) must be used.
In Sect.~\ref{sect:norms}, we discuss the proposed scheme and give
XML snippets illustrating the various components necessary for efficient
global discovery. One reason to
introduce the scheme is to enable expressing inclusion relationships
between resources for the benefit of clients doing global discovery.
Sect.~\ref{sect:rels} discusses this mechanism in more detail.
Sect.~\ref{sect:transition} addresses the question of how to transition
from what the standards currently say (and the services and clients
implement) to a VO adopting the scheme proposed here. The particular
example of Obscore is treated in Sect.~\ref{sect:obscore} and can
largely serve as boilerplate text for future standards of its kind.
\section{Registering and Discovering Standard Tables}
\label{sect:norms}
In the scheme proposed here, TAP-queriable resources conforming to
standard table schemas will be registered as
\xmlel{vs:CatalogResource}-typed Resources. If they consist of only a
single table or of multiple, independently publishable tables (e.g.,
Obscore, EPN-TAP), an IVOA identifier \citep{2016ivoa.spec.0523D} is
defined for this particular table type. This will refer to a StandardsRegExt
\citep{2012ivoa.spec.0508H} \xmlel{StandardKey}, generally in the
defining standard's resource record. This standard key SHOULD contain a
version tag precise to the minor version. Looking back, the convention
of long version tags may seem questionable, but it is at least
conceivable that clients may want to constrain their discovery to tables
containing additions made during a major version cycle.
For robustness and flexibility, clients should not use the information
on the minor version to infer presence or absence of certain features
(e.g., a column added in a minor update), however, but rather directly
check for that feature's presence, and standards should provide means
for doing so independently of the full version tag.
Hence, the VODataService \xmlel{table} element for a table conforming to
Obscore 1.2 would begin like this:
\begin{lstlisting}[language=XML]
<table>
<name>ivoa.obscore</name>
<title>Obscore Table in the Fictional Data Centre</title>
<description>
This is example metadata for use in the
TableReg specification.
</description>
<utype>ivo://ivoa.net/std/obscore#table-1.2</utype>
...
\end{lstlisting}
where Obscore's StandardsRegExt record contains a fragment:
\begin{lstlisting}[language=XML]
<key>
<name>table-1.2</name>
<description>The data model for a table conforming to version 1.2
of this specification. This implies a set of nn mandatory columns,
as well as the table's name, ivoa.obscore.
</description>
</key>
\end{lstlisting}
Standards defining full schemas, i.e., sets of interconnected tables
that only make sense together -- at the moment, RegTAP is our only
example -- will similarly define a StandardKey to use with a schema
element. The RegTAP registry record currently has:
\begin{lstlisting}[language=XML]
<key>
<name>1.1</name>
<description>The data model for the tables making up the relational
registriy in version 1.1. This key is used to locate TAP services
implementing RR in TAPRegExt dataModel or VODataService schema/@utype
elements.
</description>
</key>
\end{lstlisting}
Hence, the tableset of the record for the Heidelberg RegTAP service
\nolinkurl{ivo://org.gavo.dc/rr/q/create} contains:
\begin{lstlisting}[language=XML]
<schema>
<name>rr</name>
<title>GAVO RegTAP Service</title>
<description>
Tables containing the information in the IVOA Registry[...]
<utype>ivo://ivoa.net/std/RegTAP#1.1</utype>
<table>...
\end{lstlisting}
It is not recommended to follow the RegTAP's example of omitting an
indicator of what is referred to in the standard key's name. Hence,
prefer, for instance, \verb|schema-1.1| to \verb|1.1|; it is always
conceivable that additional versioned entities may require standard
keys.
Let us stress that for standard discovery, the minor
version must be ignored. Clients should be written such that if they
work for any 1.x version, they work for all 1.x versions, except of
course where compelling reasons exist to require features not present in
earlier minor versions.
\section{Relationships Between Tables and Other Resources}
\label{sect:rels}
An important reason to enumerate resources conforming to a schema is
global discovery of whatever is described in the table; in the case of
Obscore, this would be observational datasets.
\begin{figure}
\tikzfigure{fig-rel-sketch}
\caption{Benefits of making relationships between resources explicit.
The nodes in this graph correspond to services, the arrows point from a
service to a collective service also serving the data served by the
first service. By evaluating the relationships, a
client can deduce that the data published through the seven services
can be queried by running one Obscore query on the TOFU data
centre.}
\label{fig:rel-sketch}
\end{figure}
Furthermore, an important reason to define
registry records for such resources is
that data collections published using multiple standards or through
multiple services can
machine-readably declare that querying one such resource is enough to
cover the entire data available. In that way, clients
doing global discovery can skip services publishing data they already
queried using other services. Consider Fig.~\ref{fig:rel-sketch} for an
illustration of the dramatic savings enabled by making such
relationships explicit.
In VOResource, relationships are declared between registry resources
using the \xmlel{relationship} element, containing a
\xmlel{relationshipType} and one or more \xmlel{relatedResource}-s. For
relationships between data collections and services making them
queriable, the Endorsed Note on Discovering Data Collections
\citep{2019ivoa.spec.0520D} prescribes \vocterm{IsServedBy} from the IVOA
relationship type
vocabulary\footnote{\url{http://www.ivoa.net/rdf/voresoure/relationship_type}}.
We argue that this term can be re-used here, even though the
relationship's label might appear somewhat misplaced when a
CatalogService-typed record declares that it is served by a
CatalogResource-typed record; that is what happens if a SIA-published
data collection says it is also present in an obscore table.
More specifically, where data contained in or published through resource
$A$ is also contained in or published through a more general (in the sense
of: making other data collections queriable, too) resource $B$, the
resource record of record $A$ should declare a relationship to $B$ with
a \xmlel{relationshipType} of \vocterm{IsServedBy}.
Resources must not declare circular \vocterm{IsServedBy} relationships.
\section{Transitioning to a TableReg World}
\label{sect:transition}
There are already several standards registering TAP-published tables in
one way or another. For standards currently in Working Draft or later,
we give transition plans, both in terms of the standards process and the
actual records in the VO Registry, here. Numbers in the following
sections are for March 2024.
\subsection{Obscore}
To enable viable global dataset discovery, reparing the registration
pattern is most urgent for Obscore. Against that, the current,
\xmlel{dataModel}-based registration pattern is furthest away from what
is proposed here. Hence, it is probably unavoidable to issue a new
minor version of Obscore.
In order to provide a smooth transition, the current pattern of using an
ivoid in the embedding TAP service's \xmlel{dataModel} field in the TAP
capability must be retained.
In addition, a new standard for Obscore 1.2 will contain an adapted
version of the material in sections~\ref{sect:registering-obscore}
and~\ref{sect:finding-obscore}.
To enable viable global datatset discovery as soon as possible, once
sufficient consensus within the Registry and Data Model WGs has been
found, the Registry WG should contact the operators of
the obscore services currently active in order to advise them on
creating an additional Obscore registry record with the required
declarations.
\subsection{EPN-TAP}
Discovery of EPN-TAP 2.0 \citep{2022ivoa.spec.0822E} resources is
already based on giving a table utype in a tableset. It says:
\begin{quotation}
\noindent Normally, however, the
tableset will be contained in a VO\-Data\-Ser\-vice \xmlel{CatalogService}
record with a TAP capability, and this capability will be an auxiliary
capability as per DDC \citep{2019ivoa.spec.0520D}. For one-table
services, a full TAPRegExt \citep{2012ivoa.spec.0827D} capability is also
allowed.
\end{quotation}
Against the present proposal, this admits pure-TAP services and does not
require the use of \xmlel{vs:CatalogResource}-typed resource records
(although in practice the wide majority of EPN-TAP publishers chose
that resource type despite the recommendation in the standard, which
predates the availability of \xmlel{vs:CatalogResource}).
Leaving open the resource type complicates
the discovery pattern significantly, since clients have
to filter out duplicates when a table is present in both the TAP
service's and the resource's tableset. We hence propose to require the
use of \xmlel{vs:CatalogResource} in the future,
where we consider an adivsory erratum sufficient;
in the end, updates to the existing resource records need to be done in
a cooperation between the Registry WG and the data providers anyway.
In February 2024, there were 488 resources with tables with the EPN-TAP
table utype in their tableset. Of these,
\begin{itemize}
\item 475 have the epncore utype, 13 (from four different authorities)
the legacy vopdc utype (which is probably a good measure for how many
resource records may be hard to update)
\item 245 have a TAP auxiliary standard id, 243 a full TAP id.
\item 249 are of the type \xmlel{vs:CatalogService}, 239 of type
\xmlel{vs:Catalog\-Re\-source} (in a perfect system, there are zero or one
\xmlel{vs:CatalogService} records per \xmlel{vs:CatalogResource} record;
zero would be when the TAP services comes without a tableset).
\end{itemize}
Investigating more closely, it turns out that only four servers (counted by
access URL) chose the \xmlel{vs:CatalogService} resource type for
their table resource record, and only one TAP server is missing from the
registry. It seems entriely possible to rectify these problems on short
notice.
\subsection{ObsLocTAP}
ObsLocTAP \citep{2021ivoa.spec.0724S} already prescribes the pattern
proposed here. The discovery query given is missing a constraint on the
resource type, though. We believe this is easily repaired through an
erratum.
\subsection{RegTAP}
In its section~7, RegTAP~1.1 \citep{2019ivoa.spec.1011D} already offers
two registration patterns. One is based on \xmlel{dataModel} in the TAP
capabilities. In the next minor version, this mechanism will be
deprecated. Instead, the alternative scheme, based on
\xmlel{vg:Registry}-typed records, needs to be more fully specified and
advertised as the primary means of locating RegTAP searchable
registries. Also, since RegTAP is a complete schema, the utype to
search will sit on the \xmlel{schema} element.
Against the pattern given in sect.~\ref{sect:finding-obscore}, we would hence
give
\begin{lstlisting}[language=SQL]
SELECT DISTINCT table_name, access_url
FROM rr.res_table
NATURAL JOIN rr.capability
NATURAL JOIN rr.interface
NATURAL JOIN rr.resource
WHERE
table_utype LIKE 'ivo://ivoa.net/std/regtap#table-1.%'
AND standard_id LIKE 'ivo://ivoa.net/std/tap%'
AND intf_role='std'
AND res_type='vg:registry'
\end{lstlisting}
as the canonical discovery pattern.
Keeping \xmlel{vg:Registry} as resource type rather than moving to
\xmlel{vs:Ca\-ta\-log\-Resource} as proposed here is convenient, as it allows
the declaration of parameters of any OAI-PMH service that may accompany
the RegTAP endpoint. Accepting this slight inconsistency also seems
justified since registry discovery plays a minor role in user code.
\xmlel{vg:Registry} also does not admit the declaration of
\xmlel{coverage} as per VODataService 1.2. Given that
space-time coverage is not a useful concept at least for full
registries, and the discovery of partial searchable registries (which
conceivably could be profit from coverage declarations)
has not been a relevant use case so
far, this seems an acceptable deficit.
At this point, there is no particular urgency for this change;
given the (theoretical) equivalence of the RegTAP
services, it is unlikely richer metadata on the registry records will be
required any time soon.
\section{Obscore Tables in the Registry}
\label{sect:obscore}
This section is intended both as the blueprint for what Obscore 1.2
should say (in addition to the transitional \xmlel{dataModel}
declaration in the capabilities) and as a template on which to base
the Registry sections of similar standards.
\subsection{Registering Obscore Tables}
\label{sect:registering-obscore}
Obscore tables are registered using VODataService
\citep{2010ivoa.spec.1202P} tablesets, where the table utype is set to
$$\hbox{\verb|ivo://ivoa.net/std/obscore#table-1.2|}.$$
The tableset is contained in a resource record of the VODataService type
\xmlel{vs:CatalogResource}
with a TAP capability, where this capability is an auxiliary
capability as per DDC \citep{2019ivoa.spec.0520D}. The TAP service
serving the table must also be registered, and an \vocterm{IsServedBy}
relationship must be declared from the Obscore record to the TAP record.
When registry records for data collections published through the Obscore
table are also published -- and publishers are strongly urged to do
that --, an \vocterm{IsServedBy}
relationship must also be declared from the individual collections'
records to the Obscore record.
An example for a registry record in VOResource comes with
this document\footnote{\auxiliaryurl{example-record.xml}}.
The noteworthy points in the record are:
\begin{itemize}
\item A \xmlel{relationship} element referencing the main TAP service
through which the service is queriable as per DDC:
\begin{lstlisting}[language=XML,basicstyle=\footnotesize]
<relationship>
<relationshipType>IsServedBy</relationshipType>
<relatedResource ivo-id="ivo://org.gavo.dc/tap"
>GAVO Data Centre TAP service</relatedResource>
</relationship>
\end{lstlisting}
\item The declaration for the auxiliary capability, including the access
URL so clients do not need to follow the relationship just declared if
all they need is the access URL:
\begin{lstlisting}[language=XML,basicstyle=\footnotesize]
<capability standardID="ivo://ivoa.net/std/TAP#aux">
<interface role="std" version="1.1" xsi:type="vs:ParamHTTP">
<accessURL use="base">http://dc.zah.uni-heidelberg.de/tap</accessURL>
</interface>
</capability>
\end{lstlisting}
\item Most importantly, the declaration of the table utype that lets
clients discover that this particular table contains Obscore data:
\begin{lstlisting}[language=XML,basicstyle=\footnotesize]
<table>
<name>ivoa.obscore</name>
<title>GAVO Data Centre Obscore Table</title>
<description>The IVOA-defined obscore table, containing generic
metadata for datasets within this data centre.</description>
<utype>ivo://ivoa.net/std/obscore#table-1.1</utype>
...
</table>
\end{lstlisting}
\end{itemize}
\subsection{Discovering Obscore Tables}
\label{sect:finding-obscore}
Obscore clients in general are interested in TAP endpoints serving Obscore
tables as well as the table's metadata, such as its coverage in space,
time, and spectrum. By the registration pattern given in
Sect.~\ref{sect:registering-obscore}, this translates into resources with TAP
(auxiliary) capabilities that have a standard key for version 1 Obscore
in a table utype; this will normally also match the TAP service record
itself (as it generally also gives the tableset). Therefore, an
additional constraint on the record type is introduced.
Translated into RegTAP \citep{2019ivoa.spec.1011D}, the following query
would return TAP access URLs and the table names:
\begin{lstlisting}[language=SQL]
SELECT DISTINCT table_name, access_url
FROM rr.res_table
NATURAL JOIN rr.capability
NATURAL JOIN rr.interface
NATURAL JOIN rr.resource
WHERE
table_utype LIKE 'ivo://ivoa.net/std/obscore#table-1.%'
AND standard_id LIKE 'ivo://ivoa.net/std/tap%'
AND intf_role='std'
AND res_type='vs:catalogresource'
\end{lstlisting}
The regular expression in the utype match makes sure minor version
increments do not prevent resource discovery; by IVOA versioning rules,
all Obscore tables of minor version 1 can be operated by all Obscore
clients of version 1. We do not constrain the version of the TAP
service. Clients may want to adapt the TAP discovery pattern to match
their specific needs.
Clients not prepared to negotiate authentication should also add a
constraint \verb|AND authenticated_only=0| to avoid unnecessary attempts
to access protected resources.
Clients can add additional constraints (e.g., on publishers, coverage,
or, in VODataService 1.3 or later, data product types) to this basic
query as usual in VOResource, i.e., by \verb|NATURAL JOIN|-ing the
tables containing the columns and adding additional \verb|WHERE|
clauses. Constraints against the embedding TAP service, however,
require a more complex join through the \verb|rr.relationship| table;
this is not expected to be a common use case.
Incidentally, for Obscore 1.2, \verb|table_name| in this query will always be
\verb|ivoa.obscore|. This item hence is only relevant for standards
that allow for flexible table names.
\appendix
\section{Changes from Previous Versions}
No previous versions yet.
% these would be subsections "Changes from v. WD-..."
% Use itemize environments.
% NOTE: IVOA recommendations must be cited from docrepo rather than ivoabib
% (REC entries there are for legacy documents only)
\bibliography{ivoatex/ivoabib,ivoatex/docrepo,local}
\end{document}