Skip to content

Commit b7e014d

Browse files
committed
Add SOP for schema changes (DataBiosphere/azul#2884, PR #14)
2 parents d08fcfe + 482c6a9 commit b7e014d

File tree

2 files changed

+315
-3
lines changed

2 files changed

+315
-3
lines changed

docs/dcp2_operating_procedures.rst

+308
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
.. sectnum::
2+
:depth: 3
3+
:suffix: .
4+
5+
.. role:: raw-html(raw)
6+
:format: html
7+
.. |nn| replace:: :raw-html:`<blockquote>`
8+
.. |ne| replace:: :raw-html:`</blockquote>`
9+
10+
11+
12+
13+
==========================
14+
DCP/2 Operating Procedures
15+
==========================
16+
17+
|nn| Text in a block quote (like this) is not normative. It was included
18+
for informational purposes only and does not bear on the implementation. |ne|
19+
20+
.. contents::
21+
22+
23+
24+
25+
Changing the HCA metadata schema
26+
================================
27+
28+
The HCA metadata schema defines the shape of metadata documents being
29+
exchanged between components of the DCP/2. It therefore constitutes a
30+
de-facto contract between those components. The DCP consortium reviews schema
31+
changes collectively to ensure that all teams in the consortium approve of a
32+
schema change **before** it
33+
34+
- affects the master branch of the `metadata-schema`_ repository,
35+
36+
- is published on `schema.humancellatlas.org`_, or
37+
38+
- is used by metadata staged for import into the production instance of TDR
39+
40+
.. _schema.humancellatlas.org: https://schema.humancellatlas.org/a
41+
42+
.. _metadata-schema: https://github.com/HumanCellAtlas/metadata-schema
43+
44+
.. _schema-test-data: https://github.com/HumanCellAtlas/schema-test-data
45+
46+
.. _dcp2: https://github.com/HumanCellAtlas/dcp2
47+
48+
.. _DCP/2 system design: dcp2_system_design.rst
49+
50+
51+
Impact of Schema changes
52+
-------------------------
53+
54+
Changes to the HCA metadata schema, including seemingly minor ones, can have
55+
significant impact on the individual components of the DCP system. Even the
56+
addition of a property can, in some case, require significant work in
57+
metadata consumers like the Data Browser, especially if those consumers are
58+
required to interpret the added property. It would be difficult to explicitly
59+
define the criteria for classifing schema changes by impact. Instead, the
60+
impact is determined individually for every schema change. Changes that are
61+
deemed of low impact are fast-tracked through a lighter-weight review process.
62+
63+
64+
Sizing schema changes
65+
---------------------
66+
67+
A *schema change* is a carefully selected set of related modifications to one
68+
or more of the JSONSchema documents in the `metadata-schema`_ repository. If
69+
a schema change *can* be made smaller by excluding a particular change to a
70+
schema document without sacrificing the desired end-user value, it *should*
71+
be made smaller, so as to not overwhelm the reviewers with a mix of
72+
orthogonal concerns.
73+
74+
On the other hand, the set of changes shoudn't be too small either: all schema
75+
document changes that are necessary to achieve the desired end-user value
76+
should be combined in one schema change. If certain end user value is
77+
achieved in a series of many small schema changes, reviewers will likely not
78+
be able to see the big picture until they review the changes at the end of
79+
that series, at which point it may be too late to reverse course.
80+
81+
Some schema changes warrant an update to the `DCP/2 System Design`_
82+
specification. Whether they do or not is established collectively and early
83+
on during the review process.
84+
85+
86+
Responsiveness to submitter requirements
87+
----------------------------------------
88+
89+
90+
Schema changes are often motivated by the need to capture novel information
91+
about the experimental design, phenotypes or sample collection process as
92+
submitted by a participating laboratory, in a form that the current schema is
93+
not able to sufficiently capture. Participating labs often don't have the
94+
resources or tolerance for a prolonged submission process and are likely to
95+
walk away if it takes too long to incorporate their submission into the
96+
DCP/2.
97+
98+
The DCP/2 therefore needs to balance the celerity with which it adopts schema
99+
changes on one hand against the effort needed to make all components
100+
compatible with those changes on the other. The earlier a proposed change
101+
faces scrutiny by the entire DCP/2 consortium, the better will the consortium
102+
be able to negotiate the right balance, simply because the change will not
103+
yet have permeated through the system, making it easier to make adujustments
104+
to the proposed schema changes.
105+
106+
107+
Schema change authorship
108+
------------------------
109+
110+
Any member of the consortium can *sponsor* a schema change. The sponsor can
111+
delegate authorship of the corresponding PRs to other members of the DCP/2
112+
consortium. While the PRs that comprise a particular schema change don't all
113+
need to have the same author, the sponsor must ensure that all those PRs are
114+
semantically consistent with each other.
115+
116+
117+
118+
Review process overview
119+
-----------------------
120+
121+
The process of reviewing schema changes utilizes Github's pull request (PR)
122+
feature. Every schema change review begins with a PR against the `staging`
123+
branch of the `metadata-schema`_ repository. The `develop` and `integration`
124+
branches of that repository are not used anymore. Reviewers first determine
125+
126+
* the overall impact of the change
127+
128+
* whether the schema change is `sized <Sizing schema changes>`_ correctly
129+
130+
* whether the change requires updating the specification in the `dcp2`_
131+
repository
132+
133+
* whether it requires updating the test metadata in the `schema-test-data`_
134+
repository for other components to code unit tests against
135+
136+
If a change requires specification changes, the `metadata-schema`_ PR review is
137+
suspended and a PR is filed against the `dcp2`_ repository and reviewed there.
138+
Once that PR is approved, the `metadata-schema`_ PR is revised to match the
139+
approved specification after which it is reviewed again.
140+
141+
Schema changes that are sized correctly and have negligible impact to other
142+
components, are considered adopted by the DCP/2 as soon as the PR against the
143+
`metadata-schema`_ repository is approved. No further reviews of PRs against
144+
other branches or repositories are required.
145+
146+
Schema changes that are sized correctly, and that are of some impact to other
147+
components, may require more involved testing. There are two ways of testing
148+
schema changes:
149+
150+
A) (Meta)data exhibiting the proposed schema changes are first staged and then
151+
imported into the `dev` instance of TDR after which downstream components
152+
use the resulting `dev` snapshot to test the code changes that are required
153+
to support the schema change. The Ingest component can only populate
154+
staging areas from its staging instance, so the `metadata-schema`_ PR must
155+
first be approved provisionally [#provisionally]_, merged and deployed to
156+
the ``staging`` deployment of the Ingest component.
157+
158+
B) (Meta)data exhibiting the proposed schema changes are committed to a feature
159+
branch of the `schema-test-data`_ repository, and a PR is filed for that
160+
branch. The `schema-test-data`_ repository can only populated using the
161+
staging instance of the Ingest component, so the `metadata-schema`_ PR must
162+
first be approved provisionally [#provisionally]_, merged and deployed to
163+
the ``staging`` deployment of the Ingest component.
164+
165+
.. [#provisionally]
166+
the provision being that the schema change may need to be amended in a
167+
follow-up PR
168+
169+
The reviewers of a PR can request either testing strategy.
170+
171+
In some cases a lab submission not only involves a schema change but also
172+
introduces a novel way of linking the metadata enties into subgraphs. In those
173+
cases, reviewers are likely to request testing strategy B. A request for either
174+
strategy should be reasonably motivated. Possible reasons include
175+
176+
- the need to not only review schema changes but also review how those changes
177+
affect actual metadata documents
178+
179+
- the need to test future, unrelated code changes against the test metadata, so
180+
as to make sure that those future changes don't introduce regressions.
181+
Especially graph changes fall into that category.
182+
183+
Schema changes that are sized incorrectly need to be revised.
184+
185+
186+
Pull request reviews
187+
--------------------
188+
189+
Pull requests against the `dcp2`_, the `metadata-schema`_ or the
190+
`schema-test-data`_ repositories are reviewed and approved in exactly the same
191+
way, provided they are involved in a semantic change to either the DCP/2
192+
specification, a DCP/2 standard operating procedure or a metadata schema
193+
change:
194+
195+
As a PR author
196+
197+
1) Announce the PR on the ``#dcp2`` channel on Slack, @-mentioning all
198+
`Designated reviewers`_.
199+
200+
2) Request a review from all `Designated reviewers`_.
201+
202+
3) Wait one week.
203+
204+
4) If this is the first review cycle, remind about the PR on the ``#dcp2``
205+
channel on Slack, @-mentioning requested reviewers that haven't yet
206+
reviewed the PR.
207+
208+
5) If either
209+
210+
a) all reviewers approve of the PR without conditions or
211+
212+
b) two weeks have passed since step 1 and there are no binding reviews
213+
requesting changes,
214+
215+
merge the PR. You are done unless any of the approvals are conditional. A
216+
conditional approval is one that's dependent on successful testing using
217+
one of the strategies mentioned in `Review process overview`_. The
218+
condition has to be specified in the summary comment on approval. If
219+
testing fails, a reviewer may request amendmends to your changes. These
220+
requests are made as comments to the original, now merged PR. Open
221+
another PR with the requested amendmends and start at step 1.
222+
223+
6) If a reviewer requests that you first update the DCP/2 system design
224+
specifcation or standard operating procedures first, suspend work on this
225+
PR and open another PR against the `dcp2`_ repository. Start at step 1
226+
there. Once that PR has been approved, update this PR branch with any
227+
follow-up changes resulting from the `dcp2`_ PR review process and resume
228+
this PR at step 1.
229+
230+
7) Otherwise, respond to every review comment either by making a source code
231+
change that you think appropriately addresses the comment or by replying
232+
with a comment explaining your opposition to the change or asking for more
233+
information. When addressing a set of related review comments with a source
234+
code change, try to commit that change separately from those addressing
235+
other related sets of review comments. Avoid constantly squashing the PR
236+
branch unless doing so helps reviewers to better understand the branch
237+
history.
238+
239+
8) Start another cycle by requesting a review from the reviewers currently on
240+
the PR, even those that already approved the PR or just commented on it.
241+
Proceed to step 3.
242+
243+
Steps 1, 2 and 8 must be done on a weekday.
244+
245+
As a PR reviewer
246+
247+
1) Respond to review requests within one week of the request or
248+
249+
2) Name a delegate within one business day of the request. Do so by cancelling
250+
the request for review by you and requesting a review from the delegate
251+
instead. Make a normal (non-review) comment on the PR, announcing the
252+
delegation.
253+
254+
3) Review the PR in good faith. Ask specific questions or make specific
255+
suggestions. If you can't find anything objectionable in the PR, approve the
256+
PR as soon as possible. Don't get lost in details.
257+
258+
4) If you accept the PR author's response to a comment made by the reviewer,
259+
mark the comment thread as resolved on Github.
260+
261+
.. _dismissed: https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/dismissing-a-pull-request-review
262+
263+
Once a designated reviewer delegated the review, none of the designated
264+
reviewer's comments or reviews are binding. It's acceptable to direct the
265+
delegate but that should be kept to a minimum and ideally be done outside of
266+
Github e.g., Slack or E-Mail.
267+
268+
A delegate reviewer can only delegate back to the designated reviewer that named
269+
them, and only after the PR author rerequested a review from the delegate.
270+
271+
Only comments and reviews by mediators, designated reviewers or their delegate
272+
are binding. Other reviews should be `dismissed`_ by the author. Other comments
273+
can be ignored.
274+
275+
Reviews by designated reviewers or their delegate can be `dismissed`_ by the
276+
`Mediators`_.
277+
278+
At no time during the life-time of the PR can there be more reviewers listed on
279+
the PR than there are entries in the lists of `Designated reviewers`_ and
280+
`Mediators`_.
281+
282+
283+
Discourse
284+
~~~~~~~~~
285+
286+
Review comments and replies by authors should be kept brief. Typically, review
287+
comments made during a review cycle are addressed by the author and marked
288+
resolved by the reviewer during the next cycle. Any disagreement that cannot be
289+
resolved in two cycles should be discussed in a conference call to which the the
290+
PR author invites the `Mediators`_ and all reviewers currently on the PR.
291+
292+
293+
Designated reviewers
294+
~~~~~~~~~~~~~~~~~~~~
295+
296+
- Enrique aka ``@ESapenaVentura`` (EBI)
297+
- Amnon aka ``@amnonkhen`` (EBI)
298+
- Ruchi aka ``@ruchim`` (Broad)
299+
- Kylee aka ``@kbergin`` (Broad)
300+
- Andrew aka ``@aherbst-broad`` (Broad)
301+
- Hannes aka ``@hannes-ucsc`` (UCSC)
302+
- Dave aka ``@NoopDog`` (UCSC)
303+
304+
305+
Mediators
306+
~~~~~~~~~
307+
308+
- Kathleen (Broad)

docs/dcp2_system_design.rst

+7-3
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,13 @@ EBI Ingest (the primary channel for incorporating projects into the DCP/2), an
4040
adapter for processing analysis (meta)data from Terra workspaces, and adapters
4141
for high-level matrix data from a range of sources.
4242

43-
All metadata is in JSON format and complies with the HCA Metadata Schema.
44-
Aside from minor schema changes that were necessary for processing the staging
45-
areas, the evolution of the schema is currently on hold.
43+
All metadata is in JSON format and complies with the `HCA Metadata Schema`_.
44+
Changes to that schema are made according to standard `DCP/2 operating
45+
procedures`_.
46+
47+
.. _HCA Metadata Schema: https://github.com/HumanCellAtlas/metadata-schema
48+
49+
.. _DCP/2 operating procedures: dcp2_operating_procedures.rst
4650

4751
The DCP/2 only contains public (meta)data (not controlled access).
4852

0 commit comments

Comments
 (0)