Skip to content

Commit 43564cf

Browse files
committed
fixup! Add SOP for schema changes (DataBiosphere/azul#2884)
Added fast-track `develop` branch PRs
1 parent 75a55ae commit 43564cf

File tree

1 file changed

+138
-91
lines changed

1 file changed

+138
-91
lines changed

docs/dcp2_operating_procedures.rst

+138-91
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
.. sectnum::
2+
:depth: 3
3+
:suffix: .
14

25
.. role:: raw-html(raw)
36
:format: html
@@ -12,26 +15,25 @@ DCP/2 Operating Procedures
1215
==========================
1316

1417
|nn| Text in a block quote (like this) is not normative. It was included
15-
for informational purposes only and does not bear on the implementation.
16-
|ne|
18+
for informational purposes only and does not bear on the implementation. |ne|
1719

1820
.. contents::
1921

2022

2123

2224

23-
2425
Changing the HCA metadata schema
2526
================================
2627

27-
Changes to the HCA metadata schema, even seemingly minor ones, can have
28-
significant impact on the individual components of the DCP system. The DCP
29-
consortium therefore uses a review process that ensures all teams in the
30-
consortium approve of a schema change **before** it
28+
The HCA metadata schema defines the shape of metadata documents being
29+
exchanged between components of the DCP/2. It therefore constitutes a
30+
de-facto contract between those components. The DCP consortium reviews schema
31+
changes collectively to ensure that all teams in the consortium approve of a
32+
schema change **before** it
3133

3234
- affects the master branch of the `metadata-schema`_ repository,
3335

34-
- is published on `schema.humancellatlas.org`_, and
36+
- is published on `schema.humancellatlas.org`_, or
3537

3638
- is used by metadata staged for import into the production instance of TDR
3739

@@ -46,111 +48,147 @@ consortium approve of a schema change **before** it
4648
.. _DCP/2 system design: dcp2_system_design.rst
4749

4850

51+
Impact of Schema changes
52+
-------------------------
53+
54+
Changes to the HCA metadata schema, including seemingly minor ones, can have
55+
significant impact on the individual components of the DCP system. Even the
56+
addition of a property can, in some case, require significant work in
57+
metadata consumers like the Data Browser, especially if those consumers are
58+
required to interpret the added property. It would be difficult to explicitly
59+
define the criteria for classifing schema changes by impact. Instead, the
60+
impact is determined individually for every schema change. Changes that are
61+
deemed of low impactare fast-tracked through a lighter-weight review process.
62+
63+
4964
Sizing schema changes
5065
---------------------
5166

52-
A *schema change* is a set of related modifications to one or more of the
53-
JSONSchema documents in the `metadata-schema`_ repository. That set must be
54-
chosen carefully. If a schema change *can* be made smaller by excluding a
55-
particular change to a schema document without sacrificing the desired end-user
56-
value, it *should* be made smaller, so as to not overwhelm the reviewers with a
57-
mix of orthogonal concerns. On the other hand, the set of changes shoudn't be
58-
too small either: all schema document changes that are necessary to achieve the
59-
desired end-user value should be combined in one overall schema change. If
60-
certain end user value is achieved in a series of many small schema changes,
61-
reviewers will likely not be able to see the big picture until they review the
62-
changes at the end of that series at which point it may be too late to reverse
63-
course.
67+
A *schema change* is a carefully seleced set of related modifications to one
68+
or more of the JSONSchema documents in the `metadata-schema`_ repository. If
69+
a schema change *can* be made smaller by excluding a particular change to a
70+
schema document without sacrificing the desired end-user value, it *should*
71+
be made smaller, so as to not overwhelm the reviewers with a mix of
72+
orthogonal concerns.
73+
74+
On the other hand, the set of changes shoudn't be too small either: all schema
75+
document changes that are necessary to achieve the desired end-user value
76+
should be combined in one schema change. If certain end user value is
77+
achieved in a series of many small schema changes, reviewers will likely not
78+
be able to see the big picture until they review the changes at the end of
79+
that series, at which point it may be too late to reverse course.
6480

6581
Some schema changes warrant an update to the `DCP/2 System Design`_
66-
specification. Whether they do or not is established during the review process
67-
of *pull requests* (PRs) against the `metadata-schema` and `schema-test-data`
68-
repositories. Specification updates are done as a PR against the `dcp2`_
69-
repository.
82+
specification. Whether they do or not is established collectively and early
83+
on during the review process.
7084

7185

72-
Schema change authorship
73-
------------------------
86+
Responsiveness to submitter requirements
87+
----------------------------------------
7488

75-
Anyone can author a PR. The PRs for one schema change don't all need to be by
76-
the same author, as long as there are no competing and open PRs in the same
77-
repository and for the same schema change. One person *should* author the PRs
78-
against the `metadata-schema`_ and `dcp2`_ repositories. That person is referred
79-
to as the *schema change author*. The schema change author can delegate
80-
authorship of PRs to other members of the DCP consortium.
8189

90+
Schema changes are often motivated by the need to capture novel information
91+
about the experimental design, phenotypes or sample collection process as
92+
submitted by a participating laboratory, in a form that the current schema is
93+
not able to sufficiently capture. Participating labs often don't have the
94+
resources or tolerance for a prolonged submission process and are likely to
95+
walk away if it takes too long to incorporate their submission into the
96+
DCP/2.
8297

83-
Schema change procedure
98+
The DCP/2 therefore needs to balance the celerity with which it adopts schema
99+
changes on one hand against the effort needed to make all components
100+
compatible with those changes on the other. The earlier a proposed change
101+
faces scrutiny by the entire DCP/2 consortium, the better will the consortium
102+
be able to negotiate the right balance, simply because the change will not
103+
yet have permeated through the system, making it easier to make adujustments
104+
to the proposed schema changes.
105+
106+
107+
Review process overview
84108
-----------------------
85109

86-
The overall procedure for authoring and reviewing a schema change consists of
87-
the following steps:
110+
The process of reviewing schema changes utilizes Github's pull request
111+
(PR) feature. Every schema change begins with a PR against the `develop`
112+
branch of the `metadata-schema`_ repository. This is were the DCP/2
113+
consortium determines
88114

89-
1) If it is considered unnecessary to update the specification and a
90-
specification update wasn't explicitly requested, proceed to step 4.
115+
* the impact of the change
91116

92-
2) Prepare a PR against the `dcp2`_ repository, updating the `DCP/2 System
93-
Design`_.
117+
* whether the schema change is `sized <Sizing schema changes>`_ correctly
94118

95-
3) Initiate a `Pull request review`_ of your PR. Once the PR has been approved
96-
by the DCP, move on to the next step. Don't merge the PR just yet.
119+
* whether the change requires updating the specification in the `dcp2`_ repository
97120

98-
4) Prepare a PR against the ``develop`` branch of the `metadata-schema`_
99-
repository. It is OK to perform this step concurrently with the preceding
100-
steps.
121+
* whether it requires updating the test metadata in the `schema-test-data`_
122+
repository for other components to code unit tests against
101123

102-
5) Initiate a `Pull request review`_ of your PR. Once the PR has been approved
103-
by the DCP, move on to the next step. Don't merge the PR just yet.
124+
If a change requires specification changes, the `metadata-schema`_ PR review is
125+
suspended and a PR is filed against the `dcp2`_ repository and reviewed there.
126+
Once that PR is approved, the `metadata-schema`_ PR is revised and reviewed
127+
again.
104128

105-
Note that the above steps can be performed for several independent schema
106-
changes in parallel.
107-
108-
6) Merge the `metadata-schema`_ repository PR into the ``develop`` branch and
109-
from there into the ``staging`` branch.
129+
Schema changes that are sized correctly and have negligible impact to other
130+
components, are considered adopted by the DCP/2 as soon as the PR against the
131+
`develop` branch is approved. No further reviews of PRs against other branches or
132+
repositories are required.
110133

111-
7) Prepare a PR against the `Test metadata`_ in the `schema-test-data`_
112-
repository. Don't merge the PR just yet.
134+
Schema changes that are sized correctly, and that are of some impact to other
135+
components, may require more involved testing. There are two ways of testing
136+
schema changes:
113137

114-
8) Initiate a `Pull request review`_ of your PR. Once the PR has been approved
115-
by the DCP, move on to the next step. As part of the review, the Azul team
116-
updates their unit tests to validate the updated test metadata. If a test
117-
fails, they either improve the test or the team member reviewing the PR
118-
requests the necessary changes to the test metadata.
138+
A) (Meta)data exhibiting the proposed changes are first staged and then
139+
imported into the `dev` instance of TDR after which downstream components
140+
use the resulting `dev` snapshot to test the code changes that are required
141+
to support the schema change. The Ingest component can only populate
142+
staging areas from its staging instance, so the `develop` PR must be
143+
approved(under the provision that the schema change may need to be amended
144+
by another PR) and the changes are promoted to the staging branch.
119145

120-
9) Prepare a PR against the schema repository, targeting the master branch.
146+
B) (Meta)data exhibiting the proposed schema changes are committed to a feature
147+
branch of the `schema-test-data`_ repository, and a PR is filed for that
148+
branch. See also `Test metadata`_ for details. The `schema-test-data`_
149+
repository can only populated using the staging instance of the Ingest
150+
component, so the PR against the `develop` branch of the `metadata-schema`_
151+
repository must be approved (under the condition that the schema change may
152+
need to be amended in another PR) and the changes are promoted to the
153+
staging branch.
121154

122-
10) Initiate a `Pull request review`_ of your PR. The review should be quick
123-
because reviewers will already have seen the changes while reviewing the PR
124-
against the ``develop`` branch. Reviewers should not request any substantial
125-
changes at this stage unless last minute considerations absolutely require
126-
it. Once the PR has been approved by the DCP, move on to the next step.
155+
The reviewers of a PR can request either testing strategy. Strategy A will
156+
likely be requested for for most changes with some impact on other components.
127157

128-
11) Merge all PRs. You may squash and/or rebase a PR before merging it but that
129-
should not affect the diff between the base and head commits of the PR
130-
branch, except for housekeeping like resolving merge conflicts or adjusting
131-
the host name of schema URLs in PRs against the `schema-test-data`_
132-
repository. No semantic changes may be introduced to a PR after it has been
133-
approved by the DCP. All such housekeeping should be done on the PR branch
134-
prior to merging it.
158+
In some cases a lab submission not only involves a schema change but also
159+
introduces a novel way of linking the metadata enties into subgraphs. In those
160+
cases, reviewers are likely to request testing strategies B. A request for
161+
either strategy should be reasonably motivated. Possible reasons include
135162

163+
- the need to not only review schema changes but also review how those changes
164+
affect actual metadata documents
136165

137-
Test metadata
138-
-------------
166+
- the need to test future, unrelated code changes against the test metadata, so
167+
as to make sure that those future changes don't introduce regressions.
168+
Especially graph changes fall into that category.
139169

140-
Every change to the schemas in the `metadata-schema`_ repository must be
141-
accompanied by a matching change to the test metadata in the `schema-test-data`_
142-
repository. Both changes must be done as PRs following the `Pull request
143-
review`_ process.
170+
Schema changes that are sized incorrectly need to be revised.
144171

145-
The test metadata should exercise all graph shapes in actual use.
146172

147-
The schema references (``describedBy``) in all metadata documents in the main
148-
branch of `schema-test-data`_ repository must match the schemas on the
149-
`master` branch of the `metadata-schema`_ repository.
173+
Schema change authorship
174+
------------------------
150175

176+
Anyone can author a schema change. The PRs for one schema change don't all need
177+
to be by the same author, as long as there are no competing PRs pending in the
178+
same repository, for the same schema change. One person *should* author the PRs
179+
against the `metadata-schema`_ and `dcp2`_ repositories. That person is
180+
referred to as the *schema change author*. The schema change author can
181+
delegate authorship of PRs to other members of the DCP consortium.
151182

152-
Pull request review
153-
-------------------
183+
184+
Pull request reviews
185+
--------------------
186+
187+
Pull requests against the `dcp2`_, the `metadata-schema`_ or the
188+
`schema-test-data` repositories are reviewed and approved in exactly the same
189+
way, provided they are involved in a semantic change to either the DCP/2
190+
specification, a DCP/2 standard operating procedure or a metadata schema
191+
change:
154192

155193
As a PR author
156194

@@ -165,17 +203,26 @@ As a PR author
165203
channel on Slack, @-mentioning requested reviewers that haven't yet
166204
reviewed the PR.
167205

168-
5) The PR is considered *approved by the DCP** if either
206+
5) If either
169207

170-
a) all reviewers approve of the PR or
208+
a) all reviewers approve of the PR without conditions or
171209

172210
b) two weeks have passed since step 1 and there are no binding reviews
173-
requesting changes.
174-
175-
6) Otherwise, if this is a PR against the `metadata-schema`_ or
176-
`schema-test-data`_ repositories and a reviewer requests that you first
177-
update the DCP/2 system design specifcation first, close this PR and
178-
proceed with step 2 in section `Changing the HCA metadata schema`_.
211+
requesting changes,
212+
213+
merge the PR. You are done unless any of the approvals are conditional. A
214+
conditional approval is one that's dependent on successful testing using
215+
one of the strategies mentioned in `Review process overview_`. If testing
216+
fails, a reviewer may request amendmends to your changes. These requests
217+
are made as comments to the original, now merged PR. Open another PR with
218+
the requested amendmends and start at step 1.
219+
220+
6) If a reviewer requests that you first update the DCP/2 system design
221+
specifcation or standard operating procedures first, suspend work on this
222+
PR and open another PR against the `dpc2`_ repository. Start at step 1
223+
there. Once that PR has been approved, update this PR branch with any
224+
follow-up changes resulting from the `dcp2`_ PR review process and resume
225+
this PR at step 1.
179226

180227
7) Otherwise, respond to every review comment either by making a source code
181228
change that you think appropriately addresses the comment or by replying

0 commit comments

Comments
 (0)