Database schema optimizations #4731

SergeyGaluzo · 2024-11-19T02:45:58Z

Optimizations:

History/current resources split with removal of not needed indexes.
Raw resources in separate table.
Resource id integer mapping.
Enabled writes/reads of raw resources to/from ADLS.
Moved string references to StringReferenceSearchParams table and extended id length to 768 B.

Deployment is dependent on small schema change creating CurrentResource view. #4755

https://microsofthealth.visualstudio.com/Health/_workitems/edit/135369

Question:
Can this PR be cut into 2 smaller ones?
Answer:
No, if we want to avoid re-writing Resource table twice. Here is why. There are 2 main schema optimizations: #1 moving historical resources into separate table, which allows to reduce number of indexes, and, hence, save space, and #2 resource id integer mapping. Though these optimizations are independent, both require changes in the way resources are stored. #1 requires adding not only history table but also extra table to store raw resources (this reduces update latency as old raw resource stays in the same place). #2 requires new ResourceIdIntMap table, and also changes main resource row, where ResourceId string is replaced by its integer mapped value ResourceIdInt.

Question:
This PR has ~13K lines changed. Why is it that big?
Answer:
FHIR solution is not using database projects:
#1 Currently even a single line change in a single stored procedure results in a full schema SQL script added to PR. This alone is 6.5K code lines.
#2 Modified, even slightly, stored procedures have to be added in full to the diff.sql (~2K lines).
Other reasons:
#1 Data migration requires intermediate stored procedures and views to access old and new schema. This almost doubles the total changed lines in diff.sql.
#2 FHIR uses generator for database wrapper classes. This generator does not work with views, so, to bypass this limitation, SQL scripts have "artificial" table statements to generate wrappers, which (tables) are immediately dropped, and "replaced" by view statements (same names, so table wrapper classes can be reused for views). This significantly contributes to changed lines.
#3 There are ~300 lines of code specific to read/write of raw resources from/to ADLS. This code is dormant until ADLS is enabled.
#4 There are ~936 lines of test code changes.

* History separation V0 * Forgotten resource * replace current by view * TRUNCATE * HardDelete * delete history v0 * missed 64 and removed dead code from SQL query generator * Removed history from current in merge * Triggers * tests and tran * disable/enable indexes * rollback hard delete and merge * rollback delete invisible history * TRUNCATE -> DELETE * line * leftovers * PK check * Cosmetic * WHERE nanes * Get resources with forced indexes * Removed redundant where * Adding feature flag for raw resource dedupping * Enable invisible history by default * parameters * commit * fixed typo * 65 + more realistic diff * adjusted update trigger * right trigger * next iteration * special chars * Added delete and update * \r * Adding script runner * Added verification * cosmetic * HOLDLOCK hint * removed incorrect comments * RT * lock timeout * 180 and remove empty * no changes in update resource search params * blob rewriter tool * Revert "no changes in update resource search params" This reverts commit b2e0c38. * comment * Generic disable indexes and update search params * Dummy resources * fixes * Dummy records based on surr id * adjust test to filter dummy rows * exclude history and current * Adding PerfTest V-1 * Get asyn wrapper without type string to id function * tool * not exists on data copy * packages back * Added comments. * Correct filtered index on resource current * Added redundant IsHistory=0 to index * deduping * Removed dummy records * pp-p * history clause * deduping * testing parameters * get resource by type and surr id range with many versions * skip large databases * Added calls to fhir * added conflict * max retries = 3 * database pings * examples * reverse * start closer * put logic * Create * Move 65 to 84 * Fixes after merge * Adding RawResources table * change capture tweeks * Test fix * Added MI * inline insert * Repeat on updates

…idmap

… users/sergal/newschemarefsep

src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Sql/Scripts/Sequences.sql

src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Sql/Sprocs/CleanupResourceIdIntMap.sql

src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Sql/Tables/0_Resource.sql

src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Sql/Views/Resource.sql

...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs

src/Microsoft.Health.Fhir.SqlServer/Features/Storage/SqlStoreClient.cs

tools/Exporter/App.config

tools/Exporter/Program.cs

src/Microsoft.Health.Fhir.SqlServer/Features/Storage/SqlAdlsCient.cs

src/Microsoft.Health.Fhir.SqlServer/Features/Storage/SqlAdlsClient.cs

gunjitchhhatwal · 2025-01-14T17:40:56Z

5. Extended string reference id length.

@SergeyGaluzo - could you provide a description of this change?

@gunjitchhhatwal I updated #5 description.

src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Sql/Sprocs/HardDeleteResource.sql

src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Sql/Sprocs/DisableIndexes.sql

SergeyGaluzo · 2025-01-20T21:00:49Z

/azp run

azure-pipelines · 2025-01-20T21:01:06Z

Azure Pipelines successfully started running 1 pipeline(s).

SergeyGaluzo and others added 30 commits September 27, 2024 08:42

Raw resources in the Lake and Resource Id integer map

47c370f

perf tweeks

5aca029

Merge resource id map

e37ffd4

Fix after merge

5d8d03a

Fix after merge

cc94008

adjustments

d6ce380

tests

2950bcf

Merge from main

8e74731

Merge fixes

c3ccf69

Ignore NULL reference resource tyoe

b0ed1eb

test fix

51a942d

default number of columns = 13

38e869b

commented 84 diff

bb535af

set end date on enqueue

7060321

rename secondary to adls

427cdfa

defrag formatting

02a545f

set timeout to 0 for getting frag

fbbaf20

isnull on input param

3f5dcd9

Merge branch 'main' into users/sergal/rawmaplakeandhistoryandresource…

e6e3fe9

…idmap

after merge from main

5569ea1

Merge branch 'users/sergal/rawmaplakeandhistoryandresourceidmap' into…

2648ecd

… users/sergal/newschemarefsep

Reference separation

037061d

file id

99c6620

small things

0b226bf

Do not write raw resource

7f286f9

Added missing file id in history insert

873583e

hard delete tests

ce28891

change capture

848195c

FK

61b1c06

minus data lake

99a8c7a

gunjitchhhatwal reviewed Jan 12, 2025

View reviewed changes

SergeyGaluzo added 3 commits January 12, 2025 21:17

Removed not used stored proc

609ff9e

reverted formatting change

983a5da

comments

feb8562

microsoft deleted a comment from azure-pipelines bot Jan 13, 2025

Removed item2

a309c92

fhibf reviewed Jan 13, 2025

View reviewed changes

src/Microsoft.Health.Fhir.SqlServer/Features/Storage/SqlAdlsCient.cs Outdated Show resolved Hide resolved

Cient -> Client

f850188

github-advanced-security bot found potential problems Jan 13, 2025

View reviewed changes

removed local var

24430d0

gunjitchhhatwal reviewed Jan 14, 2025

View reviewed changes

src/Microsoft.Health.Fhir.SqlServer/Features/Schema/Sql/Sprocs/DisableIndexes.sql Outdated Show resolved Hide resolved

SergeyGaluzo added 4 commits January 14, 2025 15:40

revert disable indexes

9e1d8a6

Fixing incorrect versioning on close duplicates

4740e20

85.diff

5da1fab

Comment

73afc65

microsoft deleted a comment from gunjitchhhatwal Jan 19, 2025

microsoft deleted a comment from azure-pipelines bot Jan 19, 2025

SergeyGaluzo added 7 commits January 22, 2025 15:12

Changing 1 minute in the future to 8 seconds

f65a0ed

Prepare for merge with 85

cc96b7a

merge 85

58d15f7

86.diff and updated 86

f11a5bf

merge from main

e8c1d3a

restored 85 project file

0521ff3

merge from main

59f2faf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database schema optimizations #4731

Database schema optimizations #4731

SergeyGaluzo commented Nov 19, 2024 •

edited

Loading

gunjitchhhatwal commented Jan 14, 2025 •

edited by SergeyGaluzo

Loading

SergeyGaluzo commented Jan 20, 2025

azure-pipelines bot commented Jan 20, 2025

Database schema optimizations #4731

Are you sure you want to change the base?

Database schema optimizations #4731

Conversation

SergeyGaluzo commented Nov 19, 2024 • edited Loading

gunjitchhhatwal commented Jan 14, 2025 • edited by SergeyGaluzo Loading

SergeyGaluzo commented Jan 20, 2025

azure-pipelines bot commented Jan 20, 2025

SergeyGaluzo commented Nov 19, 2024 •

edited

Loading

gunjitchhhatwal commented Jan 14, 2025 •

edited by SergeyGaluzo

Loading