Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(Data/Multiset): split Multiset/Basic.lean into many files #22126

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Vierkantor
Copy link
Contributor

@Vierkantor Vierkantor commented Feb 20, 2025

This PR splits the very large file Mathlib.Data.Multiset.Basic into many files. In particular, my goal was to reduce the dependencies for Mathlib.Data.Finset.Defs.

The files created in this split are:

  • Multiset.Defs: definition of Multiset, predicates such as membership, order, subsets, remaining definitions required for Finset.Defs. (Some material from Multiset.Nodup was moved here.)
  • Multiset.ZeroCons: definition of the empty, singleton and cons operations, induction principles.
  • Multiset.Count: definition of count, extensionality principles.
  • Multiset.AddSub: definition of addition, erase and subtraction operations.
  • Multiset.Replicate: definition of replicate operation.
  • Multiset.MapFold: definition of map, foldl and foldr operations.
  • Multiset.Filter: definition of filter, filterMap operations.
  • Multiset.UnionIntersection: definition of union, intersection operations and distributive lattice instance.
  • Multiset.Basic: miscellaneous results.

Nearly each file in this list imports the one above it. I don't know if we can improve the linearity in this bit of the import graph much, but at least the transitive imports should improve a bit I expect.

The files with notable modifications are:

  • Multiset.Nodup: definition and basic properties of Multiset.Nodup is upstreamed to Multiset.Defs (for use in Finset.Defs).
  • Finset.Defs: downstreamed a few declarations to reduce dependencies to only Multiset.Defs.

I want to try two extra cleanups, which I'll leave to follow-up PRs in order to make the diff not too ridiculous:

  • Upstream more material from Multiset.Nodup
  • Reverse the direction of the Multiset.Filter -> Multiset.MapFold import (since there are quite a few results on map that are proved using filter).

Open in Gitpod

This PR splits the very large file `Mathlib.Data.Multiset.Basic` into many files. In particular, my goal was to reduce the dependencies for `Mathlib.Data.Finset.Defs`.

The files created in this split are:

* `Multiset.Defs`: definition of `Multiset`, predicates such as membership, order, subsets, remaining definitions required for `Finset.Defs`. (Some material from `Multiset.Nodup` was moved here.)
* `Multiset.ZeroCons`: definition of the empty, singleton and cons operations, induction principles.
* `Multiset.Count`: definition of `count`, extensionality principles.
* `Multiset.AddSub`: definition of addition, erase and subtraction operations.
* `Multiset.MapFold`: definition of `map`, `foldl` and `foldr` operations.
* `Multiset.Filter`: definition of `filter`, `filterMap` operations.
* `Multiset.UnionIntersection`: definition of union, intersection operations and distributive lattice instance.
* `Multiset.Basic`: miscellaneous results.

Nearly each file in this list imports the one above it. I don't know if we can improve the linearity in this bit of the import graph much, but at least the transitive imports should improve a bit I expect.

The files with notable modifications are:

* `Multiset.Nodup`: definition and basic properties of `Multiset.Nodup` is upstreamed to `Multiset.Defs` (for use in `Finset.Defs`).
* `Finset.Defs`: downstreamed a few declarations to reduce dependencies to only `Multiset.Defs`.

I want to try two extra cleanups, which I'll leave to follow-up PRs in order to make the diff not too ridiculous:

* Upstream more material from `Multiset.Nodup`
* Reverse the direction of the `Multiset.Filter` -> `Multiset.MapFold` import (since there are quite a few results on `map` that are proved using `filter`).
@Vierkantor Vierkantor added awaiting-CI longest-pole This PR aims to reduce the longest pole in Mathlib t-data Data (lists, quotients, numbers, etc) labels Feb 20, 2025
@github-actions github-actions bot added the large-import Automatically added label for PRs with a significant increase in transitive imports label Feb 20, 2025
Copy link

PR summary 41c8de5f01

Import changes exceeding 2%

% File
+2.02% Mathlib.Data.Finset.Basic
+2.11% Mathlib.Data.Finset.Dedup

Import changes for modified files

Dependency changes

File Base Count Head Count Change
Mathlib.Data.Multiset.Basic 372 303 -69 (-18.55%)
Mathlib.Data.Finset.Defs 376 363 -13 (-3.46%)
Mathlib.Data.Finset.Order 378 366 -12 (-3.17%)
Mathlib.Data.Multiset.NatAntidiagonal 377 375 -2 (-0.53%)
Import changes for all files
Files Import difference
There are 4154 files with changed transitive imports taking up over 181163 characters: this is too many to display!
You can run scripts/import_trans_difference.sh all locally to see the whole output.

Declarations diff

No declarations were harmed in the making of this PR! 🐙

You can run this locally as follows
## summary with just the declaration names:
./scripts/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/declarations_diff.sh long <optional_commit>

The doc-module for script/declarations_diff.sh contains some details about this script.


Decrease in tech debt: (relative, absolute) = (1.01, 0.02)
Current number Change Type
3364 -2 porting notes
41 -1 large files

Current commit 41c8de5f01
Reference commit 022c5bb1b6

You can run this locally as

./scripts/technical-debt-metrics.sh pr_summary
  • The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
  • The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

@Vierkantor
Copy link
Contributor Author

The import bump being reported is due to the fact that we have 9 files for multisets instead of 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.UnionInter is better in line with .AddSub, maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-CI large-import Automatically added label for PRs with a significant increase in transitive imports longest-pole This PR aims to reduce the longest pole in Mathlib t-data Data (lists, quotients, numbers, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants