diff --git a/docs/dev/sdd.md b/docs/dev/sdd.md index 433cfc6..7c21936 100644 --- a/docs/dev/sdd.md +++ b/docs/dev/sdd.md @@ -10,18 +10,18 @@ - [Objects](#objects) - [Counting the ways](#counting-the-ways) - [Context resolution](#context-resolution) - - [`Dict` mimicry](#dict-mimicry) + - [Dictionary mimicry](#dictionary-mimicry) - [Impedance mismatch](#impedance-mismatch) - [Parameters](#parameters) - [Arrays](#arrays) - [Tables](#tables) - - [Laziness](#laziness) + - [Lazies](#lazies) - [Signals](#signals) - [Units](#units) - [Code generation](#code-generation) - [IO](#io) - [Overview](#overview-1) - - [Unified access](#unified-access) + - [Decorators](#decorators) - [Converters](#converters) - [Codecs](#codecs) - [Parsers](#parsers) @@ -49,11 +49,12 @@ like plotting, aggregations, input/output, etc. IO, for instance, is at the boundary of an application and should only affect the object model in rare instances. -*any more?* +2. ... ## Overview -We propose a generic framework for hydrologic models. +FloPy can provide a generic framework for hydrologic +models. FloPy can consist of plugins, each defining a wrapper for a given hydrologic program. Programs are expected @@ -61,19 +62,17 @@ to provide an unambiguous input specification. FloPy can provide a basic set of building blocks with which a program's input parameters can be defined and -configured. This will consist of parameters and their -contexts. The former are leaves, the latter nodes in -an input context tree (this is made rigorous below). +configured. This will consist of parameters in nested +contexts. (This is made more rigorous below.) Provided an input specification for a program, FloPy can generate an object-oriented Python interface for it. This will consist of an **object model** (input -data model) and **IO machinery** (data access layer) -at minimum. (These are separate concerns and will be -coupled only where necessary/appropriate.) +data model) and **IO module** (data access layer). Once these exist, they *are* the specification — -specification documents should be derivable in reverse. +specification documents should be derivable from them +in reverse. FloPy will provide a **plugin runtime** which accepts a program selection and an input configuration. @@ -84,28 +83,28 @@ report its progress, and make its results available. ### Runtime FloPy will provide a plugin runtime whose purpose is to -wrap and run arbitrary hydrologic programs. We consider -the **simulation** the fundamental abstraction here; a -simulation is a *plan for how to execute a program*. A -simulation is *not* the execution of the program itself -*nor* the harness which drives the program: +wrap/run arbitrary hydrologic programs. **Simulation** +is the fundamental abstraction: we could consider the +simulation a *plan for how to execute a program*. This at odds with the standard terminology in MODFLOW 6, -where a simulation *is precisely* the runtime. FloPy is -an interface to modeling codes, and as such, adopts the -view that one might as well call the thing that becomes -the simulation the simulation; this seems like a benign -and (maybe even appropriate) effacement of a meaningful +where a simulation means the runtime itself. FloPy, as +an interface to programs, could reasonably call the thing +that becomes the simulation the simulation; seems benign +(and maybe even appropriate) effacement of a meaningful distinction for reasons of precedent and familiarity. -A distinct abstraction can represent the "task" running -the program, and a third can represent its output. `Run` -for the former and `Result` for the latter maybe? +A distinct abstraction could represent the "task" that +runs the program. A third could represent its output. +The latter should be derivable from the simulation, if +results are available in a given workspace, so results +can still be retrieved easily in a subsequent session, +or by someone else provided the workspace contents. -All runs can have an autogenerated GUID and a name, with -the name defaulting to the GUID if the run is anonymous. +Runs could have an autogenerated GUID and an optional +name. Anonymous runs' names could default to the GUID. -Scheduling seems like it could benefit from asynchrony. +Scheduling seems like it may benefit from asynchrony. While programs should ideally make maximal use of the resources provided to them, one might want to run more than one single-threaded program at once, without the @@ -115,8 +114,8 @@ An awaitable (coroutine-based) API, returning futures instead of blocking, could allow an arbitrary number of concurrent runs. -We can provide a traditional synchronous alternative -which runs the simulation directly. +If this is pursued, a synchronous alternative should +be provided which runs programs directly as done now. ### Plugins @@ -147,25 +146,23 @@ provide access to model results. We want: - an intuitive, consistent, & expressive interface -to a broad range of programs. +to a broad range of programs - a small core codebase and a largely autogenerated -user-facing input data model. +user-facing input data model - an unsurprising and uncomplicated core framework -accessible to new contributors. +accessible to new contributors -- more consistent (and fewer) points of entry. +- more consistent (and fewer) points of entry -- easy access to a program's input specification. +- easy access to a program's input specification -- easy access to a simulation's configuration. +- easy access to a simulation's input configuration -- hierarchical namespacing, address resolution, and -value lookups. +- hierarchical namespacing and context resolution -- context-aware parameters and automatic enforcement -of program invariants. +- automatic enforcement of program invariants ...and more. @@ -179,18 +176,10 @@ The latter aim to make it easier to give a class a nice `dataclasses` is derived from an older project called [`attrs`](https://www.attrs.org/en/stable/) which has some extra powers, including hooks for validation and -transformation, introspection tools, and more. Its age -does not appear to be a problem; the developer remains -active and it's on a regular release cadence with many -active users. +transformation, introspection tools, and more. Since `attrs` solves several of our problems at once, -we aim to build a prototype of the core object model -on it. - -Peripheral concerns (e.g. plotting/exporting) can be -handled by mixins, so we can avoid polluting the core -classes and also avoid the diamond problem. +we aim to prototype the core object model on it. #### Context resolution @@ -199,22 +188,23 @@ context. This will support hierarchical addressing, as is used for the MF6 memory manager. It may also inform certain user-facing operations (for instance, a method may work differently if a component is independent vs -an element in a simulation). +an element in a simulation). This should also help to +provide nice string representations. -It is also simply convenient to ask a component: what -are you attached to? The component should be able to -display a tree showing its own position in context. +It will be convenient to ask a component: what are you +attached to? The component should be able to produce a +tree showing its own position in context. Parent pointers might be implemented as weak references to avoid memory leaks; e.g., if a component is removed from a simulation and the simulation is descarded, then we want the garbage collector to be free to collect it, -and we want a finalizer callback to set the component's -`parent` to None. +with a finalization callback to set parent references +to `None`. + -This should also help provide nice string representations. -#### `Dict` mimicry +#### Dictionary mimicry The dictionary is a ubiquitous data container, useful for e.g. passing keyword arguments, and for potential @@ -245,60 +235,69 @@ extra column `file_name` or similar, which identifies the DFN file and the component it specifies). From this, FloPy must generate a nested object model. -This means distinguishing scalars from composites. It -means plugin developers need to think deeply about how -they map a linguistically picky program's input to the -FloPy data model. And it means we, as developers, need -to think deeply about the FloPy data model. +This means distinguishing scalars from composites, in +MF6's case, and in general, requires mapping an input +specification of arbitrary structure and content to a program-agnostic data model. ### Parameters -A parameter is a program input variable. - -Parameters are the core of the FloPy4 data model. - -Parameters are primitives or composites of such. +A **parameter** is a program input variable. -The data model should be agnostic to any program -supported by FloPy4; plugins should hide details -of the program's data representation and present -the same core object model and parameter types. +A parameter is a leaf in the **context tree**. The +simulation is the root. -As described above in the object model section: a -simulation is the root context. A context contains -parameters. A parameter can be a scalar or another -context. A scalar is a leaf in the context tree. +A parameter is a primitive value or a **composite** +of such. -Scalars are Python primitives. These are: int, -float, boolean, string, path, array, or table. +Primitive parameters are **scalar** (int, float, bool, +string, path), **array-like**, or **tabular**. -Ideally a data model would be dependency-agnostic, +> [!NOTE] +> Ideally a data model would be dependency-agnostic, but we view NumPy and Pandas as de facto standard -library and accept them as array/table primitives. - -If we ever need to provide array/table abstractions +library and accept them as array/table primitives. +If there is ever need to provide arrays/tables of our own, we could take inspiration from [astropy](https://github.com/astropy/astropy). -A record is a context whose parameters are all -scalars; no nested contexts. We will consider -this a `Dict` for practical use though it will -need implementing as an `attrs`-based class so -its parameter spec is discoverable upon import. - -A list can contain a single parameter type or a -union of parameter types. - -On this view, an MF6 keystring is a `typing.Union` -of multiple records. The period block is a list of -unions of records. +Composite parameters are **record** and **union** +(product and sum, respectively) types, as well as +**lists** of primitives or records. A record is a +named and ordered tuple of primitives. + +A record's parameters must all be scalars, except +for its last parameter, which may be a sequence of +scalars (such a record could be called *variadic*; +it is a value constructor with unspecified arity). + +> [!NOTE] +> A record is a `Dict` for practical purposes. It +needs implementing as an `attrs`-based class so +its parameter spec is discoverable upon import, +though. + +A list may constrain its elements to parameters of +a single scalar or record type, or may hold unions +of such. + +> [!NOTE] +> On this view an MF6 keystring is a `typing.Union` +of records and a period block is a list of `Union`s +of records. + +A context is a map of parameters. So is a record; +the operative difference is that composites cannot +contain nested parameters. A context is a non-leaf +node in the tree which can contain both parameters +and other contexts. We envision a nested hierarchy of `attrs`-based classes, all acting like dictionaries, making up -the context tree. Each of these has parameters -and/or other classes as members. +the context tree. These will include composites: +strongly typed records and unions will be more +convenient to work with. -FloPy can thus define a parameter as: +So, FloPy can define a parameter as: ```python from typing import Dict, List @@ -306,7 +305,7 @@ from numpy.typing import ArrayLike from pandas import DataFrame Scalar = bool | int | float | str | Path -Record = Dict[str, Scalar] +Record = Dict[str, Scalar | List[Scalar]] List = List[Scalar | Record] Array = ArrayLike Table = DataFrame @@ -315,7 +314,7 @@ Param = Scalar | Record | List | Array | Table This is proposed as a general foundation onto which it should be possible to map input specifications -for a broad range of programs, not only MODFLOW 6. +for a wide range of programs, not only MODFLOW 6. #### Arrays @@ -347,7 +346,7 @@ We can store parameter specification information in the `DataFrame.attrs` property or by way of [custom accessors](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#registering-custom-accessors). -#### Laziness +#### Lazies We recognize a distinction between two types of parameter: configuration and data. This isn't @@ -489,7 +488,7 @@ sequenceDiagram MF6Parser-->DFN: defines grammar ``` -#### Unified access +#### Decorators A small set of class decorators could provide unified access to IO for object model classes. Alternatively these could be mixins. diff --git a/flopy4/attrs.py b/flopy4/attrs.py new file mode 100644 index 0000000..d6d4aca --- /dev/null +++ b/flopy4/attrs.py @@ -0,0 +1,154 @@ +from pathlib import Path +from typing import ( + Dict, + Iterable, + List, + Optional, + TypeVar, + Union, +) + +from attrs import NOTHING, Attribute, define, field, fields +from numpy.typing import ArrayLike +from pandas import DataFrame + +# Core input data model. This enumerates the +# types FloPy accepts in input data contexts. + +Scalar = Union[bool, int, float, str, Path] +Record = Dict[str, Union[Scalar, List[Scalar]]] +List = List[Union[Scalar, Record]] +Array = ArrayLike +Table = DataFrame +Param = Union[Scalar, Record, List, Array, Table] + + +# Wrap `attrs.field()` for input parameters. + + +def param( + longname: Optional[str] = None, + description: Optional[str] = None, + deprecated: bool = False, + optional: bool = False, + default=NOTHING, + metadata=None, + validator=None, + converter=None, +): + """ + Define a program input parameter. Wraps `attrs.field()` + with a few extra metadata properties. + """ + metadata = metadata or {} + metadata["longname"] = longname + metadata["description"] = description + metadata["deprecated"] = deprecated + metadata["optional"] = optional + return field( + default=default, + validator=validator, + repr=True, + eq=True, + order=False, + hash=False, + init=True, + metadata=metadata, + converter=converter, + ) + + +def params(cls): + """ + Return a dictionary of the class' input parameters. + Each parameter is returned as an `attrs.Attribute`. + + Notes + ----- + Wraps `attrs.fields()`. A parameter can be a value + itself or another nested context of parameters. + """ + return {field.name: field for field in fields(cls)} + + +# Wrap `attrs.define()` for input contexts. + + +T = TypeVar("T") + + +def context( + maybe_cls: Optional[type[T]] = None, + *, + frozen: bool = False, + multi: bool = False, +): + """ + Wrap `attrs.define()` for more opinionated input contexts. + + Notes + ----- + Contexts are parameter containers and can be nested to an + arbitrary depth. + """ + + def add_index(fields): + return [ + Attribute.from_counting_attr(name="index", ca=field(), type=int), + *fields, + ] + + def wrap(cls): + transformer = (lambda _, fields: add_index(fields)) if multi else None + return define( + cls, + field_transformer=transformer, + frozen=frozen, + weakref_slot=True, + ) + + if maybe_cls is None: + return wrap + + return wrap(maybe_cls) + + +def record(maybe_cls: Optional[type[T]] = None, *, frozen: bool = True): + """ + Wrap `attrs.define()` for immutable records (tuples of parameters). + + Notes + ----- + + Records are frozen by default. + + A variadic record ends with a list. A `variadic` flag is attached + to record classes via introspection at import time. + """ + + def add_variadic(cls, fields): + last = fields[-1] + variadic = False + try: + variadic = issubclass(last.type, Iterable) + except: + variadic = ( + hasattr(last.type, "__origin__") + and last.type.__origin__ is list + ) + setattr(cls, "variadic", variadic) + return fields + + def wrap(cls): + return define( + cls, + auto_attribs=True, + field_transformer=add_variadic, + frozen=frozen, + weakref_slot=True, + ) + + if maybe_cls is None: + return wrap + + return wrap(maybe_cls) diff --git a/test/test_attrs.py b/test/test_attrs.py new file mode 100644 index 0000000..4855fbc --- /dev/null +++ b/test/test_attrs.py @@ -0,0 +1,143 @@ +import math +from pathlib import Path +from typing import List, Union + +import pytest + +from flopy4.attrs import Array, context, param, params, record + +# Records are product types: named, ordered tuples of scalars. +# Records are immutable: they can't be changed, only evolved. + + +@record +class Record: + rk: bool = param(description="keyword in record") + ri: int = param(description="int in record") + rd: float = param(description="double in record") + + +@record +class VariadicRecord: + vrk: bool = param(description="keyword in record") + vrl: List[int] = param(description="list in record") + + +@context +class Block: + k: bool = param(description="keyword") + i: int = param(description="int") + d: float = param(description="double") + s: str = param(description="string", optional=False) + f: Path = param(description="filename", optional=False) + a: Array = param(description="array") + r: Record = param( + description="record", + optional=False, + ) + + +# Keystrings are sum types: discriminated unions of records. + + +@record +class All: + all: bool = param( + description="keyword to indicate save for all time steps in period." + ) + + +@record +class First: + first: bool = param( + description="keyword to indicate save for first step in period." + ) + + +@record +class Last: + last: bool = param( + description="keyword to indicate save for last step in period." + ) + + +@record +class Frequency: + frequency: int = param( + description="save at the specified time step frequency." + ) + + +@record +class Steps: + steps: List[int] = param(description="save for each step specified.") + + +OCSetting = Union[All, First, Last, Frequency, Steps] + + +@context(multi=True) +class Period: + ocsetting: OCSetting = param( + description="keystring", + optional=False, + ) + + +def test_spec(): + spec = params(Record) + assert len(spec) == 3 + assert not Record.variadic + + spec = params(VariadicRecord) + assert len(spec) == 2 + assert VariadicRecord.variadic + + spec = params(Block) + print(spec) + + assert len(spec) == 7 + + k = spec["k"] + assert k.type is bool + assert k.metadata["description"] == "keyword" + + i = spec["i"] + assert i.type is int + assert i.metadata["description"] == "int" + + d = spec["d"] + assert d.type is float + assert d.metadata["description"] == "double" + + s = spec["s"] + assert s.type is str + assert s.metadata["description"] == "string" + + f = spec["f"] + assert f.type is Path + assert f.metadata["description"] == "filename" + + a = spec["a"] + assert a.type is Array + assert a.metadata["description"] == "array" + + r = spec["r"] + assert r.type is Record + assert r.metadata["description"] == "record" + + spec = params(Period) + assert len(spec) == 2 + + index = spec["index"] + assert index.type is int + + ocsetting = spec["ocsetting"] + assert ocsetting.type is OCSetting + + +def test_usage(): + r = Record(rk=True, ri=42, rd=math.pi) + assert r.ri == 42 + with pytest.raises(TypeError): + Record(rk=None) diff --git a/test/test_block.py b/test/test_block.py deleted file mode 100644 index cd967b2..0000000 --- a/test/test_block.py +++ /dev/null @@ -1,191 +0,0 @@ -from pathlib import Path - -import numpy as np -import pytest - -from flopy4.array import MFArray -from flopy4.block import MFBlock -from flopy4.compound import MFKeystring, MFRecord -from flopy4.scalar import MFDouble, MFFilename, MFInteger, MFKeyword, MFString - - -class TestBlock(MFBlock): - __test__ = False # tell pytest not to collect - - k = MFKeyword(description="keyword", type="keyword") - i = MFInteger(description="int", type="integer") - d = MFDouble(description="double", type="double") - s = MFString(description="string", optional=False, type="string") - f = MFFilename(description="filename", optional=False, type="filename") - a = MFArray(description="array", shape=(3,), type="array") - r = MFRecord( - params={ - "rk": MFKeyword(), - "ri": MFInteger(), - "rd": MFDouble(), - }, - description="record", - optional=False, - type="record", - ) - - -def test_members(): - params = TestBlock.params - assert len(params) == 7 - - k = params.k - assert isinstance(k, MFKeyword) - assert k.description == "keyword" - assert k.optional - - i = params.i - assert isinstance(i, MFInteger) - assert i.description == "int" - assert i.optional - - d = params.d - assert isinstance(d, MFDouble) - assert d.description == "double" - assert d.optional - - s = params.s - assert isinstance(s, MFString) - assert s.description == "string" - assert not s.optional - - f = params.f - assert isinstance(f, MFFilename) - assert f.description == "filename" - assert not f.optional - - a = params.a - assert isinstance(a, MFArray) - assert a.description == "array" - assert a.optional - - r = params.r - assert isinstance(r, MFRecord) - assert r.description == "record" - assert not r.optional - - -def test_load_write(tmp_path): - name = "options" - fpth = tmp_path / f"{name}.txt" - with open(fpth, "w") as f: - f.write(f"BEGIN {name.upper()}\n") - f.write(" K\n") - f.write(" I 1\n") - f.write(" D 1.0\n") - f.write(" S value\n") - f.write(f" F FILEIN {fpth}\n") - f.write(" R RK RI 2 RD 2.0\n") - # f.write(" RK RI 2 RD 2.0\n") - # f.write(" RK 2 2.0\n") - f.write(" A\n INTERNAL\n 1.0 2.0 3.0\n") - f.write(f"END {name.upper()}\n") - - # test block load - with open(fpth, "r") as f: - block = TestBlock.load(f) - - # check parameter specification - assert isinstance(TestBlock.k, MFKeyword) - assert TestBlock.k.name == "k" - assert TestBlock.k.block == "options" - assert TestBlock.k.description == "keyword" - - assert isinstance(TestBlock.r, MFRecord) - assert TestBlock.r.name == "r" - assert len(TestBlock.r.params) == 3 - assert isinstance(TestBlock.r.params["rk"], MFKeyword) - assert isinstance(TestBlock.r.params["ri"], MFInteger) - assert isinstance(TestBlock.r.params["rd"], MFDouble) - - # check parameter values - assert block.k and block.value["k"] - assert block.i == block.value["i"] == 1 - assert block.d == block.value["d"] == 1.0 - assert block.s == block.value["s"] == "value" - assert block.f == block.value["f"] == fpth - assert np.allclose(block.a, np.array([1.0, 2.0, 3.0])) - assert np.allclose(block.value["a"], np.array([1.0, 2.0, 3.0])) - assert block.r == block.value["r"] == {"rd": 2.0, "ri": 2, "rk": True} - - # test block write - fpth2 = tmp_path / f"{name}2.txt" - with open(fpth2, "w") as f: - block.write(f) - with open(fpth2, "r") as f: - lines = f.readlines() - assert "BEGIN OPTIONS \n" in lines - assert " K\n" in lines - assert " I 1\n" in lines - assert " D 1.0\n" in lines - assert " S value\n" in lines - assert f" F FILEIN {fpth}\n" in lines - assert " A\n" in lines - assert " INTERNAL\n" in lines - assert " 1.0 2.0 3.0\n" in lines - assert " R RK RI 2 RD 2.0\n" in lines - assert "END OPTIONS\n" in lines - - -class IndexedBlock(MFBlock): - ks = MFKeystring( - params={ - "first": MFKeyword(), - "frequency": MFInteger(), - }, - description="keystring", - optional=False, - ) - - -def test_load_write_indexed(tmp_path): - block_name = "indexed" - fpth = tmp_path / f"{block_name}.txt" - with open(fpth, "w") as f: - f.write(f"BEGIN {block_name.upper()} 1\n") - f.write(" FIRST\n") - f.write(f"END {block_name.upper()}\n") - f.write("\n") - f.write(f"BEGIN {block_name.upper()} 2\n") - f.write(" FIRST\n") - f.write(" FREQUENCY 2\n") - f.write(f"END {block_name.upper()}\n") - - with open(fpth, "r") as f: - period1 = IndexedBlock.load(f) - period2 = IndexedBlock.load(f) - - # todo: go to 0-based indexing - assert period1.index == 1 - assert period2.index == 2 - - # class attributes as param specification - assert isinstance(IndexedBlock.ks, MFKeystring) - assert IndexedBlock.ks.name == "ks" - assert IndexedBlock.ks.block == block_name - - # instance attribute as shortcut to param value - assert period1.ks == {"first": True} - assert period2.ks == {"first": True, "frequency": 2} - - -def test_set_value(): - block = TestBlock(name="test") - block.value = { - "k": True, - "i": 42, - "d": 2.0, - "s": "hello world", - } - assert block.k - - -def test_set_value_unrecognized(): - block = TestBlock(name="test") - with pytest.raises(ValueError): - block.value = {"p": Path.cwd()}