Modularize compiler around the idea of a "standard library" using functors #1182

WardBrian · 2022-05-02T15:26:37Z

This is something I played around with on a slow day last week. I think it's an interesting idea, but not one I'm myself sold on. I'm opening it up to get feedback/discussion if we want to move in this direction.

This idea came from a discussion with @mandel and @gbdrt at ProbProg 2021. They had been working on an effort to translate Stan programs to Pyro and NumPyro at https://github.com/deepppl, and I asked them what about the current implementation of stanc made it easier or harder to do this kind of work. One thing that came up was the reliance on the Stan math library signatures.

Current implementation

At the moment, we have a file src/middle/Stan_math_signatures.ml which defines a lookup table of all the C++ functions we'd like to expose at the language level. We've essentially been treating these as built in to Stan, but I'd argue this is actually very backend specific. The fact that all these functions need to be defined for your backend to work is part of what stalled the Tensorflow backend, and it created a considerable effort for the deepppl folks, who defined a name-matching Python library: https://github.com/deepppl/stan-num-pyro/blob/main/stan-pyro/stanpyro/stanlib.py. For functions which don't exist or they don't support, they defined dummy signatures which throw a runtime exception.

If we're serious about stanc being architected to support alternative backends, doing something better here is pretty key. I would argue (and I think so would the name of the file) that, as a starting point, Stan_math_signatures belongs in the src/stan_math_backend folder, not the middle folder.

The changes

This PR is conceptually pretty simple. It defines a basic module interface for what we consider to be a "Stan standard library". This had previously been discussed in the review of #1115. The interface looks a lot like the Stan_math_signatures file, but genericified just a touch.

Here it is in full:

module type Library = sig
  (** This module is used as a parameter for many functors which
    rely on information about a backend-specific Stan library. *)

  val function_signatures : (string, signature list) Hashtbl.t
  (** Mapping from names to signature(s) of functions *)

  val variadic_signatures : (string, variadic_signature) Hashtbl.t
  (** Mapping from names to description of a variadic function.
  Note that these function names cannot be overloaded, and usually require
  customized code-gen in the backend.
*)

  val distribution_families : string list

  val is_stdlib_function_name : string -> bool
  (** Equivalent to [Hashtbl.mem function_signatures s]*)

  val get_signatures : string -> signature list
  (** Equivalent to [Hashtbl.find_multi function_signatures s]*)

  val get_operator_signatures : Operator.t -> signature list
  val get_assignment_operator_signatures : Operator.t -> signature list
  val is_not_overloadable : string -> bool
  val is_variadic_function_name : string -> bool
  val is_special_function_name : string -> bool
  val special_function_returntype : string -> UnsizedType.returntype option

  val check_special_fn :
       is_cond_dist:bool
    -> Location_span.t
    -> Environment.originblock
    -> Environment.t
    -> Ast.identifier
    -> Ast.typed_expression list
    -> Ast.typed_expression
  (** This function is responsible for typechecking varadic function
    calls. It needs to live in the Library since this is usually
    bespoke per-function. *)

  val operator_to_function_names : Operator.t -> string list
  val string_operator_to_function_name : string -> string
  val deprecated_distributions : deprecation_info String.Map.t
  val deprecated_functions : deprecation_info String.Map.t
end

Then, every module which currently depends on Stan_math_signatures was made into a functor over this module. This means that rather than there being a module called Typechecker, there is now a functor called Typechecking, which when you "call" it with an instance of a standard library module (the above type), it will give you back a typechecker module which works like the old one.
I tried to make as few functors as possible, so some modules just got a slight refactor (e.g. SignatureMismatch was being supplied the list of signatures for everything except operators, which it was checking from the library. Now it just gets supplied them for everything, so it doesn't need to depend on the library implemenation).

At the call site, this looks basically the same as the non-functor status quo. E.g., in stanc.ml, we add the line module Typechecker = Typechecking.Make (Stan_math_library) and then proceed as before by calling Typechecker.check_prog etc.

If anyone is wondering how to think about this, you can conceptually imagine Library as a Java interface, and functors like Typechecking as public class Typechecker<T extends Library>. This is not a good analogy for the complete ML module system, but for the specific use here of dependency injection it is more or less correct.

The Good

Generic over backends. A project like deepppl could promote the errors they throw from runtime errors to compile time typechecking errors.
Makes backend-specificity extremely apparent. If you had asked me last week "Does the AST to MIR translation depend on anything about the backend?", I would have answered "I sure hope it doesn't".
But it does! It turns out the lowering of ~ statements is dependent on which exact signatures are defined! This refactor makes things like that extremely obvious, since the things that depend on the library look structurally different than those that don't (e.g. they're functors rather than normal modules).

The Bad

Diff size. This had to touch a lot of files, but mainly what it was doing was adding one level of indent to them to account for the new struct ... end syntax required to define a functor.
Fussy syntax and extra signatures. OCaml functors are syntactically heavy to define, even simple ones like this. They also require extra signatures, often placed in their own ..._intf.ml file (see .mli file usage #358 and discussion)

The Unsure-How-To-Feel

Increased indirecton. This is good and bad in a way. Abstraction is what allows all the good things listed above, but does increase the learning curve for the compiler and makes it a bit harder to navigate, e.g. in the typechecker you can no longer right click in your editor and go to the place that the standard library is defined, since it could be any number of possible implementations of the Library type.
So far I've been imagining each backend defining different sets of signatures but them all being subsets of the biggest/most developed backend (e.g. C++). This would mean there is always some interface which could compile every .stan file. However, there is nothing stopping a backend from defining a function which does not exist in any other backend, making them disjoint. This could be a pro or con, I think it is subjective.

Submission Checklist

Run unit tests
Documentation
- OR, no user-facing changes were made

Release notes

Refactored internals of the compiler to be less tied to the specific functions exposed in the Stan Math C++ library.

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

bob-carpenter · 2022-05-02T15:34:14Z

If you had asked me last week "Does the AST to MIR translation depend on anything about the backend?", I would have answered "I sure hope it doesn't".

It'd certainly be nice if it didn't. I don't have a good sense of the weight of the downsides here, though.

WardBrian · 2022-05-03T14:43:50Z

Functors are pretty common in OCaml (e.g., they're how one would implement any of the common generic data structures like Set, Map, etc) but they can still take some time to get used to.

I've also been trying to use an alternative feature called "virtual libraries" which covers similar use cases while being much simpler (no module applications, etc), but I am not sure if it is possible to do it in this way due to some issues that arise with circular dependencies which functors cleverly avoid.

WardBrian · 2022-05-03T16:57:19Z

Scratch that, I believe I have it working with the much lighter-weight virtual library approach. This does lose the benefit that things that use the library look structurally different, but that is the very thing that makes functors "heavy" syntactically

WardBrian · 2022-05-04T19:22:44Z

I'm going to close this in favor of #1184

WardBrian · 2022-05-12T16:30:20Z

I've re-opened this as an alternative to #1184 after today's language meeting. I want to do another pass on it of cleanup still, which I did do in #1184 after this

codecov · 2022-11-08T22:39:56Z

Codecov Report

Merging #1182 (3823560) into master (4d6664c) will decrease coverage by 0.12%.
The diff coverage is 89.92%.

@@            Coverage Diff             @@
##           master    #1182      +/-   ##
==========================================
- Coverage   88.79%   88.67%   -0.12%     
==========================================
  Files          64       68       +4     
  Lines        9844     9970     +126     
==========================================
+ Hits         8741     8841     +100     
- Misses       1103     1129      +26

Impacted Files	Coverage Δ
...c/analysis_and_optimization/Dependence_analysis.ml	`100.00% <ø> (ø)`
...rc/analysis_and_optimization/Monotone_framework.ml	`91.20% <0.00%> (ø)`
...alysis_and_optimization/Monotone_framework_intf.ml	`100.00% <ø> (ø)`
src/analysis_and_optimization/Optimize.ml	`92.77% <ø> (+0.12%)`	⬆️
src/analysis_and_optimization/Pedantic_analysis.ml	`93.42% <ø> (ø)`
src/frontend/Std_library_utils.ml	`50.00% <50.00%> (ø)`
src/stan_math_backend/Stan_math_library.ml	`97.34% <84.12%> (ø)`
src/analysis_and_optimization/Memory_patterns.ml	`87.19% <86.59%> (-0.57%)`	⬇️
src/frontend/Info.ml	`89.23% <87.87%> (+0.16%)`	⬆️
...rc/analysis_and_optimization/Partial_evaluation.ml	`89.61% <89.61%> (ø)`
... and 13 more

WardBrian added 11 commits April 28, 2022 14:07

Start refactor

7779be0

Start parameterizing modules

1383260

Finish modularizing

a913d4f

Trivially make stan math signatures comply to the new api

5f0ac98

Parametrize optimizer, unit tests passing

1a3757f

Variadic typechecking, details. Tests passing

c7f1b14

Rename to avoid ambiguity with Core.Stdlib

29d0bf7

Update Stancjs to use functors

79acb81

Cleanup, move some signatures to _intf files, capitalize module types

72a10fe

Clean up Stan_math_library, document

0b69551

Comments

d3db3e0

WardBrian added cleanup Code simplification or clean-up robustness labels May 2, 2022

WardBrian mentioned this pull request May 3, 2022

Refactor "standard library" as a virtual module #1184

Closed

2 tasks

WardBrian closed this May 4, 2022

WardBrian reopened this May 12, 2022

Merge branch 'master' into modular-library-experiment

30e6e13

WardBrian added 2 commits May 12, 2022 12:35

Cleanups ported from virtual library attempt

dd4772f

Dune promote

dad6788

WardBrian changed the title ~~[WIP/Discussion] Modularize compiler around the idea of a "standard library"~~ Modularize compiler around the idea of a "standard library" using functors May 12, 2022

WardBrian requested a review from SteveBronder May 12, 2022 18:04

WardBrian added 4 commits May 17, 2022 12:45

Merge branch 'master' into modular-library-experiment

4ad727d

Merge branch 'master' into modular-library-experiment

d2341ca

Merge branch 'master' into modular-library-experiment

c89fa55

Merge branch 'master' into modular-library-experiment

2efa144

WardBrian added 4 commits June 1, 2022 16:08

Merge branch 'master' into modular-library-experiment

9067d01

Merge branch 'master' into modular-library-experiment

839103a

Merge branch 'master' into modular-library-experiment

21e265d

Merge branch 'master' into modular-library-experiment

f52a6f9

WardBrian mentioned this pull request Jun 17, 2022

Update architecture diagrams #1215

Merged

2 tasks

WardBrian added 5 commits July 11, 2022 10:54

Merge branch 'master' into modular-library-experiment

0dc4dfd

Merge branch 'master' into modular-library-experiment

8ea3fd3

Merge branch 'master' into modular-library-experiment

047704b

Empty commit

9d684d9

Merge branch 'master' into modular-library-experiment

87c4328

WardBrian mentioned this pull request Oct 14, 2022

Refactor: encapsulate variadic functions in typechecking #1259

Merged

2 tasks

WardBrian added 6 commits October 14, 2022 11:19

Merge branch 'master'

ef3079d

Merge branch 'master' into refactor/library-functors

e4d6872

Merge branch 'master' into refactor/library-functors

32a66f0

Merge branch 'master' into refactor/library-functors

7cbfd80

Merge branch 'master' into refactor/library-functors

c88ea99

Merge branch 'master' into refactor/library-functors

ee29f62

WardBrian added 9 commits December 12, 2022 16:05

Merge branch 'master' into refactor/library-functors

3d09b6c

Merge branch 'master' into refactor/library-functors

7d26f5e

Merge branch 'master' into refactor/library-functors

bdfd77e

Merge branch 'master' into refactor/library-functors

bf3b302

Merge branch 'master' into refactor/library-functors

a9af831

Merge branch 'master' into refactor/library-functors

b4c9d4a

Merge branch 'master' into refactor/library-functors

6ff32d1

Merge branch 'master' into refactor/library-functors

0ec6964

Merge branch 'master' into refactor/library-functors

3823560

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularize compiler around the idea of a "standard library" using functors #1182

Modularize compiler around the idea of a "standard library" using functors #1182

WardBrian commented May 2, 2022 •

edited

Loading

bob-carpenter commented May 2, 2022

WardBrian commented May 3, 2022

WardBrian commented May 3, 2022

WardBrian commented May 4, 2022

WardBrian commented May 12, 2022

codecov bot commented Nov 8, 2022 •

edited

Loading

Modularize compiler around the idea of a "standard library" using functors #1182

Are you sure you want to change the base?

Modularize compiler around the idea of a "standard library" using functors #1182

Conversation

WardBrian commented May 2, 2022 • edited Loading

Current implementation

The changes

The Good

The Bad

The Unsure-How-To-Feel

Submission Checklist

Release notes

Copyright and Licensing

bob-carpenter commented May 2, 2022

WardBrian commented May 3, 2022

WardBrian commented May 3, 2022

WardBrian commented May 4, 2022

WardBrian commented May 12, 2022

codecov bot commented Nov 8, 2022 • edited Loading

Codecov Report

WardBrian commented May 2, 2022 •

edited

Loading

codecov bot commented Nov 8, 2022 •

edited

Loading