Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularize compiler around the idea of a "standard library" using functors #1182

Draft
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

WardBrian
Copy link
Member

@WardBrian WardBrian commented May 2, 2022

This is something I played around with on a slow day last week. I think it's an interesting idea, but not one I'm myself sold on. I'm opening it up to get feedback/discussion if we want to move in this direction.


This idea came from a discussion with @mandel and @gbdrt at ProbProg 2021. They had been working on an effort to translate Stan programs to Pyro and NumPyro at https://github.com/deepppl, and I asked them what about the current implementation of stanc made it easier or harder to do this kind of work. One thing that came up was the reliance on the Stan math library signatures.

Current implementation

At the moment, we have a file src/middle/Stan_math_signatures.ml which defines a lookup table of all the C++ functions we'd like to expose at the language level. We've essentially been treating these as built in to Stan, but I'd argue this is actually very backend specific. The fact that all these functions need to be defined for your backend to work is part of what stalled the Tensorflow backend, and it created a considerable effort for the deepppl folks, who defined a name-matching Python library: https://github.com/deepppl/stan-num-pyro/blob/main/stan-pyro/stanpyro/stanlib.py. For functions which don't exist or they don't support, they defined dummy signatures which throw a runtime exception.

If we're serious about stanc being architected to support alternative backends, doing something better here is pretty key. I would argue (and I think so would the name of the file) that, as a starting point, Stan_math_signatures belongs in the src/stan_math_backend folder, not the middle folder.

The changes

This PR is conceptually pretty simple. It defines a basic module interface for what we consider to be a "Stan standard library". This had previously been discussed in the review of #1115. The interface looks a lot like the Stan_math_signatures file, but genericified just a touch.

Here it is in full:

module type Library = sig
  (** This module is used as a parameter for many functors which
    rely on information about a backend-specific Stan library. *)

  val function_signatures : (string, signature list) Hashtbl.t
  (** Mapping from names to signature(s) of functions *)

  val variadic_signatures : (string, variadic_signature) Hashtbl.t
  (** Mapping from names to description of a variadic function.
  Note that these function names cannot be overloaded, and usually require
  customized code-gen in the backend.
*)

  val distribution_families : string list

  val is_stdlib_function_name : string -> bool
  (** Equivalent to [Hashtbl.mem function_signatures s]*)

  val get_signatures : string -> signature list
  (** Equivalent to [Hashtbl.find_multi function_signatures s]*)

  val get_operator_signatures : Operator.t -> signature list
  val get_assignment_operator_signatures : Operator.t -> signature list
  val is_not_overloadable : string -> bool
  val is_variadic_function_name : string -> bool
  val is_special_function_name : string -> bool
  val special_function_returntype : string -> UnsizedType.returntype option

  val check_special_fn :
       is_cond_dist:bool
    -> Location_span.t
    -> Environment.originblock
    -> Environment.t
    -> Ast.identifier
    -> Ast.typed_expression list
    -> Ast.typed_expression
  (** This function is responsible for typechecking varadic function
    calls. It needs to live in the Library since this is usually
    bespoke per-function. *)

  val operator_to_function_names : Operator.t -> string list
  val string_operator_to_function_name : string -> string
  val deprecated_distributions : deprecation_info String.Map.t
  val deprecated_functions : deprecation_info String.Map.t
end

Then, every module which currently depends on Stan_math_signatures was made into a functor over this module. This means that rather than there being a module called Typechecker, there is now a functor called Typechecking, which when you "call" it with an instance of a standard library module (the above type), it will give you back a typechecker module which works like the old one.
I tried to make as few functors as possible, so some modules just got a slight refactor (e.g. SignatureMismatch was being supplied the list of signatures for everything except operators, which it was checking from the library. Now it just gets supplied them for everything, so it doesn't need to depend on the library implemenation).

At the call site, this looks basically the same as the non-functor status quo. E.g., in stanc.ml, we add the line module Typechecker = Typechecking.Make (Stan_math_library) and then proceed as before by calling Typechecker.check_prog etc.

If anyone is wondering how to think about this, you can conceptually imagine Library as a Java interface, and functors like Typechecking as public class Typechecker<T extends Library>. This is not a good analogy for the complete ML module system, but for the specific use here of dependency injection it is more or less correct.

The Good

  • Generic over backends. A project like deepppl could promote the errors they throw from runtime errors to compile time typechecking errors.
  • Makes backend-specificity extremely apparent. If you had asked me last week "Does the AST to MIR translation depend on anything about the backend?", I would have answered "I sure hope it doesn't".
    But it does! It turns out the lowering of ~ statements is dependent on which exact signatures are defined! This refactor makes things like that extremely obvious, since the things that depend on the library look structurally different than those that don't (e.g. they're functors rather than normal modules).

The Bad

  • Diff size. This had to touch a lot of files, but mainly what it was doing was adding one level of indent to them to account for the new struct ... end syntax required to define a functor.
  • Fussy syntax and extra signatures. OCaml functors are syntactically heavy to define, even simple ones like this. They also require extra signatures, often placed in their own ..._intf.ml file (see .mli file usage #358 and discussion)

The Unsure-How-To-Feel

  • Increased indirecton. This is good and bad in a way. Abstraction is what allows all the good things listed above, but does increase the learning curve for the compiler and makes it a bit harder to navigate, e.g. in the typechecker you can no longer right click in your editor and go to the place that the standard library is defined, since it could be any number of possible implementations of the Library type.
  • So far I've been imagining each backend defining different sets of signatures but them all being subsets of the biggest/most developed backend (e.g. C++). This would mean there is always some interface which could compile every .stan file. However, there is nothing stopping a backend from defining a function which does not exist in any other backend, making them disjoint. This could be a pro or con, I think it is subjective.

Submission Checklist

  • Run unit tests
  • Documentation
    • OR, no user-facing changes were made

Release notes

Refactored internals of the compiler to be less tied to the specific functions exposed in the Stan Math C++ library.

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

@WardBrian WardBrian added cleanup Code simplification or clean-up robustness labels May 2, 2022
@bob-carpenter
Copy link
Contributor

If you had asked me last week "Does the AST to MIR translation depend on anything about the backend?", I would have answered "I sure hope it doesn't".

It'd certainly be nice if it didn't. I don't have a good sense of the weight of the downsides here, though.

@WardBrian
Copy link
Member Author

Functors are pretty common in OCaml (e.g., they're how one would implement any of the common generic data structures like Set, Map, etc) but they can still take some time to get used to.

I've also been trying to use an alternative feature called "virtual libraries" which covers similar use cases while being much simpler (no module applications, etc), but I am not sure if it is possible to do it in this way due to some issues that arise with circular dependencies which functors cleverly avoid.

@WardBrian
Copy link
Member Author

Scratch that, I believe I have it working with the much lighter-weight virtual library approach. This does lose the benefit that things that use the library look structurally different, but that is the very thing that makes functors "heavy" syntactically

@WardBrian
Copy link
Member Author

I'm going to close this in favor of #1184

@WardBrian WardBrian closed this May 4, 2022
@WardBrian WardBrian reopened this May 12, 2022
@WardBrian
Copy link
Member Author

I've re-opened this as an alternative to #1184 after today's language meeting. I want to do another pass on it of cleanup still, which I did do in #1184 after this

@WardBrian WardBrian changed the title [WIP/Discussion] Modularize compiler around the idea of a "standard library" Modularize compiler around the idea of a "standard library" using functors May 12, 2022
@WardBrian WardBrian mentioned this pull request Jun 17, 2022
2 tasks
@codecov
Copy link

codecov bot commented Nov 8, 2022

Codecov Report

Merging #1182 (3823560) into master (4d6664c) will decrease coverage by 0.12%.
The diff coverage is 89.92%.

@@            Coverage Diff             @@
##           master    #1182      +/-   ##
==========================================
- Coverage   88.79%   88.67%   -0.12%     
==========================================
  Files          64       68       +4     
  Lines        9844     9970     +126     
==========================================
+ Hits         8741     8841     +100     
- Misses       1103     1129      +26     
Impacted Files Coverage Δ
...c/analysis_and_optimization/Dependence_analysis.ml 100.00% <ø> (ø)
...rc/analysis_and_optimization/Monotone_framework.ml 91.20% <0.00%> (ø)
...alysis_and_optimization/Monotone_framework_intf.ml 100.00% <ø> (ø)
src/analysis_and_optimization/Optimize.ml 92.77% <ø> (+0.12%) ⬆️
src/analysis_and_optimization/Pedantic_analysis.ml 93.42% <ø> (ø)
src/frontend/Std_library_utils.ml 50.00% <50.00%> (ø)
src/stan_math_backend/Stan_math_library.ml 97.34% <84.12%> (ø)
src/analysis_and_optimization/Memory_patterns.ml 87.19% <86.59%> (-0.57%) ⬇️
src/frontend/Info.ml 89.23% <87.87%> (+0.16%) ⬆️
...rc/analysis_and_optimization/Partial_evaluation.ml 89.61% <89.61%> (ø)
... and 13 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Code simplification or clean-up robustness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants