-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'gm/models' into 'master'
Initial functions See merge request invenia/research/Models.jl!2
- Loading branch information
Showing
15 changed files
with
546 additions
and
86 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,67 +1,4 @@ | ||
stages: | ||
- test | ||
- teardown | ||
|
||
--- | ||
include: | ||
- https://gitlab.invenia.ca/infrastructure/gitlab-ci-helper/raw/master/templates/hidden-jobs.yml | ||
- https://gitlab.invenia.ca/infrastructure/gitlab-ci-helper/raw/master/templates/teardown.yml | ||
|
||
"1.0 (Mac)": | ||
tags: | ||
- mac | ||
- shell-ci | ||
extends: .test_shell_1_0 | ||
|
||
"1.0 (Linux, 64-bit)": | ||
tags: | ||
- linux | ||
- 64-bit | ||
- docker-ci | ||
extends: .test_docker_1_0 | ||
|
||
"1.0 (Linux, 32-bit)": | ||
tags: | ||
- linux | ||
- 32-bit | ||
- shell-ci | ||
extends: .test_shell_1_0 | ||
|
||
"1.1 (Mac)": | ||
tags: | ||
- mac | ||
- shell-ci | ||
extends: .test_shell_1_1 | ||
|
||
"1.1 (Linux, 64-bit)": | ||
tags: | ||
- linux | ||
- 64-bit | ||
- docker-ci | ||
extends: .test_docker_1_1 | ||
|
||
"1.1 (Linux, 32-bit)": | ||
tags: | ||
- linux | ||
- 32-bit | ||
- shell-ci | ||
extends: .test_shell_1_1 | ||
|
||
"Nightly (Mac)": | ||
tags: | ||
- mac | ||
- shell-ci | ||
extends: .test_shell_nightly | ||
|
||
"Nightly (Linux, 64-bit)": | ||
tags: | ||
- linux | ||
- 64-bit | ||
- docker-ci | ||
extends: .test_docker_nightly | ||
|
||
"Nightly (Linux, 32-bit)": | ||
tags: | ||
- linux | ||
- 32-bit | ||
- shell-ci | ||
extends: .test_shell_nightly | ||
- project: infrastructure/gitlab-ci-helper | ||
file: /templates/julia.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
[deps] | ||
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" | ||
Models = "e6388cff-ecff-480c-9b53-83211bf7812a" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,24 @@ | ||
using Documenter, Models | ||
using Documenter | ||
using Models | ||
using Models.TestUtils | ||
|
||
makedocs(; | ||
modules=[Models], | ||
format=Documenter.HTML(), | ||
format=Documenter.HTML(; | ||
prettyurls=false, | ||
assets=[ | ||
"assets/invenia.css", | ||
], | ||
), | ||
pages=[ | ||
"Home" => "index.md", | ||
"Index" => "index.md", | ||
"API" => "api.md", | ||
"Design" => "design.md", | ||
"TestUtils" => "testutils.md", | ||
], | ||
repo="https://gitlab.invenia.ca/invenia/Models.jl/blob/{commit}{path}#L{line}", | ||
sitename="Models.jl", | ||
authors="Invenia Technical Computing Corporation", | ||
assets=[ | ||
"assets/invenia.css", | ||
"assets/logo.png", | ||
], | ||
strict=true, | ||
html_prettyurls=false, | ||
checkdocs=:none, | ||
checkdocs=:exports, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# API | ||
|
||
## Abstract Types | ||
```@docs | ||
Template | ||
Model | ||
``` | ||
|
||
## Common API | ||
```@docs | ||
fit | ||
predict | ||
estimate_type | ||
output_type | ||
``` | ||
|
||
## Traits | ||
```@docs | ||
EstimateTrait | ||
PointEstimate | ||
DistributionEstimate | ||
OutputTrait | ||
SingleOutput | ||
MultiOutput | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
## Design Documentation | ||
|
||
This page details the key features of the design of BaselineModels. | ||
|
||
BaselineModels exists to solve the issue highlighted by following quote: | ||
|
||
> ML researchers tend to develop general purpose solutions as self-contained packages. | ||
> A wide variety of these are available as open-source packages ... | ||
> Using generic packages often results in a glue-code system design pattern, in which a massive amount of supporting code is written to get data into and out of general-purpose packages. | ||
> Glue-code is costly in the long term because it tends to freeze a system to the peculiarities of a specific package; testing alternatives may become prohibitively expensive.... | ||
> **An important strategy for combating glue-code is to wrap black-box packages into common API’s.** | ||
> This allows supporting infrastructure to be more reusable and reduces the cost of changing packages. | ||
-- [Sculley et al 2015](https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems) | ||
|
||
BaselineModels provides a common API for mostly preexisting models to allow them to all be used in the same way. | ||
As such, the most important thing is that it itself has a common API. | ||
Here are some facts about that API: | ||
|
||
### Models and Templates | ||
|
||
A **model** is an object that can be used to make predictions via calling `predict`. | ||
A **template** is an object that can create a *model* by being `fit` to some data. | ||
|
||
All information about how to perform `fit`, such as hyper-parameters, is stored inside the *template*. | ||
This is different from some other APIs which might for example pass those as keyword arguments to `fit`. | ||
The template based API is superior to these as it means `fit` is always the same. | ||
One does not have to carry both a model type, and a varying collection of keyword arguments, which would get complicated when composing wrapper models. | ||
|
||
|
||
### `fit` and `predict` | ||
|
||
```julia | ||
model = StatsBase.fit( | ||
template, | ||
outputs::AbstractMatrix, # always Features x Observations | ||
inputs::AbstractMatrix, # always Variates x Observations | ||
weights=uweights(Float32, size(outputs, 2)) | ||
)::Model | ||
``` | ||
|
||
```julia | ||
outputs = StatsBase.predict( | ||
model, | ||
inputs::AbstractMatrix # always Features x Observations | ||
)::AbstractMatrix # always Variates x Observations | ||
``` | ||
|
||
`fit` takes in a *template* and some *data* and returns a `Model` that has been fit to the data. | ||
`predict` takes a `Model` (that has been `fit` from a *template*) and produces a predicted output. | ||
|
||
Important facts about `fit` and `predict`: | ||
- `outputs` and `inputs` always have observations as the second dimension -- even if it is [`SingleOutput`](@ref) (that just means that it will be a `1 x num_obs` output. (See [Docs on Julia being column-major](https://docs.julialang.org/en/v1/manual/performance-tips/#Access-arrays-in-memory-order,-along-columns-1)) | ||
- The functions must accept any `AbstractMatrix` for the `inputs` and `outputs` (`fit` only). If the underlying implementation needs a plain dense `Matrix` then `fit`/`predict` should perform the conversion. | ||
- `fit` always accepts a `weights` argument. If the underlying model does not support weighted fitting, then `fit` should throw and error if the weights that passed in and are not all equal. | ||
- `fit`/`predict` take no keyword arguments, or any other arguments except the ones shown. | ||
|
||
### Traits | ||
|
||
This package largely avoids using complicated abstract types, or relying on a model having a particular abstract type. | ||
Instead we use [traits](https://invenia.github.io/blog/2019/11/06/julialang-features-part-2/) to determine model behavior. | ||
|
||
Here are the current model traits in use and their possible values: | ||
- `estimate_type` - determines what kinds of estimates the model outputs. | ||
- `PointEstimate`: Predicts point-estimates of the most likely values. | ||
- `DistributionEstimate`: Estimates distributions over possible values. | ||
- `output_type` - determines how many output variates a model can learn | ||
- `SingleOutput`: Fits and predicts on a single output only. | ||
- `MultiOutput`: Fits and predicts on multiple outputs at a time. | ||
|
||
The traits always agree between the model and the template. | ||
Every model and template should define all the listed traits. | ||
|
||
This package uses traits implemented such that the trait function returns an `abstract type` (rather than an instance). | ||
That means to check a trait one uses: | ||
```julia | ||
if estimate_type(model) isa DistributionEstimate | ||
``` | ||
and to dispatch on a trait one uses: | ||
``` | ||
foo(::Type{<:DistributionEstimate}, ...) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,17 @@ | ||
# Models.jl | ||
# Models | ||
|
||
```@index | ||
``` | ||
## Why does this package exist? | ||
|
||
[Models.jl](https://gitlab.invenia.ca/invenia/research/Models.jl) defines the [`Template`](@ref) and [`Model`](@ref) types as well as a common API for constructing a generic model in downstream packages, including: | ||
|
||
* Calling [`fit`](@ref) on a [`Template`](@ref). | ||
* Calling [`predict`](@ref) on a [`Model`](@ref). | ||
* Assigning traits such as [`EstimateTrait`](@ref) and [`OutputTrait`](@ref). | ||
* Testing interfaces and downstream dependencies with [`TestUtils`](@ref). | ||
|
||
For common examples of the interface being implemented see [BaselineModels.jl](https://gitlab.invenia.ca/invenia/research/BaselineModels.jl). | ||
|
||
```@autodocs | ||
Modules = [Models] | ||
## Contents | ||
```@contents | ||
Pages = ["api.md", "testutils.md"] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# TestUtils | ||
|
||
Provides test fakes, [`FakeTemplate`](@ref) and [`FakeModel`](@ref), that are useful for | ||
testing downstream dependencies, and [`test_interface`](@ref) for testing the Model's API | ||
has been correctly implemented. | ||
|
||
```@autodocs | ||
Modules = [Models.TestUtils] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,49 @@ | ||
module Models | ||
|
||
greet() = print("Hello World!") | ||
import StatsBase: fit, predict | ||
|
||
export Model, Template | ||
export fit, predict, estimate_type, output_type | ||
export EstimateTrait, PointEstimate, DistributionEstimate | ||
export OutputTrait, SingleOutput, MultiOutput | ||
|
||
""" | ||
Template | ||
A Template is an untrained [`Model`](@ref) that can be [`fit`](@ref) to data. | ||
Defined as well are the traits: | ||
- [`output_type`](@ref): SingleOutput or MultiOutput | ||
- [`estimate_type`](@ref): PointEstimate or DistributionEstimate | ||
""" | ||
abstract type Template end | ||
|
||
""" | ||
Model | ||
A Model is a trained [`Template`](@ref) with which one can [`predict`](@ref) on inputs. | ||
Defined as well are the traits: | ||
- [`output_type`](@ref): SingleOutput or MultiOutput | ||
- [`estimate_type`](@ref): PointEstimate or DistributionEstimate | ||
""" | ||
abstract type Model end | ||
|
||
""" | ||
fit(::Template, output, input) -> Model | ||
Fit the `Template` to the `output` and `input` data and return a trained `Model`. | ||
""" | ||
function fit end | ||
|
||
""" | ||
predict(::Model, input) | ||
Predict targets for the provided `input` and `Model`. | ||
Returns a predictive distribution or point estimates depending on the `Model`. | ||
""" | ||
function predict end | ||
|
||
include("traits.jl") | ||
include("test_utils.jl") | ||
|
||
end # module |
Oops, something went wrong.