Skip to content

Commit

Permalink
Merge pull request #29 from TARGENE/treatment_values
Browse files Browse the repository at this point in the history
For 0.9 release
  • Loading branch information
olivierlabayle authored Aug 21, 2024
2 parents fdfa3a6 + 8243542 commit 0161a34
Show file tree
Hide file tree
Showing 37 changed files with 1,006 additions and 655 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ jobs:
matrix:
version:
- '1'
- '1.10'
os:
- ubuntu-latest
- macOS-latest
Expand Down Expand Up @@ -56,9 +57,9 @@ jobs:
- run: |
julia --project=docs -e '
using Documenter: DocMeta, doctest
using TargetedEstimation
DocMeta.setdocmeta!(TargetedEstimation, :DocTestSetup, :(using TargetedEstimation); recursive=true)
doctest(TargetedEstimation)'
using TMLECLI
DocMeta.setdocmeta!(TMLECLI, :DocTestSetup, :(using TMLECLI); recursive=true)
doctest(TMLECLI)'
- run: julia --project=docs docs/make.jl
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/TagBot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ on:
types:
- created
workflow_dispatch:
inputs:
lookback:
default: "3"
permissions:
actions: read
checks: read
contents: write
deployments: read
issues: read
discussions: read
packages: read
pages: read
pull-requests: read
repository-projects: read
security-events: read
statuses: read
jobs:
TagBot:
if: github.event_name == 'workflow_dispatch' || github.actor == 'JuliaTagBot'
Expand Down
9 changes: 4 additions & 5 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name = "TargetedEstimation"
name = "TMLECLI"
uuid = "2573d147-4098-46ba-9db2-8608d210ccac"
authors = ["Olivier Labayle"]
version = "0.9.0"
Expand Down Expand Up @@ -45,17 +45,16 @@ EvoTrees = "0.16.5"
GLMNet = "0.7"
JLD2 = "0.4.22"
JSON = "0.21.4"
MKL = "0.6"
MKL = "0.6, 0.7"
MLJ = "0.20.0"
MLJBase = "1.0.1"
MLJLinearModels = "0.10.0"
MLJModelInterface = "1.8.0"
MLJModels = "0.16"
MLJModels = "0.16, 0.17"
MLJXGBoostInterface = "0.3.4"
MultipleTesting = "0.6.0"
Optim = "1.7"
PackageCompiler = "2.1.16"
TMLE = "0.16.1"
Tables = "1.10.1"
YAML = "0.4.9"
julia = "1.7, 1"
julia = "1.10, 1"
70 changes: 6 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,8 @@
# TargetedEstimation
# TMLECLI

[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://targene.github.io/TargetedEstimation.jl/stable/)
![GitHub Workflow Status (with branch)](https://img.shields.io/github/actions/workflow/status/TARGENE/TargetedEstimation.jl/CI.yml?branch=main)
![Codecov](https://img.shields.io/codecov/c/github/TARGENE/TargetedEstimation.jl/main)
![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/TARGENE/TargetedEstimation.jl)
[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://targene.github.io/TMLE-CLI.jl/stable/)
![GitHub Workflow Status (with branch)](https://img.shields.io/github/actions/workflow/status/TARGENE/TMLE-CLI.jl/CI.yml?branch=main)
![Codecov](https://img.shields.io/codecov/c/github/TARGENE/TMLE-CLI.jl/main)
![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/TARGENE/TMLE-CLI.jl)

This package provides two command line interfaces used mainly in the context of TarGene:
1. `scripts/tmle.jl`: To run Targeted Maximum Likelihood Estimation
1. `scripts/sieve_variance.jl`: To run sieve variance correction to account for potential non iid data.

## Usage

The best way to use the command lines is to use the associated [docker image](https://hub.docker.com/r/olivierlabayle/targeted-estimation/tags). Command line arguments can be displayed by:

### tmle.jl

To display command line arguments:

```bash
julia --project=/TargetedEstimation.jl --startup-file=no scripts/tmle.jl --help
```

### sieve_variance.jl

This requires an HDF5 file output by `tmle.jl` and the Genetic Relationship Matrix output by the GCTA software.

To display command line arguments:

```bash
julia --project=/TargetedEstimation.jl --startup-file=no scripts/sieve_variance.jl --help
```

## Experiments

The `experiments` contains various experiments related to genetic association studies: GWAS' and PheWAS'.

### GWAS Runtime

The goal of this experiment is to estimate the running time of TMLE in a GWAS setting. Because the propensity score estimation runtime varies for various SNPs, this is done by running TMLE over 100 SNPs. We estimate the runtime for both a continuous and a binary target and for 4 nuisance parameters specifications: GLM, GLMNet, CrossValidatedXGBoost, Super Learning(GLMNet+CrossValidatedXGBoost). Cross validations selections are performed over 3-folds.

- Associated data: Restricted access. On the University of Edinburgh datastore, `/exports/igmm/datastore/ponting-lab/olivier/misc_datasets/gwas_sample_data.csv`

- Associated script: [experiments/gwas_runtime.jl](experiments/gwas_runtime.jl).

- Julia script usage: `julia --project --startup-file=no experiments/gwas_runtime.jl --help`

- Bash script (to submit jobs on the Eddie cluster):
- `qsub experiments/gwas_unit_binary.sh`
- `qsub experiments/gwas_unit_continuous.sh`

### PheWAS Runtime

The goal of this experiment is to estimate the running time of TMLE in a PheWAS setting. Since the propensity score is estimated only once, it is not driving runtime. The PheWAS is perfomed over more than 760 traits and for 4 nuisance parameters specifications: GLM, GLMNet, CrossValidatedXGBoost, Super Learning(GLMNet+CrossValidatedXGBoost). Cross validations selections are performed over 3-folds.

- Associated data: Restricted access. On the University of Edinburgh datastore, `/exports/igmm/datastore/ponting-lab/olivier/misc_datasets/sample_ukb_data.csv`

- Associated script: [experiments/phewas_runtime.jl](experiments/phewas_runtime.jl).

- Julia script usage: `julia --project --startup-file=no experiments/phewas_runtime.jl --help`

- Bash scripts (to submit jobs on the Eddie cluster):
- `qsub experiments/phewas_glm.sh`
- `qsub experiments/phewas_glmnet.sh`
- `qsub experiments/phewas_xgboost.sh`
- `qsub experiments/phewas_sl.sh`
Command Line Interface for Targeted Minimum-Loss Estimation of causal effects on Tabular datasets.
2 changes: 1 addition & 1 deletion deps/build_sysimage.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
using PackageCompiler
PackageCompiler.create_sysimage(
["TargetedEstimation"],
["TMLECLI"],
cpu_target="generic",
sysimage_path="TMLESysimage.so",
precompile_execution_file="deps/execute.jl",
Expand Down
4 changes: 2 additions & 2 deletions deps/execute.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
using TargetedEstimation
using TMLECLI

@info "Running precompilation script."
# Run workload
TEST_DIR = joinpath(pkgdir(TargetedEstimation), "test")
TEST_DIR = joinpath(pkgdir(TMLECLI), "test")
push!(LOAD_PATH, TEST_DIR)
include(joinpath(TEST_DIR, "runtests.jl"))
4 changes: 2 additions & 2 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ RUN bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"

# Import the project

COPY . /TargetedEstimation.jl
COPY . /TMLECLI.jl

WORKDIR /TargetedEstimation.jl
WORKDIR /TMLECLI.jl

# Precompile the project
RUN julia --project -e'using Pkg; Pkg.instantiate(); Pkg.resolve(); Pkg.precompile()'
Expand Down
14 changes: 7 additions & 7 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
using Documenter
using TargetedEstimation
using TMLECLI

DocMeta.setdocmeta!(TargetedEstimation, :DocTestSetup, :(using TargetedEstimation); recursive=true)
DocMeta.setdocmeta!(TMLECLI, :DocTestSetup, :(using TMLECLI); recursive=true)

makedocs(
authors="Olivier Labayle",
repo="https://github.com/TARGENE/TargetedEstimation.jl/blob/{commit}{path}#{line}",
sitename = "TargetedEstimation.jl",
repo="https://github.com/TARGENE/TMLE-CLI.jl/blob/{commit}{path}#{line}",
sitename = "TMLE-CLI.jl",
format = Documenter.HTML(;
prettyurls=get(ENV, "CI", "false") == "true",
canonical="https://TARGENE.github.io/TargetedEstimation.jl",
canonical="https://TARGENE.github.io/TMLE-CLI.jl",
assets=String["assets/logo.ico"],
),
modules = [TargetedEstimation],
modules = [TMLECLI],
pages=[
"Home" => "index.md",
"Command Line Interface" => ["cli.md", "tmle_estimation.md", "sieve_variance.md", "make_summary.md"],
Expand All @@ -25,7 +25,7 @@ makedocs(

@info "Deploying docs..."
deploydocs(;
repo="github.com/TARGENE/TargetedEstimation.jl",
repo="github.com/TARGENE/TMLE-CLI.jl",
devbranch="main",
push_preview=true
)
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# TargetedEstimation.jl
# TMLE-CLI.jl

The goal of this package, is to provide a standalone executable to run large scale Targeted Minimum Loss-based Estimation ([TMLE](https://link.springer.com/book/10.1007/978-1-4419-9782-1)) on tabular datasets. To learn more about TMLE, please visit [TMLE.jl](https://targene.github.io/TMLE.jl/stable/), the companion package.

Expand Down
8 changes: 4 additions & 4 deletions docs/src/models.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Models

```@meta
CurrentModule = TargetedEstimation
CurrentModule = TMLECLI
```

Because [TMLE.jl](https://targene.github.io/TMLE.jl/stable/) is based on top of [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/), we can support any model respecting the MLJ interface. At the moment, we readily support all models from the following packages:
Expand All @@ -12,13 +12,13 @@ Because [TMLE.jl](https://targene.github.io/TMLE.jl/stable/) is based on top of
- [GLMNet](https://github.com/JuliaStats/GLMNet.jl): A Julia wrapper of the [glmnet](https://glmnet.stanford.edu/articles/glmnet.html) package. See the [GLMNet](@ref) section.
- [MLJModels](https://github.com/JuliaAI/MLJModels.jl): General utilities such as the `OneHotEncoder` or `InteractionTransformer`.

Further support for more packages can be added on request, please fill an [issue](https://github.com/TARGENE/TargetedEstimation.jl/issues).
Further support for more packages can be added on request, please fill an [issue](https://github.com/TARGENE/TMLE-CLI.jl/issues).

Also, because the estimator file used by the TMLE CLI is a pure Julia file, it is possible to use it in order to install additional package that can be used to define additional models.

Finally, we also provide some additional models described in [Additional models provided by TargetedEstimation.jl](@ref).
Finally, we also provide some additional models described in [Additional models provided by TMLE-CLI.jl](@ref).

## Additional models provided by TargetedEstimation.jl
## Additional models provided by TMLE-CLI.jl

### GLMNet

Expand Down
2 changes: 1 addition & 1 deletion docs/src/resampling.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Resampling Strategies

```@meta
CurrentModule = TargetedEstimation
CurrentModule = TMLECLI
```

We also provide additional resampling strategies compliant with the `MLJ.ResamplingStrategy` interface.
Expand Down
125 changes: 122 additions & 3 deletions docs/src/tmle_estimation.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,126 @@ tmle tmle --help
tmle
```

## Note on TMLE Outputs
## Specifying Estimands

The easiest way to create an estimands' file is to use the companion Julia [TMLE.jl](https://targene.github.io/TMLE.jl/stable/) package and create a `Configuration` structure. This structure can be serialized to a file using any of `serialize` (Julia serialization format), `write_json` (JSON) or `write_yaml` (YAML).

Alternatively you can write this file manually. The following example illustrates the creation of three estimands in YAML format: an Average Treatment Effect (ATE), an Average Interaction Effect (AIE) and a Counterfactual Mean (CM).

```yaml
type: "Configuration"
estimands:
- outcome_extra_covariates:
- C1
type: "AIE"
treatment_values:
T1:
control: 0
case: 1
T2:
control: 0
case: 1
outcome: Y1
treatment_confounders:
T2:
- W21
- W22
T1:
- W11
- W12
- outcome_extra_covariates: []
type: "ATE"
treatment_values:
T1:
control: 0
case: 1
T3:
control: "CC"
case: "AC"
outcome: Y3
treatment_confounders:
T1:
- W
T3:
- W
- outcome_extra_covariates: []
type: "CM"
treatment_values:
T1: "CC"
T3: "AC"
outcome: Y3
treatment_confounders:
T1:
- W
T3:
- W
```
## Specifying Estimators
There are two ways the estimators can be specified, either via a plain Julia file or via a configuration string.
### Estimators From A String
An estimator can be described from 3 main axes, depending on:
1. Whether they use cross-validation (sample-splitting) or not.
2. The semi-parametric estimator type: TMLE, wTMLE, OSE.
3. The models used to learn the nuisance functions.
The estimator type and cross-validation scheme are described at once by any of the following
| Estimator's Short Name | Estimator's Description |
| :--------: | :-------: |
| tmle | Canonical Targeted Minimum-Loss Estimator |
| wtmle | Canonical Targeted Minimum-Loss Estimator with weighted Fluctuation |
| ose | Canonical One-Step Estimator |
| cvtmle | Cross-Validated Targeted Minimum-Loss Estimator |
| cvwtmle | Cross-Validated Targeted Minimum-Loss Estimator with weighted Fluctuation |
| cvose | Cross-Validated One-Step Estimator |
And the available models are
| Model's Short Name | Model's Description |
| :--------: | :-------: |
| glm | A Generalised Linear Model |
| glmnet | A Cross-Validated Generalised Linear Model |
| xgboost | The default XGBoost model using the `hist` strategy. |
| tunedxgboost | A cross-validated grid of XGBoost models across (max_depth, eta) hyperparameters. |
| sl | A Super Learning strategy using a glmnet, a glm and a grid of xgboost models as in tunedxgboost. |

Then, a configuration string describes the estimators and models in the following way: ESTIMATORS--Q_MODEL--G_MODEL.

- The `ESTIMATORS` substring comprises one or more estimators separated by a single dash, e.g. `cvtmle-ose`. If multiple estimators are specified they will be used sequentially and an estimation result will provide key-value pairs of ESTIMATOR => ESTIMATE.
- The optional `G_MODEL` substring corresponds to the model used to learn the propensity score models. If it is not provided, it will default to the model provided for `Q_MODEL`.
- The optional `Q_MODEL` substring corresponds to the model used to learn the outcome models, it defaults to `glmnet`.

It is probably easier to understand with some examples.

#### Examples

- `tmle--sl--glm`: A single estimator (TMLE) using a Super Learner for the outcome models and a GLM for the propensity score models.
- `cvtmle-ose--xgboost`: Two estimators (CV-TMLE and OSE) using XGBoost for the outcome models and the default strategy for the propensity score models.
- `cvwtmle-cvose`: Two estimators (CV-wTMLE and CV-OSE) using default strategies for both outcome models and propensity score models.

#### Note on Cross-Validation

Some of the aforementioned estimators and models use cross-validation under the hood. In this case this using a stratified 3-folds cross-validation where the stratification occurs across both the outcome and treatment variables.

#### Note on GLM and GLMNet

Linear models typically do not involve any interaction terms. Here, to add extra flexibility, both GLM and GLMNet comprise pairwise interaction terms between treatment variables and all other covariates.

### Estimators Via A Julia File

Building an estimator via a configuration string is quite flexible and should cover most use cases. However, in some cases you may want to have full control over the estimation procedure. This is possible by instead providing a Julia configuration file describing the estimators to be used. The file should define an `ESTIMATORS` NamedTuple describing the estimators to be used, and some examples can be found [here](https://github.com/TARGENE/TMLE-CLI.jl/tree/treatment_values/estimators-configs).

For further information, we recommend you have a look at both:

- [TMLE.jl](https://targene.github.io/TMLE.jl/stable/): The Julia package on which this command line interface is built.
- [MLJ](https://juliaai.github.io/MLJ.jl/dev/): The Julia package used for machine-learning throughout.

## Note on Outputs

We can output results in three different formats: HDF5, JSON and JLS. By default no output is written, so you need to specify at least one. An output can be generated by specifying an output filename for it. For instance `--outputs.json.filename=output.json` will output a JSON file. Note that you can generate multiple formats at once, e.g. `--outputs.json.filename=output.json --outputs.hdf5.filename=output.hdf5` will output both JSON and HDF5 result files. Another important output option is the `pval_threshold`. Each estimation result is accompanied by an influence curve vector and by default these vectors are erased before saving the results because they typically take up too much space and are not usually needed. In some occasions you might want to keep them and this can be achieved by specifiying the output's `pval_threhsold`. For instance `--outputs.hdf5.pval_threshold=1.` will keep all such vectors because all p-values lie in between 0 and 1.

Expand All @@ -36,8 +155,8 @@ In what follows, `Y` is an outcome of interest, `W` a set of confounding variabl

For all the following experiments:

- The Julia script can be found at [experiments/runtime.jl](https://github.com/TARGENE/TargetedEstimation.jl/tree/main/experiments/runtime.jl).
- The various estimators used below are further described in the[estimators-configs](https://github.com/TARGENE/TargetedEstimation.jl/tree/main/estimators-configs) folder.
- The Julia script can be found at [experiments/runtime.jl](https://github.com/TARGENE/TMLE-CLI.jl/tree/main/experiments/runtime.jl).
- The various estimators used below are further described in the[estimators-configs](https://github.com/TARGENE/TMLE.jl/tree/main/estimators-configs) folder.

### Multiple treatment contrasts

Expand Down
Loading

2 comments on commit 0161a34

@olivierlabayle
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/113582

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.9.0 -m "<description of version>" 0161a347aab06ec493fb71c11be6c3016fb0d0f6
git push origin v0.9.0

Also, note the warning: This looks like a new registration that registers version 0.9.0.
Ideally, you should register an initial release with 0.0.1, 0.1.0 or 1.0.0 version numbers
This can be safely ignored. However, if you want to fix this you can do so. Call register() again after making the fix. This will update the Pull request.

Please sign in to comment.