Inference data loader #81

EthanMarx · 2023-11-27T22:41:12Z

Adds simple inference dataloader that yields stride_size batches sequentially from a single file.

Was seeing ~12,000 seconds / second throughput from the following simple profiling code:

import h5py
from tempfile import TemporaryDirectory
from pathlib import Path
import numpy as np
from ml4gw.dataloading import InferenceDataset
import time

channels = ["H1", "L1"]
fname = "a.h5"
length = 20000
sample_rate = 2048

update_length = 16
stride_size = int(update_length * sample_rate)

with TemporaryDirectory(dir=Path.cwd()) as tmpdir:
    with h5py.File(fname, 'w') as f:
        for channel in channels:
            f.create_dataset(channel, data=np.arange(sample_rate * length), chunks=None)

    dataset = InferenceDataset(
        fname = fname,
        channels=channels,
        stride_size=stride_size,
    )


    start = time.time()
    for x in dataset:
        continue

    stop = time.time()
    duration = stop - start
    print("Throughput (s/s) ", length / duration)

Throughput (s/s)  11789.3695364983

* consolidate phenomp and phenomd apis * use chirp mass for initializing tensor size * fix ordering of parameters in tests

* add precessing spin conversion * add conversion file * restructure waveforms module * update tests * add back generator * add more robust fref check

* bug fix for xy spins * run pre-commit * pass fRD and fDM through to phenom_d phase and amp functions * switch to relative imports * use chirp mass to mass ratio conversion function * create tests for phenom_p * lower tolerance for phenom_p * comments and additional checks for fRD and fDM

* fix issue with device in phenom pv2

* Increment version to `v0.6.3` * Deprecate python 3.8 * Remove py38 from tox * Remove python 3.8 from tox tests * resync lock file * fix full distance prior in tests * bump close distance max to 400 * fix comment

* Added citation and license * Updated readme * Fixed pre-commit * Add test coverage workflow * Updated gwpy and added coverage * Added coverage to workflow * Add coverage to tox installs * Correct command * Put relative_files config in correct table * Retry coverage table * Include toml extra for coverage * Include extra in tox table * Do extra correctly(?) * Add permissions to workflow * Move coverage.yaml to correct location * Make test coverage global * Fix typo * Testing * Pass coverage file env var * Fix typo * Remove debugging lines * Add .coveragerc file * Added source specification * Change how source is specified * Test with directory * Remove debugging * Add coverage badge

Add lowpass option to whitening

Add demo notebook

* add option for specifying fnames per batch * add tests for fname limit * add check for fnames > files_per_batch * add remove print statement hook * fix tests by increasing files sampled

…4GW#189) * rough high/low pass filter implementation * rename import * internalize functions * move filter to transforms and add docstrings * move to filters.py * create torch module * add tests * switch to scipy for filter coeff generation * use union type for p3.9 compatibility * add support for other filters * hardcode output since torchaudio takes (b, a) as input * export constants to be accessed as ml4gw.constants * add phenom tests * fix unassigned variable bug * fix tolerance on phenom test for filters * link scipy function * tests for other filters (cheby1, cheby2, ellip, bessel) * increase tolerance for phenom tests * update Return documentation

* improve generator to use better waveform conditioning * finalize waveform generator * add conversion function to parameter sampler * proper indexing when slicing * add proper device handling * add kwargs to phenomD and taylorf2 to soak up unused parameters * update testing * match conditioning with gwsignal * add fft into generator * add tests for waveform generator * fix pre-commit * remove coefficients tests since there is not lalsuite equivalent * update number of samples in conftest * adjust tests for off by one error * add iirfilter for highpassing * remove notebook * use relative imoprts * fix type hint * bump test tolerances * clean up tests and account for discrepancy in argma * reduce number of samples to 1000 * increas tolerance * remove unused high_pass_time_series

Set random seed for unit tests

* fix flaky powerlaw test by removing initial guess * add pytest-repeat to dev deps * bump probability with which we want errors to not fall within tolerance * allow just 1 sample to mismatch * add TODO marker * add num bad option to comapre with numpy * make filters a fixture * add low_cutoff fixture * cleanup iirfilter tests into fixtures

…correct device (ML4GW#199) * fix device issue in waveform generator * update version to 0.7.1

Add lowpass option for SNR calculation

Increment version

Moved scipy out of dev dependencies

Migrate to `uv` and `ruff`

EthanMarx changed the base branch from main to dev November 27, 2023 23:46

William Benoit and others added 29 commits January 9, 2025 09:48

Integrate spline interp into qtransform

61ae3f3

Better case handling

715d536

Did time interpolation more efficiently

3ea3886

Make interpolation method an argument

6a43b58

Correct location of qtile stacking

910617c

Added error tests

e7534ec

Updates to IMRPhenomP api (ML4GW#167)

6b075bd

* consolidate phenomp and phenomd apis * use chirp mass for initializing tensor size * fix ordering of parameters in tests

add precessing spin conversion (ML4GW#168)

15acf2f

* add precessing spin conversion * add conversion file * restructure waveforms module * update tests * add back generator * add more robust fref check

update minor version to 0.6.2

7e64865

Restrict scipy in env and for tox

368d0dd

fix batch inputs bug

13f61f6

Added demo notebook

ce0438b

Create tensor on proper device in IMRPhenomPv2 (ML4GW#183)

a3ba42d

* fix issue with device in phenom pv2

add read the docs file

b96f95b

Increment version to v0.6.3 (ML4GW#184)

cd372f1

* Increment version to `v0.6.3` * Deprecate python 3.8 * Remove py38 from tox * Remove python 3.8 from tox tests * resync lock file * fix full distance prior in tests * bump close distance max to 400 * fix comment

Add option to lowpass filter during whitening

01a46c5

Adapted tests for lowpassing

d6fc836

Fix typo

55cf55a

Merge pull request ML4GW#185 from wbenoit26/lowpass-whitening

ded9196

Add lowpass option to whitening

use relative imports everywhere (ML4GW#190)

00e0f22

Renamed directory

da52ee0

Merge pull request ML4GW#180 from wbenoit26/add-tutorial-notebook

7f49127

Add demo notebook

Increased waveform tolerance (ML4GW#192)

03beee4

Add fnames_per_batch argument to HDF5Dataset (ML4GW#191)

b0548be

* add option for specifying fnames per batch * add tests for fname limit * add check for fnames > files_per_batch * add remove print statement hook * fix tests by increasing files sampled

Set random seed for unit tests

d2ee746

wbenoit26 and others added 24 commits February 6, 2025 07:56

Add seeding to fixture args

3629932

Use num_samples fixture everywhere

556cfe3

Add num_samples into generator test

e7743f5

Merge pull request ML4GW#196 from wbenoit26/seed-tests

097361d

Set random seed for unit tests

Update pyproject.toml to version 0.7.0 (ML4GW#198)

b9895af

Fixes issue where tensors in waveform generator not getting built on …

a55d194

…correct device (ML4GW#199) * fix device issue in waveform generator * update version to 0.7.1

Added lowpass option to SNR operations

85f60c9

Add additional tests for gw.py

4eb87a8

Add tests for raising value error

9f77be1

Fix typo

949a516

Merge pull request ML4GW#202 from wbenoit26/add-lowpass-option

6e14e7c

Add lowpass option for SNR calculation

Increment version

c01a5d9

Merge pull request ML4GW#203 from wbenoit26/v0.7.2

78f572a

Increment version

Moved scipy out of dev dependencies

6c25a35

Merge pull request ML4GW#204 from wbenoit26/move-scipy-dep

cd5631f

Moved scipy out of dev dependencies

Switched pyproject format

ee6312f

Got pre-commit passing

b657335

Update documentation

9f7c9e5

Get tests running

446e708

Merge pull request ML4GW#205 from wbenoit26/uv-ruff

cf70e30

Migrate to `uv` and `ruff`

initial commit of inference dataloader

2ff3f99

update documentation

f3882a9

add simple inference dataloader

05d4736

EthanMarx force-pushed the inference-data-loader branch from f441cad to 05d4736 Compare February 27, 2025 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference data loader #81

Inference data loader #81

EthanMarx commented Nov 27, 2023 •

edited

Loading

Inference data loader #81

Are you sure you want to change the base?

Inference data loader #81

Conversation

EthanMarx commented Nov 27, 2023 • edited Loading

EthanMarx commented Nov 27, 2023 •

edited

Loading