Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference data loader #81

Open
wants to merge 53 commits into
base: dev
Choose a base branch
from
Open

Conversation

EthanMarx
Copy link
Collaborator

@EthanMarx EthanMarx commented Nov 27, 2023

Adds simple inference dataloader that yields stride_size batches sequentially from a single file.

Was seeing ~12,000 seconds / second throughput from the following simple profiling code:

import h5py
from tempfile import TemporaryDirectory
from pathlib import Path
import numpy as np
from ml4gw.dataloading import InferenceDataset
import time

channels = ["H1", "L1"]
fname = "a.h5"
length = 20000
sample_rate = 2048

update_length = 16
stride_size = int(update_length * sample_rate)

with TemporaryDirectory(dir=Path.cwd()) as tmpdir:
    with h5py.File(fname, 'w') as f:
        for channel in channels:
            f.create_dataset(channel, data=np.arange(sample_rate * length), chunks=None)

    dataset = InferenceDataset(
        fname = fname,
        channels=channels,
        stride_size=stride_size,
    )


    start = time.time()
    for x in dataset:
        continue

    stop = time.time()
    duration = stop - start
    print("Throughput (s/s) ", length / duration)
Throughput (s/s)  11789.3695364983

@EthanMarx EthanMarx changed the base branch from main to dev November 27, 2023 23:46
William Benoit and others added 29 commits January 9, 2025 09:48
* consolidate phenomp and phenomd apis

* use chirp mass for initializing tensor size

* fix ordering of parameters in tests
* add precessing spin conversion

* add conversion file

* restructure waveforms module

* update tests

* add back generator

* add more robust fref check
* bug fix for xy spins

* run pre-commit

* pass fRD and fDM through to phenom_d phase and amp functions

* switch to relative imports

* use chirp mass to mass ratio conversion function

* create tests for phenom_p

* lower tolerance for phenom_p

* comments and additional checks for fRD and fDM
* fix issue with device in phenom pv2
* Increment version to `v0.6.3`

* Deprecate python 3.8

* Remove py38 from tox

* Remove python 3.8 from tox tests

* resync lock file

* fix full distance prior in tests

* bump close distance max to 400

* fix comment
* Added citation and license

* Updated readme

* Fixed pre-commit

* Add test coverage workflow

* Updated gwpy and added coverage

* Added coverage to workflow

* Add coverage to tox installs

* Correct command

* Put relative_files config in correct table

* Retry coverage table

* Include toml extra for coverage

* Include extra in tox table

* Do extra correctly(?)

* Add permissions to workflow

* Move coverage.yaml to correct location

* Make test coverage global

* Fix typo

* Testing

* Pass coverage file env var

* Fix typo

* Remove debugging lines

* Add .coveragerc file

* Added source specification

* Change how source is specified

* Test with directory

* Remove debugging

* Add coverage badge
* add option for specifying fnames per batch

* add tests for fname limit

* add check for fnames > files_per_batch

* add remove print statement hook

* fix tests by increasing files sampled
…4GW#189)

* rough high/low pass filter implementation

* rename import

* internalize functions

* move filter to transforms and add docstrings

* move to filters.py

* create torch module

* add tests

* switch to scipy for filter coeff generation

* use union type for p3.9 compatibility

* add support for other filters

* hardcode output since torchaudio takes (b, a) as input

* export constants to be accessed as ml4gw.constants

* add phenom tests

* fix unassigned variable bug

* fix tolerance on phenom test for filters

* link scipy function

* tests for other filters (cheby1, cheby2, ellip, bessel)

* increase tolerance for phenom tests

* update Return documentation
* improve generator to use better waveform conditioning

* finalize waveform generator

* add conversion function to parameter sampler

* proper indexing when slicing

* add proper device handling

* add kwargs to phenomD and taylorf2 to soak up unused parameters

* update testing

* match conditioning with gwsignal

* add fft into generator

* add tests for waveform generator

* fix pre-commit

* remove coefficients tests since there is not lalsuite equivalent

* update number of samples in conftest

* adjust tests for off by one error

* add iirfilter for highpassing

* remove notebook

* use relative imoprts

* fix type hint

* bump test tolerances

* clean up tests and account for discrepancy in argma

* reduce number of samples to 1000

* increas tolerance

* remove unused high_pass_time_series
wbenoit26 and others added 24 commits February 6, 2025 07:56
* fix flaky powerlaw test by removing initial guess

* add pytest-repeat to dev deps

* bump probability with which we want errors to not fall within tolerance

* allow just 1 sample to mismatch

* add TODO marker

* add num bad option to comapre with numpy

* make filters a fixture

* add low_cutoff fixture

* cleanup iirfilter tests into fixtures
…correct device (ML4GW#199)

* fix device  issue in waveform generator

* update version to 0.7.1
Add lowpass option for SNR calculation
Moved scipy out of dev dependencies
@EthanMarx EthanMarx force-pushed the inference-data-loader branch from f441cad to 05d4736 Compare February 27, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants