Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release #76

Merged
merged 160 commits into from
Jun 8, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
160 commits
Select commit Hold shift + click to select a range
9a11358
WIP
Mar 22, 2024
9745b4c
WIP
Mar 22, 2024
8835b8f
Moved function to utils.io
Mar 22, 2024
9a2e1d8
Additional dependencies
Mar 22, 2024
6dd9be6
Removed Mixin
Mar 22, 2024
78c9f15
WIP Preprocess + property
Mar 22, 2024
55a6692
Refactored Descriptor class
Mar 22, 2024
5b6ec7e
Removed not used funcs
Mar 22, 2024
cbb494c
Added support to get torch or jax tensors
Mar 24, 2024
e22651f
Added support to get torch or jax tensors
Mar 24, 2024
0f59a04
undo change
Mar 24, 2024
29929cf
Added transform functionality test
Mar 24, 2024
736c12a
Added ACSF
Mar 25, 2024
73a3443
Conversion dataset to extxyz
Mar 25, 2024
1c77666
Updated code based on comments
Mar 25, 2024
327e363
Added list check from convert dict keys
Mar 25, 2024
5c16f79
First implementation XYZ reader
Mar 25, 2024
8503b98
Import fix
Mar 25, 2024
5139a57
test skipping if package noot present
Mar 25, 2024
3fb274c
XYZDataset, FromFileDataset, Write interaction xyz
Mar 25, 2024
3b4823e
Fixes
Mar 25, 2024
356adb1
Many-body Tensor Representation
Mar 25, 2024
20caa4f
Included dscribe in main deps, removed torch and jax
Mar 25, 2024
38675f3
wip
Mar 26, 2024
4dd2af8
MTBR incorrect signature Fix (Thanks Danny)
Mar 26, 2024
fa141f5
Added jax/tensor support to interaction datasets
Mar 26, 2024
31beb71
refactor interaction and initial testing
Mar 26, 2024
dccf676
minor changes
Mar 26, 2024
2ab64aa
dummy modification
Mar 26, 2024
5197e32
removed redundant line
Mar 26, 2024
d10b15b
minor change
Mar 26, 2024
cb30c16
State Manager and Chain of Management
Mar 27, 2024
ce8e2b5
Adressed comments + xyz file tests
Mar 27, 2024
dc74dd6
create method enums and refactor the rest accordingly
prtos Mar 28, 2024
c34e53f
adding new files
prtos Mar 28, 2024
ef4581d
fix import issues and update API doc files
prtos Mar 28, 2024
571b706
WIP
Mar 28, 2024
e6d51b7
Separated Statistics and Atom Energies
Mar 28, 2024
05d39c9
ATOM_TABLE import fix
prtos Mar 28, 2024
189ab90
undo changes in interaction dataset, and minor change in shape
Mar 29, 2024
5ee18c2
QmInteractionMethod added
prtos Mar 29, 2024
69c081f
rename QmMethod classes
prtos Mar 29, 2024
7a20193
WIP
Mar 30, 2024
b97cf25
adding a missing file
prtos Apr 1, 2024
4eff5f4
pre-commit
Apr 1, 2024
282dc91
changed super class to BaseInteractionDataset
Apr 2, 2024
cef2b35
Parallelized function, better calculate specs, docstrings
Apr 2, 2024
dbdd985
descriptor tests
Apr 2, 2024
c7ab437
Regression + Statistics abstraction
Apr 2, 2024
5cebb80
Docs+names
Apr 2, 2024
e3c4fcf
added correction method
prtos Apr 2, 2024
fe4b86a
Regressor tests
Apr 2, 2024
b968257
Fixed test
Apr 2, 2024
096b1a0
:/
Apr 2, 2024
c0b6131
Except linalgerror
Apr 2, 2024
a0ad3ab
Merge branch 'refactoring_by_P' of https://github.com/OpenDrugDiscove…
prtos Apr 2, 2024
fe2d9bd
added correction method
prtos Apr 2, 2024
496267f
fixed pre-commits
Apr 2, 2024
b4bb0f3
missing import
Apr 2, 2024
085b933
str fix in cli
Apr 2, 2024
5fe28ce
updated python version since strenum from py>3.11
Apr 2, 2024
05a6dbf
py3.8 compatibility, manual fixes to atom energies
Apr 3, 2024
65a8a12
pkgutils
Apr 3, 2024
3b4199f
some debugging
Apr 3, 2024
2513695
Merge pull request #75 from OpenDrugDiscovery/corrections
FNTwin Apr 3, 2024
6eabc0d
Merge branch 'refactor' into pattern
Apr 3, 2024
2f6eddb
Solved merged issues, added NONE PotentialMethod
Apr 3, 2024
9574974
Merge pull request #72 from OpenDrugDiscovery/refactoring_by_P
FNTwin Apr 3, 2024
701ef1e
Merge branch 'release' into testing
Apr 3, 2024
afea053
further simplified and rebase
Apr 3, 2024
34922c3
Adressed comments, fixed NullEnergy e0s_matrix
Apr 4, 2024
71113c0
Updated stats vector shape to atleast_2d
Apr 4, 2024
955e787
Solved merge issues, added some fixes
Apr 4, 2024
1f9bb94
fixes to xyz
Apr 4, 2024
41a52d6
Added log message
Apr 4, 2024
247b0e1
Merge pull request #74 from OpenDrugDiscovery/pattern
FNTwin Apr 4, 2024
e331cc6
Merge branch 'release' into dataloader
Apr 4, 2024
1c10566
Updated array stuff for xyz dataset
Apr 4, 2024
f1769b3
Merge branch 'release' into dataloader
Apr 4, 2024
ba22ee1
fix bug during rebase and tests
Apr 4, 2024
ac593e3
array test debug
Apr 4, 2024
6f0d46f
undo test change and reset state
Apr 4, 2024
a8d0016
cleaner variant
Apr 4, 2024
ac299a5
Merge pull request #55 from OpenDrugDiscovery/dataloader
shenoynikhil Apr 5, 2024
40d900d
simplified component-wise-force stats calculation and bug-fix
Apr 5, 2024
a21cb18
Loading stats with the right format
Apr 5, 2024
23f8a8b
Bug fix in convert_array for interaction
Apr 5, 2024
d5a139b
Better stats conversion, fixed a reference leak
FNTwin Apr 5, 2024
908ec35
Test dataset
FNTwin Apr 5, 2024
55d9e68
Merge branch 'simplify' of https://github.com/OpenDrugDiscovery/openQ…
FNTwin Apr 5, 2024
ebc2adf
fixes
Apr 5, 2024
7ffd0b1
Merge remote-tracking branch 'origin/release' into testing
Apr 5, 2024
a9c8f66
removed ravel
FNTwin Apr 5, 2024
a71f4d7
Merge pull request #78 from OpenDrugDiscovery/simplify
FNTwin Apr 5, 2024
d15e9cf
Merge remote-tracking branch 'origin/release' into testing
Apr 5, 2024
ed8e264
Updated metcalf
Apr 5, 2024
18bc79c
bug fix and simplifying interaction dataset
Apr 6, 2024
2a6e3ef
Updated tests for interaction datasets
Apr 6, 2024
7493273
removed stale stats in dummy interaction
Apr 6, 2024
ed73e7d
changes based on comments
Apr 6, 2024
0359022
Clean metcalf
FNTwin Apr 6, 2024
33fa342
Simplification
FNTwin Apr 6, 2024
cd486a8
cleaned des
FNTwin Apr 6, 2024
80d7371
Simplified des dataset
FNTwin Apr 6, 2024
f3d205c
removed redundant dataset files
FNTwin Apr 6, 2024
da4fece
DES inerithance
FNTwin Apr 6, 2024
71ff741
Removed des and improved des naming
FNTwin Apr 6, 2024
f6e12e1
DES fixes
FNTwin Apr 6, 2024
3328a65
Removed comments
FNTwin Apr 6, 2024
8b28d59
X40 and L70
FNTwin Apr 6, 2024
8595fd8
Safe opening
FNTwin Apr 6, 2024
ca1b4af
Moved X40 in L7 and removed x40.py
FNTwin Apr 6, 2024
4bec82d
Moved Yaml utils to _utils.py, L7 + X40 interface
FNTwin Apr 7, 2024
a5ced0a
Merge testing + Add imports
FNTwin Apr 8, 2024
a21963e
Merge pull request #79 from OpenDrugDiscovery/interaction_impr
shenoynikhil Apr 8, 2024
3303f95
better convert function and n_body_first to ptr
Apr 12, 2024
c8d245f
Preprocess cli + optional upload to preprocess
FNTwin Apr 13, 2024
6f033cf
Updated splinter reading from -1 to nan
Apr 15, 2024
3d81df2
Merge branch 'testing' into local_fetch
Apr 16, 2024
6027ab0
Cli exception
Apr 16, 2024
970082d
Fixes to x40,l7 preproc
Apr 16, 2024
486a59f
atom.txt packaging
Apr 16, 2024
77e5e26
Added init exc F405, F401 to toml
Apr 16, 2024
69df015
Datasets from data generation
Apr 17, 2024
b0d8e0c
Fixes for uploading
Apr 18, 2024
798f861
Append to extend, metcalf
Apr 18, 2024
565dc26
Dummy fix
Apr 18, 2024
dcc1b6b
SpiceVL2
Apr 18, 2024
c88e18e
Merge pull request #81 from OpenDrugDiscovery/spicevl2
FNTwin Apr 18, 2024
7f7b651
WIP float64 conv
Apr 22, 2024
828e765
fix small bug with DES subsets
mcneela Apr 22, 2024
0b9404e
Updated to float64
Apr 22, 2024
95f926d
Interaction float32
Apr 22, 2024
d2cd5be
updated DES dataset subset handling
mcneela Apr 23, 2024
7a82b59
Updated spiceV2 subsets
Apr 23, 2024
82de349
Merge pull request #82 from OpenDrugDiscovery/subset-fix
mcneela Apr 23, 2024
603496b
Updated ani read_raw_energies
Apr 23, 2024
7eb6a1e
Fixes + MD22
Apr 24, 2024
2a292ab
Remove DataConfig WIP
Apr 24, 2024
be82226
Fixed gdml read_raw_entries
Apr 24, 2024
6a65923
Fixed comp6 read_raw_entries
Apr 24, 2024
50823d0
Added links
Apr 24, 2024
22d0bf5
Logging, fixes to qmugs
Apr 24, 2024
051c084
Removed wip files
Apr 26, 2024
703cca3
Removed dataloader, converted mmap test files
Apr 26, 2024
0c3f54a
Merge pull request #83 from OpenDrugDiscovery/float64
FNTwin Apr 26, 2024
cb6dbaf
Conversion of en fixes + datastructure for e0s_dict and retriaval of …
FNTwin May 1, 2024
da767ef
Tests
FNTwin May 1, 2024
b8791d6
Added _original_unit to xyz
FNTwin May 1, 2024
6abe893
Disable mkdocs
May 1, 2024
8d21e15
I hate mkdocs errors
May 1, 2024
46bc652
mkdocs action
May 1, 2024
2f4b692
Docstrings + naming
FNTwin May 2, 2024
dcd1bea
Merge pull request #85 from OpenDrugDiscovery/atom_ener_structure
FNTwin May 2, 2024
2669d21
docstring
FNTwin May 2, 2024
fcaed00
Issue fix
May 6, 2024
34f36b0
fix metcalf dataset energies and reupload
mcneela May 30, 2024
e85839a
Merge pull request #91 from OpenDrugDiscovery/fix-metcalf
mcneela Jun 1, 2024
318e9c5
Merge pull request #84 from OpenDrugDiscovery/downloader
prtos Jun 8, 2024
10dc009
Merge pull request #88 from OpenDrugDiscovery/forcemaskfix
prtos Jun 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixes + MD22
  • Loading branch information
FNTwin committed Apr 24, 2024
commit 7eb6a1e7d1791ecbd83eb56e68db8f31c54c6a40
2 changes: 2 additions & 0 deletions openqdc/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def get_project_root():
"PCQM_B3LYP": "openqdc.datasets.potential.pcqm",
"PCQM_PM6": "openqdc.datasets.potential.pcqm",
"RevMD17": "openqdc.datasets.potential.revmd17",
"MD22": "openqdc.datasets.potential.md22",
"Transition1X": "openqdc.datasets.potential.transition1x",
"MultixcQM9": "openqdc.datasets.potential.multixcqm9",
"MultixcQM9_V2": "openqdc.datasets.potential.multixcqm9",
Expand Down Expand Up @@ -105,6 +106,7 @@ def __dir__():
from .datasets.potential.gdml import GDML
from .datasets.potential.geom import GEOM
from .datasets.potential.iso_17 import ISO17
from .datasets.potential.md22 import MD22
from .datasets.potential.molecule3d import Molecule3D
from .datasets.potential.multixcqm9 import MultixcQM9, MultixcQM9_V2
from .datasets.potential.nabladft import NablaDFT
Expand Down
42 changes: 42 additions & 0 deletions openqdc/datasets/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,48 @@ def wrapper(idx):
datum["idxs"] = idxs
return datum

@classmethod
def as_dataloader(
cls,
batch_size: int = 8,
energy_unit: Optional[str] = None,
distance_unit: Optional[str] = None,
array_format: str = "torch",
energy_type: str = "formation",
overwrite_local_cache: bool = False,
cache_dir: Optional[str] = None,
recompute_statistics: bool = False,
transform: Optional[Callable] = None,
):
"""
Return the dataset as a dataloader.

Parameters
----------
batch_size : int, optional
Batch size, by default 8
For other parameters, see the __init__ method.
"""
if not has_package("torch_geometric"):
raise ImportError("torch_geometric is required to use this method.")
assert array_format in ["torch", "jax"], f"Format {array_format} must be torch or jax."
from torch_geometric.data import Data
from torch_geometric.loader import DataLoader

return DataLoader(
cls(
energy_unit=energy_unit,
distance_unit=distance_unit,
array_format=array_format,
energy_type=energy_type,
overwrite_local_cache=overwrite_local_cache,
cache_dir=cache_dir,
recompute_statistics=recompute_statistics,
transform=lambda x: Data(**x) if transform is None else transform,
),
batch_size=batch_size,
)

def as_iter(self, atoms: bool = False, energy_method: int = 0):
"""
Return the dataset as an iterator.
Expand Down
2 changes: 2 additions & 0 deletions openqdc/datasets/potential/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from .gdml import GDML
from .geom import GEOM
from .iso_17 import ISO17
from .md22 import MD22
from .molecule3d import Molecule3D
from .multixcqm9 import MultixcQM9, MultixcQM9_V2
from .nabladft import NablaDFT
Expand Down Expand Up @@ -48,4 +49,5 @@
"multixcqm9": MultixcQM9,
"multixcqm9v2": MultixcQM9_V2,
"revmd17": RevMD17,
"md22": MD22,
}
10 changes: 10 additions & 0 deletions openqdc/datasets/potential/dummy.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,16 @@ def preprocess_path(self, overwrite_local_cache=False):

return p_join(get_project_root(), "tests", "files", self.__name__, "preprocessed")

# override
@property
def data_types(self):
return {
"atomic_inputs": np.float32,
"position_idx_range": np.int32,
"energies": np.float32,
"forces": np.float32,
}

def is_preprocessed(self):
return True

Expand Down
2 changes: 1 addition & 1 deletion openqdc/datasets/potential/iso_17.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def __smiles_converter__(self, x):
return "-".join(x.decode("ascii").split("_")[:-1])

def read_raw_entries(self):
raw_path = p_join(self.root, "iso_17.h5")
raw_path = p_join(self.root, "iso_17.h5.gz")
samples = read_qc_archive_h5(raw_path, "iso_17", self.energy_target_names, self.force_target_names)

return samples
49 changes: 49 additions & 0 deletions openqdc/datasets/potential/md22.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
from os.path import join as p_join

import numpy as np

from openqdc.datasets.potential.revmd17 import RevMD17, shape_atom_inputs

trajectories = [
"Ac-Ala3-NHMe",
"DHA",
"stachyose",
"AT-AT",
"AT-AT-CG-CG",
"double-walled_nanotube",
"buckyball-catcher",
]


def read_npz_entry(filename, root):
data = np.load(create_path(filename, root))
nuclear_charges, coords, energies, forces = (
data["z"],
data["R"],
data["E"],
data["F"],
)
frames = coords.shape[0]
res = dict(
name=np.array([filename] * frames),
subset=np.array([filename] * frames),
energies=energies.reshape(-1, 1).astype(np.float64),
forces=forces.reshape(-1, 3, 1).astype(np.float32),
atomic_inputs=shape_atom_inputs(coords, nuclear_charges),
n_atoms=np.array([len(nuclear_charges)] * frames, dtype=np.int32),
)
return res


def create_path(filename, root):
return p_join(root, filename + ".npz")


class MD22(RevMD17):
__name__ = "md22"

def read_raw_entries(self):
entries_list = []
for trajectory in trajectories:
entries_list.append(read_npz_entry(trajectory, self.root))
return entries_list
1 change: 1 addition & 0 deletions openqdc/datasets/potential/revmd17.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ class RevMD17(BaseDataset):
PotentialMethod.PBE_DEF2_TZVP
# "pbe/def2-tzvp",
]
__force_mask__ = [True]

energy_target_names = [
"PBE-TS Energy",
Expand Down
2 changes: 1 addition & 1 deletion openqdc/datasets/potential/spice.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class Spice(BaseDataset):
"""

__name__ = "spice"
__energy_methods__ = [PotentialMethod.WB97M_D3BJ_DEF2_TZVPPD] # "wb97m-d3bj/def2-tzvppd"]
__energy_methods__ = [PotentialMethod.WB97M_D3BJ_DEF2_TZVPPD]
__force_mask__ = [True]
__energy_unit__ = "hartree"
__distance_unit__ = "bohr"
Expand Down
15 changes: 15 additions & 0 deletions openqdc/raws/config_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,21 @@ class DataConfigFactory:
dataset_name="revmd17",
links={"revmd17.zip": "https://figshare.com/ndownloader/articles/12672038/versions/3"},
)
md22 = dict(
dataset_name="md22",
links={
f"{x}.npz": f"http://www.quantum-machine.org/gdml/repo/datasets/md22_{x}.npz"
for x in [
"Ac-Ala3-NHMe",
"DHA",
"stachyose",
"AT-AT",
"AT-AT-CG-CG",
"double-walled_nanotube",
"buckyball-catcher",
]
},
)

x40 = dict(
dataset_name="x40",
Expand Down
Loading