Use hdf5 references for arrays #118

ka-sarthak · 2024-09-05T16:54:35Z

Adds a class HDF5Handler which can be used to create and manipulate an auxiliary HDF5 file from within an ELN schema. This file is intended to be used to offload large arrays of data into the HDF5 file and store references in the archive. The following methods are defined for this class:

add_dataset: allows to add a dataset to a specified path in the HDF5 file. Uses lazy writing and stores the data in instance variables. Only when write file methods are triggered, the data is written into the file. Populates the instance variable hdf5_datasets: dict
add_attribute: allows to populate attrs dictionary of the HDF5 datasets and groups. Also uses lazy writing. Attributes are used to store information about units and the generation of plots. Populates the instance variable hdf5_attributes: dict
read_dataset: reads the dataset from the HDF5 file at the specified path, and if units are available in the attrs dictionary, a corresponding pint.Quantity is returned. The method ensures that before reading, all the data added using the add_dataset method is written in the file.
_write_hdf5_file: goes through the instance variables hdf5_datasets and hdf5_attributes and populates a .h5 file with data and attributes. It also automatically creates groups as required by the dataset paths. Once the data from the instance variables is written into the file, they are reset to empty dictionaries. Additionally, if the corresponding archive paths to the HDF5Reference quantities are available for the datasets, the archive quantities are populated with references.
_write_nx_file: a similar method as above that generates a .nxs file instead of a .h5 file.
write_file: a wrapper for toggling between the above two methods.
_set_hdf5_reference: a static method to set the value of the HDF5Reference quantity for a given section, section path and HDF5 dataset path.
_remove_nx_annotations: a static method that removes nexus-related annotations from dataset paths. With this, one can add nexus-specific paths in the add_dataset method and the handler can convert them into general paths if required.

The handler is intended to be used in the following way. This is also the general outline of the implementation in the schema.py file.

Initialize the handler with the auxiliary file name (either .nxs or .h5 extensions), entry archive, logger, and an optional list of valid dataset paths.
Add the datasets by specifying their HDF5 path and (optionally) their corresponding archive paths. The data can be of any valid type for HDF5 files. In case, the dataset is of type pint.Quantity, it is stored as a numpy array and the unit is stored as an attribute of the dataset.
Add the additional attributes for the dataset and groups if needed. Mostly this can be used to add attributes specific for plotting. For example, adding group attributes like NXclass, axes, and signal to generate plots using H5Web.
Read the dataset in the normalize method of sub-sections and parent sections to compute other properties.
Write the HDF5 file after adding datasets

Some additional classes and code have been added to adapt the schema's plotting functionality with HDF5 implementation.

Summary by Sourcery

Implement lazy loading of large arrays using HDF5References and an auxiliary HDF5 file. Add an HDF5Handler class to manage the creation and manipulation of the HDF5 file, including adding datasets, attributes, and reading datasets. Adapt the schema's plotting functionality to work with the HDF5 implementation.

New Features:

Add an HDF5Handler class to manage large arrays in an auxiliary HDF5 file, which supports lazy loading and storing references in the archive.

Tests:

Updated tests to reflect HDF5 implementation.

Co-authored-by: Sarthak Kapoor <[email protected]> Co-authored-by: Hampus Näsström <[email protected]>

…same methods for classes with hdf5 refs

src/nomad_measurements/utils.py

ladinesa

HDFHandler intends to serve as interface to the write raw hdf5 files to the nomad file system. It also implements methods for special handling of nxs-related data.
For me, building a class is not necessary as this does not really represent an object
rather the read and write functions should simply be functions. In any case, I cannot see any major deviation from the the HDFReference writer except that it loops over given datasets. tbh, i am really uncomfortable with the idea of giving the developer-user access to the file system, i.e. the use of context.raw_files should only be done in nomad not in plugins. but this is only my opinion. I am not sure if there are more functionalities that will be added to the class, but as it stands,i am of the opinion that read and write functions suffice.

src/nomad_measurements/utils.py

ka-sarthak · 2024-12-16T10:14:26Z

@ladinesa, the initial idea was to make these a function. But to have lazy writing, states like hdf5_datasets and hdf5_attributes became essential. I could have handled them in the schema classes, but having them in a separate class seemed like a cleaner solution. Are there any disadvantages to making this as a class?

ladinesa · 2024-12-16T10:28:47Z

@ladinesa, the initial idea was to make these a function. But to have lazy writing, states like hdf5_datasets and hdf5_attributes became essential. I could have handled them in the schema classes, but having them in a separate class seemed like a cleaner solution. Are there any disadvantages to making this as a class?

Yes, but you simply provide a list if of datasets to write if you do not like to write them individually. It is just a simpler implementation.

sourcery-ai · 2024-12-19T14:22:41Z

Reviewer's Guide by Sourcery

This pull request implements a new HDF5Handler class to manage storage and retrieval of large array data in auxiliary HDF5 files within the ELN schema. It introduces lazy writing for datasets and attributes, handles unit conversion with pint.Quantity, and provides methods for writing to both .h5 and .nxs files. The schema plotting functionality is also adapted for HDF5 integration.

Sequence diagram for HDF5 data handling process

sequenceDiagram
    participant Client
    participant Handler as HDF5Handler
    participant File as HDF5 File
    participant Archive as NOMAD Archive

    Client->>Handler: Initialize(filename, archive)
    Client->>Handler: add_dataset(path, data)
    Note over Handler: Stores data in memory
    Client->>Handler: add_attribute(path, attrs)
    Note over Handler: Stores attributes in memory
    Client->>Handler: write_file()
    Handler->>File: Create/Open file
    Handler->>File: Write datasets
    Handler->>File: Write attributes
    Handler->>Archive: Set HDF5 references
    Note over Handler: Reset internal storage

Class diagram for the new HDF5Handler and related classes

classDiagram
    class HDF5Handler {
        +str data_file
        +EntryArchive archive
        +BoundLogger logger
        +list valid_dataset_paths
        +bool nexus
        -dict _hdf5_datasets
        -dict _hdf5_attributes
        +add_dataset(path: str, params: dict, validate_path: bool)
        +add_attribute(path: str, params: dict)
        +read_dataset(path: str)
        +write_file()
        -_write_nx_file()
        -_write_hdf5_file()
        -_remove_nexus_annotations(path: str) : str
        -_set_hdf5_reference(section, path: str, ref: str)
    }

    class DatasetModel {
        +Any data
        +Optional[str] archive_path
        +Optional[bool] internal_reference
    }

    class XRDResult {
        +HDF5Reference intensity
        +HDF5Reference two_theta
        +HDF5Reference q_norm
        +HDF5Reference omega
        +HDF5Reference phi
        +HDF5Reference chi
        +HDF5Reference integration_time
    }

    HDF5Handler ..> DatasetModel : uses
    XRDResult ..> HDF5Handler : uses

File-Level Changes

Change	Details	Files
Introduced HDF5Handler class with lazy writing for datasets and attributes.	Added HDF5Handler class to utils.py with methods for adding datasets, adding attributes, reading datasets, and writing to HDF5 files. Implemented lazy writing, storing data in instance variables until write methods are triggered. Added support for pint.Quantity, storing units as attributes and returning quantities when reading datasets. Included methods for writing to both .h5 and .nxs files, handling NeXus-specific annotations if needed. Set HDF5References in the archive for added datasets. Added DatasetModel in utils.py to validate the dataset parameters with pydantic	`src/nomad_measurements/xrd/schema.py` `src/nomad_measurements/utils.py`
Adapted schema plotting functionality for HDF5 integration.	Added new sections XRDResultPlotIntensity and XRDResultPlotIntensityScatteringVector to store data for plotting. Modified normalize methods of XRDResult1D and XRDResultRSM to use HDF5Handler for reading data and generating plots. Removed array_index quantity and plotting metadata from XRDResult. Added auxiliary_file quantity to ELNXRayDiffraction section. Added hdf5_handler instance variable to ELNXRayDiffraction section. Updated normalize method of ELNXRayDiffraction to initialize HDF5Handler and write data after parsing. Added backward_compatibility method to ELNXRayDiffraction to remove existing results and figures when migrating to HDF5References. Removed generate_nexus_file quantity from ELNXRayDiffraction section. Updated write_xrd_data method to use HDF5Handler for adding datasets and setting HDF5References. Updated test_parser.py to account for .h5 files in addition to .nxs files. Added H5WebAnnotation to ELNXRayDiffraction for H5Web compatibility. Modified generate_plots methods in XRDResult1D and XRDResultRSM to read data from HDF5 files using HDF5Handler. Updated normalize method in XRDResult1D and XRDResultRSM to add datasets and attributes to HDF5Handler and write the HDF5 file. Added NEXUS_DATASET_PATHS constant to nx.py to store valid NeXus dataset paths. Removed connect_concepts, walk_through_object, write_nx_section_and_create_file, and populate_nexus_subsection functions from nx.py as they are no longer needed since the NeXus file is created directly by the HDF5Handler now. Updated normalize method of ELNXRayDiffraction to set auxiliary_file and initialize HDF5Handler. Removed generate_plots method from ELNXRayDiffraction section. Updated normalize method of ELNXRayDiffraction to write HDF5 file after parsing and update auxiliary_file if needed. Updated ELNXRayDiffraction to inherit from EntryData instead of PlotSection since the plotting data is now stored in the HDF5 file. Added plotting metadata to XRDResultPlotIntensity and XRDResultPlotIntensityScatteringVector sections for H5Web compatibility. Updated normalize method of XRDResult1D and XRDResultRSM to add datasets and attributes to HDF5Handler for plotting data. Updated normalize method of XRDResult to calculate q_norm and two_theta using HDF5Handler if source_peak_wavelength is available. Updated normalize method of XRDResultRSM to calculate q_parallel and q_perpendicular using HDF5Handler if source_peak_wavelength is available. Updated write_xrd_data method to handle different scan types and add datasets to HDF5Handler accordingly. Updated test_parser.py to account for .h5 files in addition to .nxs files. Added H5WebAnnotation to ELNXRayDiffraction for H5Web compatibility. Modified generate_plots methods in XRDResult1D and XRDResultRSM to read data from HDF5 files using HDF5Handler. Updated normalize method in XRDResult1D and XRDResultRSM to add datasets and attributes to HDF5Handler and write the HDF5 file. Added NEXUS_DATASET_PATHS constant to nx.py to store valid NeXus dataset paths. Removed connect_concepts, walk_through_object, write_nx_section_and_create_file, and populate_nexus_subsection functions from nx.py as they are no longer needed since the NeXus file is created directly by the HDF5Handler now. Updated normalize method of ELNXRayDiffraction to set auxiliary_file and initialize HDF5Handler. Removed generate_plots method from ELNXRayDiffraction section. Updated normalize method of ELNXRayDiffraction to write HDF5 file after parsing and update auxiliary_file if needed. Updated ELNXRayDiffraction to inherit from EntryData instead of PlotSection since the plotting data is now stored in the HDF5 file. Added plotting metadata to XRDResultPlotIntensity and XRDResultPlotIntensityScatteringVector sections for H5Web compatibility. Updated normalize method of XRDResult1D and XRDResultRSM to add datasets and attributes to HDF5Handler for plotting data. Updated normalize method of XRDResult to calculate q_norm and two_theta using HDF5Handler if source_peak_wavelength is available. Updated normalize method of XRDResultRSM to calculate q_parallel and q_perpendicular using HDF5Handler if source_peak_wavelength is available. Updated write_xrd_data method to handle different scan types and add datasets to HDF5Handler accordingly.	`src/nomad_measurements/xrd/schema.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @ka-sarthak - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider extracting common plotting logic from XRDResult1D and XRDResultRSM into a shared helper function to reduce code duplication

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-12-19T14:22:42Z

src/nomad_measurements/utils.py

+        if validate_path and self.valid_dataset_paths:
+            if path not in self.valid_dataset_paths:
+                self.logger.warning(f'Invalid dataset path "{path}".')
+                return


suggestion (code-quality): Merge nested if conditions (merge-nested-ifs)

Suggested change

if validate_path and self.valid_dataset_paths:

if path not in self.valid_dataset_paths:

self.logger.warning(f'Invalid dataset path "{path}".')

return

if validate_path and self.valid_dataset_paths and path not in self.valid_dataset_paths:

self.logger.warning(f'Invalid dataset path "{path}".')

return

Explanation
Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if conditions can be combined using
and is an easy win.

sourcery-ai · 2024-12-19T14:22:42Z

src/nomad_measurements/xrd/schema.py

+            result = XRDResultRSM()
+
+        if result is not None:
+            result.scan_axis = metadata_dict.get('scan_axis', None)


suggestion (code-quality): Replace dict.get(x, None) with dict.get(x) (remove-none-from-default-get)

Suggested change

result.scan_axis = metadata_dict.get('scan_axis', None)

result.scan_axis = metadata_dict.get('scan_axis')

Explanation
When using a dictionary's get method you can specify a default to return if
the key is not found. This defaults to None, so it is unnecessary to specify
None if this is the required behaviour. Removing the unnecessary argument
makes the code slightly shorter and clearer.

sourcery-ai · 2024-12-19T14:22:42Z

src/nomad_measurements/xrd/schema.py

+            self.hdf5_handler.add_dataset(
+                path='/ENTRY[entry]/experiment_result/intensity',
+                params=dict(
+                    data=xrd_dict.get('intensity', None),


suggestion (code-quality): Replace dict.get(x, None) with dict.get(x) (remove-none-from-default-get)

Suggested change

data=xrd_dict.get('intensity', None),

data=xrd_dict.get('intensity'),

Explanation
When using a dictionary's get method you can specify a default to return if
the key is not found. This defaults to None, so it is unnecessary to specify
None if this is the required behaviour. Removing the unnecessary argument
makes the code slightly shorter and clearer.

sourcery-ai · 2024-12-19T14:22:42Z

src/nomad_measurements/xrd/schema.py

+            self.hdf5_handler.add_dataset(
+                path='/ENTRY[entry]/experiment_result/two_theta',
+                params=dict(
+                    data=xrd_dict.get('2Theta', None),


suggestion (code-quality): Replace dict.get(x, None) with dict.get(x) (remove-none-from-default-get)

Suggested change

data=xrd_dict.get('2Theta', None),

data=xrd_dict.get('2Theta'),

Explanation
When using a dictionary's get method you can specify a default to return if
the key is not found. This defaults to None, so it is unnecessary to specify
None if this is the required behaviour. Removing the unnecessary argument
makes the code slightly shorter and clearer.

sourcery-ai · 2024-12-19T14:22:43Z

src/nomad_measurements/xrd/schema.py

+            self.hdf5_handler.add_dataset(
+                path='/ENTRY[entry]/experiment_result/omega',
+                params=dict(
+                    data=xrd_dict.get('Omega', None),


suggestion (code-quality): Replace dict.get(x, None) with dict.get(x) (remove-none-from-default-get)

Suggested change

data=xrd_dict.get('Omega', None),

data=xrd_dict.get('Omega'),

Explanation
When using a dictionary's get method you can specify a default to return if
the key is not found. This defaults to None, so it is unnecessary to specify
None if this is the required behaviour. Removing the unnecessary argument
makes the code slightly shorter and clearer.

sourcery-ai · 2024-12-19T14:22:43Z

tests/test_parser.py

+    if isinstance(parsed_archive.data.results[0], XRDResult1D):
        assert parsed_archive.results.properties.structural.diffraction_pattern[
            0
        ].incident_beam_wavelength.magnitude * 1e10 == pytest.approx(1.540598, 1e-2)


issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

Explanation
Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

loops

conditionals

Some ways to fix this:

Use parametrized tests to get rid of the loop.

Move the complex logic into helpers.

Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

sourcery-ai · 2024-12-19T14:22:43Z

src/nomad_measurements/utils.py

+                units = self._hdf5_attributes[dataset_path].get('units', None)
+                if units:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

units = self._hdf5_attributes[dataset_path].get('units', None)

if units:

if units := self._hdf5_attributes[dataset_path].get('units', None):

sourcery-ai · 2024-12-19T14:22:43Z

src/nomad_measurements/utils.py

+            new_key = self._remove_nexus_annotations(key)
+            tmp_dict[new_key] = value
+        self._hdf5_datasets = tmp_dict
+        tmp_dict = {}


issue (code-quality): Convert for loop into dictionary comprehension (dict-comprehension)

sourcery-ai · 2024-12-19T14:22:43Z

src/nomad_measurements/utils.py

+        new_path = ''
+        for part in path.split('/')[1:]:
+            if re.match(pattern, part):
+                new_path += '/' + part.split('[')[0].strip().lower()
+            else:
+                new_path += '/' + part
+        new_path = new_path.replace('.nxs', '.h5')
+        return new_path


suggestion (code-quality): We've found these issues:

Use str.join() instead of for loop (use-join)

Replace if statement with if expression (assign-if-exp)

Inline variable that is immediately returned (inline-immediately-returned-variable)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

Suggested change

new_path = ''

for part in path.split('/')[1:]:

if re.match(pattern, part):

new_path += '/' + part.split('[')[0].strip().lower()

else:

new_path += '/' + part

new_path = new_path.replace('.nxs', '.h5')

return new_path

new_path = ''.join(

(

'/' + part.split('[')[0].strip().lower()

if re.match(pattern, part)

else f'/{part}'

)

for part in path.split('/')[1:]

)

return new_path.replace('.nxs', '.h5')

sourcery-ai · 2024-12-19T14:22:44Z

src/nomad_measurements/xrd/schema.py

+                ),
+            )
+        elif self.q_parallel is not None and self.q_perpendicular is not None:
+            intensity = hdf5_handler.read_dataset(self.intensity)


issue (code-quality): Extract code out into method (extract-method)

RubelMozumder · 2024-12-19T14:40:25Z

Hi @ka-sarthak, after checking with you, as mentioned to create another PR regarding the nexus part to branch: use-hdf5-references-for-arrays, here is the PR: #147. I put you and @hampusnasstrom as reviewers for that.

RubelMozumder

LGTM!

ka-sarthak · 2024-12-19T14:54:41Z

Merging this branch into the branch associated with #113

* updated plugin structure * added pynxtools dependency * Apply suggestions from code review Co-authored-by: Sarthak Kapoor <[email protected]> Co-authored-by: Hampus Näsström <[email protected]> * Add sections for RSM and 1D which uses HDF5 references * Abstract out data interaction using setter and getter; allows to use same methods for classes with hdf5 refs * Use arrays, not references, in the `archive.results` section * Lock the state for using nexus file and corresponding references * Populate results without references * Make a general reader for raw files * Remove nexus flags * Add quantity for auxialiary file * Fix rebase * Make integration_time as hdf5reference * Reset results (refactor) * Add backward compatibility * Refactor reader * add missing imports * AttrDict class * Make concept map global * Add function to remove nexus annotations in concept map * Move try block inside walk_through_object * Fix imports * Add methods for generating hdf5 file * Rename auxiliary file * Expect aux file to be .nxs in the beginning * Add attributes for hdf5: data_dict, dataset_paths * Method for adding a quantity to hdf5_data_dict * Abstract out methods for creating files based on hdf5_data_dict * Add dataset_paths for nexus * Some reverting back * Minor fixes * Refactor populate_hdf5_data_dict: store a reference to be made later * Handle shift from nxs to hdf5 * Set hdf5 references after aux file is created * Cleaning * Fixing * Redefine result sections instead of extending * Remove plotly plots from ELN * Read util for hdf5 ref * Fixing * Move hdf5 handling into a util class * Refactor instance variables * Reset data dicts and reference after each writing * Fixing * Overwrite dataset if it already exists * Refactor add_dataset * Reorganize and doctrings * Rename variable * Add read_dataset method * Cleaning * Adapting schema with hdf5 handler * Cooments, minor refactoring * Fixing; add `hdf5_handler` as an attribute for archive * Reorganization * Fixing * Refactoring * Cleaning * Try block for using hdf5 handler: dont fail early, as later normalization steps will have the handler! * Extract units from dataset attrs when reading * Fixing * Linting * Make archive_path optional in add_dataset * Rename class * attrs for add_dataset; use it for units * Add add_attribute method * Refactor add_attribute * Add plot attributes: 1D * Refactor hdf5 states * Add back plotly figures * rename auxiliary file name if changed by handler * Add referenced plots * Allow hard link using internel reference * Add sections for plots * Comment out validation * Add archive paths for the plot subsections * Add back validation with flag * Use nexus flag * Add interpolated intensity data into h5 for qspace plots * Use prefix to reduce len of string * Store regularized linespace of q vectors; revise descriptions * Remove plotly plots * Bring plots to overview * Fix tests * Linting; remove attr arg from add_dataset * Review: move none check into method * Review: use 'with' for opening h5 file * Review: make internal states as private vars * Add pydantic basemodel for dataset * Use data from variables if available for reading * Review: remove lazy arg * Move DatasetModel outside Handler class * Remove None from get, as it is already a default * Merge if conditions --------- Co-authored-by: Andrea Albino <[email protected]> Co-authored-by: Andrea Albino <[email protected]> Co-authored-by: Hampus Näsström <[email protected]>

ka-sarthak self-assigned this Sep 5, 2024

ka-sarthak marked this pull request as draft September 5, 2024 16:54

ka-sarthak force-pushed the use-hdf5-references-for-arrays branch from 0bc1a1d to 60e8a52 Compare September 6, 2024 10:55

ka-sarthak mentioned this pull request Nov 20, 2024

XRD Hackathon: using a nexus subsection #135

Open

ka-sarthak force-pushed the write-nexus-section branch from d583974 to 3aeb548 Compare November 25, 2024 10:50

aalbino2 and others added 11 commits December 3, 2024 16:05

updated plugin structure

cd342fd

added pynxtools dependency

df6219b

Apply suggestions from code review

ce1e60b

Co-authored-by: Sarthak Kapoor <[email protected]> Co-authored-by: Hampus Näsström <[email protected]>

Add sections for RSM and 1D which uses HDF5 references

de7b48e

Abstract out data interaction using setter and getter; allows to use …

47198b9

…same methods for classes with hdf5 refs

Use arrays, not references, in the archive.results section

44ea11f

Lock the state for using nexus file and corresponding references

6b66448

Populate results without references

8d736b6

Make a general reader for raw files

cd36d15

Remove nexus flags

1d612f1

Add quantity for auxialiary file

f796a79

ka-sarthak force-pushed the use-hdf5-references-for-arrays branch 3 times, most recently from 3489de3 to fcd585f Compare December 3, 2024 15:19

Fix rebase

6c59dad

ka-sarthak force-pushed the use-hdf5-references-for-arrays branch from fcd585f to 6c59dad Compare December 3, 2024 15:21

ka-sarthak added 9 commits December 4, 2024 10:15

Make integration_time as hdf5reference

ad98a38

Reset results (refactor)

1072dc0

Add backward compatibility

65c8659

Refactor reader

d5445ff

add missing imports

6f16f44

AttrDict class

d170d30

Make concept map global

4f7bc83

Add function to remove nexus annotations in concept map

a3295b7

Move try block inside walk_through_object

89cdbc9

ladinesa reviewed Dec 13, 2024

View reviewed changes

src/nomad_measurements/utils.py Outdated Show resolved Hide resolved

ladinesa reviewed Dec 13, 2024

View reviewed changes

src/nomad_measurements/utils.py Outdated Show resolved Hide resolved

ladinesa reviewed Dec 13, 2024

View reviewed changes

src/nomad_measurements/utils.py Outdated Show resolved Hide resolved

ladinesa reviewed Dec 13, 2024

View reviewed changes

src/nomad_measurements/utils.py Outdated Show resolved Hide resolved

ladinesa reviewed Dec 13, 2024

View reviewed changes

src/nomad_measurements/utils.py Outdated Show resolved Hide resolved

Review: move none check into method

4a0852c

ka-sarthak added 2 commits December 17, 2024 11:49

Review: use 'with' for opening h5 file

864d53b

Review: make internal states as private vars

122b65f

RubelMozumder force-pushed the use-hdf5-references-for-arrays branch from 5cb759a to 122b65f Compare December 17, 2024 16:39

ka-sarthak added 4 commits December 17, 2024 19:14

Add pydantic basemodel for dataset

67ee382

Use data from variables if available for reading

645348a

Review: remove lazy arg

64eae6c

Move DatasetModel outside Handler class

e3164ff

RubelMozumder force-pushed the use-hdf5-references-for-arrays branch from ec860a2 to e3164ff Compare December 19, 2024 12:03

ka-sarthak marked this pull request as ready for review December 19, 2024 14:21

sourcery-ai bot reviewed Dec 19, 2024

View reviewed changes

RubelMozumder mentioned this pull request Dec 19, 2024

Integration nexus file #147

Closed

Remove None from get, as it is already a default

ac39a30

RubelMozumder approved these changes Dec 19, 2024

View reviewed changes

Merge if conditions

3fa6263

ka-sarthak merged commit 71b6952 into write-nexus-section Dec 19, 2024
5 checks passed

ka-sarthak deleted the use-hdf5-references-for-arrays branch December 19, 2024 14:55

RubelMozumder mentioned this pull request Dec 20, 2024

Adding nexus in ref #150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use hdf5 references for arrays #118

Use hdf5 references for arrays #118

ka-sarthak commented Sep 5, 2024 •

edited by sourcery-ai bot

Loading

ladinesa left a comment •

edited

Loading

ka-sarthak commented Dec 16, 2024

ladinesa commented Dec 16, 2024

sourcery-ai bot commented Dec 19, 2024

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

sourcery-ai bot Dec 19, 2024

RubelMozumder commented Dec 19, 2024

RubelMozumder left a comment

ka-sarthak commented Dec 19, 2024

	result.scan_axis = metadata_dict.get('scan_axis', None)
	result.scan_axis = metadata_dict.get('scan_axis')

	data=xrd_dict.get('intensity', None),
	data=xrd_dict.get('intensity'),

	data=xrd_dict.get('2Theta', None),
	data=xrd_dict.get('2Theta'),

	data=xrd_dict.get('Omega', None),
	data=xrd_dict.get('Omega'),

		units = self._hdf5_attributes[dataset_path].get('units', None)
		if units:

	units = self._hdf5_attributes[dataset_path].get('units', None)
	if units:
	if units := self._hdf5_attributes[dataset_path].get('units', None):

Use hdf5 references for arrays #118

Use hdf5 references for arrays #118

Conversation

ka-sarthak commented Sep 5, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

ladinesa left a comment • edited Loading

Choose a reason for hiding this comment

ka-sarthak commented Dec 16, 2024

ladinesa commented Dec 16, 2024

sourcery-ai bot commented Dec 19, 2024

Reviewer's Guide by Sourcery

Sequence diagram for HDF5 data handling process

Class diagram for the new HDF5Handler and related classes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

sourcery-ai bot Dec 19, 2024

Choose a reason for hiding this comment

RubelMozumder commented Dec 19, 2024

RubelMozumder left a comment

Choose a reason for hiding this comment

ka-sarthak commented Dec 19, 2024

ka-sarthak commented Sep 5, 2024 •

edited by sourcery-ai bot

Loading

ladinesa left a comment •

edited

Loading