Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a MultiStore object and a Store-like object to access it #787

Merged
merged 19 commits into from
May 22, 2023

Conversation

sivonxay
Copy link
Contributor

@sivonxay sivonxay commented May 18, 2023

The primary goal of the MultiStore object is to solve an issue where individual fireworks making connections to Stores can lead to a large number of total connections. When running in rlaunch multi mode, these connections can be pooled with little issue. In order to accomplish this, a Store should be shared across multiple processes. MongoClient already handles connection pooling, but it cannot be passed to child processes, since pickling will result in new connections being made. To circumvent this, a multiprocessing.Manager can be used to "host" the Store on the parent process and then the children can access this Store.

The caveats of this are:

  1. It is not guaranteed that all fireworks will need to save data to the same store. For example if you were running an elastic calculation and an AIMD calculation, the data might need to go to different collections (or databases).
    • Instead of hosting a single Store, this MultiStore caches multiple stores. Since the MultiStore contains many stores, the query, update, and other methods require an additional input specifying the store that is being operated on. Thus, the MultiStore cannot inherit from Store and cannot be used normally. The StoreFacade provides a way to interact with the MultiStore as if it were an ordinary Store.
  2. The parent process must act on the child process' behalf, since the child cannot be given a pointer to the store.
    • The MultiStore/StoreFacade accomplishes this by having the child pass all queries, updates, etc. to the parent via the StoreFacade.

Note: This might belong in another package, but I thought it was flexible and generic enough that it could be put into maggma.

These are the pull requests for Fireworks and Jobflow

Todo:

  • Need to figure out how to expose other attributes of the cached stores. For example, MongoStore.safe_update
  • Implement test over multiple processes

Contributor Checklist

  • I have run the tests locally and they passed.
  • I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR

@sivonxay sivonxay changed the title [WIP] Multistore [WIP] Create a Multistore object and a Store-like object to access it May 19, 2023
@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Patch coverage: 94.81% and project coverage change: +0.29 🎉

Comparison is base (bd31bc8) 88.26% compared to head (8798d30) 88.56%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #787      +/-   ##
==========================================
+ Coverage   88.26%   88.56%   +0.29%     
==========================================
  Files          42       43       +1     
  Lines        3188     3323     +135     
==========================================
+ Hits         2814     2943     +129     
- Misses        374      380       +6     
Impacted Files Coverage Δ
src/maggma/stores/shared_stores.py 94.81% <94.81%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sivonxay
Copy link
Contributor Author

I wrote a script to test if the classes introduced here work as intended when working with multiple processes, but I'm not sure how to integrate it into the pytest, since it uses helper functions.

Here it is:

from multiprocessing import Pool
from maggma.stores.shared_stores import MultiStore, StoreFacade, MultiStoreManager
from maggma.stores import MemoryStore

def insert_function(i, port):
    _manager = MultiStoreManager(address=("127.0.0.1", port), authkey=b"abcd")
    _manager.connect()
    _ms = _manager.MultiStore()
    store = StoreFacade(MemoryStore(), _ms)

    store.update({'process': i, 'a': 1*i, 'b': 2*i}, key='process')

def query_function(i, port):
    _manager = MultiStoreManager(address=("127.0.0.1", port), authkey=b"abcd")
    _manager.connect()
    _ms = _manager.MultiStore()
    store = StoreFacade(MemoryStore(), _ms)

    return store.query_one({'process': i})

def insert_and_query(i, port):
    insert_function(i, port)
    return query_function(i, port)

if __name__ == "__main__":
    # Make the manager
    ms = MultiStore()
    manager = MultiStoreManager.setup(ms)
    port = manager.address[1]

    # Start 4 processes
    with Pool(3) as p:
        # Start the processes and collect the results
        results = [p.apply_async(insert_and_query, args=(i, port)) for i in range(3)]
        output = [p.get() for p in results]
        print(output)
        expected_output = [{'process': 0, 'a': 0, 'b': 0},
                           {'process': 1, 'a': 1, 'b': 2},
                           {'process': 2, 'a': 2, 'b': 4}]

        for out_dict, expected_out_dict in zip(output, expected_output):
            del out_dict['_id']
            assert out_dict == expected_out_dict
        print(output == expected_output)

@sivonxay sivonxay marked this pull request as ready for review May 20, 2023 00:59
@sivonxay sivonxay changed the title [WIP] Create a Multistore object and a Store-like object to access it Create a MultiStore object and a Store-like object to access it May 20, 2023
@munrojm
Copy link
Member

munrojm commented May 22, 2023

Hi @sivonxay, this looks great. Thanks for putting it together. No concerns on my end other than the tests, but I can take a look at that. I think this is general enough to be appropriate in maggma, as you say. Happy to merge now.

@munrojm munrojm merged commit 19ca910 into materialsproject:main May 22, 2023
@sivonxay
Copy link
Contributor Author

Thanks Jason!

@sivonxay sivonxay deleted the multistore branch May 22, 2023 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants