Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client load data directly from csvs. #484

Merged
merged 2 commits into from
Jul 9, 2024

Conversation

KFilippopolitis
Copy link
Contributor

@KFilippopolitis KFilippopolitis commented Jun 10, 2024

Update data processing for client so they load data from csv and not the database.
New mipdb version.

@KFilippopolitis KFilippopolitis changed the title New mipdb version. Client load data directly from csvs. Jun 10, 2024
@KFilippopolitis KFilippopolitis force-pushed the client_load_data_from_csv branch 10 times, most recently from 94bbbd0 to 0c81b32 Compare June 14, 2024 12:12
Copy link
Member

@ThanKarab ThanKarab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a little bit more work on passing the csv paths and how to distinguish on what to pass on each worker.

Why do we need to keep the workerId on the dataset location since we know what node gives us which dataset.

Let's discuss this on Monday, to explain it to me a bit.

exareme2/algorithms/flower/logistic_regression/client.py Outdated Show resolved Hide resolved
Comment on lines 48 to 62
max_retries = 6
for attempt in range(max_retries):
try:
fl.client.start_client(
server_address=os.environ["SERVER_ADDRESS"], client=client.to_client()
)
print("Connection successful on attempt", attempt + 1)
break
except Exception as e:
print(f"Connection attempt {attempt + 1} failed: {e}")
if attempt < max_retries - 1:
time.sleep(1)
else:
print("Max retries reached. Exiting.")
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a separate commit specifying the fix, the reason, etc.

Also max_retries could be an env variable, hard coded in the Dockerfile or somewhere else, whatever you prefer.

Finally this:

        except Exception as e:
            print(f"Connection attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                time.sleep(1)
            else:
                print("Max retries reached. Exiting.")
                raise

could be rewritten like:

        except Exception as e:
            if attempt >= max_retries - 1:
                print("Could not establish connection to the server.")
                raise ex
            print(f"Connection with the server failed. Attempt {attempt} failed: {e}")
            time.sleep(1) // USE GLOBAL VARIABLE
                

You can also start the range() from 1 instead of doing the +1 -1

class DatasetsInfo(ImmutableBaseModel):
"""
A dictionary representation of a dataset's information.
Key values are the names of the datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key represents the code of a dataset.

exareme2/worker_communication.py Show resolved Hide resolved
@KFilippopolitis KFilippopolitis force-pushed the client_load_data_from_csv branch 3 times, most recently from 9f3fb59 to 97c0a64 Compare July 1, 2024 19:37
@KFilippopolitis KFilippopolitis force-pushed the client_load_data_from_csv branch 5 times, most recently from 920e911 to b257637 Compare July 2, 2024 18:58
@KFilippopolitis KFilippopolitis requested a review from ThanKarab July 4, 2024 14:30
Copy link
Member

@ThanKarab ThanKarab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! 💯

As discussed in the call we have 2 remaining things:

  1. Make the csv_path nullable
  2. In the mipdb, when importing from socket, the csv_path should be null.

@KFilippopolitis KFilippopolitis force-pushed the client_load_data_from_csv branch from b257637 to 2db7dcf Compare July 9, 2024 20:38
@KFilippopolitis KFilippopolitis merged commit 325476c into master Jul 9, 2024
6 of 7 checks passed
@KFilippopolitis KFilippopolitis deleted the client_load_data_from_csv branch July 9, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants