-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: allow users to pass schema in encrypted data-frames #676
chore: allow users to pass schema in encrypted data-frames #676
Conversation
if column_name not in column_names: | ||
# TODO: Is this check actually relevant ? Can't the schema provide more columns than the | ||
# one found in the data-frame ? | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we allow schema with column names that do not match the ones found in the given data-frame ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo raising the error as you are doing here is the correct behavior.
@@ -189,10 +297,15 @@ def pre_process_dtypes(pandas_dataframe: pandas.DataFrame) -> Tuple[pandas.DataF | |||
"supported." | |||
) | |||
|
|||
# TODO: Should all non-integers columns be considered by the schema if not None ? Currently, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we raise an error/warning if all non-integer columns from the data-frame were not covered by the given schema ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if they are missing from the given schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are automatically computed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering because if could be an easy mistake to forget to put some columns, but no error will be raised
@@ -37,7 +41,9 @@ def keygen(self, keys_path: Optional[Union[Path, str]] = None): | |||
else: | |||
self.client.keygen(True) | |||
|
|||
def encrypt_from_pandas(self, pandas_dataframe: pandas.DataFrame) -> EncryptedDataFrame: | |||
def encrypt_from_pandas( | |||
self, pandas_dataframe: pandas.DataFrame, schema: Optional[Dict] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a schema is optional. If set, it should follow a specific format.
if needed, we could also handle the output of get_schema
(pandas data-frames) as an input here
e81b54d
to
3648432
Compare
4473764
to
760712f
Compare
760712f
to
6421bc5
Compare
Coverage passed ✅Coverage details
|
elif column.dtype == "object": | ||
unique_values = column.unique() | ||
|
||
# Only take strings into account and thus avoid NaN values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use the old x != x
to detect NaNs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah right, I keep forgetting this trick thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, responded to the todos
refs https://github.com/zama-ai/concrete-ml-internal/issues/4376