-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split S3 files into smaller files to send large union file #77
Open
yuyashiraki
wants to merge
3
commits into
facebookresearch:main
Choose a base branch
from
yuyashiraki:export-D39219674
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
fb-exported
labels
Sep 4, 2022
This pull request was exported from Phabricator. Differential Revision: D39219674 |
yuyashiraki
pushed a commit
to yuyashiraki/Private-ID
that referenced
this pull request
Sep 4, 2022
…esearch#77) Summary: Pull Request resolved: facebookresearch#77 # Context We found that AWS-SDK S3 API would fail when we try to write more than 5GB of data. It is a blocking us to do capacity testing for a larger FARGATE container. In this diff, as mentioned in [the post](https://fb.workplace.com/groups/pidmatchingxfn/posts/493743615908631), we are splitting union file based on number of rows. # Description We have made following changes. - Added new arg `s3api_max_rows` in the private-id-multi-key-client and private-id-multi-key-server binaries. We will use this to split a file for S3 upload. - Added an optional arg `num_split` in save_id_map() and writer_helper(). When `num_split` is specified, it would use the arg `path` as its prefix and save files in `{path}_0`, `{path}_1`, etc. - In rpc_server.rs and client.rs, calculates the num_split based on s3api_max_rows, and passes the num_split arg for S3 only. Then, for each split file, it calls copy_from_local(). Differential Revision: D39219674 fbshipit-source-id: 82dc1788b0d4db5cf9c3de07178b52a8cc11633c
This pull request was exported from Phabricator. Differential Revision: D39219674 |
yuyashiraki
force-pushed
the
export-D39219674
branch
from
September 4, 2022 22:34
4ba74f7
to
4f28018
Compare
Summary: # What * Add unit tests for encrypt and create_id_map funcion on partner side * Add create_key function to create fixed keys for testing. * encrypt and create_id_map function both use partner.private_keys.1 to encrypt. * self_permutation also needs to be fixed when we test create_id_map() # Why * need to improve code coverage Differential Revision: https://internalfb.com/D39127178 fbshipit-source-id: 22acb4c9d2d642b8df1348547098a7539f6ce7df
Summary: Pull Request resolved: facebookresearch#76 # What * Add unit tests for save_id_map funcion on partner side. * save_id_map function is called after the create_id_map(). * Add create_key function to create fixed keys for testing. * create_id_map function use partner.private_keys.1 to encrypt. * self_permutation also needs to be fixed when we test create_id_map(). * Create a temp file and pass the path to save_id_map() and check the string in the file is correct or not. # Why * need to improve code coverage Differential Revision: D39142927 fbshipit-source-id: 82884647935873fe1f2feef5b061f3cc5385bba2
…esearch#77) Summary: Pull Request resolved: facebookresearch#77 # Context We found that AWS-SDK S3 API would fail when we try to write more than 5GB of data. It is a blocking us to do capacity testing for a larger FARGATE container. In this diff, as mentioned in [the post](https://fb.workplace.com/groups/pidmatchingxfn/posts/493743615908631), we are splitting union file based on number of rows. # Description We have made following changes. - Added new arg `s3api_max_rows` in the private-id-multi-key-client and private-id-multi-key-server binaries. We will use this to split a file for S3 upload. - Added an optional arg `num_split` in save_id_map() and writer_helper(). When `num_split` is specified, it would use the arg `path` as its prefix and save files in `{path}_0`, `{path}_1`, etc. - In rpc_server.rs and client.rs, calculates the num_split based on s3api_max_rows, and passes the num_split arg for S3 only. Then, for each split file, it calls copy_from_local(). Differential Revision: D39219674 fbshipit-source-id: 871df40d1a377ef8115422e39a868a26e09e027d
This pull request was exported from Phabricator. Differential Revision: D39219674 |
yuyashiraki
force-pushed
the
export-D39219674
branch
from
September 4, 2022 22:45
4f28018
to
48c8aa6
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
fb-exported
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Context
We found that AWS-SDK S3 API would fail when we try to write more than 5GB of data. It is a blocking us to do capacity testing for a larger FARGATE container.
In this diff, as mentioned in the post, we are splitting union file based on number of rows.
Description
We have made following changes.
s3api_max_rows
in the private-id-multi-key-client and private-id-multi-key-server binaries. We will use this to split a file for S3 upload.num_split
in save_id_map() and writer_helper(). Whennum_split
is specified, it would use the argpath
as its prefix and save files in{path}_0
,{path}_1
, etc.Differential Revision: D39219674