Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emr-dynamodb-tools-*-jar-with-dependencies.jar export DynamoDB to a valid JSON format file in S3 #147

Open
alanamircruz opened this issue Apr 19, 2021 · 1 comment

Comments

@alanamircruz
Copy link

alanamircruz commented Apr 19, 2021

Hello dear community and friends here by AWS Labs,

I hope that everything is going great with you. Now I am facing an interesting issue, as I am not able to find a way to tell the Export Step in my Data Pipeline (EMR) to export the data (from a DynamoDB) into a valid JSON file format instead of a "plain file with JSON rows" in S3.

Let me know if I can assist to allow an Export Data Format for JSON (valid one) plus the current CSV or TSV, and so on.

EMR Activity - Export (Step)
image

Current file generated in S3 through Data Pipeline (missing array annotation, commas between rows, plus JSON file ending)
image

Thanks in advance for any info or insights,
Alan

PD: I checked the source code for the jar used by the intended Step, but no info found.

@sivankumar86
Copy link
Contributor

You can use Glue ETL which has a wrapper to convert to desired format in export job
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-dynamodb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants