-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFlow Activity Support #140
Comments
Hi @deenairn, Thank you for the feature request! In the coming weeks I'll set aside some time to play with DataFlow and see if it makes sense to add such feature to the framework. |
I think it would really enhance the value of the framework, but I did start thinking about it and realised this isn't a task I can just do in my spare time as it could get complicated. Happy to help if I can though. |
@deenairn. could you share with us an example dataflow json definition? |
@arjendev - this is a trivial example of an ADF data flow, querying REST API for the public OData service https://services.odata.org/TripPinRESTierService. See the docs at: https://www.odata.org/odata-services/ Input: JSON, multiple layers of nesting There's literally no logic here, just flattening a bit of JSON data into a tabular CSV data. In the simplest case, you could provide a bit of JSON that could have properties multiple layers deep, then check that it's reformatted to a CSV (or any tabular output) as expected. Once this sort of functionality is achieved, you could focus on more logic elements (under certain conditions, the data is filtered / resharped based on data flow conditions). Does this make sense? |
@deenairn, yes, thank you for sharing! Being not familiar with Data Flow Script (DFS) myself, it seems like a language that abstracts away the underlying Spark SQL runtime. Unfortunately, the Data Factory Testing Framework is currently only able to evaluate the Data Factory Expression Language (DFEL) which is built on top of the Logic Apps Expression Language. A testing framework for DFS thus requires a completely different approach I am afraid. I am also not sure if the DFS language is publicly available. Another approach would be to rebuild the language in Python, as we did before with this Testing Framework for the DFEL. However, given the more complicated nature of DFS compared to DFEL and the amount of functions available (even more complicated), I am afraid it is a lot of work. |
@arjendev - that's a shame, it would be a real help for our project, I did consider what it might take to implement it and realised it would be a lot of work, unless I misunderstood what was necessary. I still think it would be a great addition if it were possible. |
@deenairn - we've had a discussion with our v-team working on this framework and unfortunately we indeed decided to not further pursue DataFlow support at this time. If that changes in the future, we will inform you with a reply on this issue. Thank you for your time in identifying some issues and coming up with feature suggestions, appreciated! |
The framework looks like it does a great job of testing individual activities in Pipelines.
Feature Request
However, the ADF DataFlow Activity can have some quite complex transformations in place, and it would be great if there were a way to substitute the DataFlow data source for a fixed string, and then test that the transformation against this (JSON for JSON data type, array of arrays for delimited text or database types, XML for XML data types, etc) so you can provide a reasonable unit test for complex transformations of data.
i.e. for a DataFlow that takes in a JSON data type in Blob Storage and outputs to a Delimited Text in Blob Storage, you can do something like
setup activity with simple JSON like:
[ { "name": "Donald", "age": 21 }]
assert that it returns what you expect via checking against a CSV like
The text was updated successfully, but these errors were encountered: