-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-http plugin #59
base: develop
Are you sure you want to change the base?
Add multi-http plugin #59
Conversation
Hi @VictorD-Veolia , Thanks for creating this pull request. Apologies for the delay Could you please create a short document with the use cases for this, but more importantly, how this plugin addresses performance? Making HTTP requests for every record could be quite expensive and that was the reason why this functionality wasn't built into CDF yet. If you can elaborate on how this is handled, that would be great. Thanks, |
Hi @bdmogal, The typical use case for this feature is for API that works with two calls.
If we want to retrieve the details of all users, we will need to do a GET to the first API, using the existing HTTP Source plugin, and then a second HTTP Get for every user of the retrieved list, using this multi-HTTP plugin. This usecase is pretty common, for example with the 360learning API : https://api.360learning.com/#24c393ca-8e38-4ba7-a55e-0894c308d72d. For the performance part, I have added an optional throttling feature to avoid overwhelming fragile APIs. This feature allows the user to limit the number of calls per seconds made with the plugin. |
i was wondering if there was any interest towards getting this merged in ? Definitely a very useful feature our team at LiveRamp could leverage as well |
…P plugins + rearrange configuration widget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach won't work nicely for long lists as transform is not supposed to produce huge output for every input record. This will produce OOMs as Emitter
is a push-interface, so it effectively caches everything that is emitted for given record.
Some more relevant details: |
Also from the architecture standpoint I don't think this is the proper way to go forward. Basically, the intent is to have a dataset as a list of parameters to a source. This should not be done on plugin level as we don't want to have a Transform variant of each Source. Such capability should be provided on ETL platform level |
Add a transform-type plugin that perform an HTTP request for every input record received. The URL used for the request can be dynamically modified depending on input record value through a placeholder mechanism.