Initial commit for GCS Batch Source plugin metadata feature. #1612
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change extends functionality of GCS Batch Source plugin. There were added two new plugin parameters here:
If a customer set values for these parameters, the plugin output schema will be extended with additional fields with appropriate names. In general the plugin works with these parameters in the same way as with "Path Field" parameter. Additionally was fixed "GET SCHEMA" functionality when "Path Field"/"Length Field"/"Modification Time Field" are set.
This change is supplied in two PRs, the second is .
There are open questions regarding the proposed change:
Should they be fixed before PR or there is no difference?
File Batch Source plugin had "GET SCHEMA" button for all formats in 2.7.1 version. But now this button is available for "delimited" format only. Does this functionality regress make any sense? (just faced this behavior during testing)
Noticed that in fact "length" and "modification time" fields functionality was implemented for all "AbstractFileSource" batch source plugins. Does it make sense to add these fields to File Batch Source plugin?
Initially all these changes were intended to create a customized GCS Source plugin which can be added to the current 6.5.1 CDAP cluster. As there were some changes of interfaces, it was required to support back-compatibility with unchanged version of File Batch Source and other possible plugins based on AbstractFileSource. Do we need this functionality now in future 6.7.0? (for example if we want use 6.7.0 CDAP with some old plugin based on AbstractFileSource and which uses old PathTrackingInputFormat.createRecordReader() without length and modification time support)