You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Targets (data type handling, batching, SQL object generation, tests, etc.)
Description
Some targets in the wild already support a version of this. In particular, this is what target-bigquery has to say:
It is the most versatile target for BigQuery. Extremely performant, resource efficient, and fast in all configurations enabling 20 different ingestion patterns. Denormalized variants indicate data is unpacked during load with a resultant schema in BigQuery based on the tap schema. Non-denormalized means we have a fixed schema which loads all data into an unstructured JSON column. They are both useful patterns. The latter allowing BigQuery to work with schemaless or rapidly changing sources such as MongoDB instantly, while the former is more performant and convenient to start modeling quickly.
Current support
This is almost supported by stream maps at the moment of writing. Consider the countries sample in the SDK repo and the following stream maps configuration:
Note that in the above SCHEMA message, the _data field is of type string. This is because record is not recognized as a special name and the type inference logic falls back to the default string type:
With this stream maps example as reference, it's a bit unclear what we need to do (in addition to addressing the improvements above). I can think of two options, not exclusive:
Better stream maps documentation
A target-level configuration that bypasses stream maps
The content you are editing has changed. Please copy your edits and refresh the page.
Related to #699 and #1350 and #1883 and #1888. This approach would probably require a way to use wild card selection to say "all streams should be mapped in this way".
Feature scope
Targets (data type handling, batching, SQL object generation, tests, etc.)
Description
Some targets in the wild already support a version of this. In particular, this is what target-bigquery has to say:
Current support
This is almost supported by stream maps at the moment of writing. Consider the countries sample in the SDK repo and the following stream maps configuration:
Then run
poetry run python samples/sample_tap_countries --config stream_maps.json > mapped_countries.singer.jsonl
which contains in the following messages
SCHEMA message
RECORD message
Needed improvements
Note that in the above SCHEMA message, the
_data
field is of typestring
. This is becauserecord
is not recognized as a special name and the type inference logic falls back to the default string type:sdk/singer_sdk/mapper.py
Lines 370 to 384 in 759c77b
This is relatively easy to address: bug: A field referencing
record
in streams maps gets incorrectly mapped to a string type in the schema #1882There isn't a way to apply the mapping to all streams. Would be helped by Generic StreamMap for all streams #699
Feature request
With this stream maps example as reference, it's a bit unclear what we need to do (in addition to addressing the improvements above). I can think of two options, not exclusive:
Tasks
record
in streams maps gets incorrectly mapped to a string type in the schema #1882The text was updated successfully, but these errors were encountered: