Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(targets): Support loading records into a single column/field in the target #1883

Closed
2 tasks done
edgarrmondragon opened this issue Jul 28, 2023 · 3 comments
Closed
2 tasks done
Labels
kind/Feature New feature or request valuestream/SDK

Comments

@edgarrmondragon
Copy link
Collaborator

edgarrmondragon commented Jul 28, 2023

Feature scope

Targets (data type handling, batching, SQL object generation, tests, etc.)

Description

Some targets in the wild already support a version of this. In particular, this is what target-bigquery has to say:

It is the most versatile target for BigQuery. Extremely performant, resource efficient, and fast in all configurations enabling 20 different ingestion patterns. Denormalized variants indicate data is unpacked during load with a resultant schema in BigQuery based on the tap schema. Non-denormalized means we have a fixed schema which loads all data into an unstructured JSON column. They are both useful patterns. The latter allowing BigQuery to work with schemaless or rapidly changing sources such as MongoDB instantly, while the former is more performant and convenient to start modeling quickly.

Current support

This is almost supported by stream maps at the moment of writing. Consider the countries sample in the SDK repo and the following stream maps configuration:

{
    "stream_maps": {
        "countries": {
            "_data": "record",
            "__key_properties__": [],
            "__else__": null
        }
    },
    "stream_map_config": {}
}

Then run

poetry run python samples/sample_tap_countries --config stream_maps.json > mapped_countries.singer.jsonl

which contains in the following messages

  • SCHEMA message

    {
      "type": "SCHEMA",
      "stream": "countries",
      "schema": {
        "type": "object",
        "properties": {
          "_data": {
            "type": [
              "string",
              "null"
            ]
          }
        }
      },
      "key_properties": [],
      "bookmark_properties": []
    }
  • RECORD message

    {
      "type": "RECORD",
      "stream": "countries",
      "record": {
        "_data": {
          "code": "AD",
          "name": "Andorra",
          "native": "Andorra",
          "phone": "376",
          "continent": {
            "code": "EU",
            "name": "Europe"
          },
          "capital": "Andorra la Vella",
          "currency": "EUR",
          "languages": [
            {
              "code": "ca",
              "name": "Catalan"
            }
          ],
          "emoji": "\ud83c\udde6\ud83c\udde9"
        }
      },
      "time_extracted": "2023-07-28T20:03:49.448913+00:00"
    }

Needed improvements

Feature request

With this stream maps example as reference, it's a bit unclear what we need to do (in addition to addressing the improvements above). I can think of two options, not exclusive:

  • Better stream maps documentation
  • A target-level configuration that bypasses stream maps

Tasks

  1. kind/Bug valuestream/SDK
  2. Accepting Pull Requests kind/Feature
@edgarrmondragon
Copy link
Collaborator Author

cc @tayloramurphy

@pnadolny13
Copy link
Contributor

Related to #699 and #1350 and #1883 and #1888. This approach would probably require a way to use wild card selection to say "all streams should be mapped in this way".

@pnadolny13 pnadolny13 moved this from Discussed to To Discuss in Office Hours Jan 31, 2024
@edgarrmondragon edgarrmondragon moved this from To Discuss to Up Next in Office Hours Jan 31, 2024
@edgarrmondragon edgarrmondragon moved this from Up Next to Discussed in Office Hours Feb 14, 2024
@edgarrmondragon
Copy link
Collaborator Author

This is supported! See MeltanoLabs/meltano-map-transform#255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/Feature New feature or request valuestream/SDK
Projects
Archived in project
Development

No branches or pull requests

2 participants