feat(targets): Support loading records into a single column/field in the target #1883

edgarrmondragon · 2023-07-28T20:36:11Z

Feature scope

Targets (data type handling, batching, SQL object generation, tests, etc.)

Description

Some targets in the wild already support a version of this. In particular, this is what target-bigquery has to say:

It is the most versatile target for BigQuery. Extremely performant, resource efficient, and fast in all configurations enabling 20 different ingestion patterns. Denormalized variants indicate data is unpacked during load with a resultant schema in BigQuery based on the tap schema. Non-denormalized means we have a fixed schema which loads all data into an unstructured JSON column. They are both useful patterns. The latter allowing BigQuery to work with schemaless or rapidly changing sources such as MongoDB instantly, while the former is more performant and convenient to start modeling quickly.

Current support

This is almost supported by stream maps at the moment of writing. Consider the countries sample in the SDK repo and the following stream maps configuration:

{
    "stream_maps": {
        "countries": {
            "_data": "record",
            "__key_properties__": [],
            "__else__": null
        }
    },
    "stream_map_config": {}
}

Then run

poetry run python samples/sample_tap_countries --config stream_maps.json > mapped_countries.singer.jsonl

which contains in the following messages

SCHEMA message

{
  "type": "SCHEMA",
  "stream": "countries",
  "schema": {
    "type": "object",
    "properties": {
      "_data": {
        "type": [
          "string",
          "null"
        ]
      }
    }
  },
  "key_properties": [],
  "bookmark_properties": []
}

RECORD message

{
  "type": "RECORD",
  "stream": "countries",
  "record": {
    "_data": {
      "code": "AD",
      "name": "Andorra",
      "native": "Andorra",
      "phone": "376",
      "continent": {
        "code": "EU",
        "name": "Europe"
      },
      "capital": "Andorra la Vella",
      "currency": "EUR",
      "languages": [
        {
          "code": "ca",
          "name": "Catalan"
        }
      ],
      "emoji": "\ud83c\udde6\ud83c\udde9"
    }
  },
  "time_extracted": "2023-07-28T20:03:49.448913+00:00"
}

Needed improvements

Note that in the above SCHEMA message, the _data field is of type string. This is because record is not recognized as a special name and the type inference logic falls back to the default string type:

sdk/singer_sdk/mapper.py

Lines 370 to 384 in 759c77b

    
           default = default or StringType() 
        
           if expr.startswith("float("): 
        
               return NumberType() 
        
           if expr.startswith("int("): 
        
               return IntegerType() 
        
           if expr.startswith("str("): 
        
               return StringType() 
        
           if expr[0] == "'" and expr[-1] == "'": 
        
               return StringType() 
        
           return default

This is relatively easy to address: bug: A field referencing record in streams maps gets incorrectly mapped to a string type in the schema #1882

There isn't a way to apply the mapping to all streams. Would be helped by Generic StreamMap for all streams #699

Feature request

With this stream maps example as reference, it's a bit unclear what we need to do (in addition to addressing the improvements above). I can think of two options, not exclusive:

Better stream maps documentation
A target-level configuration that bypasses stream maps

Tasks

Give feedback

bug: A field referencing record in streams maps gets incorrectly mapped to a string type in the schema #1882

kind/Bug valuestream/SDK
Generic StreamMap for all streams #699

Accepting Pull Requests kind/Feature
Options

The text was updated successfully, but these errors were encountered:

edgarrmondragon · 2023-07-28T20:36:20Z

cc @tayloramurphy

pnadolny13 · 2023-08-03T15:04:16Z

Related to #699 and #1350 and #1883 and #1888. This approach would probably require a way to use wild card selection to say "all streams should be mapped in this way".

edgarrmondragon · 2024-05-10T00:14:29Z

This is supported! See MeltanoLabs/meltano-map-transform#255

edgarrmondragon added kind/Feature New feature or request valuestream/SDK labels Jul 28, 2023

edgarrmondragon mentioned this issue Jul 28, 2023

feat: Support writing full record to a single variant column MeltanoLabs/target-snowflake#100

Open

edgarrmondragon added this to Office Hours Jul 31, 2023

github-project-automation bot moved this to To Discuss in Office Hours Jul 31, 2023

edgarrmondragon moved this from To Discuss to Up Next in Office Hours Aug 2, 2023

edgarrmondragon moved this from Up Next to Discussed in Office Hours Aug 2, 2023

pnadolny13 mentioned this issue Aug 3, 2023

feat(mappers): Added support for glob patterns in source stream names #1888

Merged

pnadolny13 moved this from Discussed to To Discuss in Office Hours Jan 31, 2024

edgarrmondragon moved this from To Discuss to Up Next in Office Hours Jan 31, 2024

edgarrmondragon moved this from Up Next to Discussed in Office Hours Feb 14, 2024

edgarrmondragon closed this as completed May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(targets): Support loading records into a single column/field in the target #1883

feat(targets): Support loading records into a single column/field in the target #1883

edgarrmondragon commented Jul 28, 2023 •

edited

Loading

Tasks

edgarrmondragon commented Jul 28, 2023

pnadolny13 commented Aug 3, 2023

edgarrmondragon commented May 10, 2024

feat(targets): Support loading records into a single column/field in the target #1883

feat(targets): Support loading records into a single column/field in the target #1883

Comments

edgarrmondragon commented Jul 28, 2023 • edited Loading

Feature scope

Description

Current support

Needed improvements

Feature request

Tasks

edgarrmondragon commented Jul 28, 2023

pnadolny13 commented Aug 3, 2023

edgarrmondragon commented May 10, 2024

edgarrmondragon commented Jul 28, 2023 •

edited

Loading