Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server/pypika): fix aliases generated for split step [TCTC-10029] [TCTC-10030] #2339

Merged
merged 4 commits into from
Feb 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion server/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Changelog (weaverbird python package)


## Unreleased

### Fixed

- Pypika: The split step can now be followed by other steps for Google Big Query

## [0.50.0] - 2025-02-03

### Changed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ def split(

safe_offset = CustomFunction("SAFE_OFFSET", ["index"])

splitted_cols_aliases = []

# Sub-optimal, could do that in two sub_queries, one for splitting to a temp array col, and
# another one to select eveything needed from the array col rather than splitting N times
def gen_splitted_cols():
Expand All @@ -155,13 +157,13 @@ def gen_splitted_cols():
#
# The IfNull is required because other backends use SPLIT_PART, which will return an
# empty string rather than NULL
yield functions.IfNull(LiteralValue(f"{split_str}[{safe_offset_str}]"), "").as_(
f"{step.column}_{i + 1}"
)
splitted_col_alias = f"{step.column}_{i + 1}"
splitted_cols_aliases.append(splitted_col_alias)
yield functions.IfNull(LiteralValue(f"{split_str}[{safe_offset_str}]"), "").as_(splitted_col_alias)

splitted_cols = list(gen_splitted_cols())
query: QueryBuilder = prev_step_table.select(*columns, *splitted_cols)
return StepContext(query, columns + splitted_cols)
return StepContext(query, columns + splitted_cols_aliases)

@classmethod
def _date_trunc(cls, date_part: str, target_column: Field) -> Term:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
exclude:
- mongo
- pandas
- snowflake
step:
pipeline:
- name: convert
columns:
- brewing_date
dataType: text
- name: split
column: brewing_date
delimiter: '-'
numberColsToKeep: 2
- name: rename
toRename:
- - brewing_date_1
- renamed_column_1
- - brewing_date_2
- renamed_column_2
- name: select
columns:
- renamed_column_1
- renamed_column_2
expected:
schema:
pandas_version: 1.5.0
fields:
- name: renamed_column_1
type: string
- name: renamed_column_2
type: string
data:
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
- renamed_column_1: '2022'
renamed_column_2: '01'
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
exclude:
- mongo
- pandas
- snowflake
step:
pipeline:
- name: convert
columns:
- brewing_date
dataType: text
- name: split
column: brewing_date
delimiter: '-'
numberColsToKeep: 2
- name: replace
searchColumn: brewing_date_2
toReplace:
- - '01'
- 'HELLO'
- name: text
newColumn: 'bye'
text: 'bye'
- name: select
columns:
- brewing_date_2
- bye
expected:
schema:
pandas_version: 1.5.0
fields:
- name: brewing_date_2
type: string
- name: bye
type: string
data:
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
- brewing_date_2: 'HELLO'
bye: 'bye'
Original file line number Diff line number Diff line change
@@ -1,26 +1,3 @@
"""
BigQuery free DBs have tables that expire after 60 days.
If the table "beers.beers_tiny" is expired, re-create it:
- open the BigQuery console https://console.cloud.google.com/bigquery?project=biquery-integration-tests&ws=!1m4!1m3!3m2!1sbiquery-integration-tests!2sbeers
- use "create table", choose "Upload" and use the `beers-bigquery.csv` file available [here](https://github.com/ToucanToco/weaverbird/pull/1835#issuecomment-1647810149)
- name the table "beers" and check "Edit text" for the schema
- fill the schema with:
```
price_per_l:FLOAT,
alcohol_degree:FLOAT,
name:STRING,
cost:FLOAT,
beer_kind:STRING,
volume_ml:FLOAT,
brewing_date:DATE,
nullable_name:STRING
```
- run the query:
``
`CREATE TABLE `beers.beers_tiny` AS SELECT * FROM `beers.beers` ORDER BY brewing_date LIMIT 10
```
"""

import json
from io import StringIO
from os import environ
Expand Down
2 changes: 1 addition & 1 deletion server/uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.