Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#305 getAncestors Database functionality. #311

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
/*
* Copyright 2021 ABSA Group Limited
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

CREATE OR REPLACE FUNCTION runs.get_ancestors(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would name it to runs.get_partitioning_ancestors, otherwise the name is little ambiguous.

IN i_id_partitioning BIGINT,
IN i_limit INT DEFAULT 5,
IN i_offset BIGINT DEFAULT 0,
OUT status INTEGER,
OUT status_text TEXT,
OUT ancestorid BIGINT,
OUT partitioning JSONB,
OUT author TEXT,
OUT has_more BOOLEAN
) RETURNS SETOF record AS
$$
-------------------------------------------------------------------------------
--
-- Function: runs.get_ancestors(3)
-- Returns Ancestors' partition ID for the given id
ABLL526 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- Returns Ancestors' partition ID for the given id
-- Returns the ids and partitionings of the ancestors of the given partitioin id

I think this explains the content better.

--
-- Parameters:
-- i_id_partitioning - id that we asking the Ancestors for
-- i_limit - (optional) maximum number of partitionings to return, default is 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important:
Don't we used 10 as the default limit in our functions?

-- i_offset - (optional) offset to use for pagination, default is 0
--
-- Returns:
-- status - Status code
-- status_text - Status message
-- ancestorid - ID of Ancestor partition
-- partitioning - partitioning data of ancestor
-- author - author of the Ancestor partitioning
-- has_more - Flag indicating if there are more partitionings available

-- Status codes:
-- 11 - OK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important I suppose but still, according to https://github.com/AbsaOSS/fa-db/blob/master/core/src/main/scala/za/co/absa/db/fadb/status/README.md we would maybe want to use status 10 instead of 11.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as mentioned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see 11. 😉

-- 41 - Partitioning not found
-- 42 - Ancestor Partitioning not found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need for this status (and error one furthermore). If no ancestors found, it's OK, simple an empty list (particularly with paging).

--
-------------------------------------------------------------------------------
DECLARE
partitionCreateAt TIMESTAMP;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitioning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as mentioned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the local variables start with _ by convention - avoids confusion with OUT parameters and column names.

_has_more BOOLEAN;

BEGIN
-- Check if the partitioning exists
PERFORM 1 FROM runs.partitionings WHERE id_partitioning = i_id_partitioning;
IF NOT FOUND THEN
status := 41;
status_text := 'Partitioning not found';
RETURN NEXT;
RETURN;
END IF;

-- Get the creation timestamp of the partitioning
SELECT created_at
FROM runs.partitionings
WHERE id_partitioning = i_id_partitioning
INTO partitionCreateAt;

-- Check if there are more partitionings than the limit
SELECT count(*) > i_limit
FROM flows.partitioning_to_flow PTF
WHERE PTF.fk_flow IN (
SELECT fk_flow
FROM flows.partitioning_to_flow
WHERE fk_partitioning = i_id_partitioning
)
LIMIT i_limit + 1 OFFSET i_offset
INTO _has_more;

-- Return the ancestors
RETURN QUERY
SELECT
11 AS status,
'OK' AS status_text,
P.id_partitioning AS ancestorid,
P.partitioning AS partitioning,
P.created_by AS author,
_has_more AS has_more
FROM
runs.partitionings P
INNER JOIN flows.partitioning_to_flow PF ON PF.fk_partitioning = P.id_partitioning
INNER JOIN flows.partitioning_to_flow PF2 ON PF2.fk_flow = PF.fk_flow
WHERE
PF2.fk_partitioning = i_id_partitioning
AND
P.created_at < partitionCreateAt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this condition?
Actually I think the whole query is incorrect, unfortunately.
It should be

FROM
            flows.partitioning_to_flow PF
                INNER JOIN flows.flows F ON F.id_flow = PF.id_flow
                INNER JOIN runs.partitionings P ON P.id_partitioning = F.fk_primary_partitioning
        WHERE
            PF.fk_partitioning = i_id_partitioning AND
            P.id_partitioning IS DISTINCT FROM i_id_partitioning 

GROUP BY P.id_partitioning
ORDER BY P.id_partitioning, P.created_at DESC
LIMIT i_limit
OFFSET i_offset;

IF FOUND THEN
Copy link
Collaborator

@salamonpavel salamonpavel Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You return status already from the query. And there is no reason to return 42. There are no records returned if ancestors don't exist. Have a look at runs.get_partitioning_checkpoints.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We then simply process the data as

if (results.nonEmpty && results.head.hasMore) ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense although. From runs.get_paritioning_checkpoint_v2 it has a similar logic to this.
What I will do is comment it out for now and determine if it is necessary.

status := 11;
status_text := 'OK';
ELSE
status := 42;
status_text := 'Ancestor Partitioning not found';
END IF;
RETURN NEXT;
RETURN;

END;
$$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

ALTER FUNCTION runs.get_ancestors(BIGINT, INT, BIGINT) OWNER TO atum_owner;
GRANT EXECUTE ON FUNCTION runs.get_ancestors(BIGINT, INT, BIGINT) TO atum_user;
Loading
Loading