Addition of skew, add_prefix, add_suffix, round, count, std functions #15

nipsn · 2023-12-05T08:52:47Z

Feature

Please insert link to associated issue here: Addition of skew, add_prefix, add_suffix, count, std functions #14

What does this change introduce?

An implementation of the skew function:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.skew.html#pandas.DataFrame.skew

An implementation of the add_prefix and add_suffix function:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_prefix.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_suffix.html

Observations:

In Pandas the index attribute does not change the result (only the columns are changed, not the indexes). In this implementation, prefixes or suffixes can be added to indexes in addition to columns.

An implementation of the round function:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.round.html#pandas.DataFrame.round

Observations:

Currently accepting a list as a way of setting the decimals is yet to be implemented.

An implementation of the count function:

Observations:

In accordance with the correspondence between q nulls and Python nulls established in the .pd conversion, the empty string is regarded as a null value for columns of type symbol.

An implementation of the std function:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.std.html#pandas.DataFrame.std

General

Code

Has all temporary code used during development been removed?
Has all commented out (unused) code been removed?
Where reasonable have you ensured there is no duplication of existing code?
If applicable for your use-case have you ensured that the code is performant?

Testing

Have unit tests been created or existing ones updated to test this new functionality?

Documentation

Has documentation been added for all public code?
Has a release note been included for the new feature?
Has any documentation which would benefit from this feature been updated to use the most up to date functionality?
If a new class has been added has a documentation stub .md file associated with it been created?
If any documentation page has been created has it been added to mkdocs.yml
Have you checked your changes with a spell checker? (US English)

nipsn · 2023-12-05T08:57:05Z

@cmccarthy1 @rianoc-kx
Please review our contributions and feel free to give any kind of feedback.
Thanks in advance.

cmccarthy1 · 2023-12-05T09:03:09Z

Thanks @nipsn we'll aim to review before the end of the week and get back to you with any feedback/updates! Thanks for the contribution

neutropolis · 2023-12-06T12:43:37Z

Hi @cmccarthy1 I just came back after my leave, please let me a few days to take a look at these changes before handing them back to you, just to check our progress. Thanks!

cmccarthy1 · 2023-12-07T11:30:47Z

src/pykx/pandas_api/pandas_indexing.py

+            elif axis == 1:
+                t = _pre_suf_fix_index(t, suffix, suf=True)


I don't think we should support the conversions of this type for indexes unless the columns in the index are symbol/string types. Adding a suffix to integer/guid/float etc would have a number of implications on the symbol count within the processes which would have a memory impact. Additionally it will update the type of the columns which is possibly fine but probably worth requiring a user to convert to the symbol types before doing the index in that case

We've decided to remove this logic and just fail with a nyi error for this particular situation.

cmccarthy1 · 2023-12-07T11:48:14Z

src/pykx/pandas_api/pandas_meta.py

@@ -210,6 +234,27 @@ def abs(self, numeric_only=False):
            tab = _get_numeric_only_subtable(self)
        return q.abs(tab)

+    @api_return
+    def round(self, decimals: Union[int, Dict[str, int]] = 0):


This function needs a couple of changes before it could be merged, at present it only handles float values but should probably also handle real values also (type -8h).

After further reviewing, we could see that round is tricky, and we've decided to remove it from this contribution.

cmccarthy1 · 2023-12-07T11:51:43Z

src/pykx/pandas_api/pandas_meta.py

+            generate_ops:{[vdic]
+                tuples:{flip(2;count[x])#key[x],value[x]}[vdic];
+                key[vdic]!({(({"F"$.Q.f[y]x}[;x[1]])';x[0])}')tuples};
+            get_float_cols:{(key[ct]@where 9=value[ct:abs type each first x])};


Using first to determine the type of a column isn't foolproof as you may be dealing with a column that is a mixed list, for example

([]10?1f;10?1f;1f,9?`a`b`c)

This would determine column x2 to be a float column rather than a mixed list of type 0h

A more general way of doing this would be the following and it allows for extension to other types (reals in this case)

get_cols:{metaTab:0!meta x;metaTab[`c]where metaTab[`t]in y}

This can then be called passing in "fe" along with the table to retrieve the float and real columns of the table

get_cols[table;"fe"]

Please, refer to previous comment.

cmccarthy1 · 2023-12-07T11:54:52Z

No issues @neutropolis I've added a couple of small comments to ponder but will hold off digging deeper until after you've reviewed further

neutropolis · 2023-12-19T10:30:24Z

Hi @cmccarthy1 sorry for the delay, it took us longer than expected to review the code. In essence, we've decided to remove round from this contribution and put it off for the near future, since we've started to work in other functions from the pandas api that we hope to share with you in the near future. Beyond that, all methods and tests have been polished and we think the new version is simpler and more elegant. Unfortunately, we could see that there was some kind of problem while cherry picking commits, which made it very hard to review the changes at the test file. Therefore, we are about to create a new PR (#16) and close the present one.

marcosvm13 and others added 5 commits December 4, 2023 17:28

add_prefix function and add_sufix function implementations

8fa87c0

add count implementation

9f1959f

add round implementation

1e1d4de

add std implementation

42470c8

add skew implementation

f51d92d

github-actions bot added documentation Improvements or additions to documentation python tests labels Dec 5, 2023

cmccarthy1 assigned cmccarthy1 and nipsn and unassigned cmccarthy1 Dec 5, 2023

cmccarthy1 requested review from cmccarthy1, rianoc-kx and kshepherdkx December 5, 2023 08:56

cmccarthy1 reviewed Dec 7, 2023

View reviewed changes

nipsn closed this Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of skew, add_prefix, add_suffix, round, count, std functions #15

Addition of skew, add_prefix, add_suffix, round, count, std functions #15

nipsn commented Dec 5, 2023

nipsn commented Dec 5, 2023

cmccarthy1 commented Dec 5, 2023

neutropolis commented Dec 6, 2023

cmccarthy1 Dec 7, 2023

neutropolis Dec 19, 2023

cmccarthy1 Dec 7, 2023

neutropolis Dec 19, 2023

cmccarthy1 Dec 7, 2023

neutropolis Dec 19, 2023

cmccarthy1 commented Dec 7, 2023

neutropolis commented Dec 19, 2023 •

edited

Loading

Addition of skew, add_prefix, add_suffix, round, count, std functions #15

Addition of skew, add_prefix, add_suffix, round, count, std functions #15

Conversation

nipsn commented Dec 5, 2023

Feature

What does this change introduce?

General

Code

Testing

Documentation

nipsn commented Dec 5, 2023

cmccarthy1 commented Dec 5, 2023

neutropolis commented Dec 6, 2023

cmccarthy1 Dec 7, 2023

Choose a reason for hiding this comment

neutropolis Dec 19, 2023

Choose a reason for hiding this comment

cmccarthy1 Dec 7, 2023

Choose a reason for hiding this comment

neutropolis Dec 19, 2023

Choose a reason for hiding this comment

cmccarthy1 Dec 7, 2023

Choose a reason for hiding this comment

neutropolis Dec 19, 2023

Choose a reason for hiding this comment

cmccarthy1 commented Dec 7, 2023

neutropolis commented Dec 19, 2023 • edited Loading

neutropolis commented Dec 19, 2023 •

edited

Loading