Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of skew, add_prefix, add_suffix, round, count, std functions #15

Closed

Conversation

nipsn
Copy link
Contributor

@nipsn nipsn commented Dec 5, 2023

Feature

What does this change introduce?

An implementation of the skew function:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.skew.html#pandas.DataFrame.skew

An implementation of the add_prefix and add_suffix function:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_prefix.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_suffix.html

Observations:

  • In Pandas the index attribute does not change the result (only the columns are changed, not the indexes). In this implementation, prefixes or suffixes can be added to indexes in addition to columns.

An implementation of the round function:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.round.html#pandas.DataFrame.round

Observations:

  • Currently accepting a list as a way of setting the decimals is yet to be implemented.

An implementation of the count function:

Observations:

  • In accordance with the correspondence between q nulls and Python nulls established in the .pd conversion, the empty string is regarded as a null value for columns of type symbol.

An implementation of the std function:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.std.html#pandas.DataFrame.std

General

  • Has an example been added to demo the new feature?
  • Have existing examples been updated or tested?
  • Have you added any new Environment Variables/Configuration Options? If yes please tick the boxes below as applicable
    • Addition to reimporter logic within src/pykx/pykx.q and src/pykx/reimporter.py
    • Have updated the src/pykx/util.py logic which is used for environment variable
  • If there have been any dependency updates have they been reflected in all files?
    • pyproject.toml
    • docs/getting-started/installing.md
    • conda-recipe/meta.yaml
    • README.md
  • If any examples have been updated has it's associated .zip been updated

Code

  • Has all temporary code used during development been removed?
  • Has all commented out (unused) code been removed?
  • Where reasonable have you ensured there is no duplication of existing code?
  • If applicable for your use-case have you ensured that the code is performant?

Testing

  • Have unit tests been created or existing ones updated to test this new functionality?

Documentation

  • Has documentation been added for all public code?
  • Has a release note been included for the new feature?
  • Has any documentation which would benefit from this feature been updated to use the most up to date functionality?
  • If a new class has been added has a documentation stub .md file associated with it been created?
  • If any documentation page has been created has it been added to mkdocs.yml
  • Have you checked your changes with a spell checker? (US English)

@github-actions github-actions bot added documentation Improvements or additions to documentation python tests labels Dec 5, 2023
@cmccarthy1 cmccarthy1 assigned cmccarthy1 and nipsn and unassigned cmccarthy1 Dec 5, 2023
@nipsn
Copy link
Contributor Author

nipsn commented Dec 5, 2023

@cmccarthy1 @rianoc-kx
Please review our contributions and feel free to give any kind of feedback.
Thanks in advance.

@cmccarthy1
Copy link
Collaborator

Thanks @nipsn we'll aim to review before the end of the week and get back to you with any feedback/updates! Thanks for the contribution

@neutropolis
Copy link
Contributor

Hi @cmccarthy1 I just came back after my leave, please let me a few days to take a look at these changes before handing them back to you, just to check our progress. Thanks!

Comment on lines +488 to +489
elif axis == 1:
t = _pre_suf_fix_index(t, suffix, suf=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should support the conversions of this type for indexes unless the columns in the index are symbol/string types. Adding a suffix to integer/guid/float etc would have a number of implications on the symbol count within the processes which would have a memory impact. Additionally it will update the type of the columns which is possibly fine but probably worth requiring a user to convert to the symbol types before doing the index in that case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to remove this logic and just fail with a nyi error for this particular situation.

@@ -210,6 +234,27 @@ def abs(self, numeric_only=False):
tab = _get_numeric_only_subtable(self)
return q.abs(tab)

@api_return
def round(self, decimals: Union[int, Dict[str, int]] = 0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs a couple of changes before it could be merged, at present it only handles float values but should probably also handle real values also (type -8h).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further reviewing, we could see that round is tricky, and we've decided to remove it from this contribution.

generate_ops:{[vdic]
tuples:{flip(2;count[x])#key[x],value[x]}[vdic];
key[vdic]!({(({"F"$.Q.f[y]x}[;x[1]])';x[0])}')tuples};
get_float_cols:{(key[ct]@where 9=value[ct:abs type each first x])};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using first to determine the type of a column isn't foolproof as you may be dealing with a column that is a mixed list, for example

([]10?1f;10?1f;1f,9?`a`b`c)

This would determine column x2 to be a float column rather than a mixed list of type 0h

A more general way of doing this would be the following and it allows for extension to other types (reals in this case)

get_cols:{metaTab:0!meta x;metaTab[`c]where metaTab[`t]in y}

This can then be called passing in "fe" along with the table to retrieve the float and real columns of the table

get_cols[table;"fe"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, refer to previous comment.

@cmccarthy1
Copy link
Collaborator

No issues @neutropolis I've added a couple of small comments to ponder but will hold off digging deeper until after you've reviewed further

@neutropolis
Copy link
Contributor

neutropolis commented Dec 19, 2023

Hi @cmccarthy1 sorry for the delay, it took us longer than expected to review the code. In essence, we've decided to remove round from this contribution and put it off for the near future, since we've started to work in other functions from the pandas api that we hope to share with you in the near future. Beyond that, all methods and tests have been polished and we think the new version is simpler and more elegant. Unfortunately, we could see that there was some kind of problem while cherry picking commits, which made it very hard to review the changes at the test file. Therefore, we are about to create a new PR (#16) and close the present one.

@nipsn nipsn closed this Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation python tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants