Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FILLNULL command in PPL (#3032) #3075

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

normanj-bitquill
Copy link
Contributor

@normanj-bitquill normanj-bitquill commented Oct 16, 2024

Description

Adds the FILLNULL command for PPL. FILLNULL will replace NULL values in specified fields.

Related Issues

Resolves #3032
Based on this PR for Spark: opensearch-project/opensearch-spark#723

Check List

  • [Y] New functionality includes testing.
  • [Y] New functionality has been documented.
  • [Y] New functionality has javadoc added.
  • [Y] New functionality has a user manual doc added.
  • [N/A] API changes companion pull request created.
  • [Y] Commits are signed per the DCO using --signoff.
  • [?] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

An example query using fillnull.

    os> source=accounts | fields email, employer | fillnull with '<not found>' in email ;
    fetched rows / total rows = 4/4
    +-----------------------+----------+
    | email                 | employer |
    |-----------------------+----------|
    | [email protected]  | Pyrami   |
    | [email protected] | Netagy   |
    | <not found>           | Quility  |
    | [email protected]   | null     |
    +-----------------------+----------+

Copy link
Collaborator

@MaxKsyunz MaxKsyunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Could you add some integration tests? They live here.

YANG-DB
YANG-DB previously approved these changes Oct 18, 2024
Copy link

@jduo jduo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an ExplainTest to validate that you get an Eval physical plan when using FILLNULL.

@normanj-bitquill
Copy link
Contributor Author

@MaxKsyunz I have added some integration tests.

@normanj-bitquill
Copy link
Contributor Author

@jduo I have added an explain test.

Comment on lines 54 to 63
os> source=accounts | fields email, host | fillnull using email = '<not found>', host = '<no host>' ;
fetched rows / total rows = 4/4
+-----------------------+------------+
| email | host |
|-----------------------+------------|
| [email protected] | pyrami.com |
| [email protected] | netagy.com |
| <not found> | |
| [email protected] | boink.com |
+-----------------------+------------+

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing <no host> from the right column, right? Same for the table above, it looks like one <not found> is missing?

Suggested change
os> source=accounts | fields email, host | fillnull using email = '<not found>', host = '<no host>' ;
fetched rows / total rows = 4/4
+-----------------------+------------+
| email | host |
|-----------------------+------------|
| [email protected] | pyrami.com |
| [email protected] | netagy.com |
| <not found> | |
| [email protected] | boink.com |
+-----------------------+------------+
os> source=accounts | fields email, host | fillnull using email = '<not found>', host = '<no host>' ;
fetched rows / total rows = 4/4
+-----------------------+------------+
| email | host |
|-----------------------+------------|
| [email protected] | pyrami.com |
| [email protected] | netagy.com |
| <not found> | <no host> |
| [email protected] | boink.com |
+-----------------------+------------+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Example 1, it looks like <not found> is still missing from the host column..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has been changed to use employer rather than host. null values are replaced with <no employer>.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks.

@@ -558,6 +559,29 @@ public LogicalPlan visitAD(AD node, AnalysisContext context) {
return new LogicalAD(child, options);
}

/** Build {@link LogicalAD} for fillnull command. */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should link to LogicalEval instead of LogicalAD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -0,0 +1,67 @@
=============
fillnull
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update category.json in doctest to get this to run with the documentation test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

JSONObject result =
executeQuery(
String.format(
"source=%s | fields str2, num0 | fillnull with -1 in num0", TEST_INDEX_CALCS));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd show unchanged and filled column in the test (and in doctests and in PR description as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an integration test that updates one field, but not the other (testFillNullWithOtherField).

There is a test in the docs as well (the first query in fillnull.rst).

The description for this PR has been updated with a sample query.

}
]
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add eol please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

JSONObject result =
executeQuery(
String.format(
"source=%s | fields str2, num0 | fillnull using num0 = -1", TEST_INDEX_CALCS));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more interesting tests there:

source=%s | fields str2, num0 | fillnull using num0 = num1
source=%s | fields str2, num0 | fillnull using num0 = cos(num1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests like this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to list new file in table of content (see James's PR for example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


@RequiredArgsConstructor
@AllArgsConstructor
public class FillNull extends UnresolvedPlan {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Javadoc comment.

Signed-off-by: Norman Jordan <[email protected]>
Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanj-bitquill please lets align the docs with the existing ppl fillnull doc in spark


Syntax
============
fillnull "with" <expression> <field> ["," <field> ]...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanj-bitquill please review spark's ppl fillnull to make sure we are aligned with the syntax

@@ -35,6 +35,7 @@ NEW_FIELD: 'NEW_FIELD';
KMEANS: 'KMEANS';
AD: 'AD';
ML: 'ML';
FILLNULL: 'FILLNULL';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add FILLNULL to the keywordsCanBeId here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]Add fillnull command to PPL
6 participants