Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bigquery): Fixes parsing of null type values returned from BQ server #14384

Merged
merged 3 commits into from
Jun 27, 2024

Conversation

jsrinnn
Copy link
Collaborator

@jsrinnn jsrinnn commented Jun 27, 2024

This PR includes the following changes for json parsing related to null values, based on what is returned from BQ Server API

  1. BQ query response can contain null values for row/column data as shown below
"rows": [
    {
      "f": [
        {
          "v": null
        },
        {
          "v": "Test String 6"
        },
        {
          "v": null
        }
      ]
    },

Due to the above json get_to method crashes with a type exception . The fix checks for null value before calling get_to in json_utils.h

  1. From the client perspective, we need to be able to distingush an empty string from a null value so that client api can work properly hence an additional is_null member has been added to ColumnData struct.

  2. Unit test changes for all of the above.


This change is Reviewable

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Jun 27, 2024
Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider if absl::optional<std::string> is a better representation. If you have a good reason to prefer a separate is_null please let me know and I can review again.

@@ -316,6 +316,7 @@ bool operator==(Struct const& lhs, Struct const& rhs);

struct ColumnData {
std::string value;
bool is_null{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more idiomatic to use absl::optional<std::string> to represent this. At the very least it would avoid awkward situations like value == "foo" and is_null == true.

@@ -634,11 +634,11 @@ void to_json(nlohmann::json& j, ColumnData const& c) {
j = nlohmann::json{{"v", c.value}};
}
void from_json(nlohmann::json const& j, ColumnData& c) {
SafeGetTo(c.value, j, "v");
SafeGetToWithNullable(c.value, c.is_null, j, "v");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If c.value was absl::optional<std::string> you could just use overloading on SafeGetTo?

Copy link
Collaborator Author

@jsrinnn jsrinnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main reason for the additional parameter is minimum disruption to the client code. Otherwise modifying std::string to absl::optional will break the client code. In the past when this has happened we had to do the following for unblocking

  1. Modify the client code to look at a non-stable version of google-cloud-cpp (i.e main branch)

  2. Make the Client code changes

  3. Wait till fix is available in google-cloud-cpp is stable release

  4. Revert the client code to a stable-version of google-cloud-cpp.

I am mainly trying to avoid the above.

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @coryan)


google/cloud/bigquery/v2/minimal/internal/common_v2_resources.h line 319 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

I think it would be more idiomatic to use absl::optional<std::string> to represent this. At the very least it would avoid awkward situations like value == "foo" and is_null == true.

Yeah...I thought about that but modifying the return value breaks our client code and its a lot of back and forth to get everything working as mentioned below

So the main reason for the additional parameter is minimum disruption to the client code. Otherwise modifying std::string to absl::optional will break the client code. In the past when this has happened we had to do the following for unblocking

  1. Modify the client code to look at a non-stable version of google-cloud-cpp (i.e main branch)

  2. Make the Client code changes

  3. Wait till fix is available in google-cloud-cpp is stable release

  4. Revert the client code to a stable-version of google-cloud-cpp.

I am mainly trying to avoid the above.


google/cloud/bigquery/v2/minimal/internal/common_v2_resources.cc line 637 at r1 (raw file):

Previously, coryan (Carlos O'Ryan) wrote…

If c.value was absl::optional<std::string> you could just use overloading on SafeGetTo?

Please see my comment above for why I am not modifying the return value.

@coryan
Copy link
Contributor

coryan commented Jun 27, 2024

If we go with the is_null approach: how will you clean the code up?

Copy link

codecov bot commented Jun 27, 2024

Codecov Report

Attention: Patch coverage is 98.64865% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.06%. Comparing base (8d01ae3) to head (f0eab7d).
Report is 2 commits behind head on main.

Current head f0eab7d differs from pull request most recent head 7ac1eb2

Please upload reports for the commit 7ac1eb2 to get more accurate results.

Files Patch % Lines
...le/cloud/bigquery/v2/minimal/internal/json_utils.h 93.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14384   +/-   ##
=======================================
  Coverage   93.06%   93.06%           
=======================================
  Files        2191     2191           
  Lines      193219   193284   +65     
=======================================
+ Hits       179814   179882   +68     
+ Misses      13405    13402    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator Author

@jsrinnn jsrinnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of options here:

  1. I'll be OOO fairly soon so I didn't want the team to be blocked hence the above approach seemed to be the least disruptive. I can merge this code now and then come back and do the cleanup of using absl::optional.

  2. Someone from my team can patch this PR and do the absl::optional approach.

I am leaning towards (1) so as not to add additional load on the team and since this is an internal library. But I can understand as code owners if you think otherwise.

Let me know what you think.

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @coryan)

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file a bug for the cleanup and then we can merge.

Reviewed 6 of 6 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jsrinnn)

Copy link
Collaborator Author

@jsrinnn jsrinnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Filed a tracking bug(#14387) and added a TODO in the code.

Reviewable status: 5 of 6 files reviewed, all discussions resolved (waiting on @coryan)

@jsrinnn jsrinnn marked this pull request as ready for review June 27, 2024 18:49
@jsrinnn jsrinnn requested a review from a team as a code owner June 27, 2024 18:49
@jsrinnn jsrinnn force-pushed the bigquery_v2_client_lib_null_fix branch from 723947e to 7ac1eb2 Compare June 27, 2024 18:51
@jsrinnn jsrinnn enabled auto-merge (squash) June 27, 2024 18:52
@jsrinnn jsrinnn merged commit 113730d into googleapis:main Jun 27, 2024
66 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants