Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONB and KeyspaceMetadata fixes for Yugabyte #1675

Closed
wants to merge 3 commits into from

Conversation

jameshartig
Copy link
Contributor

  • Add JSONB column support
  • Fallback in metadata functions to fix KeyspaceMetadata
  • Added yugabyte tests

Ran the latest tests against yb 2.16.1.0-b50

I didn't include them in all and I'm not sure how we want to automate testing going forward.

@jameshartig jameshartig force-pushed the yb-fixes branch 3 times, most recently from fc5acec to b6aedec Compare February 13, 2023 22:25
@jameshartig
Copy link
Contributor Author

@martin-sucha can you take a look at this? Thanks!

@martin-sucha
Copy link
Contributor

Hi! Previously, @Zariel was against including ScyllaDB-specific code to this repository (see the Scylla pull requests), including Yugabyte-specific code seems like the same category of issue, so I guess we should agree first whether to include such code or not. It seems we should not prefer one DB over the other.

Personally I don't mind including database-specific code to the gocql/gocql repository, but there needs to be a clear way how to maintain it and test it. Ideally we should run integration tests in CI automatically.

Alternately, is there a way to include an extension mechanism so the required functionality could be provided by an external package instead?

@jameshartig
Copy link
Contributor Author

@martin-sucha Thanks for the quick reply. I definitely understand the concern. I think it makes sense to include changes per-database provided they aren't huge. I'm definitely fine maintaining the YB-specific code since we use it in production along with this repo. One issue I see is that I called the value TypeJsonb but maybe should use a more specific name like TypeYBJsonb.

I added a specific test file that can use the YB docker image to run tests and I confirmed that they pass. What's the path to adding that to CI?

The type might be a place where we could expose a RegisterType function that adds a custom type but it would need to handle registering functions as well for unmarshaling and marshaling. Also anyone using YB would need to copy/paste some code to handle that specific type which seems not ideal.

The metadata issues are just a quirk from YB and I don't think an extension could work for that. I also expect it to be fixed on YB's end eventually.

@jameshartig
Copy link
Contributor Author

@Zariel would appreciate your thoughts as well

@martin-sucha
Copy link
Contributor

I definitely understand the concern. I think it makes sense to include changes per-database provided they aren't huge.

Right, it seems similar to handling different Cassandra versions. The question is where is the line :)

I'm definitely fine maintaining the YB-specific code since we use it in production along with this repo.

Help with maintaining is always appreciated. Maybe we could even increase the bus factor here if you are willing to help with maintaining even the non-YB-specific code. What do you think?

One issue I see is that I called the value TypeJsonb but maybe should use a more specific name like TypeYBJsonb.

Yes, it seems that the YB prefix might be useful as the JSONB type is an extension of the cql protocol. Or maybe we could have a yugabyte package with YB-specific definitions? Although having a separate package just for a single definition seems like an overkill.

I was also thinking what would happen in case a different database uses the same type number for a different type. Should we check that the database is yugabyte before checking for the 0x0080 value or it is unlikely to happen in practice? Or maybe we should update the upstream Cassandra CQL documentation and mention the 0x0080 value there?

I added a specific test file that can use the YB docker image to run tests and I confirmed that they pass. What's the path to adding that to CI?

Currently the CI is broken because Ubuntu 18.04 machine is no longer available. I tried some upgrades in #1685 but it seems that the upgrading is not enough. I think it might be better to just write the Docker setup in Go from scratch than rely on ccm, see #1686.

The type might be a place where we could expose a RegisterType function that adds a custom type but it would need to handle registering functions as well for unmarshaling and marshaling. Also anyone using YB would need to copy/paste some code to handle that specific type which seems not ideal.

How is Jsonb type usually used? Do you unmarshal it to strings, []byte etc?

Would a type like

package yugabyte

import "errors"

const TypeJsonb = gocql.Type(0x0080)

var ErrNotJsonb = errors.New("type is not jsonb")

type Jsonb []byte

func (jb *Jsonb) UnmarshalCQL(info gocql.TypeInfo, data []byte) error {
	if info.Type() != TypeJsonb {
		return ErrNotJsonb
	}
	if data != nil {
		*jb = append((*v)[:0], data...)
	} else {
		*jb = nil
	}
}

work or is it too restrictive? The downside would be that the users would have to use type in all structs and it won't automatically work with maps.

In any case it seems registering types with current (un)marshal interfaces is not straightforward. Global registration would work but I'd rather not add more global state. Per-session registration would need a different MarshalCQL/UnmarshalCQL signature that gets a callback function to unmarshal sub-values, since the usual implementations currently call the global gocql.Marshal/gocql.Unmarhal. Or we'd need to hack it and pass the type map inside gocql.TypeInfo.

The metadata issues are just a quirk from YB and I don't think an extension could work for that. I also expect it to be fixed on YB's end eventually.

Yep, makes sense.

@jameshartig
Copy link
Contributor Author

Right, it seems similar to handling different Cassandra versions. The question is where is the line :)

Totally agree. Not sure where it makes sense to define the line other than I agree with your intuition to try and externalize as much of it as possible, whenever we reasonably can.

Help with maintaining is always appreciated. Maybe we could even increase the bus factor here if you are willing to help with maintaining even the non-YB-specific code. What do you think?

Definitely open to help maintain the repo as a whole. We rely on this library in production and I've already had several PRs merged over the last few years.

Should we check that the database is yugabyte before checking for the 0x0080 value or it is unlikely to happen in practice?

I wouldn't want to introduce another round-trip to determine if its YB and I'm not finding a way to determine, with confidence, that it's YB from the handshake or the frame headers.

At the expense of another public field, what if the ClusterConfig had a some sort of "flavor" or "modifier" field that you could set to Yugabyte() and that would set something on the configuration that would trigger these specific modifications. We could then leave all of the modifications in a yugabyte.go file and only hook into them if that field is set? 99% of the functionality works outside of the things in this PR so most users won't even need to set the field but if they needed to then they could opt into these "quirks". It does kind of suck that the fixes wouldn't be "automatic" but that would reduce the risk of cross-contamination.

Let me play around with an alternative way to implement this and see what you think.

How is Jsonb type usually used? Do you unmarshal it to strings, []byte etc?

Typically []byte which is why this PR is needed but you can also do something like:

type MyStruct struct {
...
}

func (s MyStruct) MarshalCQL(info gocql.TypeInfo) ([]byte, error) {
	return json.Marshal(s)
}

func (s *MyStruct) UnmarshalCQL(info gocql.TypeInfo, data []byte) error {
	return json.Unmarshal(data, &s)
}

Which wouldn't require the type to be registered since you're overriding the marshaling or unmarshaling. If we wanted to make this a first-class type then it would be useful to automatically do the marshaling/unmarshaling if the destination type isn't a byte array but that feels like it should be in a separate PR.

Global registration would work but I'd rather not add more global state.

Yes, this would also break uses that might have multiple instances of gocql across packages.

@jameshartig
Copy link
Contributor Author

@martin-sucha So the biggest issue I ran into when trying to remove the yugabyte-specific type and instead have some opt-in "extension" is the package-level Unmarshal and Marshal methods. Any ideas what I could do there? The 2 things I was thinking were:

  1. *Session would have Unmarshal and Marshal methods and we'd deprecate the top-level ones. Any unmarshalling or marshalling orginating from the session would use the session-level functions.
  2. Make YugabyteUnmarshal and YugabyteMarshal methods that you'd have to use instead if you're using Yugabyte but I doubt many people will realize that until they ran into an error and searched why they got that error. This would also pollute the package-level functions.

@jameshartig
Copy link
Contributor Author

Yugabyte merged a commit to bring the system_schema tables in alignment which means we don't need to do a bunch of this work. yugabyte/yugabyte-db@7fc3b76

We still need to handle the custom jsonb column. I'm going to see if there's a way for a library user to add a custom type.

@jameshartig
Copy link
Contributor Author

Superseded by #1855

@jameshartig jameshartig closed this Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants