Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds jv_unshare(), jv_is_unshared() #3109

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

MaxBrandtner
Copy link

Still need to add some more tests. I would especially apprecieate some feedback on jv_new(), as this can perhaps be handled more efficiently using jvp

src/jv.c Outdated
@@ -1862,6 +1862,63 @@ jv jv_object_iter_value(jv object, int iter) {
/*
* Memory management
*/
jv jv_new(jv input){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the point of the function is to make a deep copy that don't share any memory with the input i think it probably should have a more clear name (jv_copy_deep etc?) and also a comment clarifying it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emanuele6 proposed jv_clone(). I am quite open to different names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm seem to be how rust uses copy/clone, but i'm always confusing them :) a very clear name could be something like jv_unshare

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think proposed deepclone like in Java if I proposed anything at all; jv_clone would also be confusing because in Java clone only copies one level

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I guess jv_unshare() would be the best option. Then again apparently words don't have meaning in this industry so who knows if there is some conflict there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wader I've never heard unshare used for deepclone, but I think it's a very good name for it :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) made it up right now but mostly based on @MaxBrandtner use case for it. I think something deep copy or unshare works. but maybe deep copy is a more well known concept?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really quite like that idea. I shall implement it right away

@MaxBrandtner
Copy link
Author

Is there a way to make jv_unshare() more efficient using jvp?

src/jv.c Outdated
output_object,key,
jv_unshare(
jv_getpath(jv_copy(output_object), jv_copy(key))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much faster using jv_array_set/jv_object_set would be, could skip building a path array at least?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main reason I did it the way I did it was to save on code, but yes doing it specific for arrays and objects would probably be faster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, i would probably try benchmark some (hopefully) real world like use cases if performance is an issue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it would probably be better to separate between JV_KIND_OBJECT AND JV_KIND_ARRAY

src/jv.c Outdated
if(jv_get_kind(input) == JV_KIND_OBJECT){
output_object = jv_object();
}else{
output_object = jv_array();
Copy link
Member

@wader wader May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use jv_array_sized as we know size?

@@ -1862,6 +1862,68 @@ jv jv_object_iter_value(jv object, int iter) {
/*
* Memory management
*/
jv jv_unshare(jv input){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how ugly but i guess one could save on jv_free calls by having different ownership rule for unshare? too big foot gun?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even within the implementation of jv_unshare() we benefit from it dereferencing its input. We could have jv_unshare() not consume its memory, but at the cost of having to write a lot of stuff to buffers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is jv_unshare() supposed to consume a reference to its argument? I think it should.

@MaxBrandtner
Copy link
Author

As it pertains to unit-tests. Especially with jv_paths() and jv_addpath() I would just add sufficiently complex examples to test that it produces the correct output for a use-case that covers the various edge-cases. As it pretains to jv_unshare() I am somewhat unsure how to go about implementing it. I would check for equality and then recursively check that it is not identical (using jv_identical()), but I don't know if jv_identical() would be accurate in the case of partial equality (aka what if the the value of an object is not identical, but because of a hypothetical bug the key were identical)

@MaxBrandtner
Copy link
Author

the checks to test whether the various subsections are not identical fail. Does jv_identical() not do what I think it does or what gives?

src/jv_aux.c Outdated
@@ -427,6 +427,70 @@ jv jv_setpath(jv root, jv path, jv value) {
return jv_set(root, pathcurr, jv_setpath(subroot, pathrest, value));
}

jv jv_addpath(jv root, jv path, jv add){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a comment on what this is supposed to do? Is it different from jv_setpath()?

src/jv_aux.c Outdated
@@ -538,6 +602,47 @@ static int string_cmp(const void* pa, const void* pb){
return r;
}

jv jv_paths(jv input){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a comment on what this is supposed to do? Is it output an array of paths to scalar values?

@nicowilliams
Copy link
Contributor

Can you update the PR's title to mention jv_unshare() instead of jv_new()?

Please update the description to explain what these functions do (or comment in the sources) and the motivation?

I'm quite happy to have jv_unshare(), by the way. I'm surprised it's not called jv_copy_deep() or jv_deepcopy() or something, but I guess jv_unshare() is kind of a better name.

@wader
Copy link
Member

wader commented May 15, 2024

I'm quite happy to have jv_unshare(), by the way. I'm surprised it's not called jv_copy_deep() or jv_deepcopy() or something, but I guess jv_unshare() is kind of a better name.

My fault, but i did suggested jv_copy_deep first and jv_unshare as an alternative :) I think jv_copy_deep might be a better and less confusing name. Probably also good to include in the comment that it can be used for making a separate non-shared value suitable for use in another thread.

@nicowilliams
Copy link
Contributor

nicowilliams commented May 15, 2024

the checks to test whether the various subsections are not identical fail. Does jv_identical() not do what I think it does or what gives?

jv_identical() checks that if a has a pointer then b has the same pointer. Clearly jv_identical() will never return true(ish) for values that have pointers where one is a jv_unshare() deep copy of the other. Use jv_equal() for that.

@nicowilliams
Copy link
Contributor

I'm quite happy to have jv_unshare(), by the way. I'm surprised it's not called jv_copy_deep() or jv_deepcopy() or something, but I guess jv_unshare() is kind of a better name.

My fault, but i did suggested jv_copy_deep first and jv_unshare as an alternative :) I think jv_copy_deep might be a better and less confusing name. Probably also good to include in the comment that it can be used for making a separate non-shared value suitable for use in another thread.

No, I rather like jv_unshare(), so let's keep it. At any rate we shouldn't bikeshed this too much. jv_new() was not a good name though, so thanks for suggesting a better name.

src/jq_test.c Outdated

assert(jv_equal(jv_copy(initial), jv_copy(new)));
assert(!jv_identical(jv_copy(initial), jv_copy(new)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to have a jv_unshared() function that returns true if the given value is recursively unshared in that every value down from it has jv_get_refcount() of 1. Such a jv_unshared() function would have to not consume a reference to its argument, naturally, or maybe it should check that the top-level value has 2 refcounts and all values below have 1 refcounts.

Then we could use jv_getpath() on initial to acquire an extra refcount on a value far from the initial root, and then we could check that neither initial nor that value obtained with jv_getpath() are jv_unshared().

Copy link
Author

@MaxBrandtner MaxBrandtner May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. It would make testing easier as well. (the name might just be a bit too similar so simmering like jv_is_unshared() might be better)

src/jq_test.c Outdated

{
assert(jv_equal(
jv_addpath(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think jv_addpath() is the same as jv_setpath(), so let's drop it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not

jv_setpath() replaces the entire path with the value you specify
jv_addpath() overwrites in the path you specify the keys specified in the replace value keeping keys that are not mentioned in the replace value intact

src/jq_test.c Outdated

{
assert(jv_equal(
jv_paths(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to use an assert() with jv_paths()? Why not just assert that the two values are equal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not saying I don't want jv_paths(), I'm just saying that I think we don't need jv_paths() for testing jv_unshared(), so jv_paths() -if we need it at all- can be in a separate PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did it out of convenience since I iterate through every path in the unshared value to see whether it really has no shared memory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be faster to just recurse in jv_is_unshared().

@nicowilliams
Copy link
Contributor

Thank you, jv_unshared() will be useful. It's very useful in finding leaks in libjq-using programs/libraries. I've seen others resort to jv_parse(jv_string_value(jv_dump_string(...))) to do a deep copy exactly for the purpose of working around valgrind's (and other memory debuggers') inability to deal with reference counting. (I wish valgrind had macros that one could use to tell it that a reference count is being incremented or decremented. That way when there is a leak valgrind could print all the stack traces where a leaked value's reference count was incremented, then one could look around each for a missing jv_free(). Since valgrind has nothing of the sort, a deep copy operation can be substituted for jv_copy() in many cases to get valgrind to help as if it knew how to track reference count increment/decrements.

We'll probably not use jv_unshare() in-tree, except during development, and then it might prove invaluable.

I think jv_addpath() is probably the same as jv_setpath(), and if so -or if you can just use jv_setpath() anyways- let's drop it.

I also think jv_paths() is unnecessary in this PR for testing jv_unshare(), so let's drop jv_paths().

Lastly it'd be great if you could add a int jv_unshared(jv) predicate function that indicates whether the given jv is deeply unshared. I think such a predicate would be very useful for testing jv_unshare().

@MaxBrandtner
Copy link
Author

the checks to test whether the various subsections are not identical fail. Does jv_identical() not do what I think it does or what gives?

jv_identical() checks that if a has a pointer then b has the same pointer. Clearly jv_identical() will never return true(ish) for values that have pointers where one is a jv_unshare() deep copy of the other. Use jv_equal() for that.

and yet in the unit-test for jv_unshare() jv_identical() returns true once I start checking recursively and I don't know why

src/jv_aux.c Outdated
if(jv_get_kind(root) != JV_KIND_OBJECT && jv_get_kind(root) != JV_KIND_ARRAY){
jv_free(root);

if(!jv_equal(jv_copy(path), jv_array())){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much faster to check if path has length 0.

src/jv_aux.c Outdated
return add;
}

if(!jv_equal(jv_copy(path), jv_array())){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

src/jv_aux.c Outdated

jv_free(path);

return add;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this case.

Also, what if root is a jv_null()? Should we treat it as being a container-like value, like jq does?

src/jv_aux.c Outdated
}

if(!jv_equal(jv_copy(path), jv_array())){
return jv_setpath(root, path, jv_addpath(jv_getpath(jv_copy(root), jv_copy(path)), jv_array(), add));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the difference between jv_setpath() and jv_addpath() now, but I'd still like a comment, and also if we don't need jv_addpath() then I'd rather not have it.

@nicowilliams
Copy link
Contributor

nicowilliams commented May 19, 2024

@MaxBrandtner jv_is_shared() is a better name, because that function can return faster than jv_is_unshared() can.

@MaxBrandtner MaxBrandtner changed the title adds jv_paths(), jv_addpath(), jv_new() adds jv_unshare(), jv_is_unshared() Jun 8, 2024
@MaxBrandtner
Copy link
Author

I've reduced the pr to jv_unshare() and jv_is_unshared().

jv_is_unshared() has some oddities (like internally calling jv_free() before jv_is_unshared() or checking for a refcnt 3 in the keys of jv_objects), but it works.

As it pertains to jv_unshare() it seems as if there was never an issue to begin with, but its test was simply incorrect

return 1;
}
if(jv_get_kind(a) == JV_KIND_OBJECT){
jv keys = jv_keys(jv_copy(a));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the jv internals but i suspect that maybe jv_is_unshared is suitable to be implemented using a private new jvp_* helper function that uses some internal details, similar to how jvp_object_equal etc is implemented? then you can probably save on some allocation and refcounting.

@nicowilliams does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for my inactivity for the past month on the thread. While having a jvp internal representation would get rid of some of the ref-counting it will add more complexity. Then again if you think it's worth I could implement jvp_object_is_unshared(), jvp_array_is_unshared() (I would refrain from adding jvp functions for the non-recursive options as I don't think there is much benefit to be had in these senarios)

Or do you think I should add string, number, true, false, null, invalid variants as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants