Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust2Metta gnd conversion #544

Merged
merged 9 commits into from
Jan 26, 2024
Merged

Rust2Metta gnd conversion #544

merged 9 commits into from
Jan 26, 2024

Conversation

Necr0x0Der
Copy link
Collaborator

No description provided.

@Necr0x0Der Necr0x0Der marked this pull request as draft January 23, 2024 16:26
@Necr0x0Der
Copy link
Collaborator Author

@vsbogd , @luketpeterson , this is the first step of automatic conversion btw Rust and foreign (Python in this case) grounded types. This should break nothing but only extend interoperability. Some implementation details might be improved. Feel free to commit to this branch. While it is preferable to make start with an overall design, I'd prefer to make incremental conservative changes in this case. The next steps I consider is

  • to perform backward (from foreign, e.g. Python, atoms to Rust) conversion. There are multiple ways to do this. (a) do conversion to Rust every time when Python grounded op produces the result (I'm not ok with it); (b) have a generic ForeignGroundedObject type/trait with e.g. to_rust API function, which should be implemented on the foreign language side (like match_ for matchable objects, it will allow automatic conversion from any foreign object to Rust if it is supported) (I'm ok slightly in favour of it); (c) have specific type constructors (e.g. ForeignNumber, ForeignBool, or whatever they'll be called) so one can add them into IntoNumber / Number implementation.
  • move Bool / Number library from Python stdlib to optional extension if one wants to use native Python arithmetics, etc. Enabling this extension should work seamlessly with custom Python code and Rust functions not implemented in this extension.

Copy link
Contributor

@luketpeterson luketpeterson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superficial suggestions to align with the conventions across the rest of the C API.

c/src/metta.rs Outdated Show resolved Hide resolved
c/src/metta.rs Outdated Show resolved Hide resolved
c/src/metta.rs Outdated Show resolved Hide resolved
c/src/metta.rs Outdated
@@ -1412,6 +1412,24 @@ pub extern "C" fn grounded_number_to_longlong(n: *const atom_t, res: *mut c_long
}
}

#[no_mangle]
pub extern "C" fn grounded_bool_to_int(n: *const atom_t, res: *mut c_int) -> bool {
Copy link
Contributor

@luketpeterson luketpeterson Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atom_t and atom_ref_t are designed to be interchangeable, but across the C API, we always use atom_ref_t when taking a pointer to a constant atom so the user can call an accessor function with a reference to an atom that lives elsewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons @Necr0x0Der implemented it in this way is that I implemented it in this way earlier. And it is my fault, that I didn't add you to the review @luketpeterson. I agree with the comment and I think we need to make all three functions (two I introduced in #543 and one from this PR) look uniformly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3a49aff

c/src/metta.rs Outdated
@@ -1412,6 +1412,24 @@ pub extern "C" fn grounded_number_to_longlong(n: *const atom_t, res: *mut c_long
}
}

#[no_mangle]
pub extern "C" fn grounded_bool_to_int(n: *const atom_t, res: *mut c_int) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_int is no longer accurate in light of above change/ suggestion, changing to to_bool

Across the rest of the API, into implies consuming the source and converting, where get implies access. to is ambiguous and could be read either way. My personal preference is for grounded_bool_get_bool, but it's just a style thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On C API side it is possible to implement both consuming or getting API. I was going to implement consuming API, thus I would use signature grounded_bool_into_bool(b: *mut atom_t, res: *mut bool) -> bool {.

Unfortunately we cannot consume on a Python side. Thus the only viable option to use consuming API in Python is cloning atom before calling a function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grounded_bool_into_bool(b: *mut atom_t, res: *mut bool) -> bool

The convention we have for consuming is to pass by value. ie. grounded_bool_into_bool(b: atom_t, res: *mut bool) -> bool

However, I don't see the benefit of a consuming converter over a non-consuming accessor, followed by an explicit free if the user is done with the atom. Efficiency-wise they're basically identical.

It's really a style thing, but my philosophy with the C api is to keep the number of functions that consume their arguments to a minimum - if all else is equal.

The other reason for a non-consuming accessor is that it allows you to pass an atom_ref_t, like you might get from atom_vec_get, the space_t iterator's next function, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I would prefer consuming accessor is that if into function need to return a more complex object then getter may require copying when consumer can reuse the passed value. It allows caller to decide whether we he needs to copy object additionally or not. With non-consuming accessor one would need to copy value inside accessor.

On the other hand as we anyway has different formats for Python and Rust strings and for other more complex objects then even consuming accessor will need to copy values to convert it into another format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I would prefer consuming accessor is that if into function...

That reasoning makes sense in the general case. Although we don't have a specific internal object we're accessing aside from primitives that fit in a single CPU word and stings*.

Perhaps we could make a converting API if we find ourselves wrapping a structure that is larger and can be unwrapped without any transformation.

On the other hand as we anyway has different formats for Python and Rust strings and for other more complex objects then even consuming accessor will need to copy values to convert it into another format.

Exactly. Strings would be a place where a consuming accessor could be more efficient except for the fact that there is a potential allocator mismatch and the fact that Rust strings aren't NULL-terminated. So unfortunately I think a single-copy accessor for wrapped Rust Strings (e.g. write into buffer) is probably the best we can do.

Copy link
Collaborator

@vsbogd vsbogd Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 07af9a3

c/src/metta.rs Show resolved Hide resolved
python/hyperonpy.cpp Outdated Show resolved Hide resolved
@luketpeterson
Copy link
Contributor

luketpeterson commented Jan 24, 2024

There are multiple ways to do this.
(a) do conversion to Rust every time when Python grounded op produces the result (I'm not ok with it);
(b) have a generic ForeignGroundedObject type/trait with e.g. to_rust API function, which should be implemented on the foreign language side (like match_ for matchable objects, it will allow automatic conversion from any foreign object to Rust if it is supported) (I'm ok slightly in favour of it);
(c) have specific type constructors (e.g. ForeignNumber, ForeignBool, or whatever they'll be called) so one can add them into IntoNumber / Number implementation.

As discussed on the call, (a) is the simplest approach although it places a burden on the core (Rust) atom implementations to losslessly represent all data that the foreign type is capable of representing. I can see how we might not want to maintain compatibility to that level of accuracy.

If we want to implement (b), ie. store the native Python types inside the grounded atoms that originate from Python, and then convert to Rust types lazily, on-demand, then the implementation gets a little bit tricky. (still doable) The main complicating factor is the C API that sits between Python and Rust.

I think one implementation could roughly look like:

  • Each CGrounded atom (which sits behind a native Python atom) would contain a function pointer to a conversion function. This could be NULL for non-convertible atoms
  • The function would have take an "object" ptr (the pointer returned from atom_get_object) as input, and likely return a new Rust-backed atom as output. For example, an atom that was returned from longlong_to_grounded_number

As up-side to that design is that it places the responsibility for convertibility / promotion logic on the foreign and native atom implementations, which is right, IMO.

The downside is that manufacturing a new atom means a new allocation. So an operation that would normally be a simple memory access becomes much more expensive.

An alternative implementation could look like:

  • Each CGrounded atom would contain a reference to a table of specific accessor functions. For example get_value_as_longlong, etc.
  • Each foreign atom implementation could implement the relevant accessors, and leave the others as NULL
  • The Rust client could call the relevant accessors(s), either directly or behind an abstracted Rust conversion trait.

This approach avoids the unnecessary allocation, at the expense of requiring more API surface area.

@vsbogd
Copy link
Collaborator

vsbogd commented Jan 24, 2024

This approach avoids the unnecessary allocation, at the expense of requiring more API surface area.

@luketpeterson, do you mean unnecessary allocation is avoided when Python atom's internals can be reused to represent the value in Rust? Or do you mean that unnecessary GroundedAtom wrapper allocation is avoided?

@luketpeterson
Copy link
Contributor

luketpeterson commented Jan 24, 2024

@luketpeterson, do you mean unnecessary allocation is avoided when Python atom's internals can be reused to represent the value in Rust? Or do you mean that unnecessary GroundedAtom wrapper allocation is avoided?

I mean a C callback function that takes the form:

convert_atom: extern "C" fn(src: *const gnd_t) -> atom_t,

involves an unnecessary allocation because the returned atom_t is backed by a dynamic structure (currently in every case. Although I have a vision for a static fast-path someday, that isn't the reality today)

On the other hand:

get_value_as_f64: extern "C" fn(src: *const gnd_t, dst: *mut f64) -> bool,

doesn't involve any allocation. The downside is that we need to make an entry point for every native interchange type, and each foreign (Python) type needs a table of accessor function pointers. The same table can be shared among all Python objects of a given type, so it's not actually that bad.

I think this (second option) is the way to go.

Necr0x0Der and others added 7 commits January 24, 2024 12:38
Co-authored-by: luketpeterson <[email protected]>
Co-authored-by: luketpeterson <[email protected]>
Co-authored-by: luketpeterson <[email protected]>
Co-authored-by: luketpeterson <[email protected]>
Replace *const atom_t by *const atom_ref_t as functions input references
to the atoms. Add docstrings.
_into_ infix is used for consuming functions, _get_ for non-consuming
@Necr0x0Der Necr0x0Der marked this pull request as ready for review January 25, 2024 11:10
@vsbogd vsbogd requested a review from luketpeterson January 26, 2024 16:40
@vsbogd
Copy link
Collaborator

vsbogd commented Jan 26, 2024

@luketpeterson , please approve if you agree to merge it.

Copy link
Contributor

@luketpeterson luketpeterson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good.

@luketpeterson luketpeterson merged commit 169da1d into main Jan 26, 2024
1 of 2 checks passed
@vsbogd vsbogd deleted the autoconv branch January 26, 2024 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants