Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewire python<->java logging to be generally friendlier to python #2763

Merged
merged 5 commits into from
Sep 1, 2022

Conversation

niloc132
Copy link
Member

@niloc132 niloc132 commented Aug 26, 2022

Fixes #2734, #2723

@niloc132
Copy link
Member Author

To fully fix the h2o case, jpy-consortium/jpy#85 will be required as well, as this apparently tries to log zero-byte strings.

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first pass. I'm going to run and introspect this locally later. Generally looks good.

py/server/deephaven_internal/stream.py Outdated Show resolved Hide resolved
py/server/deephaven_internal/stream.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor Pythonic/coding style issues

py/server/deephaven_internal/stream.py Outdated Show resolved Hide resolved
py/server/deephaven_internal/stream.py Outdated Show resolved Hide resolved
jmao-denver
jmao-denver previously approved these changes Aug 29, 2022
Copy link
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. It might be nice if we could add some type hints in the Python code but as it stands, the built-in super class TextIOBase doesn't provide those either and the methods defined are so universally understood and expected, so skipping the type hints here doesn't really break any convention.

@Override
public void write(@NotNull byte[] b) throws IOException {
// noinspection PrimitiveArrayArgumentToVarargsMethod
rawIoBaseSupplier.get().callMethod("write", new String(b));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need / want to account for Charset here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be better if we could call org.jpy.PyObject#call(java.lang.Class<T>, java.lang.String, java.lang.Class<?>[], java.lang.Object[]) - this allows us to be more explicit w/ the parameter and return types.

As it stands now, all of the calls to callMethod return PyObject, which I think is extra garbage we shouldn't need to create.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that raises a question - why can jpy convert cleanly from py bytes to java byte[], but not vice versa? That's the real answer here, since it is already encoded (and we can't cleanly ask for the underlying encoding... since jpy might break it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, why be more explicit with the param types? Both call into PyLib.callAndReturnValue, but the variant that takes lots of Class<?>s does java-side type checks, rather than just passing the string instance - and there is no return type to be concerned with.

(Also, .call() vs .callMethod() has semantics that technically we don't want here, we actually want callMethod. However, as far as I can tell, jpy just throws away that extra flag with a "todo why isn't this used?" sort of note)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly concerned about creating extraneous PyObjects - "write" will return a PyObject that is a python int.

try (final PyObject ignored = rawIoBaseSupplier.get().callMethod("write", new String(b))) {
}

vs

rawIoBaseSupplier.get().call(int.class, "write", String.class, new String(b));

IMO, the second is much better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, these are not the same - call sets methodCall=false and its docs clarify that this is invoking a python callable object, not a method. All overrides do this.

In contrast, callMethod sets methodCall=true, and is intended for calling methods. We don't have (but arguably should?) an overload of callMethod like call does to support each arg type. I'd support going so far as to add overloads that return primitives or void, to avoid that extra cost you describe.

In this case, we want to call a method, we are not interacting with a callable object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the jpy implementation does not actually use that boolean internally - some sort of historic wart before my time.

Copy link
Member Author

@niloc132 niloc132 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that's what I said above - in which case we should either figure out what issue might be there or historically have been there, or change the API. I'm just going off of the current stated contract of JPy, which is that this isn't to be used for method calls. But at the end of the day, we're using the wrong method that happens to do the right thing, for now, probably, to save on creating one object (that will either be called approx once per jvm, in the case of close(), or for write and maybe flush, are likely to be jitted to be inlined and have escap analysis handle anyway)?

I've you insist, I'll write the wrong code that currently does what we want but not what we're asking for, but this is an improvement to make to jpy and then improve our usage of it, so I'm objecting to deliberately doing the wrong thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed jpy-consortium/jpy#87.

org.jpy.PyObject#call(java.lang.Class<T>, java.lang.String, java.lang.Class<A0>, A0) doesn't expose the notion of "methodCall", or have any javadoc (my fault). I think your argument falls down a little bit though, b/c if the implementation of that method instead used true instead of false, the outcome would be exactly the same, and is arguably an implementation detail.

I do like to consider myself against unnecessary pre-optimization. With respect to PyObject though, it's not the allocation itself I'm usually worried about, but the implications that extraneous PyObjects may cause (PyObjectReferences growth, unnecessarily keeping python references alive). And specifically as a glue layer between streams, there's potential for this to be heavily used. In general, if there is a way to accomplish the same outcome but without creating a PyObject, that's the direction I'll lean.

All that said - I've argued my case, and can let the code stand as-is if that's still your preference.

Comment on lines 39 to 48
public void flush() throws IOException {
rawIoBaseSupplier.get().callMethod("flush");
}

@Override
public void close() throws IOException {
rawIoBaseSupplier.get().callMethod("close");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto w/ callMethods.

py/server/deephaven_internal/stream.py Show resolved Hide resolved
Comment on lines +31 to +37
write_func=lambda t: java_stream.write(bytes(t, encoding)),
flush_func=lambda: java_stream.flush(),
close_func=lambda: java_stream.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering - would it make sense to "simplify" and remove the lambdas?

TeeStream(
  orig_stream=py_stream,
  other_stream=java_stream,
  should_write_to_orig_stream=...)

The write implementation could check it's own encoding property as you have it written.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, I was imagining that while my current use case was limited to redirecting into or splitting into the JVM, we might case about generally teeing or redirecting output at a later date, and the io.IOBase type is not fun to work with - once I had the specifics worked out, I have a constructor that is easy to generalize, and two factory methods that are specific to their java use cases.

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like changes to the PyObject#call conventions.

@devinrsmith devinrsmith self-requested a review August 31, 2022 18:25
devinrsmith
devinrsmith previously approved these changes Aug 31, 2022
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to read latest comment.

@niloc132 niloc132 merged commit 2c59654 into deephaven:main Sep 1, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Sep 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logs in DHaaL are confusing
3 participants