ocrd_cli_wrap_processor: always do initLogging #1296

bertsky · 2024-11-20T19:10:31Z

fixes #1223 and replaces #1295

bertsky · 2024-11-20T19:15:34Z

(Of course, as already commented in #1223, it may still be necessary to be careful with imports on the side of the processor implementations. For example, moving import tensorflow to setup.)

Alternatively, we could still do initLogging on the module level, though not in ocrd_utils.logging but rather ocrd.decorators, which arguably will only be imported by applications that do need log handlers.

kba

(Of course, as already commented in #1223, it may still be necessary to be careful with imports on the side of the processor implementations. For example, moving import tensorflow to setup.)

I am wondering how future-proof these mechanisms are. There have been different workarounds to prevent tensorflow and others from logging, like our tf_disable_interactive_logs and various copy-and-pasted stanzas like in eynollah:

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
stderr = sys.stderr
sys.stderr = open(os.devnull, "w")
import tensorflow as tf
from tensorflow.python.keras import backend as K
from tensorflow.keras.models import load_model
sys.stderr = stderr
tf.get_logger().setLevel("ERROR")
warnings.filterwarnings("ignore")

And apart from being really intrusive and inconsistent, they could break at any point.

Alternatively, we could still do initLogging on the module level, though not in ocrd_utils.logging but rather ocrd.decorators, which arguably will only be imported by applications that do need log handlers.

Module-level in ocrd.decorators.__init__.py makes sense and a cursory glance over how eynollah and ocrd_kraken are structured make it seem this would work. For ocrd_calamari, there is a top-level __init__.py which does from .recognize import CalamariRecognize, i.e. tensorflow is imported before ocrd.decorators but it is guarded by tf_disable_interactive_logs.

So, as much as I dislike module-level function calls in general, in the interest of fewer surprises in the future, I'm for putting initLogging in ocrd.decorators.__init__.

bertsky · 2024-11-21T16:30:47Z

There have been different workarounds to prevent tensorflow and others from logging, like our tf_disable_interactive_logs

right, but that just covers the rogue print and write statements scattered across Keras and TF (so interactive_logs is actually a misnomer within Keras itself)

and various copy-and-pasted stanzas like in eynollah:
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

That should not be necessary if we run our initLogging before TF can do theirs, since we already set the level for tensorflow to error (i.e. 3).

stderr = sys.stderr
sys.stderr = open(os.devnull, "w")

wow, that's rather extreme, it speaks to the frustration that we had with correct logging setup in the Python ecosystem

import tensorflow as tf
from tensorflow.python.keras import backend as K
from tensorflow.keras.models import load_model
sys.stderr = stderr
tf.get_logger().setLevel("ERROR")

should also not be necessary anymore

warnings.filterwarnings("ignore")

That's something we could indeed add into ocrd_cli_wrap_processor in the non-processing contexts

ocrd_cli_wrap_processor: always do initLogging

4492367

bertsky requested a review from kba November 20, 2024 19:10

kba requested changes Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocrd_cli_wrap_processor: always do initLogging #1296

ocrd_cli_wrap_processor: always do initLogging #1296

bertsky commented Nov 20, 2024

bertsky commented Nov 20, 2024

kba left a comment

bertsky commented Nov 21, 2024

ocrd_cli_wrap_processor: always do initLogging #1296

Are you sure you want to change the base?

ocrd_cli_wrap_processor: always do initLogging #1296

Conversation

bertsky commented Nov 20, 2024

bertsky commented Nov 20, 2024

kba left a comment

Choose a reason for hiding this comment

bertsky commented Nov 21, 2024