Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Fix gcs logging #48952

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Nov 26, 2024

Same motivation as #48931, but different implementation.

TLDR for the problem:

  • The excessive logging is caused by bug in setting rotation size in C++ side spdlog, and redirection log from python side doesn't have rotation support
  • The proposed solution in this PR is to manage the whole log via spdlog, and disable redirection logic from python

Signed-off-by: hjiang <[email protected]>
@dentiny dentiny requested a review from a team as a code owner November 26, 2024 22:15
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Nov 26, 2024
Signed-off-by: hjiang <[email protected]>
python/ray/_private/services.py Outdated Show resolved Hide resolved
@@ -38,17 +41,40 @@ DEFINE_string(session_name,
"session_name: The session name (ClusterID) of the cluster.");
DEFINE_string(ray_commit, "", "The commit hash of Ray.");

namespace {
// GCS server output filename.
constexpr std::string_view kGcsServerLog = "gcs_server.out";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_log_file_handles can create names like gcs_server.2.out if gcs_server.out and gcs_server.1.out both exists. Do we have such thing in spdlog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All rotated logs are suffixed with id, similar to what you're described.

src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
? std::numeric_limits<int64_t>::max()
: FLAGS_log_rotation_size;
RAY_CHECK_EQ(setenv(
"RAY_ROTATION_MAX_BYTES", std::to_string(log_rotation_max_size), /*overwrite=*/1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain diff of RAY_ROTATION_MAX_BYTES vs FLAGS_log_rotation_size ? If we already have the former, then we only need to fix existing behavior? I see gcs_server_main.cc already call ray::RayLog::StartRayLog and why does the log rotations in it do not work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't do anything in this PR, instead only do 2 things:

  1. remove python stdout/stderr redirection
  2. change ray_log_shutdown_raii from /*log_dir=*/"" to /*log_dir=*/FLAGS_log_dir

will the rotations automatically work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the file name will be changed. We want to keep the existing gcs_server.out filename for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the rotations automatically work?

To answer your question, passing the log directory works for log rotation.
But one motivation would be backward compatibility, namely keep the gcs_server.out filename.

src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
python/ray/_private/services.py Show resolved Hide resolved
Signed-off-by: hjiang <[email protected]>
@rynewang
Copy link
Contributor

rynewang commented Dec 2, 2024

Now we have 2 ways to specify a RayLog storage:

  • log_dir
  • log_file

and log_file has higher priority than log_dir.

This setup is a bit nuanced and instead can we do something simpler like this:

  1. log_dir controls if spdlog logs or not
  2. log_file is relative path (or just a stem name), only used as log file name override

so that:

  1. log_dir non-empty, log_name empty -> writes to log_dir by spdlog default log names, subject to duplicate file name renaming and rotations
  2. log_dir non-empty, log_name non-empty -> writes to log_dir/log_name by spdlog, subject to duplicate file name renaming and rotations
  3. both empty: no spdlog writes
  4. log_dir empty, log_name non-empty -> illegal, RAY_LOG(FATAL)

@dentiny
Copy link
Contributor Author

dentiny commented Dec 2, 2024

Now we have 2 ways to specify a RayLog storage:

  • log_dir
  • log_file

and log_file has higher priority than log_dir.

This setup is a bit nuanced and instead can we do something simpler like this:

  1. log_dir controls if spdlog logs or not
  2. log_file is relative path (or just a stem name), only used as log file name override

so that:

  1. log_dir non-empty, log_name empty -> writes to log_dir by spdlog default log names, subject to duplicate file name renaming and rotations
  2. log_dir non-empty, log_name non-empty -> writes to log_dir/log_name by spdlog, subject to duplicate file name renaming and rotations
  3. both empty: no spdlog writes
  4. log_dir empty, log_name non-empty -> illegal, RAY_LOG(FATAL)

Updated, let me know if I understand correctly.

python/ray/_private/services.py Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_server_main.cc Outdated Show resolved Hide resolved
@@ -312,14 +311,45 @@ void RayLog::InitLogFormat() {
}
}

/*static*/ std::string RayLog::GetLogOutputFilename(const std::string &log_dir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have a commented out static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a static function.

src/ray/util/logging.cc Outdated Show resolved Hide resolved
/*static*/ std::string RayLog::GetLogOutputFilename(const std::string &log_dir,
const std::string &log_file,
const std::string &app_name) {
// Case-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these Case-n are not informative. We can write a prologue like:

We combine log_dir and (log_file or app_name) into a final output file name. Rules:

1. both log_dir and log_file are empty: return "" meaning no log outputs.
1. both log_dir and log_file are NON empty: return f"{log_dir}/{log_name}".
2. log_dir is NON empty, log_file is empty: return f"{log_dir}/{app_name}_{pid}.log".
3. log_dir is empty, log_file is NON empty: check failure.

Then the "Case-n" comments would become meaningful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case-n corresponds to header file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code reference here:

ray/src/ray/util/logging.h

Lines 268 to 273 in c49bae6

/// A few cases:
/// 1. If both folder and filename are empty, logging will be displayed to stdout.
/// 2. If both folder and filename specified, `folder/filename` will be used as output
/// file.
/// 3. If only folder filled, default filename by folder is used.
/// 4. It's illegal to only provide filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

/// \param log_dir Logging output directory name.
/// \param log_file Logging output file name.
///
/// Both [log_dir] and [log_file] are used to determine the output logging file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use backticks, not brackets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering do we have any code style recommendation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it's the grammar for documentation generator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment, but still want to know about the coding standing / documentation style.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no specific style rules, but I often use backticks and never saw brackets in Ray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's coding consistency I'm fine with it, but backtick is usually used in bash script for execution (i.e. make -jnproc). To reduce mixed usage, I usually use [] for function arguments and referenced function names.

@dentiny dentiny requested a review from rynewang December 3, 2024 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants