Skip to content

Commit

Permalink
Merge branch 'master' into redis-username
Browse files Browse the repository at this point in the history
  • Loading branch information
rueian committed Nov 30, 2024
2 parents fbbfd35 + 19818cf commit bf735e9
Show file tree
Hide file tree
Showing 583 changed files with 14,073 additions and 7,969 deletions.
2 changes: 1 addition & 1 deletion .buildkite/core.rayci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ steps:
commands:
- bazel run //ci/ray_ci:test_in_docker -- //... core
--run-flaky-tests --build-type clang
--parallelism-per-worker 2 --gpus 2
--gpus 4
--build-name coregpubuild
--only-tags multi_gpu
depends_on: coregpubuild
Expand Down
18 changes: 9 additions & 9 deletions .buildkite/others.rayci.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
group: others
depends_on:
- forge
- oss-ci-base_build
steps:
#build
- name: doctestbuild
wanda: ci/docker/doctest.build.wanda.yaml

# dependencies
- label: ":tapioca: build: pip-compile dependencies"
key: pip_compile_dependencies
instance_type: small
Expand All @@ -19,10 +15,13 @@ steps:
- cp -f ./python/requirements_compiled.txt /artifact-mount/
soft_fail: true
job_env: oss-ci-base_test-py3.11
depends_on:
- oss-ci-base_test-multipy
depends_on: oss-ci-base_test-multipy

# docs
- name: doctestbuild
wanda: ci/docker/doctest.build.wanda.yaml
depends_on: oss-ci-base_build

# test
- label: doc tests
instance_type: large
commands:
Expand All @@ -40,6 +39,7 @@ steps:
--skip-ray-installation
depends_on: doctestbuild

# java
- label: ":java: java tests"
tags: java
instance_type: medium
Expand All @@ -48,7 +48,7 @@ steps:
- docker run -i --rm --volume /tmp/artifacts:/artifact-mount --shm-size=2.5gb
"$${RAYCI_WORK_REPO}":"$${RAYCI_BUILD_ID}"-corebuild /bin/bash -iecuo pipefail
"./java/test.sh"
depends_on: [ "corebuild", "forge" ]
depends_on: corebuild

# bot
- label: ":robot_face: CI weekly green metric"
Expand Down
1 change: 1 addition & 0 deletions .vale/styles/config/vocabularies/Data/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Data('s)?
[Dd]iscretizer(s)?
dtype
[Gg]roupby
[Hh]udi
[Ii]ndexable
[Ii]ngest
[Ii]nqueue(s)?
Expand Down
47 changes: 40 additions & 7 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,7 @@ ray_cc_library(
deps = [
":reporter_rpc",
":stats_metric",
"//src/ray/util:size_literals",
"@com_github_grpc_grpc//:grpc_opencensus_plugin",
],
)
Expand Down Expand Up @@ -1626,7 +1627,7 @@ ray_cc_test(
deps = [
":gcs_server_lib",
":gcs_test_util_lib",
"@com_google_googletest//:gtest_main",
"@com_google_googletest//:gtest",
],
)

Expand All @@ -1648,7 +1649,7 @@ ray_cc_test(
deps = [
":gcs_server_lib",
":gcs_test_util_lib",
"@com_google_googletest//:gtest_main",
"@com_google_googletest//:gtest",
],
)

Expand Down Expand Up @@ -1882,7 +1883,7 @@ ray_cc_test(
":gcs_table_storage_test_lib",
":gcs_test_util_lib",
":store_client_test_lib",
"@com_google_googletest//:gtest_main",
"@com_google_googletest//:gtest",
],
)

Expand Down Expand Up @@ -2402,11 +2403,43 @@ ray_cc_test(
)

ray_cc_test(
name = "gcs_export_event_test",
name = "gcs_job_manager_export_event_test",
size = "small",
srcs = glob([
"src/ray/gcs/gcs_server/test/export_api/*.cc",
]),
srcs = ["src/ray/gcs/gcs_server/test/export_api/gcs_job_manager_export_event_test.cc"],
tags = [
"no_windows",
"team:core"
],
deps = [
":gcs_server_lib",
":gcs_server_test_util",
":gcs_test_util_lib",
":ray_mock",
"@com_google_googletest//:gtest_main",
],
)

ray_cc_test(
name = "gcs_actor_manager_export_event_test",
size = "small",
srcs = ["src/ray/gcs/gcs_server/test/export_api/gcs_actor_manager_export_event_test.cc"],
tags = [
"no_windows",
"team:core"
],
deps = [
":gcs_server_lib",
":gcs_server_test_util",
":gcs_test_util_lib",
":ray_mock",
"@com_google_googletest//:gtest_main",
],
)

ray_cc_test(
name = "gcs_node_manager_export_event_test",
size = "small",
srcs = ["src/ray/gcs/gcs_server/test/export_api/gcs_node_manager_export_event_test.cc"],
tags = [
"no_windows",
"team:core"
Expand Down
2 changes: 1 addition & 1 deletion bazel/BUILD.jemalloc
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ configure_make(
out_shared_libs = ["libjemalloc.so"],
# See https://salsa.debian.org/debian/jemalloc/-/blob/c0a88c37a551be7d12e4863435365c9a6a51525f/debian/rules#L8-23
# for why we are setting "--with-lg-page" on non x86 hardware here.
configure_options = ["--disable-static", "--enable-prof"] +
configure_options = ["--disable-static", "--enable-prof", "--enable-prof-libunwind"] +
select({
"@platforms//cpu:x86_64": [],
"//conditions:default": ["--with-lg-page=16"],
Expand Down
8 changes: 5 additions & 3 deletions bazel/ray_deps_setup.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -212,12 +212,14 @@ def ray_deps_setup():

# OpenCensus depends on Abseil so we have to explicitly pull it in.
# This is how diamond dependencies are prevented.
#
# TODO(owner): Upgrade abseil to latest version after protobuf updated, which requires to upgrade `rules_cc` first.
auto_http_archive(
name = "com_google_absl",
sha256 = "5366d7e7fa7ba0d915014d387b66d0d002c03236448e1ba9ef98122c13b35c36",
strip_prefix = "abseil-cpp-20230125.3",
sha256 = "987ce98f02eefbaf930d6e38ab16aa05737234d7afbab2d5c4ea7adbe50c28ed",
strip_prefix = "abseil-cpp-20230802.1",
urls = [
"https://github.com/abseil/abseil-cpp/archive/20230125.3.tar.gz",
"https://github.com/abseil/abseil-cpp/archive/refs/tags/20230802.1.tar.gz",
],
)

Expand Down
1 change: 0 additions & 1 deletion ci/docker/ray-ml.cpu.base.wanda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ froms: ["cr.ray.io/rayproject/ray-py$PYTHON_VERSION-cpu-base"]
dockerfile: docker/ray-ml/Dockerfile
srcs:
- python/requirements.txt
- python/requirements_compiled.txt
- python/requirements/ml/dl-cpu-requirements.txt
- python/requirements/ml/dl-gpu-requirements.txt
- python/requirements/ml/core-requirements.txt
Expand Down
1 change: 0 additions & 1 deletion ci/docker/ray-ml.cuda.base.wanda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ froms: ["cr.ray.io/rayproject/ray-py$PYTHON_VERSION-cu$CUDA_VERSION-base"]
dockerfile: docker/ray-ml/Dockerfile
srcs:
- python/requirements.txt
- python/requirements_compiled.txt
- python/requirements/ml/dl-cpu-requirements.txt
- python/requirements/ml/dl-gpu-requirements.txt
- python/requirements/ml/core-requirements.txt
Expand Down
2 changes: 2 additions & 0 deletions ci/docker/ray.cpu.base.aarch64.wanda.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: "ray-py$PYTHON_VERSION-cpu-base-aarch64"
froms: ["ubuntu:22.04"]
dockerfile: docker/base-deps/Dockerfile
srcs:
- python/requirements_compiled.txt
build_args:
- PYTHON_VERSION
- BASE_IMAGE=ubuntu:22.04
Expand Down
2 changes: 2 additions & 0 deletions ci/docker/ray.cpu.base.wanda.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: "ray-py$PYTHON_VERSION-cpu-base"
froms: ["ubuntu:22.04"]
dockerfile: docker/base-deps/Dockerfile
srcs:
- python/requirements_compiled.txt
build_args:
- PYTHON_VERSION
- BASE_IMAGE=ubuntu:22.04
Expand Down
2 changes: 2 additions & 0 deletions ci/docker/ray.cuda.base.aarch64.wanda.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: "ray-py$PYTHON_VERSION-cu$CUDA_VERSION-base-aarch64"
froms: ["nvidia/cuda:$CUDA_VERSION-devel-ubuntu22.04"]
dockerfile: docker/base-deps/Dockerfile
srcs:
- python/requirements_compiled.txt
build_args:
- PYTHON_VERSION
- BASE_IMAGE=nvidia/cuda:$CUDA_VERSION-devel-ubuntu22.04
Expand Down
2 changes: 2 additions & 0 deletions ci/docker/ray.cuda.base.wanda.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: "ray-py$PYTHON_VERSION-cu$CUDA_VERSION-base"
froms: ["nvidia/cuda:$CUDA_VERSION-devel-ubuntu22.04"]
dockerfile: docker/base-deps/Dockerfile
srcs:
- python/requirements_compiled.txt
build_args:
- PYTHON_VERSION
- BASE_IMAGE=nvidia/cuda:$CUDA_VERSION-devel-ubuntu22.04
Expand Down
6 changes: 2 additions & 4 deletions ci/env/install-core-prerelease-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,5 @@ set -e
# install all unbounded dependencies in setup.py for ray core
# TOOD(scv119) reenable grpcio once https://github.com/grpc/grpc/issues/31885 is fixed.
# TOOD(scv119) reenable jsonschema once https://github.com/ray-project/ray/issues/33411 is fixed.
for dependency in aiosignal frozenlist requests protobuf
do
python -m pip install -U --pre --upgrade-strategy=eager $dependency
done
DEPS=(aiosignal frozenlist requests protobuf)
python -m pip install -U --pre --upgrade-strategy=eager "${DEPS[@]}"
6 changes: 1 addition & 5 deletions cpp/src/ray/runtime/task/native_task_submitter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,7 @@ ObjectID NativeTaskSubmitter::Submit(InvocationSpec &invocation,
scheduling_strategy,
"");
}
std::vector<ObjectID> return_ids;
for (const auto &ref : return_refs) {
return_ids.push_back(ObjectID::FromBinary(ref.object_id()));
}
return return_ids[0];
return ObjectID::FromBinary(return_refs[0].object_id());
}

ObjectID NativeTaskSubmitter::SubmitTask(InvocationSpec &invocation,
Expand Down
2 changes: 1 addition & 1 deletion doc/source/cluster/configure-manage-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Dashboard configurations may differ depending on how you launch Ray Clusters (e.g., local Ray Cluster v.s. KubeRay). Integrations with Prometheus and Grafana are optional for enhanced Dashboard experience.

:::{note}
Ray Dashboard is only intended for interactive development and debugging because the Dashboard UI and the underlying data are not accessible after Clusters are terminated. For production monitoring and debugging, users should rely on [persisted logs](../cluster/kubernetes/user-guides/logging.md), [persisted metrics](./metrics.md), [persisted Ray states](../ray-observability/user-guides/cli-sdk.rst), and other observability tools.
Ray Dashboard is useful for interactive development and debugging because when clusters terminate, the dashboard UI and the underlying data are no longer accessible. For production monitoring and debugging, you should rely on [persisted logs](../cluster/kubernetes/user-guides/persist-kuberay-custom-resource-logs.md), [persisted metrics](./metrics.md), [persisted Ray states](../ray-observability/user-guides/cli-sdk.rst), and other observability tools.
:::

## Changing the Ray Dashboard port
Expand Down
46 changes: 46 additions & 0 deletions doc/source/cluster/kubernetes/configs/loki.log.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Fluent Bit Config
config:
inputs: |
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser docker, cri
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
filters: |
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
outputs: |
[OUTPUT]
Name loki
Match *
Host loki-gateway
Port 80
Labels job=fluent-bit,namespace=$kubernetes['namespace_name'],pod=$kubernetes['pod_name'],container=$kubernetes['container_name']
Auto_Kubernetes_Labels Off
tenant_id test
---
# Grafana Datasource Config
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
editable: true
url: http://loki-gateway.default
jsonData:
timeout: 60
maxLines: 1000
httpHeaderName1: "X-Scope-OrgID"
secureJsonData:
httpHeaderValue1: "test"
2 changes: 1 addition & 1 deletion doc/source/cluster/kubernetes/configs/ray-cluster.gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ spec:
######################headGroupSpec#################################
# head group template and specs, (perhaps 'group' is not needed in the name)
headGroupSpec:
# logical group name, for this called head-group, also can be functional
# logical group name, for this called headgroup, also can be functional
# pod type head or worker
# rayNodeType: head # Not needed since it is under the headgroup
# the following params are used to complete the ray start: ray start --head --block ...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,12 @@ kubectl get pods
# kuberay-operator-7fbdbf8c89-pt8bk 1/1 Running 0 27s
```

KubeRay offers multiple options for operator installations, such as Helm, Kustomize, and a single-namespaced operator. For further information, please refer to [the installation instructions in the KubeRay documentation](https://ray-project.github.io/kuberay/deploy/installation/).
KubeRay offers multiple options for operator installations, such as Helm, Kustomize, and a single-namespaced operator. For further information, see [the installation instructions in the KubeRay documentation](https://ray-project.github.io/kuberay/deploy/installation/).

(raycluster-deploy)=
## Step 3: Deploy a RayCluster custom resource

Once the KubeRay operator is running, we are ready to deploy a RayCluster. To do so, we create a RayCluster Custom Resource (CR) in the `default` namespace.
Once the KubeRay operator is running, you're ready to deploy a RayCluster. Create a RayCluster Custom Resource (CR) in the `default` namespace.

::::{tab-set}

Expand Down
6 changes: 4 additions & 2 deletions doc/source/cluster/kubernetes/user-guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ user-guides/config
user-guides/configuring-autoscaling
user-guides/kuberay-gcs-ft
user-guides/gke-gcs-bucket
user-guides/logging
user-guides/persist-kuberay-custom-resource-logs
user-guides/persist-kuberay-operator-logs
user-guides/gpu
user-guides/tpu
user-guides/rayserve-dev-doc
Expand Down Expand Up @@ -45,7 +46,8 @@ at the {ref}`introductory guide <kuberay-quickstart>` first.
* {ref}`kuberay-gpu`
* {ref}`kuberay-tpu`
* {ref}`kuberay-gcs-ft`
* {ref}`kuberay-logging`
* {ref}`persist-kuberay-custom-resource-logs`
* {ref}`persist-kuberay-operator-logs`
* {ref}`kuberay-dev-serve`
* {ref}`kuberay-pod-command`
* {ref}`kuberay-pod-security`
Expand Down
2 changes: 1 addition & 1 deletion doc/source/cluster/kubernetes/user-guides/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ Here are some of the subfields of the pod `template` to pay attention to:
#### containers
A Ray pod template specifies at minimum one container, namely the container
that runs the Ray processes. A Ray pod template may also specify additional sidecar
containers, for purposes such as {ref}`log processing <kuberay-logging>`. However, the KubeRay operator assumes that
containers, for purposes such as {ref}`log processing <persist-kuberay-custom-resource-logs>`. However, the KubeRay operator assumes that
the first container in the containers list is the main Ray container.
Therefore, make sure to specify any sidecar containers
**after** the main Ray container. In other words, the Ray container should be the **first**
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ See {ref}`Ray Serve end-to-end fault tolerance documentation <serve-e2e-ft-guide

* Ray 2.0.0+
* KubeRay 0.6.0+
* Redis: single shard, one or multiple replicas
* Redis: single shard Redis Cluster or Redis Sentinel, one or multiple replicas

## Quickstart

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
(kuberay-logging)=
(persist-kuberay-custom-resource-logs)=

# Log Persistence
# Persist KubeRay custom resource logs

Logs (both system and application logs) are useful for troubleshooting Ray applications and Clusters. For example, you may want to access system logs if a node terminates unexpectedly.

Expand Down
Loading

0 comments on commit bf735e9

Please sign in to comment.