Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest Docker Image failing for A40 GPU #2763

Open
2 of 4 tasks
SMAntony opened this issue Nov 20, 2024 · 0 comments
Open
2 of 4 tasks

Latest Docker Image failing for A40 GPU #2763

SMAntony opened this issue Nov 20, 2024 · 0 comments

Comments

@SMAntony
Copy link

System Info

When testing TGI Docker on 2xA40 GPUs to load Llama3.1-70b in eetq quantization. I ran into a CUDA illegal memory error

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Run the docker container with the following cmd
    --model-id meta-llama/Llama-3.1-70B-Instruct --quantize eetq --max-total-tokens 5000 --num-shard 2 --max-input-tokens 3600 --max-batch-prefill-tokens 3600 --port 8010

  2. Model loads and webserver is connected

2024-11-20T14:37:16.307700574Z     shard_uds_path: "/tmp/text-generation-server",
2024-11-20T14:37:16.307705327Z     master_addr: "localhost",
2024-11-20T14:37:16.307709620Z     master_port: 29500,
2024-11-20T14:37:16.307713867Z     huggingface_hub_cache: None,
2024-11-20T14:37:16.307723983Z     weights_cache_override: None,
2024-11-20T14:37:16.307728183Z     disable_custom_kernels: false,
2024-11-20T14:37:16.307732404Z     cuda_memory_fraction: 1.0,
2024-11-20T14:37:16.307736494Z     rope_scaling: None,
2024-11-20T14:37:16.307740543Z     rope_factor: None,
2024-11-20T14:37:16.307744724Z     json_output: false,
2024-11-20T14:37:16.307750164Z     otlp_endpoint: None,
2024-11-20T14:37:16.307754647Z     otlp_service_name: "text-generation-inference.router",
2024-11-20T14:37:16.307758823Z     cors_allow_origin: [],
2024-11-20T14:37:16.307762890Z     api_key: None,
2024-11-20T14:37:16.307767914Z     watermark_gamma: None,
2024-11-20T14:37:16.307772014Z     watermark_delta: None,
2024-11-20T14:37:16.307776120Z     ngrok: false,
2024-11-20T14:37:16.307780153Z     ngrok_authtoken: None,
2024-11-20T14:37:16.307784313Z     ngrok_edge: None,
2024-11-20T14:37:16.307792724Z     tokenizer_config_path: None,
2024-11-20T14:37:16.307796893Z     disable_grammar_support: false,
2024-11-20T14:37:16.307801247Z     env: false,
2024-11-20T14:37:16.307805717Z     max_client_batch_size: 4,
2024-11-20T14:37:16.307810014Z     lora_adapters: None,
2024-11-20T14:37:16.307814093Z     usage_stats: On,
2024-11-20T14:37:16.307818180Z }
2024-11-20T14:37:16.307822804Z 2024-11-20T14:37:16.307146Z  INFO hf_hub: Token file not found "/data/token"
2024-11-20T14:37:18.096338967Z 2024-11-20T14:37:18.096215Z  INFO text_generation_launcher: Using attention flashinfer - Prefix caching true
2024-11-20T14:37:18.096361238Z 2024-11-20T14:37:18.096237Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-11-20T14:37:18.096367908Z 2024-11-20T14:37:18.096240Z  INFO text_generation_launcher: Sharding model on 2 processes
2024-11-20T14:48:21.626673860Z 2024-11-20T14:48:21.626574Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-11-20T14:48:31.637392292Z 2024-11-20T14:48:31.636998Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-20T14:48:31.637450527Z 2024-11-20T14:48:31.637200Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-11-20T14:48:41.648744837Z 2024-11-20T14:48:41.648407Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-20T14:48:41.648799649Z 2024-11-20T14:48:41.648444Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-11-20T14:48:51.659683331Z 2024-11-20T14:48:51.659423Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-20T14:48:51.659742391Z 2024-11-20T14:48:51.659534Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-11-20T14:49:01.670074897Z 2024-11-20T14:49:01.669799Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-20T14:49:01.670802699Z 2024-11-20T14:49:01.670676Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-11-20T14:49:08.917281370Z 2024-11-20T14:49:08.916960Z  INFO text_generation_launcher: Using experimental prefill chunking = False
2024-11-20T14:49:09.885724200Z 2024-11-20T14:49:09.885562Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-11-20T14:49:09.978852158Z 2024-11-20T14:49:09.978651Z  INFO shard-manager: text_generation_launcher: Shard ready in 528.928728549s rank=0
2024-11-20T14:49:10.147624439Z 2024-11-20T14:49:10.147354Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-11-20T14:49:10.180444763Z 2024-11-20T14:49:10.180174Z  INFO shard-manager: text_generation_launcher: Shard ready in 529.124012995s rank=1
2024-11-20T14:49:10.189117339Z 2024-11-20T14:49:10.188842Z  INFO text_generation_launcher: Starting Webserver
2024-11-20T14:49:10.253657653Z 2024-11-20T14:49:10.253383Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-11-20T14:49:10.292598176Z 2024-11-20T14:49:10.292416Z  INFO text_generation_launcher: Using optimized Triton indexing kernels.
2024-11-20T14:49:17.103097980Z 2024-11-20T14:49:17.102845Z  INFO text_generation_launcher: KV-cache blocks: 23677, size: 1
2024-11-20T14:49:17.160879057Z 2024-11-20T14:49:17.160595Z  INFO text_generation_launcher: Cuda Graphs are enabled for sizes [32, 16, 8, 4, 2, 1]
2024-11-20T14:49:19.251790998Z 2024-11-20T14:49:19.251316Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:137: Setting max batch total tokens to 23677
2024-11-20T14:49:19.251833781Z 2024-11-20T14:49:19.251381Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:166: Using backend V3
2024-11-20T14:49:19.251840354Z 2024-11-20T14:49:19.251425Z  INFO text_generation_router::server: router/src/server.rs:1730: Using the Hugging Face API
2024-11-20T14:49:19.251845201Z 2024-11-20T14:49:19.251471Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/data/token"
2024-11-20T14:49:19.417317634Z 2024-11-20T14:49:19.416925Z  INFO text_generation_router::server: router/src/server.rs:2427: Serving revision 945c8663693130f8be2ee66210e062158b2a9693 of model meta-llama/Llama-3.1-70B-Instruct
2024-11-20T14:49:23.180377525Z 2024-11-20T14:49:23.179916Z  INFO text_generation_router::server: router/src/server.rs:1863: Using config Some(Llama)
2024-11-20T14:49:23.411177365Z 2024-11-20T14:49:23.410741Z  WARN text_generation_router::server: router/src/server.rs:2003: Invalid hostname, defaulting to 0.0.0.0
2024-11-20T14:49:23.512079371Z 2024-11-20T14:49:23.511694Z  INFO text_generation_router::server: router/src/server.rs:2389: Connected
  1. Hit the webserver (as simple as visiting the url) and it results in a CUDA illegal memory error
2024-11-20T14:49:01.670074897Z 2024-11-20T14:49:01.669799Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-20T14:49:01.670802699Z 2024-11-20T14:49:01.670676Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2024-11-20T14:49:08.917281370Z 2024-11-20T14:49:08.916960Z  INFO text_generation_launcher: Using experimental prefill chunking = False
2024-11-20T14:49:09.885724200Z 2024-11-20T14:49:09.885562Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-11-20T14:49:09.978852158Z 2024-11-20T14:49:09.978651Z  INFO shard-manager: text_generation_launcher: Shard ready in 528.928728549s rank=0
2024-11-20T14:49:10.147624439Z 2024-11-20T14:49:10.147354Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-1
2024-11-20T14:49:10.180444763Z 2024-11-20T14:49:10.180174Z  INFO shard-manager: text_generation_launcher: Shard ready in 529.124012995s rank=1
2024-11-20T14:49:10.189117339Z 2024-11-20T14:49:10.188842Z  INFO text_generation_launcher: Starting Webserver
2024-11-20T14:49:10.253657653Z 2024-11-20T14:49:10.253383Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-11-20T14:49:10.292598176Z 2024-11-20T14:49:10.292416Z  INFO text_generation_launcher: Using optimized Triton indexing kernels.
2024-11-20T14:49:17.103097980Z 2024-11-20T14:49:17.102845Z  INFO text_generation_launcher: KV-cache blocks: 23677, size: 1
2024-11-20T14:49:17.160879057Z 2024-11-20T14:49:17.160595Z  INFO text_generation_launcher: Cuda Graphs are enabled for sizes [32, 16, 8, 4, 2, 1]
2024-11-20T14:49:19.251790998Z 2024-11-20T14:49:19.251316Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:137: Setting max batch total tokens to 23677
2024-11-20T14:49:19.251833781Z 2024-11-20T14:49:19.251381Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:166: Using backend V3
2024-11-20T14:49:19.251840354Z 2024-11-20T14:49:19.251425Z  INFO text_generation_router::server: router/src/server.rs:1730: Using the Hugging Face API
2024-11-20T14:49:19.251845201Z 2024-11-20T14:49:19.251471Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/data/token"
2024-11-20T14:49:19.417317634Z 2024-11-20T14:49:19.416925Z  INFO text_generation_router::server: router/src/server.rs:2427: Serving revision 945c8663693130f8be2ee66210e062158b2a9693 of model meta-llama/Llama-3.1-70B-Instruct
2024-11-20T14:49:23.180377525Z 2024-11-20T14:49:23.179916Z  INFO text_generation_router::server: router/src/server.rs:1863: Using config Some(Llama)
2024-11-20T14:49:23.411177365Z 2024-11-20T14:49:23.410741Z  WARN text_generation_router::server: router/src/server.rs:2003: Invalid hostname, defaulting to 0.0.0.0
2024-11-20T14:49:23.512079371Z 2024-11-20T14:49:23.511694Z  INFO text_generation_router::server: router/src/server.rs:2389: Connected
2024-11-20T14:50:03.072008375Z 2024-11-20T14:50:03.071512Z ERROR health:health:prefill{id=18446744073709551615 size=1}:prefill{id=18446744073709551615 size=1}: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: transport error
2024-11-20T14:50:03.241251620Z 2024-11-20T14:50:03.240830Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
2024-11-20T14:50:03.241283441Z 2024-11-20 14:40:22.745 | INFO     | text_generation_server.utils.import_utils:<module>:80 - Detected system cuda
2024-11-20T14:50:03.241298591Z /opt/conda/lib/python3.11/site-packages/text_generation_server/layers/gptq/triton.py:242: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.241301894Z   @custom_fwd(cast_inputs=torch.float16)
2024-11-20T14:50:03.241304333Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.241306234Z   @custom_fwd
2024-11-20T14:50:03.241307682Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.241309522Z   @custom_bwd
2024-11-20T14:50:03.241310908Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.241312898Z   @custom_fwd
2024-11-20T14:50:03.241314325Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.241316296Z   @custom_bwd
2024-11-20T14:50:03.241317659Z [rank0]:[E1120 14:50:01.589782869 ProcessGroupNCCL.cpp:1515] [PG 0 (default_pg) Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
2024-11-20T14:50:03.241319896Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2024-11-20T14:50:03.241321391Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2024-11-20T14:50:03.241323663Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-11-20T14:50:03.241328079Z Exception raised from c10_cuda_check_implementation at /opt/conda/conda-bld/pytorch_1720538435607/work/c10/cuda/CUDAException.cpp:43 (most recent call first):
2024-11-20T14:50:03.241329731Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7370874abf86 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.241331282Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x73708745ad10 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.241333711Z frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x737087587f08 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10_cuda.so)
2024-11-20T14:50:03.241335322Z frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7370377eabc6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241338142Z frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7370377efde0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241339673Z frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7370377f6a9a in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241341215Z frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7370377f8edc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241342555Z frame #7: <unknown function> + 0xd3b75 (0x7370909e0b75 in /opt/conda/bin/../lib/libstdc++.so.6)
2024-11-20T14:50:03.241344806Z frame #8: <unknown function> + 0x94ac3 (0x737090b84ac3 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.241346206Z frame #9: clone + 0x44 (0x737090c15a04 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.241349128Z terminate called after throwing an instance of 'c10::DistBackendError'
2024-11-20T14:50:03.241353537Z   what():  [PG 0 (default_pg) Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
2024-11-20T14:50:03.241354976Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2024-11-20T14:50:03.241356503Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2024-11-20T14:50:03.241357908Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-11-20T14:50:03.241360968Z Exception raised from c10_cuda_check_implementation at /opt/conda/conda-bld/pytorch_1720538435607/work/c10/cuda/CUDAException.cpp:43 (most recent call first):
2024-11-20T14:50:03.241362359Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7370874abf86 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.241363736Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x73708745ad10 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.241365139Z frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x737087587f08 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10_cuda.so)
2024-11-20T14:50:03.241366487Z frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7370377eabc6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241368087Z frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7370377efde0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241369440Z frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7370377f6a9a in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241370839Z frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7370377f8edc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241372192Z frame #7: <unknown function> + 0xd3b75 (0x7370909e0b75 in /opt/conda/bin/../lib/libstdc++.so.6)
2024-11-20T14:50:03.241373566Z frame #8: <unknown function> + 0x94ac3 (0x737090b84ac3 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.241375177Z frame #9: clone + 0x44 (0x737090c15a04 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.241378091Z Exception raised from ncclCommWatchdog at /opt/conda/conda-bld/pytorch_1720538435607/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1521 (most recent call first):
2024-11-20T14:50:03.241379629Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7370874abf86 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.241381126Z frame #1: <unknown function> + 0xe3ec34 (0x737037478c34 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.241382696Z frame #2: <unknown function> + 0xd3b75 (0x7370909e0b75 in /opt/conda/bin/../lib/libstdc++.so.6)
2024-11-20T14:50:03.241384508Z frame #3: <unknown function> + 0x94ac3 (0x737090b84ac3 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.241385886Z frame #4: clone + 0x44 (0x737090c15a04 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.241387275Z  rank=0
2024-11-20T14:50:03.241389538Z 2024-11-20T14:50:03.240890Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 6 rank=0
2024-11-20T14:50:03.252138023Z 2024-11-20T14:50:03.251958Z ERROR text_generation_launcher: Shard 0 crashed
2024-11-20T14:50:03.252164406Z 2024-11-20T14:50:03.251982Z  INFO text_generation_launcher: Terminating webserver
2024-11-20T14:50:03.252167387Z 2024-11-20T14:50:03.252001Z  INFO text_generation_launcher: Waiting for webserver to gracefully shutdown
2024-11-20T14:50:03.252576404Z 2024-11-20T14:50:03.252296Z  INFO text_generation_router::server: router/src/server.rs:2481: signal received, starting graceful shutdown
2024-11-20T14:50:03.391542207Z 2024-11-20T14:50:03.391170Z ERROR health:health:prefill{id=18446744073709551615 size=1}:prefill{id=18446744073709551615 size=1}: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: transport error
2024-11-20T14:50:03.643555814Z 2024-11-20T14:50:03.643097Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
2024-11-20T14:50:03.643620480Z 2024-11-20 14:40:22.733 | INFO     | text_generation_server.utils.import_utils:<module>:80 - Detected system cuda
2024-11-20T14:50:03.643626729Z /opt/conda/lib/python3.11/site-packages/text_generation_server/layers/gptq/triton.py:242: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.643632980Z   @custom_fwd(cast_inputs=torch.float16)
2024-11-20T14:50:03.643638110Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.643642966Z   @custom_fwd
2024-11-20T14:50:03.643648639Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.643653913Z   @custom_bwd
2024-11-20T14:50:03.643658580Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.643664583Z   @custom_fwd
2024-11-20T14:50:03.643669328Z /opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
2024-11-20T14:50:03.643673846Z   @custom_bwd
2024-11-20T14:50:03.643678130Z [rank1]:[E1120 14:50:01.584143246 ProcessGroupNCCL.cpp:1515] [PG 0 (default_pg) Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
2024-11-20T14:50:03.643683080Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2024-11-20T14:50:03.643687450Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2024-11-20T14:50:03.643692308Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-11-20T14:50:03.643704132Z Exception raised from c10_cuda_check_implementation at /opt/conda/conda-bld/pytorch_1720538435607/work/c10/cuda/CUDAException.cpp:43 (most recent call first):
2024-11-20T14:50:03.643708832Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x73bbc83b0f86 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.643713345Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x73bbc835fd10 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.643718212Z frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x73bc1928ff08 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10_cuda.so)
2024-11-20T14:50:03.643722615Z frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x73bbc95eabc6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643727805Z frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x73bbc95efde0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643756878Z frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x73bbc95f6a9a in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643761742Z frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x73bbc95f8edc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643766588Z frame #7: <unknown function> + 0xd3b75 (0x73bc226c7b75 in /opt/conda/bin/../lib/libstdc++.so.6)
2024-11-20T14:50:03.643784819Z frame #8: <unknown function> + 0x94ac3 (0x73bc2286bac3 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.643789509Z frame #9: clone + 0x44 (0x73bc228fca04 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.643797982Z terminate called after throwing an instance of 'c10::DistBackendError'
2024-11-20T14:50:03.643802349Z   what():  [PG 0 (default_pg) Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
2024-11-20T14:50:03.643806939Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2024-11-20T14:50:03.643811385Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2024-11-20T14:50:03.643815842Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-11-20T14:50:03.643833622Z Exception raised from c10_cuda_check_implementation at /opt/conda/conda-bld/pytorch_1720538435607/work/c10/cuda/CUDAException.cpp:43 (most recent call first):
2024-11-20T14:50:03.643838087Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x73bbc83b0f86 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.643842531Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x73bbc835fd10 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.643847179Z frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x73bc1928ff08 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10_cuda.so)
2024-11-20T14:50:03.643851539Z frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x73bbc95eabc6 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643855971Z frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x73bbc95efde0 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643860431Z frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x73bbc95f6a9a in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643864747Z frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x73bbc95f8edc in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643869141Z frame #7: <unknown function> + 0xd3b75 (0x73bc226c7b75 in /opt/conda/bin/../lib/libstdc++.so.6)
2024-11-20T14:50:03.643873694Z frame #8: <unknown function> + 0x94ac3 (0x73bc2286bac3 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.643877987Z frame #9: clone + 0x44 (0x73bc228fca04 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.643886361Z Exception raised from ncclCommWatchdog at /opt/conda/conda-bld/pytorch_1720538435607/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1521 (most recent call first):
2024-11-20T14:50:03.643890841Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x73bbc83b0f86 in /opt/conda/lib/python3.11/site-packages/torch/lib/libc10.so)
2024-11-20T14:50:03.643895274Z frame #1: <unknown function> + 0xe3ec34 (0x73bbc9278c34 in /opt/conda/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
2024-11-20T14:50:03.643899824Z frame #2: <unknown function> + 0xd3b75 (0x73bc226c7b75 in /opt/conda/bin/../lib/libstdc++.so.6)
2024-11-20T14:50:03.643904908Z frame #3: <unknown function> + 0x94ac3 (0x73bc2286bac3 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.643925931Z frame #4: clone + 0x44 (0x73bc228fca04 in /lib/x86_64-linux-gnu/libc.so.6)
2024-11-20T14:50:03.643930477Z  rank=1
2024-11-20T14:50:03.643935968Z 2024-11-20T14:50:03.643172Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 6 rank=1
2024-11-20T14:50:03.852901332Z 2024-11-20T14:50:03.852675Z  INFO text_generation_launcher: webserver terminated
2024-11-20T14:50:03.852928944Z 2024-11-20T14:50:03.852707Z  INFO text_generation_launcher: Shutting down shards
2024-11-20T14:50:03.852955241Z Error: ShardFailed

Expected behavior

Expecting model url to be hit and inferenced properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant