How much GPU memory needed to run example ? #70

fangming-he · 2023-11-26T03:49:39Z

How much CUDA memory are required to run the example?

While running exmaple with command "CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming"
Below error pop up:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacty of 7.92 GiB of which 131.69 MiB is free. Including non-PyTorch memory, this process has 7.79 GiB memory in use. Of the allocated memory 7.03 GiB is allocated by PyTorch, and 131.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

meganmou · 2024-03-02T22:24:12Z

Did you end up finding out the answer to this? I ran into the same issue with a 16 GB GPU trying to run on a GCP VM instance.

scatyf3 · 2024-03-08T06:38:20Z

I ran streamingLLM on an A100 (40GB), using Llama-2-13b and Aquila2-7B, but they were both Out of menory :( I don't know what I did wrong

fangming-he · 2024-03-08T06:45:13Z

Did you enable_streaming?
If enable_streaming and has 32GB memory on GPU, it should be OK to run it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How much GPU memory needed to run example ? #70

How much GPU memory needed to run example ? #70

fangming-he commented Nov 26, 2023

meganmou commented Mar 2, 2024

scatyf3 commented Mar 8, 2024

fangming-he commented Mar 8, 2024 •

edited

Loading

How much GPU memory needed to run example ? #70

How much GPU memory needed to run example ? #70

Comments

fangming-he commented Nov 26, 2023

meganmou commented Mar 2, 2024

scatyf3 commented Mar 8, 2024

fangming-he commented Mar 8, 2024 • edited Loading

fangming-he commented Mar 8, 2024 •

edited

Loading