-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context-free grammars example does not work with vLLM integration #1233
Comments
I actually run into a similar issue as well when trying to add CFG support to https://github.com/huggingface/text-generation-inference. Same error message with the same code path towards the last 3 function calls (see trace below). Any hints would be appreciated. 2024-10-31T06:56:33.558057Z ERROR text_generation_launcher: Method Decode encountered an error. Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module> sys.exit(app())
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 311, in __call__ return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 778, in main return _main(
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 216, in _main rv = self.invoke(ctx)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/cli.py", line 116, in serve server.serve(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 315, in serve asyncio.run(
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main)
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
> File "/opt/conda/lib/python3.11/site-packages/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 218, in Decode
generations, next_batch, timings = self.model.generate_token(batch)
File "/opt/conda/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py", line 1968, in generate_token
) = batch.next_token_chooser(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/tokens.py", line 364, in __call__
_scores = self.grammar_processor(_scores, self.fsm_grammar_states)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/logits_process.py", line 597, in __call__
allowed_tokens = fsm.get_next_instruction(fsm_grammar_states[i]).tokens
File "/opt/conda/lib/python3.11/site-packages/outlines/fsm/guide.py", line 154, in get_next_instruction
valid_tokens = list(
File "/opt/conda/lib/python3.11/site-packages/outlines/fsm/guide.py", line 189, in iter_valid_token_ids
self._get_parser_state_token_applied(state, int(token_id))
File "/opt/conda/lib/python3.11/site-packages/outlines/fsm/guide.py", line 241, in _get_parser_state_token_applied
prev_token_str = self.tokenizer.decode([[state.prev_token]])[0]
File "/opt/conda/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3999, in decode
return self._decode(
File "/opt/conda/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 654, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
TypeError: argument 'ids': 'list' object cannot be interpreted as an integer |
It's hard to know if it's an issue on their end or ours. Running the same in |
I found the same issue, I think it goes wrong because of this line (outlines/fsm/guide.py:241):
The tokenizer does not expect a 2d list. Changing it to:
Fixes it for me, but I stumble upon another issue after (could be unrelated). |
Hi everyone, Error Message
Code to Reproducefrom vllm import LLM, SamplingParams
llm = LLM(
"neuralmagic/Llama-3.2-1B-Instruct-quantized.w8a8",
enable_prefix_caching=True,
block_size=64,
max_num_batched_tokens=15000,
gpu_memory_utilization=0.96,
max_model_len=15000,
use_v2_block_manager=True,
)
arithmetic_grammar = """
?start: expression
?expression: term (("+" | "-") term)*
?term: factor (("*" | "/") factor)*
?factor: NUMBER
| "-" factor
| "(" expression ")"
%import common.NUMBER
"""
from outlines import generate, models
model = models.VLLM(llm)
generator = generate.cfg(model, arithmetic_grammar)
sampling_params = SamplingParams(temperature=0.3, top_p=0.2, max_tokens=20)
sequence = generator(
"Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:",
sampling_params=sampling_params,
) Expected BehaviorI expected the code to generate a sequence based on the defined grammar using the Actual BehaviorThe code throws a Environment
Additional ContextIs the CFG Logits processor not yet supported for Thank you! |
Describe the issue as clearly as possible:
When running provided arithmetic grammar example with vLLM, I get an error
TypeError: Error in model execution: argument 'ids': 'list' object cannot be interpreted as an integer
. I presume this comes from de-tokenization, but still not sure how to fix it. Any suggestions would be welcome, as we have used outlines with vLLM successfully on a number of other use cases and really like the tool!Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
absl-py==1.0.0
accelerate==0.31.0
aiohttp==3.8.5
aiohttp-cors==0.7.0
aiosignal==1.2.0
airportsdata==20241001
annotated-types==0.7.0
anyio==3.5.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
astor==0.8.1
asttokens==2.0.5
astunparse==1.6.3
async-timeout==4.0.2
attrs==24.2.0
audioread==3.0.1
azure-core==1.30.2
azure-cosmos==4.3.1
azure-identity==1.17.1
azure-storage-blob==12.19.1
azure-storage-file-datalake==12.14.0
backcall==0.2.0
bcrypt==3.2.0
beautifulsoup4==4.12.2
black==23.3.0
bleach==4.1.0
blinker==1.4
blis==0.7.11
boto3==1.34.39
botocore==1.34.39
Brotli==1.0.9
cachetools==5.4.0
catalogue==2.0.10
category-encoders==2.6.3
certifi==2023.7.22
cffi==1.15.1
chardet==4.0.0
charset-normalizer==2.0.4
circuitbreaker==1.4.0
click==8.0.4
cloudpathlib==0.16.0
cloudpickle==2.2.1
cmdstanpy==1.2.2
colorful==0.5.6
comm==0.1.2
confection==0.1.4
configparser==5.2.0
contourpy==1.0.5
cryptography==41.0.3
cycler==0.11.0
cymem==2.0.8
Cython==0.29.32
dacite==1.8.1
databricks-automl-runtime==0.2.21
databricks-feature-engineering==0.6.0
databricks-sdk==0.20.0
dataclasses-json==0.6.7
datasets==2.19.1
dbl-tempo==0.1.26
dbus-python==1.2.18
debugpy==1.6.7
decorator==5.1.1
deepspeed==0.14.4
defusedxml==0.7.1
Deprecated==1.2.14
dill==0.3.6
diskcache==5.6.3
distlib==0.3.8
distro==1.7.0
distro-info==1.1+ubuntu0.2
dm-tree==0.1.8
einops==0.8.0
entrypoints==0.4
evaluate==0.4.2
executing==0.8.3
facets-overview==1.1.1
Farama-Notifications==0.0.4
fastapi==0.115.4
fastjsonschema==2.20.0
fasttext==0.9.2
filelock==3.13.4
flash-attn==2.5.9.post1
Flask==2.2.5
flatbuffers==24.3.25
fonttools==4.25.0
frozenlist==1.3.3
fsspec==2023.5.0
future==0.18.3
gast==0.4.0
gguf==0.10.0
gitdb==4.0.11
GitPython==3.1.27
google-api-core==2.18.0
google-auth==2.21.0
google-auth-oauthlib==1.0.0
google-cloud-core==2.4.1
google-cloud-storage==2.10.0
google-crc32c==1.5.0
google-pasta==0.2.0
google-resumable-media==2.7.1
googleapis-common-protos==1.63.0
greenlet==2.0.1
grpcio==1.60.0
grpcio-status==1.60.0
gunicorn==20.1.0
gviz-api==1.10.0
gymnasium==0.28.1
h11==0.14.0
h5py==3.10.0
hjson==3.1.0
holidays==0.45
horovod @ git+https://github.com/wenfeiy-db/horovod.git@d510b1d385628f8ac5770199c0824fd5b7e01394
htmlmin==0.1.12
httpcore==1.0.5
httplib2==0.20.2
httptools==0.6.4
httpx==0.27.0
huggingface-hub==0.23.4
idna==3.4
ImageHash==4.3.1
imageio==2.31.1
imbalanced-learn==0.11.0
importlib-metadata==6.0.0
importlib_resources==6.4.0
interegular==0.3.3
ipyflow-core==0.0.198
ipykernel==6.25.1
ipython==8.15.0
ipython-genutils==0.2.0
ipywidgets @ https://databricks-build-artifacts-manual-staging.s3-accelerate.amazonaws.com/ipywidgets/ipywidgets-7.7.2-2databricksnojsdeps-py2.py3-none-any.whl?AWSAccessKeyId=AKIAX7HWM34HCSVHYQ7M&Expires=2028837235&Signature=gJ%2BjzENPoM6UKsDxe1M3VIrgWco%3D#sha256=903ead20c8d40de671853515fcad2f34b43ebf3eff80e4df3f876b8dd64c903b
isodate==0.6.1
itsdangerous==2.0.1
jax-jumpy==1.0.0
jedi==0.18.1
jeepney==0.7.1
Jinja2==3.1.2
jiter==0.6.1
jmespath==0.10.0
joblib==1.2.0
joblibspark==0.5.1
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter-server==1.23.4
jupyter_client==7.4.9
jupyter_core==5.3.0
jupyterlab-pygments==0.1.2
keras==3.2.1
keyring==23.5.0
kiwisolver==1.4.4
langchain==0.1.20
langchain-community==0.0.38
langchain-core==0.1.52
langchain-text-splitters==0.0.2
langcodes==3.4.0
langsmith==0.1.63
language_data==1.2.0
lark==1.2.2
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lazy_loader==0.2
libclang==15.0.6.1
librosa==0.10.1
lightgbm==4.3.0
linkify-it-py==2.0.0
llvmlite==0.43.0
lm-format-enforcer==0.10.6
lxml==4.9.2
lz4==4.3.2
Mako==1.2.0
marisa-trie==1.1.1
Markdown==3.4.1
markdown-it-py==2.2.0
MarkupSafe==2.1.1
marshmallow==3.21.2
matplotlib==3.7.2
matplotlib-inline==0.1.6
mdit-py-plugins==0.3.0
mdurl==0.1.0
memray==1.13.4
mistral_common==1.4.4
mistune==0.8.4
ml-dtypes==0.3.2
mlflow-skinny==2.15.1
more-itertools==8.10.0
mosaicml-streaming==0.7.4
mpmath==1.3.0
msal==1.30.0
msal-extensions==1.2.0
msgpack==1.0.8
msgspec==0.18.6
multidict==6.0.2
multimethod==1.12
multiprocess==0.70.14
murmurhash==1.0.10
mypy-extensions==0.4.3
namex==0.0.8
nbclassic==0.5.5
nbclient==0.5.13
nbconvert==6.5.4
nbformat==5.7.0
nest-asyncio==1.5.6
networkx==3.1
ninja==1.11.1.1
nltk==3.8.1
notebook==6.5.4
notebook_shim==0.2.2
numba==0.60.0
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.555.43
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.82
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.0
oci==2.126.4
openai==1.52.2
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.10.0.84
opentelemetry-api==1.25.0
opentelemetry-sdk==1.25.0
opentelemetry-semantic-conventions==0.46b0
opt-einsum==3.3.0
optree==0.12.1
orjson==3.10.6
outlines==0.1.1
outlines_core==0.1.14
packaging==23.2
pandas==1.5.3
pandocfilters==1.5.0
paramiko==3.4.0
parso==0.8.3
partial-json-parser==0.2.1.1.post4
pathspec==0.10.3
patsy==0.5.3
petastorm==0.12.1
pexpect==4.8.0
phik==0.12.4
pickleshare==0.7.5
pillow==10.4.0
platformdirs==3.10.0
plotly==5.9.0
pmdarima==2.0.4
pooch==1.8.1
portalocker==2.10.1
preshed==3.0.9
prometheus-fastapi-instrumentator==7.0.0
prometheus_client==0.21.0
prompt-toolkit==3.0.36
prophet==1.1.5
proto-plus==1.24.0
protobuf==4.24.1
psutil==5.9.0
psycopg2==2.9.3
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==8.0.0
py-spy==0.3.14
pyairports==2.1.1
pyarrow==14.0.1
pyarrow-hotfix==0.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.13.1
pyccolo==0.0.52
pycountry==24.6.1
pycparser==2.21
pydantic==2.9.2
pydantic_core==2.23.4
Pygments==2.15.1
PyGObject==3.42.1
PyJWT==2.3.0
PyNaCl==1.5.0
pyodbc==4.0.38
pyOpenSSL==23.2.0
pyparsing==3.0.9
pyrsistent==0.18.0
pytesseract==0.3.10
python-apt==2.4.0+ubuntu3
python-dateutil==2.8.2
python-dotenv==1.0.1
python-editor==1.0.4
python-lsp-jsonrpc==1.1.1
python-snappy==0.6.1
pytz==2022.7
PyWavelets==1.4.1
PyYAML==6.0
pyzmq==23.2.0
ray==2.35.0
referencing==0.35.1
regex==2022.7.9
requests==2.31.0
requests-oauthlib==1.3.1
rich==13.7.1
rpds-py==0.20.0
rsa==4.9
s3transfer==0.10.2
safetensors==0.4.2
scikit-image==0.20.0
scikit-learn==1.3.0
scipy==1.11.1
seaborn==0.12.2
SecretStorage==3.3.1
Send2Trash==1.8.0
sentence-transformers==2.7.0
sentencepiece==0.2.0
shap==0.44.0
simplejson==3.17.6
six==1.16.0
slicer==0.0.7
smart-open==5.2.1
smmap==5.0.0
sniffio==1.2.0
soundfile==0.12.1
soupsieve==2.4
soxr==0.3.7
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
spark-tensorflow-distributor==1.0.0
SQLAlchemy==1.4.39
sqlparse==0.4.2
srsly==2.4.8
ssh-import-id==5.11
stack-data==0.2.0
stanio==0.5.1
starlette==0.41.2
statsmodels==0.14.0
sympy==1.11.1
tangled-up-in-unicode==0.2.0
tenacity==8.2.2
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorboard_plugin_profile==2.15.1
tensorboardX==2.6.2.2
tensorflow==2.16.1
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.37.1
termcolor==2.4.0
terminado==0.17.1
textual==0.63.3
tf_keras==2.16.0
thinc==8.2.3
threadpoolctl==2.2.0
tifffile==2021.7.2
tiktoken==0.7.0
tinycss2==1.2.1
tokenize-rt==4.2.1
tokenizers==0.20.1
torch==2.4.0
torcheval==0.0.7
torchvision==0.19.0
tornado==6.3.2
tqdm==4.65.0
traitlets==5.7.1
transformers==4.46.0
triton==3.0.0
typeguard==2.13.3
typer==0.9.4
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2022.1
uc-micro-py==1.0.1
ujson==5.4.0
unattended-upgrades==0.1
urllib3==1.26.16
uvicorn==0.32.0
uvloop==0.21.0
virtualenv==20.24.2
visions==0.7.5
vllm==0.6.3
wadllib==1.3.6
wasabi==1.1.2
watchfiles==0.24.0
wcwidth==0.2.5
weasel==0.3.4
webencodings==0.5.1
websocket-client==0.58.0
websockets==13.1
Werkzeug==2.2.3
wordcloud==1.9.3
wrapt==1.14.1
xformers==0.0.27.post2
xgboost==2.0.3
xxhash==3.4.1
yarl==1.8.1
ydata-profiling==4.5.1
zipp==3.11.0
zstd==1.5.5.1
The text was updated successfully, but these errors were encountered: