Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch size errors with chromadb #126

Closed
caufieldjh opened this issue Jan 31, 2025 · 2 comments
Closed

Batch size errors with chromadb #126

caufieldjh opened this issue Jan 31, 2025 · 2 comments

Comments

@caufieldjh
Copy link
Member

@kevinschaper reports:

Is trying to run curategpt ontology index -c terms_mp sqlite:obo:mp and getting:

  File "/Users/kschaper/Monarch/include/emods/.venv/lib/python3.10/site-packages/chromadb/api/types.py", line 521, in validate_batch
    raise ValueError(
ValueError: Batch size 34394 exceeds maximum batch size 5461

a solved problem? I thought I might as well just try chromadb locally for a first try

@justaddcoffee tried it too:

FWIW, I’m seeing this too when I run the above command (Python 3.11.9, curategpt==0.2.1)

(.venv) ~/curategpt $ curategpt ontology index -c terms_mp sqlite:obo:mp
Indexing 34394 objects
Traceback (most recent call last):
  File "/Users/jtr4v/curategpt/.venv/bin/curategpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/curategpt/cli.py", line 2346, in index_ontology_command
    db.insert(view.objects(), collection=collection, model=model)
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/curategpt/store/chromadb_adapter.py", line 141, in insert
    self._insert_or_update(objs, method_name="add", **kwargs)
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/curategpt/store/chromadb_adapter.py", line 209, in _insert_or_update
    method(
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 168, in add
    self._client._add(ids, self.id, embeddings, metadatas, documents, uris)
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/chromadb/rate_limiting/__init__.py", line 45, in wrapper
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/chromadb/api/segment.py", line 373, in _add
    validate_batch(
  File "/Users/jtr4v/curategpt/.venv/lib/python3.11/site-packages/chromadb/api/types.py", line 521, in validate_batch
    raise ValueError(
ValueError: Batch size 34394 exceeds maximum batch size 5461
@iQuxLE
Copy link
Member

iQuxLE commented Jan 31, 2025

@kevinschaper @justaddcoffee @caufieldjh
Did you try to use a different batch-size?

--batch-size 1000

It somehow seems like it tries to put all objects in one go.

EDIT:
there is no option like batch size for this command for now. I'll make a PR.
However, try to index with -m openai:
OR locally, change this line to batch_size = 1000 for now (if you want to use the default sentence-transformer model)

iQuxLE added a commit to iQuxLE/curate-gpt that referenced this issue Feb 6, 2025
caufieldjh added a commit that referenced this issue Feb 7, 2025
fix batch error described in #126
@caufieldjh
Copy link
Member Author

Should be fixed by #127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants