Skip to content

Commit

Permalink
Patch/fix import bleed (#1527)
Browse files Browse the repository at this point in the history
* Feature/tweak actions (#1507)

* up

* tweak actions

* Sync JS SDK, Harmonize Python SDK KG Methods (#1511)

* Feature/move logging (#1492)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

* Feature/dev rebased (#1500)

* Feature/move logging (#1493)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* Update js package (#1498)

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

---------

Co-authored-by: Nolan Tremelling <[email protected]>

* Fix handling for R2R exceptions (#1501)

* fix doc test (#1502)

* Harmonize python SDK KG methods for optional params, add missing JS methods

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>

* Clean up pagination and offset around KG (#1519)

* Move to R2R light for integration testing (#1521)

* fix ollama pdf parser

---------

Co-authored-by: Nolan Tremelling <[email protected]>
  • Loading branch information
emrgnt-cmplxty and NolanTrem authored Oct 30, 2024
1 parent 96e2367 commit 680c327
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 14 deletions.
7 changes: 7 additions & 0 deletions py/core/configs/local_llm.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,10 @@ concurrent_request_limit = 2

[orchestration]
provider = "simple"


[ingestion]
vision_img_model = "ollama/llama3.2-vision"
vision_pdf_model = "ollama/llama3.2-vision"
[ingestion.extra_parsers]
pdf = "basic"
24 changes: 11 additions & 13 deletions py/core/providers/ingestion/r2r/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,23 +202,21 @@ async def parse( # type: ignore
else:
t0 = time.time()
contents = ""
parser_overrides = ingestion_config_override.get(
"parser_overrides", {}

def check_vlm(model_name: str) -> bool:
return "gpt-4o" in model_name

is_not_vlm = not check_vlm(
ingestion_config_override.get("vision_pdf_model")
or self.config.vision_pdf_model
)
if document.document_type.value in parser_overrides:

if document.document_type == DocumentType.PDF and is_not_vlm:
logger.info(
f"Using parser_override for {document.document_type} with input value {parser_overrides[document.document_type.value]}"
f"Reverting to basic PDF parser as the provided is not a proper VLM model."
)
# TODO - Cleanup this approach to be less hardcoded
if (
document.document_type != DocumentType.PDF
or parser_overrides[DocumentType.PDF.value] != "zerox"
):
raise ValueError(
"Only Zerox PDF parser override is available."
)
async for text in self.parsers[
f"zerox_{DocumentType.PDF.value}"
f"basic_{DocumentType.PDF.value}"
].ingest(file_content, **ingestion_config_override):
contents += text + "\n"
else:
Expand Down
1 change: 0 additions & 1 deletion py/r2r.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ require_email_verification = true
default_admin_email = "[email protected]"
default_admin_password = "change_me_immediately"


[completion]
provider = "litellm"
concurrent_request_limit = 256
Expand Down

0 comments on commit 680c327

Please sign in to comment.