-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(document-search): add support for images #121
Conversation
Code Coverage Summary
Diff against main
Results for commit: c5ca279 Minimum allowed coverage is ♻️ This comment has been updated with latest results |
Trivy scanning results. .venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA (secrets)Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0) MEDIUM: JWT (jwt-token) .venv/lib/python3.10/site-packages/litellm/llms/huggingface_llms_metadata/hf_text_generation_models.txt (secrets)Total: 1 (MEDIUM: 0, HIGH: 0, CRITICAL: 1) CRITICAL: HuggingFace (hugging-face-access-token) .venv/lib/python3.10/site-packages/litellm/proxy/_types.py (secrets)Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0) MEDIUM: Slack (slack-web-hook) |
|
||
from pydantic import BaseModel | ||
from typing_extensions import TypeVar | ||
|
||
ChatFormat = List[Dict[str, str]] | ||
ChatFormat = List[Dict[str, Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right - even when sending images the format is still dict of strings, isn't it? Besides, you don't seem to use ChatFormat
/ the Prompt
classes in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it because mypy was complaining about this part:
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": f"{prompt}"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{img_base64}"},
},
],
}
]
return await self.model.client.call(messages, self.model_options)
there is a key with list instead of string. I'm not sure why it doesn't complain now when I reverted it to str 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it's back:
error: Argument 1 to "call" of "LiteLLMClient" has incompatible type "list[dict[str, Sequence[object]]]"; expected "list[dict[str, str]]" [arg-type]
Found 1 error in 1 file (checked 70 source files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank change it to list[dict[str, str | list[dict]]]
. Any
is very rarely the right answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it didn't help, but introduced one more error from mypy. I tried to fight with it a bit but without success and decided to simply add ignore line there (this is to be changed anyway to use our prompt template with image support)
if self.__class__ == UnstructuredDefaultProvider: | ||
chunked_elements = chunk_elements(elements, **self.chunking_kwargs) | ||
return [to_text_element(element, document_meta) for element in chunked_elements] | ||
image_elements = [e for e in elements if e.category == "Image"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably use Unstructured's ElementType.Image
instead of a raw string
async def _chunk_and_convert( | ||
self, elements: list[UnstructuredElement], document_meta: DocumentMeta, document_path: Path | ||
) -> list[Element]: | ||
if self.__class__ == UnstructuredDefaultProvider: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks confusing to me. Is this about having a behavior on the base class that the classes inheriting from it don't share?
Inheriting a class and having the behavior change magically, even though I haven't changed it myself, would be surprising and not something I would like to encounter as a programmer. Seems like this behavior should be in a method that inheriting class can overwrite if they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both inheriting classes need to do the same preprocessing:
image_elements = [e for e in elements if e.category == ElementType.IMAGE]
other_elements = [e for e in elements if e.category != ElementType.IMAGE]
chunked_other_elements = chunk_elements(other_elements, **self.chunking_kwargs)
text_elements: list[Element] = [to_text_element(element, document_meta) for element in chunked_other_elements]
I didn't want to duplicate this code in both classes nor introduce a third "in-between" class so I came up with this idea. Happy to change it if you have a better suggestion how to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would definitely change this - it's a very surprising behavior and you don't want unexpected surprises from your code. Even though both classes that currently inherit from UnstructuredDefaultProvider
need this behavior, the base class should be ready to be subclassed any time for other reasons. It shouldn't contain code specific to functionality that is not related to its intended behavior and definitely shouldn't magically switch to it for all of its children.
This is an important principle: child classes are downstream from the base class and they can know and use the specifics of the base class. On the other hand, the base class is upstream from its children. It cannot know when it will be subclassed and for what reason, it cannot predict how many children it has and what functionality this children are intended to have. As a user of ragbits
, I should be able to subclass UnstructuredDefaultProvider
in order to tweak some specific behavior, without everything breaking with a cryptic "UnstructuredDefaultProvider doesn't support image conversion" exception.
Seems that UnstructuredPdfProvider
and UnstructuredImageProvider
are similar because they both deal with images - the only difference is where they get the images from.
You can reflect that in the inheritance structure. Maybe pdf provider is just an image provider enhanced with the ability to find images in PDFs? Than the inheritance could look like that:
UnstructuredDefaultProvider -> UnstructuredImageProvider -> UnstructuredPdfProvider
If this doesn't seem correct to you in this case, alternatively you could have an intermediate (possibly abstract) class that adds image-related functionality, inherits from UnstructuredDefaultProvider
and is a base for both UnstructuredImageProvider
and UnstructuredPdfProvider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that turned out to be a good idea, now pdf extends the images and there are only two small functions that needed to be implemented
@@ -70,13 +71,16 @@ def __init__( | |||
variable will be used. | |||
api_server: The API server URL to use for the Unstructured API. If not specified, the | |||
UNSTRUCTURED_SERVER_URL environment variable will be used. | |||
use_api: whether to use unstructured API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more detailed documentation would be helpful - what is the alternative that will be used when use_api
is set to False
?
async def _to_image_element( | ||
self, element: UnstructuredElement, document_meta: DocumentMeta, document_path: Path | ||
) -> ImageElement: | ||
image_coordinates = extract_image_coordinates(element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unpacking in a way that the coordinates have meaningful names might make the code clearer (when I see image_coordinates[0]
I'm not sure what I'm looking at)
) | ||
|
||
|
||
def set_or_raise(name: str, value: Optional[str], env_var: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name, the description, and the actual behavior don't match, at least when it comes to my mental model.
- I would expect a function named
set_or_raise
to set a value and if it can't - to raise an exception. I also wouldn't know that it has anything to do with environmental variables - I would expect a function described as "Checks if given environment variable is set and returns it or raises an error" to simply look for an env variable and if it's not present - to raise an exception
- In reality this is more like a fallback mechanism - first it tries to use the value given as an argument, if it's not present it tries to get it from the environment, and if it's not present there it throws an exception. The argument called "name" is not actually used anywhere other than the error message, and the error message makes it clear that the function is specifically about checking arguments. I would have never gotten all this without reading the code.
Maybe rename to something like check_required_argument(value: Optional[str], arg_name: str, fallback_env: str)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a function that I simply moved from another class, it came without documentation so I wrote something myself, but I guess it wasn't enough. Changed to your suggestion.
|
||
def __init__(self, model_name: str, model_options: LiteLLMOptions): | ||
self.model_name = model_name | ||
self.model: Optional[LiteLLM] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we usually refer to LLM
objects as llm
, not model
to_text_element, | ||
) | ||
|
||
DEFAULT_LLM_IMAGE_SUMMARIZATION_MODEL = "gpt-4o-mini" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocker for this PR but maybe this should use our built-in core_config.default_llm_factory
functionality? Or if not (because there is no guarantee that default_llm_factory
returns LLM that supports images) maybe we should have a separate config option for default visual LLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue for that, as discussed.
No description provided.