-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support all non exportable google workspace docs #572
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we dont need the docs_export_mimetype we can just have a conversion dict that maps the supported_mimetypes to their respective export mime type.
Then just lookup the available mimetypes in exportLinks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR enhances Google Drive document handling by improving support for non-exportable Google Workspace documents. Here's a concise summary of the key changes:
- Added static
docs_export_mimetype
dictionary in/daras_ai_v2/gdrive_downloader.py
for mapping Google Workspace MIME types to export formats - Modified
gdrive_download()
to handle export links more gracefully with optional parameter defaulting to empty dict - Updated
doc_url_to_file_metadata()
in/daras_ai_v2/vector_search.py
to handle export links as None instead of empty dict for better type clarity - Added proper export link handling for Google Drive files vs non-Google Drive files
These changes improve the handling of Google Workspace documents while maintaining backward compatibility.
💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!
2 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile
update type definition in class FileMetadata(models.Model):
name = models.TextField(default="", blank=True)
etag = models.CharField(max_length=255, null=True)
mime_type = models.CharField(max_length=255, default="", blank=True)
total_bytes = models.PositiveIntegerField(default=0, blank=True)
export_links: dict[str, str] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
(updates since last review)
Based on the recent changes and avoiding repetition from previous reviews, here's a focused summary of the latest modifications:
Changed FileMetadata model to improve type safety and export link handling in Google Workspace integration.
- Changed
export_links
infiles/models.py
from instance to class-level attribute with explicit type annotation - Added nullable support for
export_links
(nowdict[str, str] | None
instead of empty dict) - Added index on FileMetadata fields for improved query performance
- Renamed constant to
DOCS_EXPORT_MIMETYPES
following Python naming conventions
2 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
(updates since last review)
Based on the recent changes and avoiding repetition from previous reviews, here's a focused summary of the key concerns:
The PR introduces potential issues in Google Drive document handling and data management:
- Moving
export_links
to class-level attribute infiles/models.py
creates risk of data leaks between FileMetadata instances since class attributes are shared - Silent fallback in
gdrive_download()
when mime_type not found in DOCS_EXPORT_MIMETYPES could cause runtime errors - Missing error handling for failed exports in
gdrive_downloader.py
- Type safety concerns with nullable
export_links
not being properly validated
These changes require careful review of the shared state and error handling implementations.
2 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile
Q/A checklist
You can visualize this using tuna:
To measure import time for a specific library:
To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:
Legal Boilerplate
Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.