Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support all non exportable google workspace docs #572

Merged
merged 5 commits into from
Dec 23, 2024
Merged

Conversation

milovate
Copy link
Contributor

@milovate milovate commented Dec 20, 2024

Q/A checklist

  • If you add new dependencies, did you update the lock file?
poetry lock --no-update
  • Run tests
ulimit -n unlimited && ./scripts/run-tests.sh
  • Do a self code review of the changes - Read the diff at least twice.
  • Carefully think about the stuff that might break because of this change - this sounds obvious but it's easy to forget to do "Go to references" on each function you're changing and see if it's used in a way you didn't expect.
  • The relevant pages still run when you press submit
  • The API for those pages still work (API tab)
  • The public API interface doesn't change if you didn't want it to (check API tab > docs page)
  • Do your UI changes (if applicable) look acceptable on mobile?
  • Ensure you have not regressed the import time unless you have a good reason to do so.
    You can visualize this using tuna:
python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

@milovate milovate requested a review from devxpy December 20, 2024 13:31
@milovate milovate self-assigned this Dec 20, 2024
Copy link
Member

@devxpy devxpy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we dont need the docs_export_mimetype we can just have a conversion dict that maps the supported_mimetypes to their respective export mime type.

Then just lookup the available mimetypes in exportLinks

@milovate milovate requested a review from devxpy December 20, 2024 15:04
devxpy
devxpy previously approved these changes Dec 23, 2024
@devxpy devxpy self-requested a review December 23, 2024 03:25
@devxpy devxpy dismissed their stale review December 23, 2024 03:25

dict in params

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR enhances Google Drive document handling by improving support for non-exportable Google Workspace documents. Here's a concise summary of the key changes:

  • Added static docs_export_mimetype dictionary in /daras_ai_v2/gdrive_downloader.py for mapping Google Workspace MIME types to export formats
  • Modified gdrive_download() to handle export links more gracefully with optional parameter defaulting to empty dict
  • Updated doc_url_to_file_metadata() in /daras_ai_v2/vector_search.py to handle export links as None instead of empty dict for better type clarity
  • Added proper export link handling for Google Drive files vs non-Google Drive files

These changes improve the handling of Google Workspace documents while maintaining backward compatibility.

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

2 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

daras_ai_v2/gdrive_downloader.py Outdated Show resolved Hide resolved
@devxpy
Copy link
Member

devxpy commented Dec 23, 2024

update type definition in FileMetadata and remove __init__

class FileMetadata(models.Model):
    name = models.TextField(default="", blank=True)
    etag = models.CharField(max_length=255, null=True)
    mime_type = models.CharField(max_length=255, default="", blank=True)
    total_bytes = models.PositiveIntegerField(default=0, blank=True)

    export_links: dict[str, str] | None = None

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

(updates since last review)

Based on the recent changes and avoiding repetition from previous reviews, here's a focused summary of the latest modifications:

Changed FileMetadata model to improve type safety and export link handling in Google Workspace integration.

  • Changed export_links in files/models.py from instance to class-level attribute with explicit type annotation
  • Added nullable support for export_links (now dict[str, str] | None instead of empty dict)
  • Added index on FileMetadata fields for improved query performance
  • Renamed constant to DOCS_EXPORT_MIMETYPES following Python naming conventions

2 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

files/models.py Show resolved Hide resolved
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

(updates since last review)

Based on the recent changes and avoiding repetition from previous reviews, here's a focused summary of the key concerns:

The PR introduces potential issues in Google Drive document handling and data management:

  • Moving export_links to class-level attribute in files/models.py creates risk of data leaks between FileMetadata instances since class attributes are shared
  • Silent fallback in gdrive_download() when mime_type not found in DOCS_EXPORT_MIMETYPES could cause runtime errors
  • Missing error handling for failed exports in gdrive_downloader.py
  • Type safety concerns with nullable export_links not being properly validated

These changes require careful review of the shared state and error handling implementations.

2 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

@devxpy
Copy link
Member

devxpy commented Dec 23, 2024

LGTM

@milovate milovate merged commit 6e72eed into master Dec 23, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants