fix: pptx uploaded on google docs #540

milovate · 2024-11-30T03:40:32Z

Q/A checklist

If you add new dependencies, did you update the lock file?

poetry lock --no-update

Run tests

ulimit -n unlimited && ./scripts/run-tests.sh

Do a self code review of the changes - Read the diff at least twice.
Carefully think about the stuff that might break because of this change - this sounds obvious but it's easy to forget to do "Go to references" on each function you're changing and see if it's used in a way you didn't expect.
The relevant pages still run when you press submit
The API for those pages still work (API tab)
The public API interface doesn't change if you didn't want it to (check API tab > docs page)
Do your UI changes (if applicable) look acceptable on mobile?
Ensure you have not regressed the import time unless you have a good reason to do so.
You can visualize this using tuna:

python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

devxpy · 2024-12-02T16:28:34Z

I think we talked about exporting the file as presentation in the first place here

gooey-server/daras_ai_v2/gdrive_downloader.py

Lines 159 to 161 in 66a05d7

    
           elif "presentation" in f.path.segments: 
        
               mime_type = "application/pdf" 
        
               ext = ".pdf"

devxpy · 2024-12-02T16:35:21Z

I think we talked about exporting the file as presentation in the first place here

gooey-server/daras_ai_v2/gdrive_downloader.py

Lines 159 to 161 in 66a05d7

elif "presentation" in f.path.segments:

mime_type = "application/pdf"

ext = ".pdf"

Ok nevermind, I don't think that works for pptx files in slides. But we can be smart about it by looking at the mime type returned by gdrive_metadata (which we already have here)

devxpy · 2024-12-02T16:39:02Z

In addition, we can also add exportLinks to the gdrive_metadata to export slides larger than 10mb https://stackoverflow.com/a/59168288/7061265

…r presentations

devxpy · 2024-12-17T12:37:51Z

daras_ai_v2/gdrive_downloader.py

+
+
+def service_request(
+    service, file_id: str, f: furl, mime_type: str, retried_request=False


remove unused param

devxpy · 2024-12-17T12:39:40Z

daras_ai_v2/gdrive_downloader.py

+
+    request, mime_type = service_request(service, file_id, f, mime_type)
+    file_bytes, mime_type = download_blob_file_content(
+        service, request, file_id, f, mime_type, export_links


I don't think we need the download_blob_file_content to be a separate function here

devxpy · 2024-12-17T12:49:42Z

daras_ai_v2/gdrive_downloader.py

+def download_from_exportlinks(f: furl) -> bytes:
+    try:
+        r = requests.get(f)
+        f_bytes = r.content


use raise_for_status() like we do everywhere else in code

this likely doesnt support downloading private docs without auth, I may be wrong

devxpy · 2024-12-17T12:57:40Z

daras_ai_v2/gdrive_downloader.py

+    if (
+        mime_type
+        == "application/vnd.openxmlformats-officedocument.presentationml.presentation"
+    ):
+        # logger.debug(f"Downloading {str(f)!r} using export links")
+        f_url_export = export_links.get(mime_type, None)
+        if f_url_export:
+
+            f_bytes = download_from_exportlinks(f_url_export)
+        else:
+            request = service.files().get_media(
+                fileId=file_id,
+                supportsAllDrives=True,
+            )
+            downloader = MediaIoBaseDownload(file, request)
+
+            done = False
+            while done is False:
+                _, done = downloader.next_chunk()
+                # print(f"Download {int(status.progress() * 100)}%")
+            f_bytes = file.getvalue()
+
+    else:
+        done = False
+        while done is False:
+            _, done = downloader.next_chunk()
+            # print(f"Download {int(status.progress() * 100)}%")
+        f_bytes = file.getvalue()
+


This entire logic looks a bit repetitive. You wanna look at the diff really closely and figure out if the changes make sense.
I think you only need to change how the export google docs part was working:

But I can't really make sure by looking at the multiple diverging code paths here. The export_media() option is not needed if you use exportLinks I believe

ah! , yes this make way more sense, i wanted to only use export_links for pptx without breaking the existing logic hence the mess.

milovate self-assigned this Nov 30, 2024

milovate requested a review from devxpy November 30, 2024 03:43

milovate added 3 commits December 14, 2024 11:07

add error handling for Google Drive downloads and update MIME type fo…

2555cb4

…r presentations

fix :pptx export limit

9909115

fix: gdocs-docx upload

855adce

milovate force-pushed the pptx-gdocs branch 5 times, most recently from 510dd9e to 855adce Compare December 17, 2024 11:31

add export links to download pptx files

a2226b8

devxpy reviewed Dec 17, 2024

View reviewed changes

milovate added 2 commits December 18, 2024 13:05

refactor: export links to handle all google docs

a2b4f18

feat: add author notes

7032cb2

devxpy approved these changes Dec 18, 2024

View reviewed changes

fix: update error handling

cc67f67

milovate merged commit f2662b7 into master Dec 20, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pptx uploaded on google docs #540

fix: pptx uploaded on google docs #540

milovate commented Nov 30, 2024

devxpy commented Dec 2, 2024 •

edited

Loading

devxpy commented Dec 2, 2024 •

edited

Loading

devxpy commented Dec 2, 2024 •

edited

Loading

devxpy Dec 17, 2024

devxpy Dec 17, 2024

devxpy Dec 17, 2024

devxpy Dec 17, 2024

devxpy Dec 17, 2024 •

edited

Loading

milovate Dec 17, 2024



		def service_request(
		service, file_id: str, f: furl, mime_type: str, retried_request=False

fix: pptx uploaded on google docs #540

fix: pptx uploaded on google docs #540

Conversation

milovate commented Nov 30, 2024

Q/A checklist

Legal Boilerplate

devxpy commented Dec 2, 2024 • edited Loading

devxpy commented Dec 2, 2024 • edited Loading

devxpy commented Dec 2, 2024 • edited Loading

devxpy Dec 17, 2024

Choose a reason for hiding this comment

devxpy Dec 17, 2024

Choose a reason for hiding this comment

devxpy Dec 17, 2024

Choose a reason for hiding this comment

devxpy Dec 17, 2024

Choose a reason for hiding this comment

devxpy Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

milovate Dec 17, 2024

Choose a reason for hiding this comment

devxpy commented Dec 2, 2024 •

edited

Loading

devxpy commented Dec 2, 2024 •

edited

Loading

devxpy commented Dec 2, 2024 •

edited

Loading

devxpy Dec 17, 2024 •

edited

Loading