Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pptx uploaded on google docs #540

Merged
merged 7 commits into from
Dec 20, 2024
Merged

fix: pptx uploaded on google docs #540

merged 7 commits into from
Dec 20, 2024

Conversation

milovate
Copy link
Contributor

Q/A checklist

  • If you add new dependencies, did you update the lock file?
poetry lock --no-update
  • Run tests
ulimit -n unlimited && ./scripts/run-tests.sh
  • Do a self code review of the changes - Read the diff at least twice.
  • Carefully think about the stuff that might break because of this change - this sounds obvious but it's easy to forget to do "Go to references" on each function you're changing and see if it's used in a way you didn't expect.
  • The relevant pages still run when you press submit
  • The API for those pages still work (API tab)
  • The public API interface doesn't change if you didn't want it to (check API tab > docs page)
  • Do your UI changes (if applicable) look acceptable on mobile?
  • Ensure you have not regressed the import time unless you have a good reason to do so.
    You can visualize this using tuna:
python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

@milovate milovate self-assigned this Nov 30, 2024
@milovate milovate requested a review from devxpy November 30, 2024 03:43
@devxpy
Copy link
Member

devxpy commented Dec 2, 2024

I think we talked about exporting the file as presentation in the first place here

elif "presentation" in f.path.segments:
mime_type = "application/pdf"
ext = ".pdf"

@devxpy
Copy link
Member

devxpy commented Dec 2, 2024

I think we talked about exporting the file as presentation in the first place here

elif "presentation" in f.path.segments:
mime_type = "application/pdf"
ext = ".pdf"

Ok nevermind, I don't think that works for pptx files in slides. But we can be smart about it by looking at the mime type returned by gdrive_metadata (which we already have here)

@devxpy
Copy link
Member

devxpy commented Dec 2, 2024

In addition, we can also add exportLinks to the gdrive_metadata to export slides larger than 10mb https://stackoverflow.com/a/59168288/7061265

@milovate milovate force-pushed the pptx-gdocs branch 5 times, most recently from 510dd9e to 855adce Compare December 17, 2024 11:31


def service_request(
service, file_id: str, f: furl, mime_type: str, retried_request=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused param


request, mime_type = service_request(service, file_id, f, mime_type)
file_bytes, mime_type = download_blob_file_content(
service, request, file_id, f, mime_type, export_links
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the download_blob_file_content to be a separate function here

def download_from_exportlinks(f: furl) -> bytes:
try:
r = requests.get(f)
f_bytes = r.content
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use raise_for_status() like we do everywhere else in code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this likely doesnt support downloading private docs without auth, I may be wrong

Comment on lines 110 to 138
if (
mime_type
== "application/vnd.openxmlformats-officedocument.presentationml.presentation"
):
# logger.debug(f"Downloading {str(f)!r} using export links")
f_url_export = export_links.get(mime_type, None)
if f_url_export:

f_bytes = download_from_exportlinks(f_url_export)
else:
request = service.files().get_media(
fileId=file_id,
supportsAllDrives=True,
)
downloader = MediaIoBaseDownload(file, request)

done = False
while done is False:
_, done = downloader.next_chunk()
# print(f"Download {int(status.progress() * 100)}%")
f_bytes = file.getvalue()

else:
done = False
while done is False:
_, done = downloader.next_chunk()
# print(f"Download {int(status.progress() * 100)}%")
f_bytes = file.getvalue()

Copy link
Member

@devxpy devxpy Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire logic looks a bit repetitive. You wanna look at the diff really closely and figure out if the changes make sense.
I think you only need to change how the export google docs part was working:

image

But I can't really make sure by looking at the multiple diverging code paths here. The export_media() option is not needed if you use exportLinks I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! , yes this make way more sense, i wanted to only use export_links for pptx without breaking the existing logic hence the mess.

@milovate milovate merged commit f2662b7 into master Dec 20, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants