Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix read write tmpfile on windows #323

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Blaxzter
Copy link

This is a low impact fix for reading still open temporary files on windows.
#303

@@ -351,11 +352,14 @@ def process_data_with_model(
) -> DocumentLayout:
"""Processes pdf file in the form of a file handler (supporting a read method) into a
DocumentLayout by using a model identified by model_name."""
with tempfile.NamedTemporaryFile() as tmp_file:
tmp_file.write(data.read())
tmp_file.flush() # Make sure the file is written out
Copy link
Contributor

@pawel-kmiecik pawel-kmiecik Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of creating a separate file, could you try with this smaller change:

with tempfile.NamedTemporaryFile() as tmp_file:
        tmp_file.write(data.read())
        tmp_file.flush() 
        tmp_file.seek(0)
        layout = process_file_with_model(...)

Unfortunately I don't have a windows machine to check myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, probably it's not this issue - I'd try to create tempfile with options:

tempfile.NamedTemporaryFile(delete=True, delete_on_close=False)

According to windows recommendations from here:
https://docs.python.org/3/library/tempfile.html

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, probably it's not this issue - I'd try to create tempfile with options:

tempfile.NamedTemporaryFile(delete=True, delete_on_close=False)

According to windows recommendations from here: https://docs.python.org/3/library/tempfile.html

Note that this only exists for Python 3.12, the docs for even Python 3.11 do not have the delete_on_close parameter: https://docs.python.org/3.11/library/tempfile.html#tempfile.NamedTemporaryFile

The reason I'm commenting is because I'm also running into this issue on windows

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And wound it defeat the purpose of a temp file to leave it on device after things have been done with it?
I guess you could do something like this if you dont want to use a temp folder

def process_data_with_model(
    data: BinaryIO,
    model_name: Optional[str],
    **kwargs,
) -> DocumentLayout:
    """Processes pdf file in the form of a file handler (supporting a read method) into a
    DocumentLayout by using a model identified by model_name."""
    file_name = ''
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        tmp_file.write(data.read())
        tmp_file.flush()  # Make sure the file is written out
        file_name = tmp_file.name
        
    try:
        layout = process_file_with_model(
            file_name,
            model_name,
            **kwargs,
        )
    finally:
        os.remove(file_name)

    return layout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants