Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set request parameters for MathpixPDFLoader #12773

Closed
juliuskittler opened this issue Nov 2, 2023 · 3 comments
Closed

Add option to set request parameters for MathpixPDFLoader #12773

juliuskittler opened this issue Nov 2, 2023 · 3 comments
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases

Comments

@juliuskittler
Copy link

juliuskittler commented Nov 2, 2023

Feature request

Status Quo

Currently, the MathpixPDFLoader sets its data property as follows:

    @property
    def data(self) -> dict:
        options = {"conversion_formats": {self.processed_file_format: True}}
        return {"options_json": json.dumps(options)}

However, there are a variety of additional useful parameters that can be defined in the MathPix API calls: https://docs.mathpix.com/#request-parameters-6 For instance, this includes enable_tables_fallback, numbers_default_to_math, enable_spell_check etc.

It would be great if these parameters could be specified as keyword arguments directly in the MathpixPDFLoader.

Motivation

For instance, tables do not always get parsed properly in PDFs and therefore in some cases it can make sense to set enable_tables_fallback=True.

Your contribution

TBD

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Nov 2, 2023
Copy link

dosubot bot commented Nov 2, 2023

🤖

Hi @juliuskittler, good to see you again!

Thank you for your feature request. I agree that adding the ability to set request parameters for the MathpixPDFLoader would be a useful enhancement to the LangChain framework.

Based on your suggestion, we can modify the send_pdf method in the MathpixPDFLoader class to accept a dictionary of parameters and include these in the POST request to the Mathpix API. Here's a possible implementation:

def send_pdf(self, params: Optional[Dict[str, Any]] = None) -> str:
    with open(self.file_path, "rb") as f:
        files = {"file": f}
        response = requests.post(
            self.url, headers=self._mathpix_headers, files=files, data=self.data, params=params
        )
    response_data = response.json()
    if "pdf_id" in response_data:
        pdf_id = response_data["pdf_id"]
        return pdf_id
    else:
        raise ValueError("Unable to send PDF to Mathpix.")

In this code, params is a dictionary that contains the request parameters you want to set. These parameters are then included in the POST request to the Mathpix API by passing them to the requests.post method.

You can then use this method like this:

params = {
    'enable_tables_fallback': True,
    'numbers_default_to_math': True,
    'enable_spell_check': True
}
pdf_id = loader.send_pdf(params)

This will send a POST request to the Mathpix API with the specified parameters.

Please note that this is just a suggestion and the actual implementation may vary based on the LangChain codebase and the specific requirements of the project.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Best,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

baskaryan added a commit that referenced this issue Nov 29, 2023
…13950)

- **Description:** Support providing whatever extra parameters you want
to the Mathpix PDF loader API request.
  - **Issue:** #12773
  - **Dependencies:** None

---------

Co-authored-by: Bagatur <[email protected]>
Copy link

dosubot bot commented Feb 1, 2024

Hi, @juliuskittler,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you opened requested the addition of an option to set request parameters for the MathpixPDFLoader, allowing users to customize API calls for improving the parsing of PDFs. In response, a detailed suggestion was provided on how to implement this feature, including modifying the send_pdf method in the MathpixPDFLoader class to accept a dictionary of parameters and include them in the POST request to the Mathpix API. The suggestion also included example usage of the proposed method.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 1, 2024
@baskaryan
Copy link
Collaborator

Resolved in #13950

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases
Projects
None yet
Development

No branches or pull requests

2 participants