Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML #120

Closed
oschildt opened this issue Sep 20, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@oschildt
Copy link

oschildt commented Sep 20, 2024

Describe the feature

I have implemented 2 new useful methods in ZugferdDocumentPdfReader for extraction of the attached XML.

It is useful for checking whether the software issuing the invoice has placed all information to the XML, which is available in the PDF.
We faced the situations that a software places some data into the PDF but not into the XML of the Zugferd invoice.

class ZugferdDocumentPdfReader
{
    public static function extractXMLFromFromFile(string $pdfFilename) : ?string
    {
        if (!file_exists($pdfFilename)) {
            throw new ZugferdFileNotFoundException($pdfFilename);
        }

        $pdfContent = file_get_contents($pdfFilename);

        if ($pdfContent === false) {
            throw new ZugferdFileNotReadableException($pdfFilename);
        }

        return static::extractXMLFromContent($pdfContent);
    }

    public static function extractXMLFromContent(string $pdfContent) : ?string
    {
        $pdfParser = new PdfParser();
        $pdfParsed = $pdfParser->parseContent($pdfContent);
        $filespecs = $pdfParsed->getObjectsByType('Filespec');

        $attachmentFound = false;
        $attachmentIndex = 0;
        $embeddedFileIndex = 0;
        $returnValue = null;

        try {
            foreach ($filespecs as $filespec) {
                $filespecDetails = $filespec->getDetails();
                if (in_array($filespecDetails['F'], static::ATTACHMENT_FILENAMES)) {
                    $attachmentFound = true;
                    break;
                }
                $attachmentIndex++;
            }

            if (true == $attachmentFound) {
                /**
                 * @var array<\Smalot\PdfParser\PDFObject>
                 */
                $embeddedFiles = $pdfParsed->getObjectsByType('EmbeddedFile');
                foreach ($embeddedFiles as $embeddedFile) {
                    if ($attachmentIndex == $embeddedFileIndex) {
                        $returnValue = $embeddedFile->getContent();
                        break;
                    }
                    $embeddedFileIndex++;
                }
            }
        } catch (\Exception $e) {
            $returnValue = null;
        }

        return $returnValue;
    }
}
@oschildt oschildt added the enhancement New feature or request label Sep 20, 2024
@horstoeko
Copy link
Owner

HI @oschildt,

Many thanks for the issue. I'll take a look at it. Please check again whether your code suggestion fits into the current implementation. I had already adjusted something for you in this regard.

Best regards

@horstoeko horstoeko modified the milestones: 11/2024, 10/2024 Sep 20, 2024
@oschildt
Copy link
Author

oschildt commented Sep 20, 2024 via email

@horstoeko
Copy link
Owner

Hi @oschildt,

Would it perhaps be possible for you to provide a PullRequest based on the current implementation?

@oschildt
Copy link
Author

oschildt commented Sep 20, 2024 via email

@horstoeko
Copy link
Owner

Hi @oschildt,

many thanks for that. My time is extremely limited at the moment. It's important to remember that I'm doing the project exclusively privately - I also have a regular job, which unfortunately keeps me very busy at the moment.

Please don't forget to write tests for your implementation if necessary. Unfortunately, this is very often forgotten... :-)

Best regards

horstoeko pushed a commit that referenced this issue Sep 24, 2024
@horstoeko
Copy link
Owner

horstoeko commented Sep 24, 2024

Hi @oschildt,

i implemented some additional methods. Please have a look at this Pull Request and give me feedback.

Kind regards

horstoeko pushed a commit that referenced this issue Sep 24, 2024
…racting the XML content -> better method names
horstoeko added a commit that referenced this issue Sep 24, 2024
…racting the XML content

#120 ZugferdDocumentPdfReader now contains additional methods for extracting the XML content
@oschildt
Copy link
Author

oschildt commented Sep 25, 2024 via email

@horstoeko
Copy link
Owner

HI @oschildt,

Nice to hear from you and thank you for your response I will make a release next days.

Kind regards

Repository owner locked as resolved and limited conversation to collaborators Sep 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants