[FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML #120

oschildt · 2024-09-20T09:27:24Z

Describe the feature

I have implemented 2 new useful methods in ZugferdDocumentPdfReader for extraction of the attached XML.

It is useful for checking whether the software issuing the invoice has placed all information to the XML, which is available in the PDF.
We faced the situations that a software places some data into the PDF but not into the XML of the Zugferd invoice.

class ZugferdDocumentPdfReader
{
    public static function extractXMLFromFromFile(string $pdfFilename) : ?string
    {
        if (!file_exists($pdfFilename)) {
            throw new ZugferdFileNotFoundException($pdfFilename);
        }

        $pdfContent = file_get_contents($pdfFilename);

        if ($pdfContent === false) {
            throw new ZugferdFileNotReadableException($pdfFilename);
        }

        return static::extractXMLFromContent($pdfContent);
    }

    public static function extractXMLFromContent(string $pdfContent) : ?string
    {
        $pdfParser = new PdfParser();
        $pdfParsed = $pdfParser->parseContent($pdfContent);
        $filespecs = $pdfParsed->getObjectsByType('Filespec');

        $attachmentFound = false;
        $attachmentIndex = 0;
        $embeddedFileIndex = 0;
        $returnValue = null;

        try {
            foreach ($filespecs as $filespec) {
                $filespecDetails = $filespec->getDetails();
                if (in_array($filespecDetails['F'], static::ATTACHMENT_FILENAMES)) {
                    $attachmentFound = true;
                    break;
                }
                $attachmentIndex++;
            }

            if (true == $attachmentFound) {
                /**
                 * @var array<\Smalot\PdfParser\PDFObject>
                 */
                $embeddedFiles = $pdfParsed->getObjectsByType('EmbeddedFile');
                foreach ($embeddedFiles as $embeddedFile) {
                    if ($attachmentIndex == $embeddedFileIndex) {
                        $returnValue = $embeddedFile->getContent();
                        break;
                    }
                    $embeddedFileIndex++;
                }
            }
        } catch (\Exception $e) {
            $returnValue = null;
        }

        return $returnValue;
    }
}

horstoeko · 2024-09-20T09:40:25Z

HI @oschildt,

Many thanks for the issue. I'll take a look at it. Please check again whether your code suggestion fits into the current implementation. I had already adjusted something for you in this regard.

Best regards

oschildt · 2024-09-20T09:56:40Z

Hi, We have document archiving system. It gathers the documents from many sources - FTP, LAN, Emails, direct upload etc. We have document viewer and E-Rechnung Viewer. For the Zugferd invoices we enable viewing also the embedded XML. It is useful for checking whether the software issuing the Zugferd invoice has placed all information to the XML, which is available in the PDF. We faced the situations that a software places some data into the PDF but not into the XML of the Zugferd invoice. Best regards Oleg

horstoeko · 2024-09-20T10:04:02Z

Hi @oschildt,

Would it perhaps be possible for you to provide a PullRequest based on the current implementation?

oschildt · 2024-09-20T10:11:08Z

Hi, I can do it but later.

horstoeko · 2024-09-20T10:22:45Z

Hi @oschildt,

many thanks for that. My time is extremely limited at the moment. It's important to remember that I'm doing the project exclusively privately - I also have a regular job, which unfortunately keeps me very busy at the moment.

Please don't forget to write tests for your implementation if necessary. Unfortunately, this is very often forgotten... :-)

Best regards

…racting the XML content

horstoeko · 2024-09-24T03:23:11Z

Hi @oschildt,

i implemented some additional methods. Please have a look at this Pull Request and give me feedback.

Kind regards

…racting the XML content -> better method names

…racting the XML content #120 ZugferdDocumentPdfReader now contains additional methods for extracting the XML content

oschildt · 2024-09-25T08:54:38Z

Hi, I see you have implemented the XML extraction and also test units. I have tested it, it works perfect. Regards,

…

------ Original Message ------ From "horstoeko" ***@***.***> To "horstoeko/zugferd" ***@***.***> Cc "Oleg Schildt" ***@***.***>; "Mention" ***@***.***> Date 24.09.2024 05:23:32 Subject Re: [horstoeko/zugferd] [FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML (Issue #120)

Hi @oschildt <https://github.com/oschildt>, i implemented some additional methods. Please have a look at this commt <10fc7d3> and give me feedback. Kind regards — Reply to this email directly, view it on GitHub <#120 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA74NQHNMNCMDUSJ6BJPRKTZYDLLJAVCNFSM6AAAAABORUNI2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGAZTGOBTGE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

horstoeko · 2024-09-25T09:33:39Z

HI @oschildt,

Nice to hear from you and thank you for your response I will make a release next days.

Kind regards

oschildt added the enhancement New feature or request label Sep 20, 2024

oschildt assigned horstoeko Sep 20, 2024

horstoeko modified the milestones: 11/2024, 10/2024 Sep 20, 2024

horstoeko pushed a commit that referenced this issue Sep 24, 2024

#120 ZugferdDocumentPdfReader now contains additional methods for ext…

10fc7d3

…racting the XML content

horstoeko mentioned this issue Sep 24, 2024

#120 ZugferdDocumentPdfReader now contains additional methods for extracting the XML content #124

Merged

horstoeko pushed a commit that referenced this issue Sep 24, 2024

#120 ZugferdDocumentPdfReader now contains additional methods for ext…

c44f617

…racting the XML content -> better method names

horstoeko added a commit that referenced this issue Sep 24, 2024

#120 ZugferdDocumentPdfReader now contains additional methods for ext…

d52dfea

…racting the XML content #120 ZugferdDocumentPdfReader now contains additional methods for extracting the XML content

horstoeko closed this as completed Sep 25, 2024

Repository owner locked as resolved and limited conversation to collaborators Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML #120

[FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML #120

oschildt commented Sep 20, 2024 •

edited

Loading

horstoeko commented Sep 20, 2024

oschildt commented Sep 20, 2024 via email •

edited by horstoeko

Loading

horstoeko commented Sep 20, 2024

oschildt commented Sep 20, 2024 via email •

edited by horstoeko

Loading

horstoeko commented Sep 20, 2024

horstoeko commented Sep 24, 2024 •

edited

Loading

oschildt commented Sep 25, 2024 via email

horstoeko commented Sep 25, 2024

[FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML #120

[FEATURE] new useful methods in ZugferdDocumentPdfReader for extraction of the XML #120

Comments

oschildt commented Sep 20, 2024 • edited Loading

horstoeko commented Sep 20, 2024

oschildt commented Sep 20, 2024 via email • edited by horstoeko Loading

horstoeko commented Sep 20, 2024

oschildt commented Sep 20, 2024 via email • edited by horstoeko Loading

horstoeko commented Sep 20, 2024

horstoeko commented Sep 24, 2024 • edited Loading

oschildt commented Sep 25, 2024 via email

horstoeko commented Sep 25, 2024

oschildt commented Sep 20, 2024 •

edited

Loading

oschildt commented Sep 20, 2024 via email •

edited by horstoeko

Loading

oschildt commented Sep 20, 2024 via email •

edited by horstoeko

Loading

horstoeko commented Sep 24, 2024 •

edited

Loading