Skip to content

Commit

Permalink
Improve parsing and adding support for JsonSerialize
Browse files Browse the repository at this point in the history
  • Loading branch information
nyamsprod committed Sep 28, 2023
1 parent 2696c11 commit d23a7f8
Show file tree
Hide file tree
Showing 6 changed files with 120 additions and 76 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ All Notable changes to `bakame/html-table` will be documented in this file.

- `Parser::tableXpathPosition`
- `Table` class which implements the `TabularDataReader` interface.
- `Parser::includeSections` and `Parser::excludeSections` to improve section parsing.

### Fixed

Expand All @@ -20,7 +21,7 @@ All Notable changes to `bakame/html-table` will be documented in this file.

### Removed

- None
- `Parser::(in|ex)cludeTableFooter` replaced by `Parser::(in|ex)cludeSections`

## [0.2.0](https://github.com/bakame-php/html-table/compare/0.1.0...0.2.0) - 2023-09-26

Expand Down
41 changes: 23 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,15 +94,15 @@ $parser = Parser::new()->tableXPathPosition("//main/div/table");
`Parser::tableXpathPosition` and `Parser::tablePosition` override each other. It is
recommended to use one or the other but not both at the same time.
### defaultCaption
### tableCaption
You can optionnally define a caption for your table if none is present of found during parsing.
You can optionnally define a caption for your table if none is present or found during parsing.
```php
use Bakame\HtmlTable\Parser;
$parser = Parser::new()->defaultCaption('this is a generated caption');
$parser = Parser::new()->defaultCaption(null); // reset the default caption to null
$parser = Parser::new()->tableCaption('this is a generated caption');
$parser = Parser::new()->tableCaption(null); // remove any default caption set
```
### ignoreTableHeader and resolveTableHeader
Expand All @@ -112,8 +112,8 @@ Tells the parser to attempt or not table header resolution.
```php
use Bakame\HtmlTable\Parser;
$parser = Parser::new()->ignoreTableHeader(); // no header table will be calculated
$parser = Parser::new()->resolveTableHeader(3); // will attempt to resolve the table header
$parser = Parser::new()->ignoreTableHeader(); // no table header will be resolved
$parser = Parser::new()->resolveTableHeader(); // will attempt to resolve the table header
```
### tableHeaderPosition
Expand All @@ -124,7 +124,8 @@ Tells where to locate and resolve the table header
use Bakame\HtmlTable\Parser;
use Bakame\HtmlTable\Section;
$parser = Parser::new()->tableHeaderPosition(Section::thead, 3); // no header table will be calculated
$parser = Parser::new()->tableHeaderPosition(Section::thead, 3);
// header is the 4th row in the <thead> table section
```
use the `Bakame\HtmlTable\Section` enum to designate which table section to use to resolve the header
Expand All @@ -137,11 +138,11 @@ enum Section
case thead;
case tbody;
case tfoot;
case none;
case tr;
}
```
If `Section::none` is used, `tr` tags will be used independently of their section.
If `Section::tr` is used, `tr` tags will be used independently of their section.
The second argument is the table header offset; it defaults to `0` (ie: the first row).
### tableHeader
Expand All @@ -153,20 +154,23 @@ related configuration with this one
use Bakame\HtmlTable\Parser;
use Bakame\HtmlTable\Section;
$parser = Parser::new()->tableHeader(['rank', 'team', 'winner']); // no header table will be calculated
$parser = Parser::new()->tableHeader(['rank', 'team', 'winner']);
```
**If you specify a non empty array as the table header, it will take precedence over any other table header related options.**
**If you specify a non-empty array as the table header, it will take precedence over any other table header related options.**
### includeTableFooter and excludeTableFooter
**Because its a tabular data each cell MUST be unique otherwise an exception will be thrown**
Tells whether the footer should be included when parsing the table content.
### includSection and excludeSection
Tells which section should be parsed based on the `Section` enum
```php
use Bakame\HtmlTable\Parser;
use Bakame\HtmlTable\Section;
$parser = Parser::new()->includeTableFooter(); // tfoot is included during parsing
$parser = Parser::new()->excludeTableFooter(3); // tfoot is excluded during parsing
$parser = Parser::new()->includeSection(Section::tfoot); // tfoot is included during parsing
$parser = Parser::new()->excludeSection(Section::tr); // table direct tr children are not included during parsing
```
### ignoreXmlErrors and failOnXmlErrors
Expand Down Expand Up @@ -203,13 +207,14 @@ otherwise an array list is provided.
### Default behaviour
By default, when calling the `Parser::new()` named constructor you will:
By default, when calling the `Parser::new()` named constructor the parser will:
- try to parse the first table found in the page
- expect the table header row to be the first `tr` found in the `thead` section of your table
- include the table `tfoot` section
- exclude the table `thead` section when extracting the table content.
- ignore XML errors.
- no formatter is attached to the parser.
- have no formatter attached.
- have no default caption to used.
### parseHtml and parseFile
Expand Down
85 changes: 51 additions & 34 deletions src/Parser.php
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ final class Parser

/**
* @param array<string> $tableHeader
* @param array<string, int> $includedSections
*/
private function __construct(
private readonly string $expression,
Expand All @@ -56,7 +57,7 @@ private function __construct(
private readonly Section $tableHeaderSection,
private readonly int $tableHeaderOffset,
private readonly bool $throwOnXmlErrors,
private readonly bool $includeTableFooter,
private readonly array $includedSections,
private readonly ?Closure $formatter,
private readonly ?string $caption,
) {
Expand All @@ -71,7 +72,7 @@ public static function new(): self
Section::thead,
0,
false,
true,
[Section::tbody->value => 1, Section::tr->value => 1, Section::tfoot->value => 1],
null,
null,
);
Expand All @@ -90,7 +91,7 @@ public function tableXPathPosition(string $expression): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$this->caption,
),
Expand Down Expand Up @@ -133,7 +134,7 @@ public function tableHeader(array $headerRow): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$this->caption,
),
Expand All @@ -151,7 +152,7 @@ public function ignoreTableHeader(): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$this->caption,
),
Expand All @@ -169,7 +170,7 @@ public function resolveTableHeader(): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$this->caption,
),
Expand All @@ -191,43 +192,49 @@ public function tableHeaderPosition(Section $section, int $offset = 0): self
$section,
$offset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$this->caption,
),
};
}

public function includeTableFooter(): self
public function includeSection(Section $section): self
{
return match ($this->includeTableFooter) {
true => $this,
false => new self(
$includedSections = $this->includedSections;
$includedSections[$section->value] = 1;

return match ($this->includedSections) {
$includedSections => $this,
default => new self(
$this->expression,
$this->tableHeader,
$this->ignoreTableHeader,
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
true,
$includedSections,
$this->formatter,
$this->caption,
),
};
}

public function excludeTableFooter(): self
public function excludeSection(Section $section): self
{
return match ($this->includeTableFooter) {
false => $this,
true => new self(
$includedSections = $this->includedSections;
unset($includedSections[$section->value]);

return match ($this->includedSections) {
$includedSections => $this,
default => new self(
$this->expression,
$this->tableHeader,
$this->ignoreTableHeader,
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
false,
$includedSections,
$this->formatter,
$this->caption,
),
Expand All @@ -245,7 +252,7 @@ public function failOnXmlErrors(): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
true,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$this->caption,
),
Expand All @@ -263,7 +270,7 @@ public function ignoreXmlErrors(): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
false,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$this->caption,
),
Expand All @@ -279,7 +286,7 @@ public function withFormatter(Closure $formatter): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
$formatter,
$this->caption,
);
Expand All @@ -296,14 +303,14 @@ public function withoutFormatter(): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
null,
$this->caption,
),
};
}

public function defaultCaption(?string $caption = null): self
public function tableCaption(?string $caption = null): self
{
return match ($this->caption) {
$caption => $this,
Expand All @@ -314,7 +321,7 @@ public function defaultCaption(?string $caption = null): self
$this->tableHeaderSection,
$this->tableHeaderOffset,
$this->throwOnXmlErrors,
$this->includeTableFooter,
$this->includedSections,
$this->formatter,
$caption,
),
Expand Down Expand Up @@ -458,28 +465,38 @@ private function extractTableContents(DOMXPath $xpath, array $header): Iterator
$query = $xpath->query('//table');
/** @var DOMElement $table */
$table = $query->item(0);
$it = new ArrayIterator();
$iterator = new ArrayIterator();
$header = $this->tableHeader($header)->tableHeader;
$rowSpan = [];
foreach ($table->childNodes as $childNode) {
if (!$childNode instanceof DOMElement) {
continue;
}

$nodeName = strtolower($childNode->nodeName);
if ('tbody' === $nodeName || ('tfoot' === $nodeName && $this->includeTableFooter)) {
$rowSpanSection = [];
foreach ($childNode->childNodes as $tr) {
if (null !== ($record = $this->filterRecord($tr))) {
$it->append($this->formatRecord($this->extractRecord($record, $rowSpanSection), $header));
}
$section = Section::tryFrom(strtolower($childNode->nodeName));
if (!$this->isIncludedSection($section)) {
continue;
}

if (Section::tr === $section && null !== ($record = $this->filterRecord($childNode))) {
$iterator->append($this->formatRecord($this->extractRecord($record, $rowSpan), $header));
continue;
}

$rowSpanSection = [];
foreach ($childNode->childNodes as $tr) {
if (null !== ($record = $this->filterRecord($tr))) {
$iterator->append($this->formatRecord($this->extractRecord($record, $rowSpanSection), $header));
}
} elseif (null !== ($record = $this->filterRecord($childNode))) {
$it->append($this->formatRecord($this->extractRecord($record, $rowSpan), $header));
}
}

return $it;
return $iterator;
}

private function isIncludedSection(?Section $nodeName): bool
{
return array_key_exists($nodeName?->value ?? '', $this->includedSections);
}

private function filterRecord(DOMNode $tr): ?DOMElement
Expand Down
Loading

0 comments on commit d23a7f8

Please sign in to comment.