Skip to content

Commit

Permalink
Merge pull request #120 from Lullabot/misc-improvements
Browse files Browse the repository at this point in the history
Misc improvements
  • Loading branch information
sidkshatriya authored Jul 5, 2016
2 parents be3dd21 + ace24a0 commit e110579
Show file tree
Hide file tree
Showing 12 changed files with 209 additions and 66 deletions.
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,20 @@ The AMP PHP Library is an open source and pure PHP Library that:
- Reports compliance of a whole/partial HTML document with the [AMP HTML specification](https://www.ampproject.org/). We implement an AMP HTML validator in pure PHP to report compliance of an arbitrary HTML document / HTML fragment with the AMP HTML standard. This validator is a ported subset of the [canonical validator](https://github.com/ampproject/amphtml/tree/master/validator) that is implemented in JavaScript
- Specifically, the PHP validator supports tag specification validation, attribute specification validation, CDATA validation, CSS validation, layout validation, template validation and attribute property-value pair validation. It will report tags and attributes that are missing, illegal, mandatory according to spec but not present, unique according to spec but multiply present, having wrong parents or ancestors or children and so forth.
- _Note_: while the AMP PHP library (already) supports many of the features and capabilities of the canonical validator, it is not intended to achieve parity in _every_ respect with the canonical validator. Even _within_ the features we support (e.g. CSS validation) there may be certain validation issues that we don't flag but the canonical validator does.
- Using the feedback given by the in-house PHP validator, the AMP PHP library tries to "correct" some issues found in the HTML to make it more AMP HTML compliant. This would, for example, involve removing:
- Illegal attributes e.g. `style` within `<body>` tag
- Illegal tags e.g. `<script>` within `<body>` tag
- Illegal property value pairs e.g. remove `minimum-scale=hello` from `<meta name="viewport" content="minimum-scale=hello">`
- Using the feedback given by the in-house PHP validator, the AMP PHP library tries to "correct" some issues found in the HTML to make it more AMP HTML compliant. This would, for example, involve:
- Removing illegal attributes e.g. `style` attribute within `<body>` tag
- Removing all kinds of illegal tags e.g. `<script>` within `<body>` tag, a tag with a disallowed ancestor, a duplicate unique tag etc.
- Removing illegal property value pairs e.g. removing `minimum-scale=hello` from `<meta name="viewport" content="minimum-scale=hello">`
- Adding or correcting the tags necessary for a minimally valid AMP document:
- `<head>`, `<body>`, `meta viewport`, `meta charset`, `<style>` and `<noscript>` tags
- The `link rel=canonical` tag if you let the library know the canonical path of the document
- Javascript `<script>` tags for the various AMP components and generic AMP Javascript `<script>` tag
- Boilerplate CSS
- If there are mutually exclusive attributes for a tag, removing all but one of them
- Fixing issues with `amp-img` tags that have problems like inconsistent units, invalid attributes, missing mandatory attributes, invalid implied or specified layouts.
- _Notes_:
- The "correction" of the input HTML to make it more compliant with the AMP HTML standard is currently basic. The library does a decent job of _removing_ bad things but does not _add_ tags, attributes or property-value pairs where it could "fix" things
- The library does a decent job of _removing_ bad things and in a few cases makes some corrections/additions to the HTML. As the library cannot understand the true _intention_ of the user, most of the validation problems in the HTML may need to be fixed manually by the human.
- In general, the library will try to fix validation errors in `<head>` and if its not successful in doing so, _remove_ those tags from `<head>`. Within `<body>` the AMP PHP library is less aggressive and in most cases will _not_ remove the tag from the document if the tag does not validate after it attempts any fixes on it.
- The library needs to be provided with well formed HTML / HTML5. Please don't give it faulty, incorrect html (e.g. non closed `<div>` tags etc). The correction it does is related to AMP HTML standard issues only. Use a HTML tidying library if you expect your HTML to be malformed.
- Converts some non-amp elements to their AMP equivalents automatically
- A `<img>` tag is converted to an `<amp-img>` tag
Expand Down
8 changes: 4 additions & 4 deletions src/Pass/AmpImgFixPass.php
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ function pass()
{
/** @var SValidationError $error */
foreach ($this->validation_result->errors as $error) {
if (in_array($error->code, [ValidationErrorCode::MANDATORY_ATTR_MISSING, ValidationErrorCode::IMPLIED_LAYOUT_INVALID, ValidationErrorCode::SPECIFIED_LAYOUT_INVALID]) &&
if (in_array($error->code, [ValidationErrorCode::INCONSISTENT_UNITS_FOR_WIDTH_AND_HEIGHT, ValidationErrorCode::MANDATORY_ATTR_MISSING, ValidationErrorCode::INVALID_ATTR_VALUE, ValidationErrorCode::IMPLIED_LAYOUT_INVALID, ValidationErrorCode::SPECIFIED_LAYOUT_INVALID]) &&
!$error->resolved &&
!empty($error->dom_tag) &&
$error->dom_tag->tagName == 'amp-img'
Expand All @@ -61,13 +61,13 @@ function pass()
}

$layout = ParsedTagSpec::parseLayout($amp_img_el->attr('layout'));
if ($error->code == ValidationErrorCode::MANDATORY_ATTR_MISSING &&
($layout !== AmpLayoutLayout::RESPONSIVE || !in_array($error->params[0], ['height', 'width']))
if (in_array($error->code, [ValidationErrorCode::INCONSISTENT_UNITS_FOR_WIDTH_AND_HEIGHT, ValidationErrorCode::MANDATORY_ATTR_MISSING, ValidationErrorCode::INVALID_ATTR_VALUE]) &&
($layout !== AmpLayoutLayout::RESPONSIVE || !in_array($error->params[0], ['height', 'width', 'amp-img']))
) {
continue;
}

$success = $this->setAmpImgAttributes($amp_img_el);
$success = $this->setResponsiveImgHeightAndWidth($amp_img_el);
if ($success) {
$error->addActionTaken(new ActionTakenLine('amp-img', ActionTakenType::AMP_IMG_FIX));
$error->resolved = true;
Expand Down
46 changes: 32 additions & 14 deletions src/Pass/ImgTagTransformPass.php
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
namespace Lullabot\AMP\Pass;

use Lullabot\AMP\Utility\ParseUrl;
use Lullabot\AMP\Validate\CssLengthAndUnit;
use Lullabot\AMP\Validate\GroupedValidationResult;
use Lullabot\AMP\Validate\Scope;
use Lullabot\AMP\Utility\ActionTakenLine;
Expand Down Expand Up @@ -74,9 +75,9 @@ function pass()
$new_dom_el = $this->cloneAndRenameDomElement($dom_el, 'amp-img');
$new_el = $el->prev();

$success = $this->setAmpImgAttributes($new_el);
$success = $this->setResponsiveImgHeightAndWidth($new_el);
// We were not able to get the image dimensions, abort conversion.
if(!$success) {
if (!$success) {
$this->addActionTaken(new ActionTakenLine('img', ActionTakenType::IMG_COULD_NOT_BE_CONVERTED, $lineno, $context_string));
// Abort the conversion and remove the new img tag
$new_el->remove();
Expand Down Expand Up @@ -163,21 +164,38 @@ protected function getImageUrl($src)

/**
* @param DOMQuery $el
* @return bool
*/
protected function setAmpImgAttributes(DOMQuery $el)
protected function setResponsiveImgHeightAndWidth(DOMQuery $el)
{
// If height or image is not set, get it from the image
if (!$el->attr('width') || !$el->attr('height')) {
$dimensions = $this->getImageWidthHeight($el->attr('src'));
if ($dimensions !== false) {
$el->attr('width', $dimensions['width']);
$el->attr('height', $dimensions['height']);
return true;
} else {
return false;
}
// Static cache
static $image_dimensions_cache = [];

$wcss = new CssLengthAndUnit($el->attr('width'), false);
$hcss = new CssLengthAndUnit($el->attr('height'), false);

if ($wcss->is_set && $wcss->is_valid && $wcss->is_set && $wcss->is_valid && $wcss->unit == $hcss->unit) {
return true;
}

$src = trim($el->attr('src'));
if (empty($src)) {
return false;
}

if (isset($image_dimensions_cache[$src])) {
$dimensions = $image_dimensions_cache[$src];
} else {
$dimensions = $this->getImageWidthHeight($src);
}

return true;
if ($dimensions !== false) {
$image_dimensions_cache[$src] = $dimensions;
$el->attr('width', $dimensions['width']);
$el->attr('height', $dimensions['height']);
return true;
} else {
return false;
}
}
}
4 changes: 0 additions & 4 deletions src/Pass/StandardFixPass.php
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,9 @@ public function pass()
if (empty($new_attr_value_trimmed)) { // There is nothing here now so we should just remove the attribute
$error->dom_tag->removeAttribute($error->attr_name);
$error->addActionTaken(new ActionTakenLine("In $tag_name.$error->attr_name the \"$error->segment\"", ActionTakenType::PROPERTY_REMOVED_ATTRIBUTE_REMOVED, $error->line));
$error->resolved = true;
} else {
$error->dom_tag->setAttribute($error->attr_name, $new_attr_value);
$error->addActionTaken(new ActionTakenLine("In $tag_name.$error->attr_name the \"$error->segment\"", ActionTakenType::PROPERTY_REMOVED, $error->line));
$error->resolved = true;
}
}

Expand All @@ -111,15 +109,13 @@ public function pass()
) {
$error->dom_tag->removeAttribute($error->attr_name);
$error->addActionTaken(new ActionTakenLine("$tag_name.$error->attr_name", ActionTakenType::ATTRIBUTE_REMOVED, $error->line));
$error->resolved = true;
}

// Tags
if (in_array($error->code, $this->remove_tags_for_codes) && !empty($error->dom_tag)) {
// Remove the offending tag
$error->dom_tag->parentNode->removeChild($error->dom_tag);
$error->addActionTaken(new ActionTakenLine($tag_name, ActionTakenType::TAG_REMOVED, $error->line));
$error->resolved = true;
}

}
Expand Down
2 changes: 1 addition & 1 deletion src/Utility/ActionTakenType.php
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,6 @@ class ActionTakenType
const TAG_REMOVED_FROM_HEAD_AFTER_REVALIDATE_FAILED = 'tag removed from head as it still does not validate. Could not fix tag validation problems.';
const ATTRIBUTE_REMOVED_MUTUALLY_EXCLUSIVE = 'attribute(s) removed as they were mutually exclusive.';
const ISSUE_RESOLVED = 'no further action required as this issue was resolved due to an earlier fix';
const AMP_IMG_FIX = 'tried to fix problems with amp-img by adjusting one or more of: height, width, and setting layout to responsive';
const AMP_IMG_FIX = 'tried to fix problems with amp-img by trying to fetch height, width from image directly and/or setting layout to responsive';
const AMP_IMG_FIX_RESPONSIVE = 'tried to fix problems with amp-img by setting layout to responsive';
}
1 change: 1 addition & 0 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Note: An explicit commit is noted as these files can change.
* `test-data/full-html/link_meta_values.html` https://github.com/ampproject/amphtml/blob/96335e0540532264b3b38d498070f17473692b21/validator/testdata/feature_tests/link_meta_values.html
* `test-data/full-html/mandatory-dimensions.html` https://github.com/ampproject/amphtml/blob/e1aa24df8432963423ee6cec1ce4d57529767e6e/validator/testdata/feature_tests/mandatory_dimensions.html
There is a trivial modification: `</amp-anum>` is changed to `</amp-anim>`. See https://github.com/ampproject/amphtml/issues/3609
Also all `<amp-img>` tags are closed. See https://github.com/ampproject/amphtml/issues/3713
* `test-data/full-html/minimum_valid_amp.html` https://github.com/ampproject/amphtml/blob/96335e0540532264b3b38d498070f17473692b21/validator/testdata/feature_tests/minimum_valid_amp.html
* `test-data/full-html/new_and_old_boilerplate_mixed.html` https://github.com/ampproject/amphtml/blob/3bd74c3915f9e29824eebedbf6f14f6c531a3449/validator/testdata/feature_tests/new_and_old_boilerplate_mixed.html
* `test-data/full-html/new_and_old_boilerplate_mixed2.html` https://github.com/ampproject/amphtml/blob/d17719548ca786ec8c3778743da7f2020c7d0460/validator/testdata/feature_tests/new_and_old_boilerplate_mixed2.html
Expand Down
22 changes: 22 additions & 0 deletions tests/test-data/fragment-html/img-test-fragment.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<!-- Note: this image is in the public domain. https://commons.wikimedia.org/wiki/File:"Birdcatcher"_with_jockey_up.jpg -->

<!-- should transform to amp-img -->
<img src="https://upload.wikimedia.org/wikipedia/commons/e/ee/%22Birdcatcher%22_with_jockey_up.jpg">

<!-- should transform to amp-img with responsive layout, preserving the width and height -->
<img src="https://upload.wikimedia.org/wikipedia/commons/e/ee/%22Birdcatcher%22_with_jockey_up.jpg" width="125" height="96">

<!-- nonexistent image, should refuse to convert to amp-img -->
<img src="https://upload.wikimedia.org/wikipedia/commons/e/ee/non-existent-image1234.jpg">

<!-- should provide layout and height width and make layout responsive -->
<amp-img src="https://upload.wikimedia.org/wikipedia/commons/e/ee/%22Birdcatcher%22_with_jockey_up.jpg"></amp-img>

<!-- since only width exists, overwrite with height and width from original image -->
<amp-img layout="responsive" src="https://upload.wikimedia.org/wikipedia/commons/e/ee/%22Birdcatcher%22_with_jockey_up.jpg" width="500"></amp-img>

<!-- since height is illegal, overwrite with height and width from original image -->
<amp-img layout="responsive" src="https://upload.wikimedia.org/wikipedia/commons/e/ee/%22Birdcatcher%22_with_jockey_up.jpg" width="625" height="auto"></amp-img>

<!-- since units are inconsistent, overwrite with height+width from original image -->
<amp-img layout="responsive" src="https://upload.wikimedia.org/wikipedia/commons/e/ee/%22Birdcatcher%22_with_jockey_up.jpg" width="625rem" height="480"></amp-img>
Loading

0 comments on commit e110579

Please sign in to comment.