Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a basic HTML compressor #956

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions hakyll.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Library
Hakyll.Web.CompressCss
Hakyll.Web.Feed
Hakyll.Web.Html
Hakyll.Web.Html.Compress
Hakyll.Web.Html.RelativizeUrls
Hakyll.Web.Meta.JSONLD
Hakyll.Web.Meta.OpenGraph
Expand Down
2 changes: 2 additions & 0 deletions lib/Hakyll.hs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ module Hakyll
, module Hakyll.Web.CompressCss
, module Hakyll.Web.Feed
, module Hakyll.Web.Html
, module Hakyll.Web.Html.Compress
, module Hakyll.Web.Html.RelativizeUrls
, module Hakyll.Web.Meta.JSONLD
, module Hakyll.Web.Meta.OpenGraph
Expand Down Expand Up @@ -55,6 +56,7 @@ import Hakyll.Main
import Hakyll.Web.CompressCss
import Hakyll.Web.Feed
import Hakyll.Web.Html
import Hakyll.Web.Html.Compress
import Hakyll.Web.Html.RelativizeUrls
import Hakyll.Web.Meta.JSONLD
import Hakyll.Web.Meta.OpenGraph
Expand Down
101 changes: 101 additions & 0 deletions lib/Hakyll/Web/Html/Compress.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE RecordWildCards #-}

--------------------------------------------------------------------------------
-- | This module exposes a function to compress the HTML output.
--
-- The compression is very basic, shaving off about 1-3% of a typical HTML output,
-- and it works as follows:
--
-- * Comments are removed.
-- * Several consecutive whitespaces are replaced by a single one, unless within a <pre> tag.
-- * Within a <pre> tag, @n@ consecutive whitespaces are replaced by a single @\t@ character.
-- This is useful if a page is heavy on code listings.
-- Don't forget to add @tab-size: n@ to your CSS!
--
-- Any of these steps can be disabled, see 'CompressHtmlOpts'.

module Hakyll.Web.Html.Compress
( CompressHtmlOpts(..)
, def
, compressHtml
, compressHtmlCompiler
) where

import Data.Char
import Data.Default
import qualified Data.Set as S
import Hakyll.Core.Compiler
import Hakyll.Core.Item
import Hakyll.Web.Html
import Text.HTML.TagSoup

-- | The configuration for the HTML compression.
data CompressHtmlOpts = CompressHtmlOpts
{ choRemoveComments :: Bool -- ^ Whether to remove comments.
, choCompressWhitespace :: Bool -- ^ Whether to remove excessive whitespaces.
, choTabSize :: Maybe Int -- ^ Replace this many spaces in <pre> with @\t@ (if 'choCompressWhitespace' is set).
}

instance Default CompressHtmlOpts where
def = CompressHtmlOpts
{ choRemoveComments = True
, choCompressWhitespace = True
, choTabSize = Nothing
}

-- | Compiler form of 'compressHtml'.
compressHtmlCompiler :: CompressHtmlOpts -> Item String -> Compiler (Item String)
compressHtmlCompiler opts item = pure $ compressHtml opts <$> item

-- | Compresses an HTML string according to the given configuration.
compressHtml :: CompressHtmlOpts -> String -> String
compressHtml CompressHtmlOpts{ .. } = withTagList go
where
go = foldr (.) id
$ [ f
| (True, f) <- [ (choRemoveComments, removeComments)
, (choCompressWhitespace, compressWS choTabSize)
]
]

removeComments :: [Tag String] -> [Tag String]
removeComments = filter (not . isTagComment)

compressWS :: Maybe Int -> [Tag String] -> [Tag String]
compressWS maybeTabSize = go mempty
where
go stack =
\case [] -> []
(tag@(TagClose n) : rest) -> tag : go (S.delete n stack) rest
(tag@(TagOpen n _) : rest) -> tag : go (S.insert n stack) rest
(tag@(TagText text) : rest)
-- all spaces within a <pre> are important, but we can replace them with tabs
| "pre" `S.member` stack -> case maybeTabSize of
Nothing -> tag : go stack rest
Just tabSize -> TagText (collapseIntoTabs (tabSize - 1) text) : go stack rest
| otherwise -> let text' = collapseSpaces text
in case text' of
[] -> go stack rest
Minoru marked this conversation as resolved.
Show resolved Hide resolved
_ -> TagText text' : go stack rest
(tag : rest) -> tag : go stack rest

collapseSpaces :: String -> String
collapseSpaces = go
where
go [] = []
go [c] = [c]
go (c1 : c2 : rest)
| isSpace c1 && isSpace c2 = go (c2 : rest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make a distinction between various kinds of spaces, most notably no-break spaces. I can't find the reference in the spec, but browsers definitely avoid collapsing consecutive no-break spaces. Authors could rely on this behaviour, so can you please make it so no-breaking spaces are preserved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks for catching this! I was considering non-breaking spaces only as &nbsp;, and didn't think about isSpace swallowing those as well.

I can add a check against c1 not being one of U+00A0, U+2007, U+202F, or U+2060. I think that should do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there are dozens of kinds of spaces in Unicode, so I think it's better to be restrictive here: check that c1 and c2 are one of space, \t, \v, \n, \r, or \f. This seems like a safe subset that can be expanded later.

| otherwise = c1 : go (c2 : rest)

collapseIntoTabs :: Int -> String -> String
collapseIntoTabs n = go
where
go [] = []
go (' ':cs)
| (pref, rest) <- splitAt n cs
, pref == pat = '\t' : go rest
go (c:cs) = c : go cs

pat = replicate n ' '