2 Nick's web site: white-space normalization for Nokogiri.
4 Copyright © 2019, 2022 Nick Bowler
6 Nokogiri's pretty-printer seems a bit weird. Regardless of the indentation
7 setting, if an element has no child text nodes then it will be pretty-
8 printed. This works by adding arbitrary whitespace to that element, and
9 then all of its children are eligible to be pretty-printed.
11 If an element has any text nodes at all, then it is not pretty-printed and
12 neither are any of its descendents.
14 In general, adding or removing whitespace from an XHTML document is unsafe
15 (changes the meaning) around span-level elements, but it is OK around other
18 These templates exploit the Nokogiri behaviour in two ways:
20 - by explicitly stripping whitespace wherever it is safe to do so, attempting
21 to allow pretty printing as much as possible without changing the meaning
24 - by explicitly adding text nodes to suppress pretty printing in
25 situations where it would otherwise change the meaning of the document.
27 The text nodes which are added consist of U+2060 word joiner characters.
28 These should be removed from the final document in a separate pass.
30 This program is free software: you can redistribute it and/or modify
31 it under the terms of the GNU General Public License as published by
32 the Free Software Foundation, either version 3 of the License, or
33 (at your option) any later version.
35 This program is distributed in the hope that it will be useful,
36 but WITHOUT ANY WARRANTY; without even the implied warranty of
37 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
38 GNU General Public License for more details.
40 You should have received a copy of the GNU General Public License
41 along with this program. If not, see <https://www.gnu.org/licenses/>
43 <xsl:stylesheet version='1.0'
44 xmlns='http://www.w3.org/1999/xhtml'
45 xmlns:xhtml='http://www.w3.org/1999/xhtml'
46 xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
47 xmlns:f='http://draconx.ca/my-functions'>
49 <xsl:import href='functions.xsl' />
52 Adding arbitrary whitespace to <pre> is bad, so we inject zero-width non-
53 breaking spaces to prevent this. This will render fine but the spaces
54 should be removed before final output to avoid problems with copy+paste.
56 <xsl:template match='xhtml:pre'>
58 <xsl:apply-templates select='node()|@*' />
59 <xsl:text>⁠</xsl:text>
64 Likewise, adding spaces between consecutive span-level elements where
65 none existed before won't go over well.
67 <xsl:template name='glue-preceding-span'>
68 <xsl:if test='f:element-is-span(preceding-sibling::node()[1])'>
69 <xsl:text>⁠</xsl:text>
73 <xsl:template match='*[f:element-is-span()]'>
74 <xsl:call-template name='glue-preceding-span' />
76 <xsl:apply-templates select='node()|@*' />
78 <!-- avoid breaking within a span element -->
79 <xsl:text>⁠</xsl:text>
85 Manually strip whitespace-only text nodes so the pretty printer can do its
86 thing on remaining elements.
88 <xsl:template match='text()[normalize-space(.) = ""]'>
90 <!-- preserve anything according to xml:space -->
91 <xsl:when test='ancestor::*[@xml:space][1][@xml:space="preserve"]'>
94 <!-- preserve anything under <pre> -->
95 <xsl:when test='ancestor::xhtml:pre'><xsl:copy /></xsl:when>
96 <!-- preserve whitespace which is the only child node of an element -->
97 <xsl:when test='count(../node()) = 1'><xsl:copy /></xsl:when>
98 <!-- preserve whitespace between consecutive span-level elements
99 which have at least one non-whitespace sibling text element -->
100 <xsl:when test='f:element-is-span(preceding-sibling::node()[1])
101 and f:element-is-span(following-sibling::node()[1])
102 and ../text()[normalize-space(.) != ""]'>
108 <!-- Clean up whitespace where harmless to do so -->
109 <xsl:template match='xhtml:p/node()[1][self::text()]'>
110 <xsl:value-of select='f:strip-leading()' />
112 <xsl:template match='xhtml:p/node()[position()=last()][self::text()]'>
113 <xsl:value-of select='f:strip-trailing()' />