Revised RFC with open questions
In HTML5, whitespace is considered significant since CSS rules can affect rendering. This affects wikitext parsing because the parser needs to figure out how to handle whitespace in source wikitext. As of today, the PHP parser doesn't do anything special with whitespace in most cases (with caveats that I'll skip here). But the parser tests do mostly treat whitespace as being significant. However, since the core PHP parser is backed by Tidy, we end up having to look at the combined effect of (PHP parser + Tidy) . Tidy strips whitespace in *all* HTML tags. While this is buggy behavior from a HTML5 point of view, editors have come to rely on some aspects of this whitespace stripping. Most notably, CSS for horizontal lists (hlist CSS) used in isolation and in navboxes rely on whitespace in nested lists to have been stripped out.
In a post-Tidy world of Remex and Parsoid that are based on HTML5, there is no automatic whitespace stripping anymore. This affects rendering of hlists and navboxes as a result. There are at least 3 possibilities in front of us
- get editors to fix their wikitext by eliminating whitespace where it is significant -- this proposal has the drawback of being too invasive and requires regular editors to be aware of how some other editors might style that content, especially when the nested lists come through template parameters. This is acceptable if the impact is limited, but at first blush, it appears that this is used widely.
- rely on javascript to strip whitespace for these inline-styled lists -- this is potentially acceptable but doesn't work if javascript is turned off
- make the core wikitext spec whitespace-insensitive for all or for some subset of the native wikitext constructs.
At a first cut, it would be useful for an RFC discussion to evaluate the merits of these proposals -- in reality proposal 2 vs proposal 3. I am equally in favour of both 2 and 3.
Assuming proposal 3 is the favoured approach, here are further decisions that need to be made. There are 3 sub-proposals:
- 3a. All wikitext constructs AND html tags will be whitespace insensitive (caveat: except existing whitespace behavior in parser functions) -- I personally do not favour this proposal because this introduces an egregious inconsistency with the HTML5 spec .. i.e. html tags in wikitext documents behave differently compared to everywhere else, and also can introduce subtle problems when html content from elsewhere is pasted.
- 3b. Only native wikitext constructs (non-html tags) will be whitespace insensitive - so * # : ; {| |} | ! || !! '' ''' [[..]]
- 3c. Whitespace within these native wikitext constructs will be trimmed - so * # : ; {| |} | ! || !! = |+ . Effectively, this means that table cells, table captions, table headings, list items, and headings will have their whitespace trimmed.
I think only proposals 3b and 3c above are really viable. I personally prefer proposal 3c.
Summary
It would be useful for the TechCom to weigh in on the two open questions both by itself, and via an IRC discussion. We want to find out any concerns and unresolved issues with these proposals.
Original Description (with context)
In T155634: Tidy strips whitespace after HTML tags AND adds newlines between HTML tags, we discovered that both Parsoid and Tidy replacements introduce rendering differences with the existing rendering because of Tidy's HTML4-era behavior of stripping white-space from HTML tags. i.e. <some-html-tag> foo </some-html-tag> renders as <some-html-tag>foo</some-html-tag> after it goes through Tidy. In the HTML5 landscape, this is buggy / broken behavior because CSS rules like display:inline or white-space:nowrap will render the two forms differently.
Our initial inclination was to break this Tidy behavior and require editors to fix pages that use the white-space sensitive CSS rules so that they get the desired rendering that Tidy provides. While sensible, courtesy @Izno and @Quiddity, two competing concerns that have surfaced. (1) There are some heavily used templates like hlist and navbox that would require fixing up a lot of pages across all wikis, and automated fixups can be a little bit more tricky. Even if done, there is the following concern which is: (2) Editors seem to prefer adding white-space in their native wikitext markup for readability reasons.
Given this situation, one option would be to treat native *block* wikitext constructs (lists, headings, tables) as being insensitive to leading/trailing whitespace. So, this whitespace will be stripped as part of parsing and will not make it to the final serialized HTML output. However, *inline* wikitext constructs (italics, bold, links) will continue to treat whitespace as sensitive. Separately, we will continue to treat HTML tags as whitespace sensitive and whitespace in there will not be stripped. That would be reaching too far.
This does not affect how editors write / use markup. This is a transparent step to ameliorate rendering differences with Parsoid and a Tidy replacement. We think this can be done in both Parsoid and PHP parser relatively easily.
However, I am filing this ticket to gather any concerns / problems that we might be overlooking. This task will not block the pilot deployment of the Tidy replacement. But, a final disabling of Tidy will be blocked on resolving this ticket one way or another.