Page MenuHomePhabricator

No syntax highlight for large JavaScript/CSS/Lua/API pages
Open, Stalled, LowPublic

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Another practical use case would be https://en.wikipedia.org/wiki/MediaWiki:Gadget-popups.js , which is 250k of code on one page or https://en.wikipedia.org/wiki/User:Cacycle/wikEd.js , which is a whooping 650k. I'm not sure if these are the master versions of these scripts, or built from some saner form. (One thing I expected to hit the limit but doesn't is Twinkle, which is reasonably spread over several files/pages.)

(edit: mid-air collision with MZMcBride)

Those are the master versions =/

The 100k limit also applies to large modules like https://en.wikipedia.org/wiki/Module:Citation/CS1 and https://en.wikipedia.org/wiki/Module:Convert. Both of these have also been copied to and are in use on other wikis. The old version of whatever highlighting code we had worked. The new and improved version should also work.

He7d3r renamed this task from No syntax highlight for large .js pages to No syntax highlight for large JavaScript/CSS/Lua pages.Jun 30 2015, 10:57 AM

The new and improved version should also work.

While a nice thing to strive for, it's not a goal in itself. The goal of having a library that was being supported and in active development was much higher and if there are some small trade offs that we have to make, then so be it.

I vaguely remember that ori said that save times for large modules is significant higher with pygments. So removing this limit might increase save times for those pages beyond reasonable limits. Is that correct @ori ?

ori subscribed.

I vaguely remember that ori said that save times for large modules is significant higher with pygments. So removing this limit might increase save times for those pages beyond reasonable limits. Is that correct @ori ?

Yep.

I vaguely remember that ori said that save times for large modules is significant higher with pygments. So removing this limit might increase save times for those pages beyond reasonable limits.

What kind of page save time increase are we talking about? Writing a large article with a lot of links and templates also takes more time (to save, parse, and render the page), but users are typically willing to make this trade-off and it's easily explained.

The error reporting here isn't adequate, in my opinion. Maybe auto-adding yet another tracking category would be a reasonable solution here? That would at least give some indication in the user interface of what's going on.

I did some unscientific measurements on a local installation of MediaWiki, looking at the value of wgBackendResponseTime after purging a .js page (curl -s -L "http://localhost/w/index.php/...?action=purge" | grep BackendResponseTime), repeating each test a few times. This was done inside a VM on a rather slow machine with unoptimized MediaWiki installation, so it's likely to be several times faster running on WMF infrastructure. For example, MediaWiki:Gadget-popups.js has wgBackendResponseTime of ~300 ms right now (with Pygments enabled, but not actually highlighting things).

I changed the Pygments version of the code not to separately cache the highlighting results, to make the purging actually redo the highlighting (otherwise I'd have to edit the page every time), and set $wgGroupPermissions['*']['purge'] = true; to allow anonymous purge.

Time (no highlighting)Time (GeSHi)Time (Pygments)
MediaWiki:Gadget-popups.js (250 kB)~700~1900~2800
User:Cacycle/wikEd.js (650 kB)~300~7100~11000

So, according to my unscientific measurements, Pygments is at worst two times slower for large pages than GeSHi. I'm fairly sure the servers can take it :) (especially with the aforementioned additional caching of the Pygments version, making it impossible to generate large load by doing action=purge repetitively like in this test – one would have to edit a page, which would be immediately noticeable to everyone that something is amiss).

Change 223053 had a related patch set uploaded (by Bartosz Dziewoński):
Remove the 100 kB code size limit for highlighting

https://gerrit.wikimedia.org/r/223053

So, according to my unscientific measurements, Pygments is at worst two times slower for large pages than GeSHi.

The Geshi project is no longer maintained. It's implementation encourages bad performance.

These statements do not necessarily contradict each other :) Pygments needs only one CSS stylesheet for all languages, while GeSHi needs one stylesheet per language. So while actual highlighting might be slower (the difference is probably small for most cases, only visible to naked eye on stupidly large pages like these), the Pygments version requires less code to be sent to the user and is lighter on ResourceLoader (only requires one module to be defined rather than hundreds), hopefully resulting in quicker-loading cached pages.

So long as pygments respects the timeouts that we set for external programs ($wgMaxShellTime, $wgMaxShellWallClockTime), then someone shouldn't be able to use this as a DoS beyond what we accept for normal pages.

Instead of https://gerrit.wikimedia.org/r/223053, I'd prefer that the code keeps enforcing a limit that's easy to configure, and we choose one that is big enough for most of our use cases. Then if we see it being abused, we can lower it quickly.

He7d3r renamed this task from No syntax highlight for large JavaScript/CSS/Lua pages to No syntax highlight for large JavaScript/CSS/Lua/API pages.Jul 7 2015, 12:09 PM
He7d3r updated the task description. (Show Details)

So long as pygments respects the timeouts that we set for external programs ($wgMaxShellTime, $wgMaxShellWallClockTime), then someone shouldn't be able to use this as a DoS beyond what we accept for normal pages.

Hmm, it currently doesn't. We use the Kzykhys Pygments wrapper, which uses Symfony ProcessBuilder for shelling out, which presumably has its own limits (at a quick glance there's some sort of 60 seconds timeout). It might be configurable.

Instead of https://gerrit.wikimedia.org/r/223053, I'd prefer that the code keeps enforcing a limit that's easy to configure, and we choose one that is big enough for most of our use cases. Then if we see it being abused, we can lower it quickly.

Done, amended the patch.

The error reporting here isn't adequate, in my opinion. Maybe auto-adding yet another tracking category would be a reasonable solution here? That would at least give some indication in the user interface of what's going on.

Auto-adding a tracking category is happening now when the input is "too large." However, a generic tracking category is being used to cover both invalid language selections and overly large inputs, making the user experience even more awful.

It took me a few minutes to figure out that this edit was made not because <source lang="text"> is invalid markup, but because the input is larger than the secret and unreasonable threshold of 102,400 bytes.

Change 223053 abandoned by Bartosz Dziewoński:
Raise the code size limit for highlighting from 100 kB to 2000 kB

Reason:
I would like to pursue this at some point in the future.

https://gerrit.wikimedia.org/r/223053

I am not sure to be in the right task.
For some days the edit panel to change Lua modules has lost its highlight style and its lines numbers.
Perhaps in fr.wikisource.Mediawiki:
1.29.0-wmf.18 (rMW9ab639f85dc6) UTC-20170403-15:20
1.29/wmf.19 (rMWe4784c6b8dd6) UTC-20170407-20:52
1.29.0-wmf.19 (rMWe4784c6b8dd6) UTC-20170407-20:52
1.29.0-wmf.19 (rMW676532717e58) UTC-20170408-00:13
My code size is 1.2 Mo on disk, 580 k UTF8, stable for 3 months.
Also when I reduce it at 70% the result is the same: no lines numbers, no highlight.

To reproduce, edit any module of any size in fr.wikisource.org or vi.wikipedia.org
use last Firefox 52.0.2 (64 bits) or Chromium 57.0.2987.98
on Ubuntu 16.04 LTS updated each day, and ASUS N56JR,
also in Firefox 52.0.2 (32 bits), Windows 8 on another ASUS type.

Have you the same bug?
Where could this effect coming from?

  • My user:vector.css or common.css? But I have them only in fr.wikisource.org and not in vi.wikipedia.org
  • A step to prepare T121470 Central Global Repository?
  • A step to prepare T156048 syntax highlighting?
  • A change in my Module:Central itself, which forms many viewers, always inside <div>...</div> ? But I edit with the edit-panel first, up in the page, and the result of the module below, later. Also these <div> tags are, in the page, in other tags which protect them to interfere.
  • A mix of these origins?

Small or large modules edit panels lost their syntax highlighting and line numbers.
Tech News: 2017-15 could explain that by replacement of HTML_Tidy or Extension:ParserMigration, see https://meta.wikimedia.org/wiki/Tech/News/2017/15

@Rical Are you talking about the edit mode or the read mode. Those are two completely different highlighters. This ticket is about the read mode.

From my first post here, I always talk about the edit state.

In this state, the content of the edit panel is always in continuity with other tags.
In other words, a search at page level searches in the title, in the edit panel, in options of edit and in the content of the preview page where the module insert its results.

This happens before the first edit and between succesive edits without record.

I see also that Anomie merged the task T162908 which could be linked with my posts.

The MW version 1.29.0-wmf.21 (rMWe826acd58b26) UTC-20170425-15:31 repair this bug for Lua modules.
I don't close it because I don't know what happens for JavaScript/CSS/API pages.

@ori created https://people.wikimedia.org/~ori/bad.js as a test case, and 'Python' failed badly (likely Pygments with all of its regexes, but maybe one of its dependencies). That is an Upstream failure. See his comment on https://gerrit.wikimedia.org/r/#/c/223053/ . But that was two years ago. Perhaps this should be retested, and a bug files Upstream if there still a problem.

I've just retested this bug.

By using mediawiki-vagrant:
MediaWiki 1.31.0-alpha (827c6bf) 23:57, 5 January 2018
SyntaxHighlight 2.0 (578bb79) 11:38, 2 January 2018

Steps to reproduce:

  1. Create a page in the mediawiki, that contains the same wikitext from https://fr.wiktionary.org/wiki/Utilisateur:LmaltierBot/pt1.py .
  2. Submit the page.
  3. The code doesn't get highlighted.

Screenshot at 2018-01-06 08-54-00.png (633×1 px, 114 KB)

And yeah, like have said at the previous comments, the reason is const HIGHLIGHT_MAX_LINES = 1000; and const HIGHLIGHT_MAX_BYTES = 102400; are defined, this causes codes that contain more than 1000 lines or have size than 100kB don't get highlighted.

Increasing the limit and resubmit the page makes the code get syntaxhighlighted again.

Screenshot at 2018-01-06 09-02-55.png (639×1 px, 137 KB)

And about https://people.wikimedia.org/~ori/bad.js created by @ori , pygments can syntaxhighlight it under 10 seconds now, at http://pygments.org/demo/6689276/ .

vagrant@vagrant:/vagrant$ ./mediawiki/extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -V
Pygments version 2.2.0, (c) 2006-2017 by Georg Brandl.
vagrant@vagrant:/vagrant$ wc bad.js
      0   17390 1000000 bad.js
vagrant@vagrant:/vagrant$ time { ./mediawiki/extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -o bad.html bad.js; }

real	0m8.078s
user	0m7.380s
sys	0m0.136s

@jayvdb yes I can.

I've copied and pasted the bad.js so the filesize increases 2 times from before.

vagrant@vagrant:/vagrant$ du bad.js
1956	bad.js
vagrant@vagrant:/vagrant$ time { ./mediawiki/extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -o bad.html bad.js; } | mpstat 1 12
Linux 3.16.0-4-amd64 (vagrant) 	01/06/2018 	_x86_64_	(4 CPU)

07:56:16 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:56:17 AM  all   29.60    0.00    0.27    0.27    0.00    0.53    0.00    0.00    0.00   69.33
07:56:18 AM  all   20.11    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   79.62
07:56:19 AM  all   18.26    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   81.74
07:56:20 AM  all   32.32    0.00    0.28    0.00    0.00    0.28    0.00    0.00    0.00   67.13
07:56:21 AM  all   16.67    0.00    0.56    0.00    0.00    0.00    0.00    0.00    0.00   82.78
07:56:22 AM  all   30.06    0.00    0.28    0.00    0.00    0.28    0.00    0.00    0.00   69.38
07:56:23 AM  all   20.27    0.00    0.53    0.00    0.00    0.00    0.00    0.00    0.00   79.20
07:56:24 AM  all   17.13    0.00    1.93    0.00    0.00    0.55    0.00    0.00    0.00   80.39
07:56:25 AM  all   35.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   65.00
07:56:26 AM  all   19.41    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   80.32
07:56:27 AM  all   34.14    0.00    0.54    0.00    0.00    0.00    0.00    0.00    0.00   65.32
07:56:28 AM  all   17.31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   82.69
Average:     all   24.23    0.00    0.41    0.02    0.00    0.14    0.00    0.00    0.00   75.20

real	0m14.305s
user	0m13.680s
sys	0m0.188s

But when I try to submit it to mediawiki, I got Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 40466569 bytes) in /vagrant/mediawiki/includes/parser/Sanitizer.php on line 1553.

When the file size was still around 1MB, I was still able to submit it to mediawiki.

This is the vagrant CPU usage when I trying to submit bad.js (with 2MB size) to mediawiki:

vagrant@vagrant:/vagrant$ mpstat 1 50
Linux 3.16.0-4-amd64 (vagrant) 	01/06/2018 	_x86_64_	(4 CPU)

05:53:25 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
05:53:27 AM  all   17.76    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   81.97
05:53:28 AM  all   32.08    0.00    2.96    1.62    0.00    0.27    0.00    0.00    0.00   63.07
05:53:29 AM  all   35.08    0.00    0.52    0.00    0.00    0.00    0.00    0.00    0.00   64.40
05:53:30 AM  all   18.70    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   81.03
05:53:31 AM  all   35.96    0.00    0.26    0.26    0.00    0.26    0.00    0.00    0.00   63.25
05:53:32 AM  all   27.32    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   72.41
05:53:33 AM  all   18.48    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   81.25
05:53:34 AM  all   38.08    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   61.64
05:53:35 AM  all   18.46    0.00    0.28    0.00    0.00    0.00    0.00    0.00    0.00   81.27
05:53:36 AM  all   20.74    0.00    3.46    1.60    0.00    0.27    0.00    0.00    0.00   73.94
05:53:37 AM  all   29.57    0.00    2.69    1.34    0.00    0.00    0.00    0.00    0.00   66.40
05:53:38 AM  all    2.80    0.00    1.02    1.78    0.00    0.00    0.00    0.00    0.00   94.40
^C
Average:     all   24.54    0.00    1.05    0.56    0.00    0.07    0.00    0.00    0.00   73.79

Change 852282 had a related patch set uploaded (by Alex44019; author: Alex44019):

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Make the code size limit for highlighting configurable

https://gerrit.wikimedia.org/r/852282

Patch above makes the limit configurable via $wgSyntaxHighlightMaxLines and $wgSyntaxHighlightMaxBytes variables. This shouldn't affect WMF wikis much, but gives more control to external sites, at the very least until the issue is resolved properly. Please review.

Change 852282 merged by jenkins-bot:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Make the code size limit for highlighting configurable

https://gerrit.wikimedia.org/r/852282

TheDJ changed the task status from Open to Stalled.Nov 10 2023, 9:19 AM
TheDJ triaged this task as Low priority.

This is now a known limitation due to performance. While it effects quite a few files, there are good fallbacks and only the more tech savvy editors are likely to encounters this. There is no planned work on this issue at the moment, but it can be revisited when new ideas and insights develop.