Page MenuHomePhabricator

Add Index and Page namespaces to $wgContentNamespaces
Closed, ResolvedPublic

Description

Now that T39483 has been fixed, it is possible to apply some settings to the
namespaces.
By definition, Index and Page namespaces are content namespaces; applying this setting will make the configuration cleaner and the chances of mistakes slighter.

See Also:
T48196: Dump stats: wikisource stats are much too low
T46320: Clean Page and Index namespaces configuration for Wikisource

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:04 AM
bzimport added a project: ProofreadPage.
bzimport set Reference to bz52709.
bzimport added a subscriber: Unknown Object (MLST).

This needs more discussion on Wikisource wikis?

(In reply to Glaisher from comment #1)

This needs more discussion on Wikisource wikis?

No: this is not a site request. Most/all wikisources already do so, but it's easy to forget it for new wikis.

This definitely needs attention as the creation of orWS seems to have been missed in the setting of ''wgContentNamespaces' for Cirrus Search. So if the defaults cannot be fixed, we will need a separate bug to get the content spaces at orWS added to the search for content.

As a separate point that to get a default setup for this harder when we don't have a default/static namespace set for the WSes. If yes, then we may wish to add that bug as a blocker to this.

I could have swore they finally made ns-250 & ns-252 the default namespaces for Index: and Page: just about week or two ago.

(won't be the first time I was wrong either)

nor the first time that I missed something. I look forward to seeing other comment about this.

nor the first time that I missed something. I look forward to seeing other comment about this.

Found it... T74525

....I think that It only means that it'll be 250-252 unless overridden for the wiki (all wikis currently having different ids will continue to have that).

should we do this within the extension itself or just on Wikisources?

should we do this within the extension itself or just on Wikisources?

The extension. The two namespaces definitely fit the general definition of "content namespace" from the docs; until proven otherwise, I think most wikis need to set them as content namespaces and will be grateful for the reduced configuration burden.

This looks to be related to ProofreadPage only as the WSes would seem to have this set as a default? Or is it that this still needs to get done for WSes as any new WSes need to have configuration added separately until this is done?

This looks to be related to ProofreadPage

Yes, see my January 10 comment.

This can be done without doing T74525 first but it looks like we currently have several Wikisource projects without Page/Index namespaces as content namespaces (meaning we'll have to rebuild search index and run updateArticleCount.php (this is done monthly so probably no need to worry much about this) for these wikis.

Wikis without page or index namespace set as content namespaces currently.

angwikisource
azwikisource
bewikisource
bswikisource
cywikisource
eowikisource
fiwikisource
fowikisource
glwikisource
guwikisource
htwikisource
iswikisource
jawikisource
knwikisource
liwikisource
ltwikisource
mkwikisource
mrwikisource
orwikisource
sahwikisource
sawikisource
skwikisource
tawikisource
thwikisource
yiwikisource
zh_min_nanwikisource
test2wiki
bgwikisource
bnwikisource
cswikisource
hewikisource
kowikisource
nlwikisource
srwikisource
trwikisource

Change 240639 had a related patch set uploaded (by Glaisher):
Add Index and Page namespaces to $wgContentNamespaces by default

https://gerrit.wikimedia.org/r/240639

Change 240640 had a related patch set uploaded (by Glaisher):
Remove Page and Index namespaces from $wgContentNamespaces

https://gerrit.wikimedia.org/r/240640

We need someone to rebuild the search index for the wikis listed above when ProofreadPage patch is merged and reaches Wikisources. I don't know who can do this. Maybe @demon or @EBernhardson can help?

Excellent, I had forgotten that I had pushed for enWS to have its ContentNamespaces added separately, so having it automated is a good step forward and a more mature approach.

Change 240639 merged by jenkins-bot:
Add Index and Page namespaces to $wgContentNamespaces by default

https://gerrit.wikimedia.org/r/240639

Once that is rolled out, I would suggest that we then wish to remove the manual configurations in
https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php

...
// Note that changing this for wikis with CirrusSearch will remove pages in the
// affected namespace from search results until a full reindex is completed.
'wgContentNamespaces' => array(
 ...
'+arwikisource' => array( 102, 104 ),
'+aswikisource' => array( 102, 104, 106 ), // T45129, T72464
'+bgwikisource' => array( 100 ),
'+bnwikisource' => array( 100 ),
'+brwikisource' => array( 100, 102, 104 ),
'+cawikisource' => array( 102, 104, 106 ),
'+cswikisource' => array( 100 ),
'+dawikisource' => array( 102, 104, 106 ),
'+dewikisource' => array( 102, 104 ),
'+elwikisource' => array( 100, 102 ),
'+enwikisource' => array( 102, 104, 106, 114 ), // T52007
'+eswikisource' => array( 102, 104 ),
'+etwikisource' => array( 102, 104, 106 ),
'+fawikisource' => array( 102, 104 ),
'+frwikisource' => array( 102, 104, 112 ),
'+hewikisource' => array( 100, 106, 108, 110 ), // T98709
'+hrwikisource' => array( 100, 102, 104 ),
'+huwikisource' => array( 100, 104, 106 ),
'+hywikisource' => array( 100, 104, 106 ),
'+idwikisource' => array( 100, 102, 104 ),
'+itwikisource' => array( 102, 108, 110 ),
'+kowikisource' => array( 100 ),
'+lawikisource' => array( 102, 104, 106 ),
'+mlwikisource' => array( 100, 104, 106 ),
'+nlwikisource' => array( 102 ),
'+nowikisource' => array( 102, 104, 106 ),
'+plwikisource' => array( 100, 102, 104 ),
'+ptwikisource' => array( 102, 104, 106 ),
'+rowikisource' => array( 102, 104, 106 ), // Follow-up for T31190
'+ruwikisource' => array( 104, 106 ),
'+slwikisource' => array( 100, 104 ),
'+sourceswiki' => array( 104, 106 ),
'+srwikisource' => array( 100 ),
'+svwikisource' => array( 104, 106, 108 ),
'+tewikisource' => array( 102, 104, 106 ),
'+trwikisource' => array( 100 ),
'+ukwikisource' => array( 102, 114, 250, 252 ), // T52561, T53684
'+vecwikisource' => array( 100, 102, 104 ),
'+viwikisource' => array( 102, 104, 106 ),
'+zhwikisource' => array( 102, 104, 106, 114 ), // T66127
...

They are predominantly 104/106, though not exclusively, and this will assist in the harmonisation. (T74525)

should we do this within the extension itself or just on Wikisources?

The extension. The two namespaces definitely fit the general definition of "content namespace" from the docs; until proven otherwise, I think most wikis need to set them as content namespaces and will be grateful for the reduced configuration burden.

Whoa... back-up a second. I can reluctantly accept the notion of the Index: namespace being classified as a "content namespace" but hardly pages found in the Page: namespace. Normally, everything associated with what amounts to "a single work" found in the Page: namespace is ultimately transcluded to the main (or more recently the //Translation://) namespace(s) to meet that "single work" premise in the end.

Plus, contributions made to a Page: namespace page reflects one of four possible states of "flux" as it relates to transcription and formatting (e.g. Proofread page) against ever static content only unlike a Wikipedia article where the state of "flux" comes from contributions that are refinements to forever-changing content.

If the issue is purely the lack of one or both of these namespaces as part of default search results, the answer surely can't be to "double count" the same [always being transcluded] content by reclassifying what currently amounts to draft or work space only?

This doesn't change the default search namespaces. That's controlled by another configuration variable.

Why Index and Page is $wgContentNamespaces? They not!

I disagree. If you look at $wgContentNamespaces at MediaWiki, you will see
that it fits the definition and, more importantly, the desired
functionality.

Why Index and Page is $wgContentNamespaces? They not!

For future reference, please elaborate in such comments and provide arguments.
Arguments help a discussion, short "I don't like it" comments help less. Thank you! :)

I disagree. If you look at $wgContentNamespaces at MediaWiki, you will see
that it fits the definition and, more importantly, the desired
functionality.

Good argument - where exactly does the Manual:$wgContentNamespaces page state that a "page" used in the end for no other purpose than transclusion to the mainspace equate to being content worthy? All that page talks about is being tracking and/or statistic 'based'; again to what end?... to be able to "be searched"? No thanks - I don't want search results of partially transcribed works - or worse; partially un-proofread ones (does anyone?)

To me, the Page: namespace is no more than a temporary work space -- a custom space. Its premise follows that 'custom nuance' as inferred on this page in....

A custom namespace can be used to hold content that should not be shown on the search results page, for example pages that are used only for transclusion.

We're not really changing anything here for most Wikisources. Page/Index namespaces are already in $wgContentNamespaces for most of the Wikisources. We're just moving that config from Wikimedia-specific configuration to the extension to make it more manageable. I don't think there has been any issue with page and index namespaces being set as content namespaces at those wikis until now..?

... I don't think there has been any issue with page and index namespaces being set as content namespaces at those wikis until now..?

Dimes to dollars not more than 3 or 4 Wikisources even know what they have or why. I'm strictly concerned with en.WS regardless.

This needs more discussion on Wikisource wikis?

Because? See the above.

Please explain why we need to reclassify a custom workspace where the end product is used for nothing else BUT transclusion to the main namespace and how that is "content".

Dimes to dollars not more than 3 or 4 Wikisources even know what they have or why.

Multiple Wikisources had specific discussions and requests about this, you can probably re-find the discussions about it from WS:COORD and related updates I made in 2011.

I disagree. If you look at $wgContentNamespaces at MediaWiki, you will see
that it fits the definition and, more importantly, the desired
functionality.

Index just a table of Contents; page is a workspace, similar draft namespace.

Disagree. Both are permanent pages, draft is not.

The pages should be included in edit countsb theyshould be counted statistically. It maintains our validation data, the pages contain content. Sure that this content should be lesser priority than main ns, but it still should be findable and counted.

My reasoning for findability was expressed in my discussion about the biographical dictionaries. Those pages are pertinent to search irrespective of proofread status, especially at their rate of proofreading. They require visibility for referencing.

Disagree. Both are permanent pages, draft is not.

So? The point is neither space contains a full work -- they only help facilitate the eventual creation of "a single work" in the main namespace -- in our case via transclusion.

The pages should be included in edit countsb they should be counted statistically.

To what end? Some works are comprised of a handful of Page:s while others run into the hundreds; unless all the pages (i.e. the entire content) are organized, transcribed, proofread and then transcluded to the main namespace, there is no finished work -- just a work in progress.

It maintains our validation data, the pages contain content. Sure that this content should be lesser priority than main ns, but it still should be findable and counted.

Correction: pages in the Page: namespace contain portions of content. Only when these portions are combined to reflect the entire content (e.g. are transcluded into a single main namespace title) do we have a hostable- & search- worthy work.

My reasoning for findability was expressed in my discussion about the biographical dictionaries. Those pages are pertinent to search irrespective of proofread status, especially at their rate of proofreading. They require visibility for referencing.

Sorry, I know not of any discussion about 'the biographical dictionaries' [pointer?] but I don't see how a specific type of work changes anything since the process for all works, regardless of type, is pretty much the same.

Disagree. Both are permanent pages, draft is not.

So? The point is neither space contains a full work -- they only help facilitate the eventual creation of "a single work" in the main namespace -- in our case via transclusion.

The pages should be included in edit countsb they should be counted statistically.

To what end? Some works are comprised of a handful of Page:s while others run into the hundreds; unless all the pages (i.e. the entire content) are organized, transcribed, proofread and then transcluded to the main namespace, there is no finished work -- just a work in progress.

It maintains our validation data, the pages contain content. Sure that this content should be lesser priority than main ns, but it still should be findable and counted.

Correction: pages in the Page: namespace contain portions of content. Only when these portions are combined to reflect the entire content (e.g. are transcluded into a single main namespace title) do we have a hostable- & search- worthy work.

My reasoning for findability was expressed in my discussion about the biographical dictionaries. Those pages are pertinent to search irrespective of proofread status, especially at their rate of proofreading. They require visibility for referencing.

Sorry, I know not of any discussion about 'the biographical dictionaries' [pointer?] but I don't see how a specific type of work changes anything since the process for all works, regardless of type, is pretty much the same.

It would seem that you have a different interpretation of "content", it doesn't specify complete works, I have never argued such.

Content as defined by MW is at
https://www.mediawiki.org/wiki/Manual:$wgContentNamespaces

If you have an issue with the configuration then it may be better to pull the conversation back from Phabricator and return it to the Wikisources. This ticket simply applied (standardised) to the remaining wikis what has long been applied to English Wikisource and numbers of others.

Re my commentary about biographical dictionaries, it was at Scriptorium on English Wikisource.

If you have an issue with the configuration then it may be better to pull the conversation back from Phabricator and return it to the Wikisources.

Yes, please. Individual wikis can always ask their own, different configuration.

It would seem that you have a different interpretation of "content", it doesn't specify complete works, I have never argued such.
Content as defined by MW is at
https://www.mediawiki.org/wiki/Manual:$wgContentNamespaces

No but you keep pointing to that "definition" which ultimately has to do with the "counting" of articles (Wikipedia mainspace titles); for us, the equivalent is generally thought of as works (Wikisource mainspace titles).

What is so useful about inflating the count of 'real' content (titles in Author, Project & any other useful namespaces aside) with the addition of 'partial' content?... So the same exact content can be counted twice -- once in the Page: ns as a partial stand-alone and again when its part of the entire range of Page:s transcluded to the main ns in a finished work? What am I missing here?

If you have an issue with the configuration then it may be better to pull the conversation back from Phabricator and return it to the Wikisources. This ticket simply applied (standardised) to the remaining wikis what has long been applied to English Wikisource and numbers of others.

Can't say -- I'm left with questions because I don't see because its been that way since... or because others are doing it that way... as justification of the premise to begin with.

Re my commentary about biographical dictionaries, it was at Scriptorium on English Wikisource.

Thanks. I'll look for it there.

Change 240640 merged by jenkins-bot:
Remove Page and Index namespaces from $wgContentNamespaces

https://gerrit.wikimedia.org/r/240640

Glaisher claimed this task.

Change has been on all wikis since Thursday and the indices were rebuilt yesterday. wmf-config and ProofreadPage has also been updated just now. If specific wikis want to opt out, they can go through the regular Wikimedia-Site-requests process.

Perhaps my comment about the importance of nsIndex and nsPage could be surprising, but IMHO they are the true content pages, while transcluded ns0 text are merely one of many possible derived text. They are the true digitalization of the specific edition, while ns0 transcluded text is something like a new, "original" edition of the work. NsPage is a NPOV kind of digitalization, while ns0 transclusion is not.