Hi everyone,
I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Kind Regards,
Hugo Vincent,
Bluewater Systems.
I've been putting placeholder images on a lot of articles on en:wp.
e.g. [[Image:Replace this image male.svg]], which goes to
[[Wikipedia:Fromowner]], which asks people to upload an image if they
own one.
I know it's inspired people to add free content images to articles in
several cases. What I'm interested in is numbers. So what I'd need is
a list of edits where one of the SVGs that redirects to
[[Wikipedia:Fromowner]] is replaced with an image. (Checking which of
those are actually free …
[View More]images can come next.)
Is there a tolerably easy way to get this info from a dump? Any
Wikipedia statistics fans who think this'd be easy?
(If the placeholders do work, then it'd also be useful convincing some
wikiprojects to encourage the things. Not that there's ownership of
articles on en:wp, of *course* ...)
- d.
[View Less]
Hi
Sorry for my English :)
What I need is case insensitive titles. My solution for the problem was to
change collation in mysql from <unf8_bin> to <utf8_general_ci> in table
<page>, for field <page_title>.
But bigger problem with links persists. In my case, if there is an article
<Frank Dreben>, link [[Frank Dreben]] is treated like a link to an existent
article (GoodLink), but link [[frank dreben]] is treated like a link to a
non-existent article, so, …
[View More]this link opens editing of existent article <Frank
Dreben>. What can be fixed for that link [[frank dreben]] to be treated like
a GoodLink?
I've spent some time in Parser.php, LinkCache.php, Title.php, Linker.php,
LinkBatch.php but found nothing useful. The last thing I tried was to do
strtoupper on title every time array of link cache is filled, in
LinkCache.php. I also tried to do strtoupper on title every time data is
fetched from the array.
I've tried to make titles in cache be case insensitive, but it didn't work
out, not sure why - it seems like when links are constructed (parser, title,
linker, etc) only LinkCache methods are used.
Could anybody point a direction to dig in? :)
[View Less]
Hi,
I know a lot of people would love wysiwyg editing functionality, I know a
lot of people on this list are working on such a thing, and I know that a
problem seems to be the "grammar" involved. I heard on the grapevine that,
on this basis, it was "at least a year away". I don't want to know the
specifics of what that grammar is, or what its problems are. :-) But what
I'd like to know is:
* Would this issue be solved quicker by having a few funded developers
working on it? (If so, how many …
[View More]wo/man hours would be needed - what size of
a funding bid would be needed?)
* What does the fckeditor in existence *not* do yet? <
http://mediawiki.fckeditor.net> (If the answer is: "too much - it only does
a tiny handful of things compared to what we need" - well then, that answers
my question. :-))
If you haven't already surmised, this question is from a technical dunce -
answers at that level would be greatly appreciated. :-)
Cheers,
Cormac
[View Less]
<rant>
I'm currently working on the Scott Forseman image donation, cutting
large scanned images into smaller, manually optimized ones.
The category containing the unprocessed images is
http://commons.wikimedia.org/wiki/Category:ScottForesman-raw
It's shameful. Honestly. Look at it. We're the world's #9 top web
site, and this is the best we can do?
Yes, I know that the images are large, both in dimensions
(~5000x5000px) and size (5-15MB each).
Yes, I …
[View More]know that ImageMagick has problems with such images.
But honestly, is there no open source software that can generate a
thumbnail from a 15MB PNG without nuking our servers?
In case it's not possible (which I doubt, since I can generate
thumbnails with ImageMagick from these on my laptop, one at a time;
maybe a slow-running thumbnail generator, at least for "usual" sizes,
on a dedicated server?), it's no use cluttering the entire page with
broken thumbnails.
Where's the option for a list view? You know, a table with linked
title, size, uploader, date, no thumbnails? They're files, so why
don't we use things that have proven useful in a file system?
And then, of course:
"There are 200 files in this category."
That's two lines below the "(next 200)" link. At that point, we know
there are more than 200 images, but we forget about that two lines
further down?
Yes, I know that some categories are huge, and that it would take too
long to get the exact number.
But, would the exact number for large categories be useful? 500.000 or
500.001 entries, who cares? How many categories are that large anyway?
200 or 582 entries, now /that/ people might care about.
Why not at least try to get a number, set a limit to, say, 5001, and
* give the exact number if it's less that 5001 entries
* say "over 5000 entries" if it returns 5001
Yes, everyone's busy.
Yes, there are more pressing issues (SUL, stable versions, you name it).
Yes, MediaWiki wasn't developed as a media repository (tell me about it;-)
Yes, "sofixit" myself.
Still, I ask: is this the best we can do?
Magnus
</rant>
[View Less]
The most recent enwiki dump seems corrupt (CRC failure when bunzipping).
Another person (Nessus) has also noticed this, so it's not just me:
http://meta.wikimedia.org/wiki/Talk:Data_dumps#Broken_image_.28enwiki-20080…
Steps to reproduce:
lsb32@cmt:~/enwiki> md5sum enwiki-20080103-pages-meta-current.xml.bz2
9aa19d3a871071f4895431f19d674650 enwiki-20080103-pages-meta-current.xml.bz2
lsb32@cmt:~/enwiki> bzip2 -tvv
enwiki-20080103-pages-meta-current.xml.…
[View More]bz2 &> bunzip.log
lsb32@cmt:~/enwiki> tail bunzip.log
[3490: huff+mtf rt+rld]
[3491: huff+mtf rt+rld]
[3492: huff+mtf rt+rld]
[3493: huff+mtf rt+rld]
[3494: huff+mtf rt+rld]
[3495: huff+mtf data integrity (CRC) error in data
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
lsb32@cmt:~/enwiki> bzip2 -V
bzip2, a block-sorting file compressor. Version 1.0.3, 15-Feb-2005.
Copyright (C) 1996-2005 by Julian Seward.
This program is free software; you can redistribute it and/or modify
it under the terms set out in the LICENSE file, which is included
in the bzip2-1.0 source distribution.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
LICENSE file for more details.
bzip2: I won't write compressed data to a terminal.
bzip2: For help, type: `bzip2 --help'.
lsb32@cmt:~/enwiki>
[View Less]
Erm... never, EVER! try to write a long e-mail on a Wii... Especially
when occasionally your Wi-Fi disconnects which ends up killing
everything you've written.
Well onto the topic...
I've notice that the number of extensions using JS Libraries has
increased recently. Notably Semantic MediaWiki/Semantic Forms, and
SocialProfile. Additionally I was contracted to create a new mp3 playing
extension because all the current ones break the lines (The requester
wants to be able to let the music …
[View More]play inline, basically beside a normal
link to an audio file, instead of needing a plugin or something on their
computer, or a big player that takes up unneeded space)... So I found
the mp3inline <http://pjoe.net/software/mp3inline/> Wordpress plugin,
and intend to adapt some of it into a MediaWiki extension which will
automatically let audio links be playable inline with an icon located
cleanly beside the link. Of course, the note on this topic is that the
player uses Scriptaculous which is another JS Library which would be put
into MW.
Various extensions use different methods of including the libraries they
need. Mostly just requiring the user to find a way to put it in. However
SocialProfile includes a YUI extension which can be used. This extension
however is basically just a short bit that includes a single script
which is basically a minified part of the basic required YUI code, and
an unminified version of the animation package (Why they used the
minified version for one half, and the full version of another part is
beyond me though)...
The biggest issue with any of these that I see... Is load time. For all
of them you need to add a bunch of script tags to the page for them to
work, and suddenly you drastically increase the number of HTTP calls for
stuff on your site.
Since things are growing, I was thinking it would be a good idea to add
some stuff to core to allow extensions to add use of JS libraries in an
intuitive way. I started an allinone.php extension awhile ago (inspired
by Wikia's allinone.js idea) and was thinking I should probably rewrite
it and make something good for core.
The idea is to have a single script in the page which contains all of
the needed JS Libraries... And even wikibits.js inside it... All of them
minified to compact space... Of course, if you need to debug any errors
or anything, simply reload the page with &allinone=0 and the system
automatically includes the separate non-minified files in individual
script tags for debugging. Perhaps even a no-allinone preference for
those doing heavy debugging in areas where they have a post request they
can't add &allinone=0 to.
Additionally, the system would have a understanding of the structure of
a js library. Basically, a sort of definition module would be created
for each library that people may use (YUI, jQuery, Scriptaculous,
Prototype, etc...) which would outline things like the different parts
of the system (Core file, individual parts of the system like ui or
other things only needed sometimes, separation of full/minified files
(perhaps a notion of debug like what YUI has), and files like YUI's
utilities.js or yahoo-dom-event.js which are minified versions of a
grouping of various parts of the library.)
And using calls like, say... Making the thing handling this called
"JSLibs" just for this example... JSLibs::addUse( 'YUI', 'animation' );
which would note that YUI's animation bit is required for use in the
page. And so it'll automatically know that the 'yahoo' bit is also
needed, additionally if various other things like the dom, event, etc...
bits are needed it'll automatically use one of the combined files
instead of individual ones.
Of course, there is a little bit of optimization by use that things
using the libs need to do...
Primarily this is because some things are needed at some times, and not
at others... But if you don't define good times that it should be
included, then the number of varying types of allinone groups you have
increases and you end up with more stuff for the browser to cache and
more requests to the server.
So basically:
* Skins... For the JS Libraries that they require, they should include
the libraries all the time when inside of that skin. (There'll be code
to let Skins define what they need inside of the definition of where to
put the stuff)
* Site scripts... When JS Libraries are wanted for site scripting, the
stuff should be included using calls inside of LocalSettings.php and
included all the time.
* Extensions... It depends on what kind of extension...
** For low use things inside articles, like perhaps a TagCloud which is
likely only to be used on a few major pages, this should be only
included when needed (ie: The thing needing it is parsed into existence)
** For special page stuff, and things meant for only edit pages and the
like the libraries should always be included while on these pages, but
not in general while reading articles.
** For high use things, like SMW's attributes, factboxes, and such...
The libraries should be included 100% of the time... Of course, if you
really want you can put in some exclusions for when on special pages...
But these are used a high amount of times, and can add up the number of
variations easily.
If you don't understand what I'm meaning... It occurs when multiple
extensions of different types are used...
For example... Say we had a low use tag cloud, and something like SMW
which included dynamic stuff every time an attribute was used...
If the tag cloud loaded only when needed, and SMW included only when an
attribute was used... then we'd have the variations:
* One for when tag cloud, and SMW attributes are used (main pages mostly)
* One for when tag cloud isn't used, but SMW attributes are used (most
article pages)
* One for when tag cloud is used, but SMW attributes are not (extremely
rare case)
* And one for when the tag cloud isn't used, and SMW attribues are not
(another rare case)
Those last two shouldn't exist... They only exist because one extension
didn't define when stuff should be included right.
If the example SMW had loaded it's libraries 100% of the time when on
articles because of the high use of it... Then there would only be two
variations, one for with tag cloud, and one for when it's not...
Another issue, is minification... Not everything comes with a minified
counterpart... I was hoping to make this something which could be done
automatically. However, I found out that most of the minification
programs that seam to be good, run in other languages like Java, rather
than having PHP ports. So perhaps a toolserver service would be nice,
one allowing extension authors to submit a page of code in a form to the
toolserver, and have it return them a minified version using the best
program for the job, that way people developing scripts and stuff for
use can distribute the extension with pre-minified code, rather than
requiring the people using the extension to download something to minify
the code on their own.
^_^ And yes, of course we'd have a minified version of wikibits.js... We
include it 100% of the time, why waste bytes on the comments and
whitespace? Especially when using a non-minified/minified split allows
us to put nice literate documentation inside of the code, while still
making end use of something extremely compact.
--
~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)
[View Less]
Just a reminder that its six weeks and ten changes since the last
interwiki update.
Does this bug https://bugzilla.wikimedia.org/show_bug.cgi?id=12763
about the updates need closing since there was an update 10 Feb, or
should we keep until the process is an automated monthly task say?
Keep up the good work
BozMo
>
> Message: 8
> Date: Fri, 12 Oct 2007 17:59:22 +0200
> From: GerardM <gerard.meijssen(a)gmail.com>
> Subject: Re: [Wikitech-l] Primary account for single user login
>
> Hoi,
> This issue has been decided. Seniority is not fair either; there are
> hundreds if not thousands of users that have done no or only a few edits and
> I would not consider it fair when a person with say over 10.000 edits should
> have to defer to these typically inactive users.
1. …
[View More]Yes, it's not fair, but this is the truth on wikimedia project that ones
have to admit. Imagine if, all wikimedia sites has a single user login
since when it is first established, the one who first register will own that
username for all wikimedia sites.
2. The person with less edits, doesn't mean that they are less active than the
one with more edits. And according to,
http://en.wikipedia.org/wiki/Wikipedia:Edit_count,
``Edit counts do not necessarily reflect the value of a user's contributions
to the Wikipedia project.''
What if, some users have less edits count,
* since they deliberately edit, preview, edit, and preview the articles,
over and over, before submitting the deliberated versions to wikimedia
sites.
* Some users edit, edit and edit the articles in their offline storage, over
and over, before submitting the only final versions to wikimedia sites.
While some users have more edits count,
* since they often submit so many changes, without previewing it first, and
have to correct the undeliberated edit, over and over.
* Some users often submit so many minor changes, over and over, rather than
accumulate the changes resulting in fewer edits count.
* Some users do so many robot routines by themselves, rather than letting
the real robot to do those tasks.
* Some users often take part in many edit wars.
* Some users often take part in many arguments in many talk pages.
What if, the users with less edits count, try to increase their edits count
to take back the status of primary account.
What if, they decide to change their habit of editing, to increase the
edits count,
* by submitting many edits without deliberated preview,
* by splitting the accumulated changes into many minor edits, and submit
them separately,
* by stopping their robots, and do those robot routines by themselves,
* by joining edit wars.
3. According to 2) above, I think, the better measurement of activeness is to
measure the time between the first edit and the last edit of that username.
The formula will look like this,
activeness = last edit time - first edit time
>
> A choice has been made and as always, there will be people that will find an
> un-justice. There were many discussions and a choice was made. It is not
> good to revisit things continuously, it is good to finish things so that
> there is no point to it any more.
>
> Thanks,
> GerardM
>
> On 10/12/07, Anon Sricharoenchai <anon.hui(a)gmail.com> wrote:
> >
> > According to the conflict resolution process, that the account with
> > most edits is selected as a primary account for that username, this
> > may sound reasonable for the username that is owned by the same person
> > on all wikimedia sites.
> >
> > But the problem will come when the same username on those wikimedia
> > sites is owned by different person and they are actively in used.
> > The active account that has registered first (seniority rule) should
> > rather be considered the primary account.
> > Since, I think the person who register first should own that username
> > on the unified
> > wikimedia sites.
> >
> > Imagine, what if the wikimedia sites have been unified ever since the
> > sites are
> > first established long time ago (that their accounts have never been
> > separated),
> > the person who register first will own that username on all of the
> > wikimedia
> > sites.
> > The person who come after will be unable to use the registered
> > username, and have
> > to choose their alternate username.
> > This logic should also apply on current wikimedia sites, after it have
> > been
> > unified.
> >
[View Less]
Hi,
I have written some extensions for forms that use template parameters.
I have found that it appears to be impossible to pass a template
parameter to an extension. I think this make be an upshot of the
design, but before I totally give up I thought I'd see if anyone here
had tried to do the same thing.
Ideally my extension on a template page would look like this:
<textbox>{{{name|Not set}}}</textbox>
then used like this:
{{form:dynamicform|name=Alex}}
I am embedding them …
[View More]into the webpage as Divs, and then using Java
script to mark them up.
However. I cannot get the parser to expand the "name" parameter for my
extension. I get passed {{{name|Not Set}}}. OK, so I'll call
parsetags, but that seems to only pick up "Not set" when called from
my extension, not "Alex".
My eventual solution was partially formed HTML:
<textbox id=name />{{{name}}} <close textbox/>
Note the closed extension tags. The extension hacks around with the
parser adding tokens and replacing them after strip tags. It creates
an open div tag, and then a close div tag using <close> extension.
This works and used properly produces well formed HTML, but is not
elegant and is prone to error. I was hoping the new parser ordering
might help my plight, but it didn't. Am I going about this the wrong
way?
Kind regards,
Alex
--
Alex Powell
Exscien Training Ltd
Tel: +44 (0) 1865 920024
Direct: +44 (0) 1865 920032
Mob: +44 (0) 7717 765210
skype: alexp700
mailto:alexp@exscien.com
http://www.exscien.com
Registered in England and Wales 05927635, Unit 10 Wheatley Business
Centre, Old London Road, Wheatley, OX33 1XW, England
[View Less]