Sadly, it appears that there are some hurtful vandals out there
who are attacking the people trying to counter them. For example,
User:Zoe has just posted that she's abandoning her efforts to counter
vandals; see: http://www.wikipedia.org/wiki/User%3AZoe
which begins: "I'm tired of fighting, I'm tired of arguing,
I'm tired of being called names."
The last straw seems to have been an edit by No-Fx to Zoe's
user page, in which No-Fx made it appear that Zoe was "…
[View More]into oral sex".
I don't know enough about this situation to know for sure if this
is an example, but I am concerned about the long-term dangers
if this starts a trend.
Attacks on users and sysops - particularly highly dedicated ones -
are much more dangerous to the Wikipedia than
simple attacks on a few pages. If these kinds of attacks cause
people to stop weeding out bad pages or vandals for
fear of retribution, the project is doomed.
Is there any way the software could be modified to make it harder
for vandals to counter-attack the people who are trying to
remove vandalism?
At the least, why not let the User:NAME pages be ONLY editable
by NAME? The "User_talk:" spaces need to be editable in some way,
but I don't see a need for others to "fix" the User: space of someone;
it's not critical that that content be fixed, and there's advantanges
to having some areas that are "precious" to each user.
Here's a more controversial idea: perhaps some
information relating to deletion of pages and banning of users
should be hidden from non-sysops. For example,
since "delete" can only be done by sysops, why not just tell
non-sysops that a deletion occurred, but not WHICH sysop did it?
By the same token, perhaps some discussion areas should be
only readable/writeable by sysops, in particular a discussion
area to discuss banning someone. Perhaps there could be a way
where anyone (non-sysop) could suggest that someone be banned,
without having their name revealed to non-sysops.
Since real deletes and banning can only be done by sysops anyway,
and sysops are trusted, there's no reason this information
MUST be public.
A related idea might be to modify the "talk" system so that it's
more like a bulletin board, with threaded messages and
a clear identification of who made it (click on "reply" to reply
to that item, maybe in a threaded way). That way, any message is
clearly
identified with its REAL author. A side-effect would be that
the attribution would happen automatically (no more
forgetting ~~~~). That way, when people discuss things, they
can't make it appear that someone else made an outrageous/nasty
statement.
The goal here would be to prevent people from attacking each other,
or at least limit its effectiveness.
Thoughts?
[View Less]
After some delays and bug-hunting my script for the HTML static versions
is in acceptable shape.
Here you can see an example, built from a SQL file of some weeks ago:
(Don't try the Search box!!! I explain below)
http://www.arcetri.astro.it/~puglisi/wiki/dump/ma/main_page.html
Please don't DOS the connection, it's not a very fast line.
Interested parties can find the script here:
http://www.arcetri.astro.it/~puglisi/wiki/…
[View More]wiki2static.txt
(renamed to .txt due to some server misconfig)
use a wide terminal for this one. Everything (html code included) is in
one single file. The whitespace may appear weird because I use 4-space
tabs. There's no need to tell me you don't like the coding style, I
alread know :-)))
Some issues:
- the topbar links do not work (known bug :-). The Edit link goes to the
online wikipedia site.
- interlanguage links are ignored
- some wiki markup is not recognized yet.
- no images are present (of course!)
- filenames should be OK for most filesystems not "8.3" limited
(max 63 chars, only a-z, 0-9 and underscore)
- despite the two-letter subdirectories, some of them have over 4,000
files in them!
- Time: the script takes more than 2 hours on my 1.3 Ghz Athlon...
- Size: this dump is about 800MB. (tar.gz is just 110MB). I think
that I can bring it down to 600-650MB with a bit of trimming and
eliminating unnecessary redirects. BUT, without some form of compression,
the English wikipedia will soon overflow a single CD. Maybe we should
target DVDs? :-)
- Images: no images are present here. AFAIK, each of them has a SQL record
(that my script skips), but the actual image data is not included. How
many megabytes of images we have? I think it will be impossible to store
the full images on a CD. Certainly it's possible on a DVD. Maybe a low-res
version could be included in a CD.
- Search: I tried a javascript search that worked well for small sized
databases: it's basically a big array of strings (article titles and
filenames) with some lines code that do a regexp match against them.
For full-sized databases like this one, the search page becomes an 8
megabytes monster that takes forever to process (IE grabs 100 MB of memory
and stops there, Opera is even worse). I'll see if I can find a different
solution.
Enough for now. While I carry on development, any input is welcome.
Ciao,
Alfio
[View Less]
Marco wrote:
> No, you lose much more. You can not easily combine the content of two
> "free" encyclopedias and get something that is "free". You can not copy
> images from the English Wikipedia to the German Wikipedia anymore because
> the "fair use" right works not this way in Germany.
What? Since when has the German Wikipedia moved to a German-based server?
Well, I know for a fact that it hasn't so German law has no bearing on the
legality of having "fair user" (per US law) …
[View More]images on the German Wikipedia.
However, those people who are subject to German law may be legally barred from
uploading such images. But there are plenty of German-speaking Wikipedians
living outside of Germany to do this.
-- Daniel Mayer (aka mav)
[View Less]
I have added
*PageHistory
*UserContributions
*BackLinks
InterWiki prefixes, because we currently do not support parameters in the
[[Special:]] namespace, and this was the lazy way to provide a much needed
quickfix. Among other things, this allows us to put
#REDIRECT [[UserContributions:Username]]
on the user page of a known vandal, making it easier to fix his edits from
RC.
As you might expect, these InterWiki links point to en:. I pondered
picking a name like EnHistory, EnContris …
[View More]etc., but I wanted something
intuitive. If other languages want the same functionality, prefixes like
"SeitenHistorie","BenutzerBeitraege" and "LinksAuf" can be easily added
(i.e. local equivalents).
Note that if we change the functionality of [[Special:]], things like
[[Special:MovePage->nul|Click here]] also become possible unless we
specifically forbid them.
Regards,
Erik
[View Less]
If you enter something like this (test in german wikipedia):
"5 AND Dezember" you get:
1064: You have an error in your SQL syntax. Check the manual that
corresponds to your MySQL server version for the right syntax to use near
'AND (MATCH (si_title) AGAINST ('dezember')) ) AND cur_namespace":
SELECT cur_id,cur_namespace,cur_title,cur_text FROM cur,searchindex WHERE
cur_id=si_page AND ( AND (MATCH (si_title) AGAINST ('dezember')) ) AND
cur_namespace IN (0) LIMIT 0, 20
I see two AND's one …
[View More]after the other which means that "this->mTextcond" is
emtpy (in the source code). It works with every single character as search
term, not only numbers.
Someone good at SearchEngine.php should take a look.
--
Smurf
smurf(a)AdamAnt.mud.de
------------------------- Anthill inside! ---------------------------
[View Less]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Is somebody deliberately turning on the InnoDB monitor, or is some
setting turn it on automatically?
It dumps data to the MySQL error log file every 15 seconds listing a
bunch of status and every transaction that's been done since the last
one, and that comes to several hundred megs of log file after a few
days, which can only be freed from the disk by deleting the log file
and restarting MySQL.
So if anyone knows a way to have it not start …
[View More]up, that would be nice. :)
- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE+2Dt3xVlOmwh1xjgRAn5zAJ4pOiIZB7QCMZkcCBl2pQAJJq83eQCeN6RC
D9tybprG144oSWsj5oIQsvc=
=0Ebu
-----END PGP SIGNATURE-----
[View Less]
Hi Alfio,
I looked at your code. Nice job.
Superficially it may seem we did almost the same job.
But overlap is minimal. My perl script addresses a lot of issues that
only are relevant in a Palm/Pocket PC/TomeRaider environment.
Your version has quite some code which is specific for a static html
version.
Still there are some areas where we can be of help to each other.
You mentioned unicode support as an open issue. Conincidentally I was
looking into this issue the past few days, while …
[View More]preparing a TomeRaider
version of the Esperanto Wikipedia, which would be unreadable without
it.
You will also find the UTF-8 coding scheme on which this is based below.
Here is some Perl code to translate unicode multicharacter byte
sequences into html tags of type &#nnn;
# unicode -> html character codes &#nnnn;
$entry =~ s/([\x80-\xFF]+)/&UnicodeToHtml($1)/ge ;
sub UnicodeToHtml
{
my $text = shift ;
my $html = "" ;
my $c, $byte, $ord, $unicode, $bytes, $html ;
for ($c = 0 ; $c < length ($text) ; $c++)
{
$byte = substr ($text,$c,1) ; # optimize with regexp ?
$ord = ord ($byte) ;
if ($ord < 128) # plain ascii character
{ $html .= $byte ; } # (will not occur in this script)
else
{
if ($ord < 224)
{ $bytes = 2 ; }
elsif ($ord < 240)
{ $bytes = 3 ; }
elsif ($ord < 248)
{ $bytes = 4 ; }
elsif ($ord < 252)
{ $bytes = 5 ; }
else
{ $bytes = 6 ; }
$unicode = substr ($text,$c,$bytes) ;
$html .= &UnicodeToHtmlTag ($unicode) ;
$c += $bytes - 1 ;
}
}
return ($html) ;
}
sub UnicodeToHtmlTag
{
my $unicode = shift ;
my $char = substr ($unicode,0,1) ;
my $ord = ord ($char) ;
my $c, $ord, $value ;
if ($ord < 128) # plain ascii character
{ return ($unicode) ; } # (will not occur in this script)
else
{
if ($ord >= 252)
{ $value = $ord - 252 ; }
elsif ($ord >= 248)
{ $value = $ord - 248 ; }
elsif ($ord >= 240)
{ $value = $ord - 240 ; }
elsif ($ord >= 224)
{ $value = $ord - 222 ; }
else
{ $value = $ord - 192 ; }
for ($c = 1 ; $c < length ($unicode) ; $c++)
{ $value = $value * 64 + ord (substr ($unicode, $c,1)) - 128 ; }
return ("\&\#" . $value . ";") ;
}
}
Found this somewhere on the web:
#UTF-8 works as follows:
#ENCODING
# The following byte sequences are used to represent a char-
# acter. The sequence to be used depends on the UCS code
# number of the character:
# 0x00000000 - 0x0000007F:
# 0xxxxxxx
#
# 0x00000080 - 0x000007FF:
# 110xxxxx 10xxxxxx
#
# 0x00000800 - 0x0000FFFF:
# 1110xxxx 10xxxxxx 10xxxxxx
#
# 0x00010000 - 0x001FFFFF:
# 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
#
# 0x00200000 - 0x03FFFFFF:
# 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
#
# 0x04000000 - 0x7FFFFFFF:
# 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
#
# The xxx bit positions are filled with the bits of the
# character code number in binary representation. Only the
# shortest possible multibyte sequence which can represent
# the code number of the character can be used.
By the way I enjoyed your contribution about Ant Power.
If you have any questions or suggestions you can reach me at
xxx(a)chello.nl
!spam: read xxx as epzachte
Cheers, Erik Zachte
[View Less]
>> If you run IE6 and right click on any web page, you will get a drop
down menu with "encoding" as an entry. Follow the arrow to a long list
of encodings. In my case, I chose Japanese and it was installed on
demand, in under a minute. Then I left "Encoding" set to "Autoselect."
<<
I tried this but it did not work for me. I remember that when I
installed XP and then ran the 'Windows Update' wizard I clicked 'Remove'
for all foreign language packages (a little short sighted, looking
…
[View More]back). Maybe this explains why. Could not find how to undo this.
There is also a "Enable Install On Demand (Explorer)" checkbox in
Explorer -> Options -> Advanced. (unchecked by default, or because of my
actions above). Enabling this did not help me either.
Finally I found a link in the Wikipedia to "Alan Wood's Unicode
Resources": "http://www.alanwood.net/unicode/ Lots of info and useful
links there.
He tells that Microsoft has some very complete TrueType fonts. They are
only shipped with MS Office. I copied the Arial unicode font
(Arialuni.ttf, 24 Mb) from another machine running Office and all was
well.
Erik Zachte
[View Less]
Hi,
I just got a new computer with Windows XP. I, also, was wondering where
the old "Input Methods" for foreign languages were.
If you run IE6 and right click on any web page, you will get a drop down
menu with "encoding" as an entry. Follow the arrow to a long list of encodings.
In my case, I chose Japanese and it was installed on demand, in under a
minute. Then I left "Encoding" set to "Autoselect."
If you are aware of this already, apologies...
As Ever,
…
[View More] Ruth Ifcher
--
> On Tue, 27 May 2003 12:32:19 +0900, Guillaume Blanchard
> <gblanchard(a)arcsy.co.jp> gave utterance to the following:
>
> <older attribution for the >> was snipped by Guillaume>
> >> So...perhaps I understood nothing, but do you think
> >> Opera 5 is not accepting unicode because of missing
> >> polices or does it just not tolerate it at all ?
>
> >
> > I think there are both problem. Even if your browser can handle unicode,
> > you
> > can't see caracters not defined in your font. I'm using MS Arial Unicode
> > with IE6.0 and I still not be able to see 100% of unicode characters. In
> > my
> > case I think it's only a font problem. You can go to this page and look
> > at
> > what percentage of caracters you can see :
> > http://www.columbia.edu/kermit/utf8.html (it's a UTF8 sample page).
> >
> Opera 5 has no unicode support - Opera 6 was the unicode rewrite.
> Both Opera (6+) and Mozilla support unicode natively - the only thing you
> have to do to get it working is to install an appropriate font.
> However, even if you have the font, IE doesn't display some writing systems
> until you "install support" by downloading a large patch to your operating
> system. (A fully multilingual installation of IE6 weighs in at around 85MB)
>
> --
> Richard Grevers
> I hate Victor Hugo said Les miserably
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)wikipedia.org
> http://www.wikipedia.org/mailman/listinfo/wikitech-l
[View Less]