Hi,
regarding an actual topic in Germany about publication of the
timetable-data of Deutsche Bahn (German national railway company) and
their willingness of a discussion with other Open-Data-Supporters it may
be a good idea of providing an expiration dates for Wikidata-records.
In their open letter to Mr. Kreil [1] they announced that it may cause
problems providing the timetable-data in an open way if e.g. anybody
uses old data.
Marco
[1] http://www.db-vertrieb.com/db_vertrieb/view/service/open_plan_b.shtml
Heya folks :)
We're now live on the Hebrew and Italian Wikipedia as well. Wohoooo!
More details are in this blog post:
http://blog.wikimedia.de/2013/01/30/wikidata-coming-to-the-next-two-wikiped…
(including planned dates for the next deployments!)
At the same time we updated the code on the Hungarian Wikipedia. These
are however only minor fixes and changes.
Thanks to everyone who helped and to the Hebrew and Italian Wikipedia
for joining the Hungarian …
[View More]Wikipedia in a very elite club of awesome :P
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata
Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
[View Less]
2013/1/25 Daniel Kinzler <daniel.kinzler(a)wikimedia.de>
> Hi!
>
> I thought about the RDF export a bit, and I think we should break this up
> into
> several steps for better tracking. Here is what I think needs to be done:
>
>
Daniel,
I am answering to Wikidata-l, and adding Tpt (since he started working on
something similar), hoping to get more input on the open list.
I especially hope that Markus and maybe Jeroen can provide insight from the
experience with …
[View More]Semantic MediaWiki.
Just to reiterate internally: in my opinion we should learn from the
experience that SMW made here, but we should not immediately try to create
common code for this case. First step should be to create something that
works for Wikibase, and then analyze if we can refactor some code on both
Wikibase and SMW and then have a common library that both build on. This
will give us two running systems that can be tested against while
refactoring. But starting the other way around -- designing a common
library, developing it for both Wikibase and SMW, while keeping SMW's
constraints in mind -- will be much more expensive in terms of resources. I
guess we agree on the end result -- share as much code as possible. But
please let us not *start* with that goal, but rather aim first at the goal
"Get an RDF export for Wikidata". (This is especially true because of the
fact that Wikibase is basically reified all the way through, something SMW
does not have to deal with).
In Semantic MediaWiki, the relevant parts of the code are (if I get it
right):
SMWSemanticData is roughly what we call Wikibase::Entity
includes/export/SMW_ExportController.php - SMWExportController - main
object responsible for creating serializations. Used for configuration, and
then calls the SMWExporter on the relevant data (which it collects itself)
and applies the defined SMWSerializer on the returned SMWExpData.
includes/export/SMW_Exporter.php - SMWExporter - takes a SMWSemanticData
object and returns a SMWExpData object, which is optimized for being
exported
includes/export/SMW_Exp_Data.php - SMWExpData - holds the data that is
needed for export
includes/export/SMW_Exp_Element.php - several classes used to represent the
data in SMWExpData. Note that there is some interesting interplay happening
with DataItems and DataValues here.
includes/export/SMW_Serializer.php - SMWSerializer - abstract class for
different serializers
includes/export/SMW_Serializer_RDFXML.php - SMWRDFXMLSerializer -
responsible to create the RDF/XML serialization
includes/export/SWM_Serializer_Turtle.php - SMWTurtleSerializer -
responsible to create the Turtle serialization
special/URIResolver/SMW_SpecialURIResolver.php - SMWURIResolver - Special
page that deals with content negotiation.
special/Export/SMW_SpecialOWLExport.php - SMWSpecialOWLExport - Special
page that serializes a single item.
maintenance/SMW_dumpRDF.php - calling the serialization code to create a
dump of the whole wiki, or of certain entity types. Basically configures a
SMWExportController and let's it do its job.
There are some smart ideas in the way that the ExportController and
Exporter are being called by both the dump script as well as the single
item serializer, and that allow it to scale to almost any size.
Remember that unlike SMW, Wikibase contains mostly reified knowledge. Here
is the spec of how to translate the internal Wikibase representation to
RDF: http://meta.wikimedia.org/wiki/Wikidata/Development/RDF
The other major influence is obviously the MediaWiki API, with its (almost)
clean separation of results and serialization formats. Whereas we can also
get inspired here, the issue is that RDF is a graph based model and the
MediaWiki API is really built for a tree. Therefore I am afraid that we
cannot reuse much here.
Note that this does not mean that the API can not be used to access the
data about entities, but merely that the API answers with tree-based
objects, most prominently the JSON objects described here:
http://meta.wikimedia.org/wiki/Wikidata/Data_model/JSON
So, after this lengthy prelude, let's get to the Todos that Daniel suggests:
* A low-level serializer for RDF triples, with namespace support. Would be
> nice
> if it had support for different forms of output (xml, n3, etc). I suppose
> we can
> just use an existing one, but it needs to be found and tried.
>
>
Re reuse: the thing is that to the best of my knowledge PHP RDF packages
are quite heavyweight (because they also contain parsers, not just
serializers, and often enough SPARQL processors and support for blank nodes
etc.), and it is rare that they support the kind of high-throughput
streaming that we would require for the complete dump (i.e. there is
obviously no point of first setting all triples into a graph model and then
call the model->serialize() method, this needs too much memory). Also some
optimizations that we can use (re ordering of triples, use of namespaces,
some assumptions about the whole dump, etc.). I will ask the Semantic Web
mailing list about that, but I don't have much hope.
The corresponding classes in SMW are the SMWSerializer classes.
> * A high level RDF serializer that process Entity objects. It should be
> possible
> to use this in streaming mode, i.e. it needs separate functions for
> generating
> the document header and footer in addition to the actual Entities.
>
>
This corresponds to the SMWExporter and and parts of the
SMWExportController classes.
> * Support for pulling in extra information on demand, e.g. back-links or
> prope3rty definitions.
>
>
SMWExportController provides most of these supporting tasks.
> * A maintenance script for generating dumps. It should at least be able to
> generate a dump of either all entities, or one kind of entity (e.g.
> items). And
> it should also be able to dump a given list of entities.
>
>
Surprisingly, creating a dump of all entities or of one kind of entities is
quite different from providing a dump of a given list of entities, because
whenever you create a dump of everything you can make some assumptions that
save you from keeping a lot of state. Therefor this item should be split
into two (or even three) subitems.
* A script to create a dump of all entities
* (A script to create a dump of all entities of a given kind)
* A script to create a dump of a list of entities
Personally I think the last item has a rather low priority, because it can
be so easily simulated.
> * Special:EntityData needs a plug in interface so the RDF serializer can
> be used
> from there.
>
>
Or call the exporter. This special page corresponds to SMWSpecialOWLExport.
> * Special:EntityData should support format selection using file extension
> syntax
> (e.g. Q1234.n3 vs. Q1234.json).
>
>
That is a nice solution which works with Wikibase and was not available in
SMW.
> * Similarly, Special:EntityData should support a "pretty" syntax for
> showing
> specific revisions, e.g. Q1234.json@81762345.
>
>
I really never understood why you considered this one so important. Let's
keep it as an item, but for me the priority of it is really low.
> * Special:EntityData should support content negotiation (using redirects).
>
>
Basically what SMWURIResolver provides, but can be a bit nicer due to the
file extension suffixes.
> Did I miss anything?
>
>
Probably, just as I did.
I'd like to see if we get some input here, and then we can extract the
items from it and start with implementing them.
Already available is the following special page:
http://wikidata-test-repo.wikimedia.de/wiki/Special:EntityData/Q3
> -- daniel
>
> --
> Daniel Kinzler, Softwarearchitekt
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
>
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
[View Less]
Hi all!
Deployment will start in a moment, so I turned the test client back to
English. It has been updated with the latest stuff, repo will be, too,
in a few minutes.
Happy testing,
Silke
--
Silke Meyer
Systemadministratorin und Projektassistenz Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 260
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im …
[View More]Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.
[View Less]
As owner of interwiki bot i now see:
Disabling of interwiki bots is about one row in source code of bot,
depend how ofter owners update. So 1-2 days can disable many bots, the
others shoud be blocked for a while
But there is problem now - other wikis still use classic intwrwiki
links and on hu.wiki still remains these links - which are outdated
and in some case with incorrect links too - and this causes interwiki
conflicts on other wikis, because bots read links in this wiki, but
cannot edit …
[View More]them.
And on wikidata are oudtated data, because many new articles are
created (moved and deleted) daily, but the most used platform -
pywikipedia is not ready yet for wikidata.
Bots should remove links on nonconflicted pages on hu.wiki and update
wikidata too, but no one was able (or willing?) to write this feature
since 30th october :-(
In next days more and more wikis will be "locked" for interwiki bots,
but these problems will remain minimally one week after this feature
exists (one week is necessary for granting bot flag on wikidata - or
should there be global bots allowed?)
JAnD
"Bináris" <wikiposta(a)gmail.com> schrieb:
>2013/1/28 Amir Ladsgroup <ladsgroup(a)gmail.com>
>
>> What is exact time of the next deployment (it and he)?
>>
>If you want to catch it, join #wikimedia-wikidata on IRC. It was great
>to
>follow it on D-day!
>
>
>> And what time you think is best to disable interwiki bots?
>>
>Xqt can modify the code, but pywiki is not deployed, it is updated by
>bot
>owners, so there is no chance to focus it on one hour. For this reason
>I
>would say to begin it after deployment of Wikibase as otherwise one
>should
>do it at least 1 or 2 days before which would cause a maintenance
>pause.
>Yes, people will try to remove iws and some of them will be put back by
>bots.
> Would it also make sense to write a bot putting the remaining iws to >wikidata and rmoving them from the wiki if they can be replaced by t>hem from wikidata?
> Marco
--
--
Ing. Jan Dudík
[View Less]
Hi!
At the moment, wikidata-test-client.wikimedia.de/wiki is configured to
act as Hebrew client to wikidata-test-repo.wikimedia.de/wiki. wmf8 can
be tested there until tomorrow's deployment. (Sorry for the remaining
auto-imported content.)
Best,
--
Silke Meyer
Systemadministratorin und Projektassistenz Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 260
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur …
[View More]Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.
[View Less]
Heya folks :)
Is anyone interested in getting us some stats for the deployment on
the Hungarian Wikipedia? There is a database dump at
http://dumps.wikimedia.org/backup-index.html from the 22nd of January
that could be used. I'm interested in the effect Wikidata had so far
on this one Wikipedia.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata
Wikimedia Deutschland …
[View More]e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
[View Less]
Spin off from the "Phase 1" thread.
2013/1/29 Magnus Manske <magnusmanske(a)googlemail.com>:
> Why not just block the bots on wikis that use wikidata?
This looks like the right thing to me, but I don't want to be too rude
to the bot operators and I do want the bots to keep doing useful
things.
Imagine the scenario:
* Wikidata Client is deployed to the Hebrew Wikipedia.
* I remove interlanguage links from the Hebrew Wikipedia article
[[ASCII]], an item for which is available in the …
[View More]Wikidata Repo (
https://www.wikidata.org/wiki/Q8815 ).
** The article is supposed to show the links brought from Wikidata now.
* After some time User:LovelyBot adds the links back.
* I block User:LovelyBot.
Now what do I say to User:Lovely?
A: Stop changing interlanguage links on the Hebrew Wikipedia. We have
Wikidata now.
B: Update your pywikipedia bot configuration (or version). We have
Wikidata now, and your bot must not touch articles that get the
interlanguage links from the Wikidata repo.
I prefer option B, but can pywikipediabot indeed identify that the
links in the article are coming from Wikidata? And are there interwiki
bots that are not using the pywikipediabot infrastructure?
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
[View Less]