Wikidata January 2013

wikidata@lists.wikimedia.org

50 participants
40 discussions

[Wikidata-l] Expiration date for data
by Marco Fleckinger 23 Mar '13

23 Mar '13

Hi, regarding an actual topic in Germany about publication of the timetable-data of Deutsche Bahn (German national railway company) and their willingness of a discussion with other Open-Data-Supporters it may be a good idea of providing an expiration dates for Wikidata-records. In their open letter to Mr. Kreil [1] they announced that it may cause problems providing the timetable-data in an open way if e.g. anybody uses old data. Marco [1] http://www.db-vertrieb.com/db_vertrieb/view/service/open_plan_b.shtml

17 35

[Wikidata-l] And we're live on the Hebrew and Italian Wikipedia \o/
by Lydia Pintscher 30 Jan '13

30 Jan '13

Heya folks :) We're now live on the Hebrew and Italian Wikipedia as well. Wohoooo! More details are in this blog post: http://blog.wikimedia.de/2013/01/30/wikidata-coming-to-the-next-two-wikiped… (including planned dates for the next deployments!) At the same time we updated the code on the Hungarian Wikipedia. These are however only minor fixes and changes. Thanks to everyone who helped and to the Hebrew and Italian Wikipedia for joining the Hungarian … [View More]

3 2

[Wikidata-l] Fwd: Todos for RDF export
by Denny Vrandečić 30 Jan '13

30 Jan '13

2013/1/25 Daniel Kinzler <daniel.kinzler(a)wikimedia.de> > Hi! > > I thought about the RDF export a bit, and I think we should break this up > into > several steps for better tracking. Here is what I think needs to be done: > > Daniel, I am answering to Wikidata-l, and adding Tpt (since he started working on something similar), hoping to get more input on the open list. I especially hope that Markus and maybe Jeroen can provide insight from the experience with … [View More]Semantic MediaWiki. Just to reiterate internally: in my opinion we should learn from the experience that SMW made here, but we should not immediately try to create common code for this case. First step should be to create something that works for Wikibase, and then analyze if we can refactor some code on both Wikibase and SMW and then have a common library that both build on. This will give us two running systems that can be tested against while refactoring. But starting the other way around -- designing a common library, developing it for both Wikibase and SMW, while keeping SMW's constraints in mind -- will be much more expensive in terms of resources. I guess we agree on the end result -- share as much code as possible. But please let us not *start* with that goal, but rather aim first at the goal "Get an RDF export for Wikidata". (This is especially true because of the fact that Wikibase is basically reified all the way through, something SMW does not have to deal with). In Semantic MediaWiki, the relevant parts of the code are (if I get it right): SMWSemanticData is roughly what we call Wikibase::Entity includes/export/SMW_ExportController.php - SMWExportController - main object responsible for creating serializations. Used for configuration, and then calls the SMWExporter on the relevant data (which it collects itself) and applies the defined SMWSerializer on the returned SMWExpData. includes/export/SMW_Exporter.php - SMWExporter - takes a SMWSemanticData object and returns a SMWExpData object, which is optimized for being exported includes/export/SMW_Exp_Data.php - SMWExpData - holds the data that is needed for export includes/export/SMW_Exp_Element.php - several classes used to represent the data in SMWExpData. Note that there is some interesting interplay happening with DataItems and DataValues here. includes/export/SMW_Serializer.php - SMWSerializer - abstract class for different serializers includes/export/SMW_Serializer_RDFXML.php - SMWRDFXMLSerializer - responsible to create the RDF/XML serialization includes/export/SWM_Serializer_Turtle.php - SMWTurtleSerializer - responsible to create the Turtle serialization special/URIResolver/SMW_SpecialURIResolver.php - SMWURIResolver - Special page that deals with content negotiation. special/Export/SMW_SpecialOWLExport.php - SMWSpecialOWLExport - Special page that serializes a single item. maintenance/SMW_dumpRDF.php - calling the serialization code to create a dump of the whole wiki, or of certain entity types. Basically configures a SMWExportController and let's it do its job. There are some smart ideas in the way that the ExportController and Exporter are being called by both the dump script as well as the single item serializer, and that allow it to scale to almost any size. Remember that unlike SMW, Wikibase contains mostly reified knowledge. Here is the spec of how to translate the internal Wikibase representation to RDF: http://meta.wikimedia.org/wiki/Wikidata/Development/RDF The other major influence is obviously the MediaWiki API, with its (almost) clean separation of results and serialization formats. Whereas we can also get inspired here, the issue is that RDF is a graph based model and the MediaWiki API is really built for a tree. Therefore I am afraid that we cannot reuse much here. Note that this does not mean that the API can not be used to access the data about entities, but merely that the API answers with tree-based objects, most prominently the JSON objects described here: http://meta.wikimedia.org/wiki/Wikidata/Data_model/JSON So, after this lengthy prelude, let's get to the Todos that Daniel suggests: * A low-level serializer for RDF triples, with namespace support. Would be > nice > if it had support for different forms of output (xml, n3, etc). I suppose > we can > just use an existing one, but it needs to be found and tried. > > Re reuse: the thing is that to the best of my knowledge PHP RDF packages are quite heavyweight (because they also contain parsers, not just serializers, and often enough SPARQL processors and support for blank nodes etc.), and it is rare that they support the kind of high-throughput streaming that we would require for the complete dump (i.e. there is obviously no point of first setting all triples into a graph model and then call the model->serialize() method, this needs too much memory). Also some optimizations that we can use (re ordering of triples, use of namespaces, some assumptions about the whole dump, etc.). I will ask the Semantic Web mailing list about that, but I don't have much hope. The corresponding classes in SMW are the SMWSerializer classes. > * A high level RDF serializer that process Entity objects. It should be > possible > to use this in streaming mode, i.e. it needs separate functions for > generating > the document header and footer in addition to the actual Entities. > > This corresponds to the SMWExporter and and parts of the SMWExportController classes. > * Support for pulling in extra information on demand, e.g. back-links or > prope3rty definitions. > > SMWExportController provides most of these supporting tasks. > * A maintenance script for generating dumps. It should at least be able to > generate a dump of either all entities, or one kind of entity (e.g. > items). And > it should also be able to dump a given list of entities. > > Surprisingly, creating a dump of all entities or of one kind of entities is quite different from providing a dump of a given list of entities, because whenever you create a dump of everything you can make some assumptions that save you from keeping a lot of state. Therefor this item should be split into two (or even three) subitems. * A script to create a dump of all entities * (A script to create a dump of all entities of a given kind) * A script to create a dump of a list of entities Personally I think the last item has a rather low priority, because it can be so easily simulated. > * Special:EntityData needs a plug in interface so the RDF serializer can > be used > from there. > > Or call the exporter. This special page corresponds to SMWSpecialOWLExport. > * Special:EntityData should support format selection using file extension > syntax > (e.g. Q1234.n3 vs. Q1234.json). > > That is a nice solution which works with Wikibase and was not available in SMW. > * Similarly, Special:EntityData should support a "pretty" syntax for > showing > specific revisions, e.g. Q1234.json@81762345. > > I really never understood why you considered this one so important. Let's keep it as an item, but for me the priority of it is really low. > * Special:EntityData should support content negotiation (using redirects). > > Basically what SMWURIResolver provides, but can be a bit nicer due to the file extension suffixes. > Did I miss anything? > > Probably, just as I did. I'd like to see if we get some input here, and then we can extract the items from it and start with implementing them. Already available is the following special page: http://wikidata-test-repo.wikimedia.de/wiki/Special:EntityData/Q3 > -- daniel > > -- > Daniel Kinzler, Softwarearchitekt > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. > > -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. [View Less]

2 2

[Wikidata-l] Test is back to English
by Silke Meyer 30 Jan '13

30 Jan '13

Hi all! Deployment will start in a moment, so I turned the test client back to English. It has been updated with the latest stuff, repo will be, too, in a few minutes. Happy testing, Silke -- Silke Meyer Systemadministratorin und Projektassistenz Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. (030) 219 158 260 http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im … [View More]

1 0

Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia
by Jan Dudík 30 Jan '13

30 Jan '13

As owner of interwiki bot i now see: Disabling of interwiki bots is about one row in source code of bot, depend how ofter owners update. So 1-2 days can disable many bots, the others shoud be blocked for a while But there is problem now - other wikis still use classic intwrwiki links and on hu.wiki still remains these links - which are outdated and in some case with incorrect links too - and this causes interwiki conflicts on other wikis, because bots read links in this wiki, but cannot edit … [View More]

6 8

[Wikidata-l] wikidata-test-client is acting as a Hebrew client until deployment
by Silke Meyer 29 Jan '13

29 Jan '13

Hi! At the moment, wikidata-test-client.wikimedia.de/wiki is configured to act as Hebrew client to wikidata-test-repo.wikimedia.de/wiki. wmf8 can be tested there until tomorrow's deployment. (Sorry for the remaining auto-imported content.) Best, -- Silke Meyer Systemadministratorin und Projektassistenz Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. (030) 219 158 260 http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur … [View More]

1 0

[Wikidata-l] RDBMSes in Wikidata
by Sumana Harihareswara 29 Jan '13

29 Jan '13

(was "Database used by wikidata") We would LOVE for more developers or systems administrators to help support MediaWiki core on other RDBMSes; one of the most helpful things you can do is help look at the changes other developers are making that might affect your preferred RDBMS, and provide testing and code review. https://www.mediawiki.org/wiki/Code_review_guide For instance, here's a search for not-yet-merged commits that mention PostgreSQL in their … [View More]

1 0

[Wikidata-l] getting some stats for the Hungarian Wikipedia
by Lydia Pintscher 29 Jan '13

29 Jan '13

Heya folks :) Is anyone interested in getting us some stats for the deployment on the Hungarian Wikipedia? There is a database dump at http://dumps.wikimedia.org/backup-index.html from the 22nd of January that could be used. I'm interested in the effect Wikidata had so far on this one Wikipedia. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland … [View More]

6 13

[Wikidata-l] Interwiki bots - practical questions
by Amir E. Aharoni 29 Jan '13

29 Jan '13

Spin off from the "Phase 1" thread. 2013/1/29 Magnus Manske <magnusmanske(a)googlemail.com>: > Why not just block the bots on wikis that use wikidata? This looks like the right thing to me, but I don't want to be too rude to the bot operators and I do want the bots to keep doing useful things. Imagine the scenario: * Wikidata Client is deployed to the Hebrew Wikipedia. * I remove interlanguage links from the Hebrew Wikipedia article [[ASCII]], an item for which is available in the … [View More]

2 1

[Wikidata-l] Phase 1
by Jan Kučera 29 Jan '13

29 Jan '13

How is the Hungarian pilot running? When will you deploy Phase 1 to other wikis?

7 10

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata January 2013