Hello,
I am writing a Java program to extract the abstract of the wikipedia page
given the title of the wikipedia page. I have done some research and found
out that the abstract with be in rvsection=0
So for example if I want the abstract of 'Eiffel Tower" wiki page then I am
querying using the api in the following way.
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel…
and parse the XML data which we get and take the wikitext in the tag <rev
xml:space="preserve"> which represents the abstract of the wikipedia page.
But this wiki text also contains the infobox data which I do not need. I
would like to know if there is anyway in which I can remove the infobox data
and get only the wikitext related to the page's abstract Or if there is any
alternative method by which I can get the abstract of the page directly.
Looking forward to your help.
Thanks in Advance
Aditya Uppu
Hi,
I would like to get the citation info for a page -- i.e. the bibliographic
information about the page itself -- in a JSON, YAML, or BibTex format. How
do I accomplish that? I see there is a CiteThisPage extension but I can't
get the syntax right.
Fred
---
Install PageKicker's Algorithmic Publishing Toolkit:
git clone http://github.com/fredzannarbor/pagekicker-community.git
cd pagekicker-community
./simple-install.sh
# build your first book:
cd pagekicker-community/scripts
bin/builder.sh --seedsviacli "Algorithms; Thomas Cormen; Introduction to
Algorithms" --booktitle "Algorithm Demo"
Thank you so much for your kind help and clear explanation.
The use case is indeed a Chinese-language project, and the examples provided posed a nice illustration of how three kinds of possible outcomes will show in different versions. Here I would like to add another scenario in hope for a further understanding to Wikimedia's language conversion. If you search for "川普" ( Donald Trump in Traditional Chinese ) at OpenSearch API layer with redirects=resolve, [1] the first description should be "唐納·約翰·川普(英語:Donald John Trump,1946年6月14日-),第45任美國總統、著名企業家、作家和節目主持人。他生於紐約市皇后區,為川普集團前任董事長兼總裁及川普娛樂公司的創辦人,他在全世界經營房地產、賭場和酒店,但在就任美國總統後把集團交給他兩名兒子小唐納·川普及艾瑞克·川普管理。", which is Traditional Chinese; however, if the profile parameter is set to restrict,[2] the first description should become "唐纳德·约翰·特朗普(英语:Donald John Trump,1946年6月14日-),第45任美國總統、著名企業家、作家和節目主持人。他生于紐約市皇后区,为特朗普集團前任董事长兼總裁及特朗普娱乐公司的創辦人,他在全世界经营房地产、赌场和酒店,但在就任美國總統後把集團交給他兩名兒子小唐納·川普及艾瑞克·川普管理。, which is Simplified Chinese.
This scenario indicates that language conversion happens not just in display time, but also at API layer. To add another interesting point, the API use cases mentioned above [1][2] can even have various outcomes in different machines.
The ultimate issue now should be how such language conversions can happen at API layers and how can it be controlled?
-- Ben Yeh
[1] https://zh.wikipedia.org/w/api.php?action=opensearch&search=%E5%B7%9D%E6%99…
[2] https://zh.wikipedia.org/w/api.php?action=opensearch&search=%E5%B7%9D%E6%99…
在 2017年1月26日 星期四 上午4:24:05 [台北], Trey Jones<tjones(a)wikimedia.org> 寫道:Let's see if I can help, either directly, or indirectly via Cunningham's Law.[1]
I'm reading this as you are searching a Chinese-language project (like zh.wikipedia.org), and getting results that are mixed Traditional and Simplified Chinese. If that's not the case, please elaborate!
My understanding, which is admittedly incomplete, is that the text for Chinese-language projects is stored however it was entered (Traditional or Simplified), and is converted at display time. If you look at the main page of zh.wikipedia.org[2] today without being logged in (or in a private browsing window), the featured article link has this text: "2007年欧洲冠军联赛決賽", which uses both 赛 and 賽, with 赛 being the Simplified version of Traditional 賽.[3] If you request the zh-cn version of the page,[4] the text is "2007年欧洲冠军联赛决赛", and both are Simplified "赛". If you request the zh-tw version of the page[5], the text is "2007年歐洲冠軍聯賽決賽", and both are Traditional "賽". So, I believe that explains why you are seeing mixed Traditional and Simplified results.
What to do about it? I can't get the Opensearch API to do the conversion in place, but there is a separate API that does the conversion: Parsing wikitext.[6] Unfortunately, I can only get the API to do the conversion (which is based on the uselang parameter) when I submit the text as wikitext,[7][8] which adds some additional tags and a long comment to the results. \u-formatted input doesn't work, and I can't get the conversion to work for json input (i.e., the result of the Opensearch call). That doesn't mean it isn't possible, just that I haven't figured it out.
I hope that points you in the right direction, and maybe inspires someone who knows this stuff better than me to help out.
—Trey
[1] https://meta.wikimedia.org/wiki/Cunningham's_Law[2] https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5[3]https://en.wiktionary.org/wiki/%E8%B5%9B[4]https://zh.wikipedia.org/zh-cn/Wikipedia:%E9%A6%96%E9%A1%B5[5]https://zh.wikipedia.org/zh-tw/Wikipedia:%E9%A6%96%E9%A1%B5[6]https://www.mediawiki.org/wiki/API:Parsing_wikitext[7]https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=text&usela…
[8] https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=text&usela…
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Wed, Jan 25, 2017 at 11:22 AM, Adam Baso <abaso(a)wikimedia.org> wrote:
+discovery list
On Wed, Jan 25, 2017 at 10:15 AM, Brad Jorsch (Anomie) <bjorsch(a)wikimedia.org> wrote:
On Wed, Jan 25, 2017 at 2:09 AM, <byeh(a)yahoo-inc.com> wrote:
While I was developing some services based on API:Opensearch, I found that the response of the same url request can be either Simplified Chinese or Traditional Chinese. To be more specific, I would love to know how can I determine the response language form from API layer ( Or other factors that may have impact ) ? Since the document of API:Opensearch doesn't seem to take language into consideration,
The OpenSearch Suggestions extension specification[1] does not allow for returning additional metadata such as language with the response. You may want to look at the prefixsearch query module[2] instead which allows for returning the same results in a different format, although I don't know the details of how language variants are handled in the search output.
[1]: http://www.opensearch.org/Spec ifications/OpenSearch/Extensio ns/Suggestions/1.1
[2]: https://www.mediawiki.org/wiki /API:Prefixsearch
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
______________________________ _________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia. org
https://lists.wikimedia.org/ma ilman/listinfo/mediawiki-api
______________________________ _________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/ mailman/listinfo/discovery
Sorry, if it turns out as a stupid mistake, but I am at the end of my own
knowledge to solve this issue on my MediaWiki 1.28 installation:
I am using a python3 script to 1) get the token, and 2) login via
'clientlogin', but I always receive the
"error": {
"code": "badtoken",
"info": "Invalid token",
The taken is coming from 'action': 'query', 'meta': 'tokens',
'type': 'login', and I strip the last '\'.
This is my submission for Login:
Type: <class 'urllib.request.Request'>
Contents: {'_full_url': 'https://wiki.xxxxat.com/api.php',
'unredirected_hdrs': {}, '_tunnel_host': None, 'fragment': None, '_data':
b'password=Lxxxxxxxx7&action=clientlogin&rememberMe=1&username=Admin&returnu
rl=https%3A%2F%2Flinproxy.fan.workers.dev%3A443%2Fhttps%2Fwiki.xxxxat.com%2F&logintoken=8f2d58f9cc8336543c7f67cf8a592
6d058a95ede%2B%5C&requestid=byMichael', 'origin_req_host':
'wiki.xxxxat.com', 'selector': '/api.php', 'headers': {'Content-type':
'application/x-www-form-urlencoded'}, 'type': 'https', 'unverifiable':
False, 'host': 'wiki.xxxxat.com'}
Following the same procedure via ApiSandbox is working fine.
Thank you for your help!
As was previously announced, passing the lgpassword or lgtoken parameters
to action=login in the query string rather than the POST body will begin to
return an error starting with 1.29.0-wmf.13. See
https://www.mediawiki.org/wiki/MediaWiki_1.29/Roadmap for the deployment
schedule.
On Mon, Oct 31, 2016 at 2:44 PM, Brad Jorsch (Anomie) <bjorsch(a)wikimedia.org
> wrote:
> Over the past 30 days, there has been exactly one hit to
> action=clientlogin with sensitive data in the query string, and none to
> action=createaccount, action=linkaccount, and action=changeauthenticationdata.
> Beginning in 1.29.0-wmf.1 (to be deployed this week) these actions will now
> begin throwing errors if sensitive fields are included in the query string.
>
> Over the past 30 days, logins have been attempted via action=login for 28
> different user names[1] with sensitive data (lgpassword or lgtoken) in the
> query string. This will continue to work for now; my current plan is to
> turn that warning into an error on February 15, 2017.
>
>
> [1]: I can't post the list publicly at this time. If you want to know if
> you're one of the 28, put your user agent into https://meta.wikimedia.org/
> wiki/Special:ApiFeatureUsage and look for "login-params-in-query-string".
>
>
> On Fri, Aug 19, 2016 at 3:24 PM, Brad Jorsch (Anomie) <
> bjorsch(a)wikimedia.org> wrote:
>
>> For improved safety, passwords and other sensitive fields for
>> authentication should not be included in the request URI during a POST.
>> Instead, they should be in the POST body where they are less likely to be
>> included in log files. With the merge of Gerrit change 305545,[1] the API
>> will now produce a warning if such fields are detected in the URI. This
>> should be deployed to WMF wikis with 1.28.0-wmf.16, see
>> https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap for the schedule.
>>
>> This affects the following modules and fields:
>> * action=login: 'lgpassword'
>> * action=clientlogin, action=createaccount, action=linkaccount, and
>> action=changeauthenticationdata: Any fields reported as "sensitive" by
>> action=query&meta=authmanagerinfo or by UI or REDIRECT responses.
>> Currently, this affects the 'password' and 'retype' fields.
>>
>> The 'lgtoken' field for action=login will now also issue a warning if
>> placed in the request URI. The error code for other tokens being in the
>> request URI has changed from 'mustposttoken' to 'mustpostparams'.
>>
>> To check if your client's user agent is detected making such submissions,
>> you can also use ApiFeatureUsage[2] and look for
>> '<action>-params-in-query-string' once 1.28.0-wmf.16 is rolled out to
>> wikis your client is logging in to.
>>
>> It is planned that these warnings will be changed to errors during 1.29.
>> Let's avoid having a repeat of T142155,[3] update your code ASAP instead of
>> waiting until it breaks. Thanks.
>>
>> [1]: https://gerrit.wikimedia.org/r/#/c/305545/
>> [2]: https://meta.wikimedia.org/wiki/Special:ApiFeatureUsage
>> [3]: https://phabricator.wikimedia.org/T142155
>>
>> --
>> Brad Jorsch (Anomie)
>> Senior Software Engineer
>> Wikimedia Foundation
>>
>
>
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
>
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Mediawiki-api-announce mailing list
Mediawiki-api-announce(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce
Hello,
Almost two weeks ago, the Technical Collaboration team invited proposals
for the first edition of the Developer Wishlist survey!
We collected around 77 proposals that were marked as suitable for the
developer wishlist and met the defined scope and criteria
<https://www.mediawiki.org/wiki/Developer_Wishlist#Scope>. These proposals
fall into the following nine categories: Frontend, Backend, Code
Contribution (Process, Guidelines), Extensions, Technical Debt, Developer
Environment, Documentation, Tools (Phabricator, Gerrit) and Community
Engagement.
Voting phase starts now and will run until *February 14th, 23:59 UTC*. Click
here on a category and show support for the proposals you care for most
<https://www.mediawiki.org/wiki/Developer_Wishlist>. Use the 'Vote' and
'Endorse' buttons next to a proposal to do so.
*What happens next?*Proposals that will gather most votes will be included
in the final results which will be published on *Wednesday, February 15th*.
These proposals will also be considered in the Wikimedia Foundation’s
annual plan FY 2017-18.
Cheers,
Srishti
--
Srishti Sethi
Developer Advocate
Technical Collaboration team
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:SSethi_(WMF)