Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am writing an "auto-wikifier" tool using HTML and JavaScript. For each word in the text to be wikified, I need to obtain a list of pages that contain that word (so that the matching phrases in the text can be automatically wikified, if they are found). Is there a way to obtain a list of all Wikipedia pages that contain a specific word, using one of Wikipedia's APIs or web services?

function getMatchingPageTitles(theString){
    //get a list of all matching page titles for a specific string, using one of Wikipedia's APIs or web services
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
224 views
Welcome To Ask or Share your Answers For Others

1 Answer

First, I'm not sure I understand how would something like that be useful. (Wikipedia has articles for all the common words and I don't think links to them would be of any use.)

But if you really wanted to do something like this, I think a much better way would be to use the API to find out which words from your input text have articles.

For example, for the string I am writing an "auto-wikifier" tool, your query could look something like:

http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=I|am|writing|an|auto-wikifier|tool

And the answer is:

<api>
  <query>
    <normalized>
      <n from="am" to="Am" />
      <n from="writing" to="Writing" />
      <n from="an" to="An" />
      <n from="auto-wikifier" to="Auto-wikifier" />
      <n from="tool" to="Tool" />
    </normalized>
    <pages>
      <page ns="0" title="Auto-wikifier" missing="" />
      <page pageid="2513432" ns="0" title="Am" />
      <page pageid="2513422" ns="0" title="An" />
      <page pageid="25346998" ns="0" title="I" />
      <page pageid="30677" ns="0" title="Tool" />
      <page pageid="32977" ns="0" title="Writing" />
    </pages>
  </query>
</api>

Few notes:

  • The results are not in the order you specified them.
  • If a page doesn't exist, the result has missing="" attribute.
  • JSON and JSONP formats are available too, that might be more suitable for JavaScript.
  • The titles parameter has a limit of 50 per one query.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...