Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm new to LibreOffice Basic. I'm trying to write a macro in LibreOffice Calc that will read the name of a noble House of Westeros from a cell (e.g. Stark), and output the Words of that House by looking it up on the relevant page on A Wiki of Ice and Fire. It should work like this:

enter image description here

enter image description here

Here is the pseudocode:

Read HouseName from column A
Open HtmlFile at "http://www.awoiaf.westeros.org/index.php/House_" & HouseName
Iterate through HtmlFile to find line which begins "<table class="infobox infobox-body"" // Finds the info box for the page.
Read Each Row in the table until Row begins Words
Read the contents of the next <td> tag, and return this as a string.

My problem is with the second line, I don't know how to read a HTML file. How should I do this in LibreOffice Basic?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
204 views
Welcome To Ask or Share your Answers For Others

1 Answer

There are two mainly issues with this. 1. Performance Your UDF will need get the HTTP resource in every cell, in which it is stored. 2. HTML Unfortunately there is no HTML parser in OpenOffice or LibreOffice. There is only a XML parser. Thats why we cannot parse HTML directly with the UDF.

This will work, but slow and not very universal:

Public Function FETCHHOUSE(sHouse as String) as String

   sURL = "http://awoiaf.westeros.org/index.php/House_" & sHouse

   oSimpleFileAccess = createUNOService ("com.sun.star.ucb.SimpleFileAccess")
   oInpDataStream = createUNOService ("com.sun.star.io.TextInputStream")
   on error goto falseHouseName
   oInpDataStream.setInputStream(oSimpleFileAccess.openFileRead(sUrl))
   on error goto 0
   dim delimiters() as long
   sContent = oInpDataStream.readString(delimiters(), false)

   lStartPos = instr(1, sContent, "<table class=" & chr(34) & "infobox infobox-body" )
   if lStartPos = 0 then
     FETCHHOUSE = "no infobox on page"
     exit function
   end if   
   lEndPos = instr(lStartPos, sContent, "</table>")
   sTable = mid(sContent, lStartPos, lEndPos-lStartPos + 8)

   lStartPos = instr(1, sTable, "Words" )
   if lStartPos = 0 then
     FETCHHOUSE = "no Words on page"
     exit function
   end if        
   lEndPos = instr(lStartPos, sTable, "</tr>")
   sRow = mid(sTable, lStartPos, lEndPos-lStartPos + 5)

   oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
   oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
   oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
   oOptions.searchString = "<td[^<]*>"
   oTextSearch.setOptions(oOptions)
   oFound = oTextSearch.searchForward(sRow, 0, Len(sRow))
   If  oFound.subRegExpressions = 0 then 
     FETCHHOUSE = "Words header but no Words content on page"
     exit function   
   end if
   lStartPos = oFound.endOffset(0) + 1
   lEndPos = instr(lStartPos, sRow, "</td>")
   sWords = mid(sRow, lStartPos, lEndPos-lStartPos)

   FETCHHOUSE = sWords
   exit function

   falseHouseName:
   FETCHHOUSE = "House name does not exist"

End Function

The better way would be, if you could get the needed informations from a Web API that would offered from the Wiki. You know the people behind the Wiki? If so, then you could place this there as a suggestion.

Greetings

Axel


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...