Reading HTML into Python and querying with XPath

Really, just how hard can it be to read some html into Python and query it with XPath?

Well, as usual the Pyhon manual is pretty patchy, and there are a myriad of third party libraries in various states of disrepair. After a while, I got this to work:

from lxml import etree
parser = etree.HTMLParser()
tree = etree.parse("stuff.html", parser)

cells = tree.xpath("//table[@class='guildBattlesInner']/tbody/tr/td[2]")
for td in cells:
    if td.attrib.has_key('class') and td.attrib['class'].find('highlight') != -1:
        print(tr.text)

Home | More stuff | Octad of the week