Skip to main content

API to extract content from HTML & XML documents

Project description

https://travis-ci.org/lorien/selection.png?branch=master https://coveralls.io/repos/lorien/selection/badge.svg

API to extract data from HTML and XML documents.

Usage Example

Example:

from selection import XpathSelector
from lxml.html import fromstring

html = '<div><h1>test</h1><ul id="items"><li>1</li><li>2</li></ul></div>'
sel = XpathSelector(fromstring(html))
print(sel.select('//h1')).text()
print(sel.select('//li').text_list()
print(sel.select('//ul').attr('id')

Dependencies

  • lxml

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page