Querying¶
minestrone
allows searching through HTML via CSS selectors (similar to JQuery or other frontend libraries).
Note
Querying uses the select
method in Beautiful Soup
which delegates to SoupSieve
. More details about SoupSieve
is available in their documentation.
root_element¶
Gets the root element of the HTML.
from minestrone import HTML
html = HTML("""
<div>
<span>Dormouse</span>
</div>
""")
assert html.root_element.name == "div"
elements¶
Recursively get all elements in the HTML.
from minestrone import HTML
html = HTML("""
<div>
<span>Dormouse</span>
</div>
""")
assert [e.name for e in html.elements] == ["div", "span"]
query¶
Takes a CSS selector and returns an iterator of Element
items.
Query by element name¶
from minestrone import HTML
html = HTML("""
<h1>The Dormouse's Story</h1>
<p>There was a table...</p>
""")
for h1 in html.query("h1"):
assert str(h1) == "<h1>The Dormouse's Story</h1>"
Query by id¶
from minestrone import HTML
html = HTML("""
<ul>
<li><a href="http://example.com/elsie" class="sister" id="elsie">Elsie</a></li>
<li><a href="http://example.com/lacie" class="sister" id="lacie">Lacie</a></li>
</ul>
""")
for a in html.query("a#elsie"):
assert str(a) == '<a href="http://example.com/elsie" class="sister" id="elsie">Elsie</a>'
Query by class¶
from minestrone import HTML
html = HTML("""
<ul>
<li><a href="http://example.com/elsie" class="sister" id="elsie">Elsie</a></li>
<li><a href="http://example.com/lacie" class="sister" id="lacie">Lacie</a></li>
</ul>
""")
elsie_link = next(html.query("ul li a.sister"))
assert str(elsie_link) == '<a href="http://example.com/elsie" class="sister" id="elsie">Elsie</a>'
lacie_link = next(html.query("ul li a.sister"))
assert str(lacie_link) == '<a href="http://example.com/lacie" class="sister" id="lacie">Lacie</a>'
query_to_list¶
Exactly the same as query except it returns a list of Element
items instead of a generator. This is sometimes more useful than the query
above, but it can take more time to parse and more memory to store the data if the HTML document is large.
from minestrone import HTML
html = HTML("""
<ul>
<li><a href="http://example.com/elsie" class="sister" id="elsie">Elsie</a></li>
<li><a href="http://example.com/lacie" class="sister" id="lacie">Lacie</a></li>
</ul>
""")
assert len(html.query_to_list("a")) == 2
assert str(html.query_to_list("a")[0]) == '<a href="http://example.com/elsie" class="sister" id="elsie">Elsie</a>'
assert html.query_to_list("a") == list(html.query("a"))