If you’ve ever worked with Parsel to scrape HTML, you’ve probably used XPath to extract parts of the page.
But here’s a trap that gets a lot of people (myself included):
table.xpath("//tbody")
You’d expect this to return the <tbody> inside the current table, right?
But it might actually give you the <tbody> from a completely different table elsewhere on the page.
Let’s walk through why this happens — with a clear example — and how to fix it using relative XPath.
🧪 Example HTML with Two Tables
Here’s a simple HTML snippet with two tables:
<html>
<body>
<table id="first">
<thead><tr><th>Item</th><th>Price</th></tr></thead>
<tbody>
<tr><td>Apple</td><td>$1</td></tr>
</tbody>
</table>
<table id="second">
<thead><tr><th>Name</th><th>Age</th></tr></thead>
<tbody>
<tr><td>Alice</td><td>30</td></tr>
<tr><td>Bob</td><td>25</td></tr>
</tbody>
</table>
</body>
</html>
You want to get rows from only the second table (with id="second").
❌ Absolute XPath: The Wrong Way (Usually)
from parsel import Selector
selector = Selector(text=html)
table = selector.xpath('//table[@id="second"]')
tbody = table.xpath('//tbody') # ⛔ This is the problem!
What’s wrong here?
//tbodyis an absolute XPath.- It ignores the fact that you’re inside
table. - It starts from the top of the document and finds all
<tbody>elements.
Result? It could return both <tbody> elements (from both first and second tables) — or the wrong one entirely.
✅ Relative XPath: The Right Way
tbody = table.xpath('.//tbody') # ✅ Note the dot!
- The dot
.means: start from this node (in this case, the second table). .//tbodysays: look inside this table, and find all<tbody>elements underneath.
This returns only the <tbody> for table#second, as expected.
💡 Why This Happens
XPath expressions behave differently depending on how you write them:
| Expression | Means |
|---|---|
//tbody |
Look for all <tbody> elements anywhere in the document (starts from the root) |
.//tbody |
Look for <tbody> elements inside the current node |
Even though you’re calling .xpath() on a specific node, starting with // resets the search back to the whole page.
That’s why using the dot . is so important when you want to limit your search to a specific part of the page.
✅ Real Example: Extracting Rows from a Specific Table
Here’s how you might use this properly in code:
from parsel import Selector
html = open("two_tables.html").read()
selector = Selector(text=html)
# Get only the second table
table = selector.xpath('//table[@id="second"]')
# Use relative XPath to get rows inside this table
rows = table.xpath('.//tbody/tr')
for row in rows:
cols = row.xpath('./td/text()').getall()
print(cols)
Output:
['Alice', '30']
['Bob', '25']
Perfect!
Summary: Absolute vs. Relative XPath
| XPath | Starts From | Use Case |
|---|---|---|
//tbody |
Entire HTML | Use when you want to search globally |
.//tbody |
Current Node | Use when you’re drilling into a specific element |
./td |
Current Node | Get child nodes from the current row |
Final Thoughts
If you’re chaining .xpath() calls in Parsel and wondering why you’re getting unexpected results, check whether you’re using absolute (//) or relative (.//) XPath.
Adding that little . makes all the difference.