XPath in Parsel: Absolute and Relative Path

If you’ve ever worked with Parsel to scrape HTML, you’ve probably used XPath to extract parts of the page.

But here’s a trap that gets a lot of people (myself included):

table.xpath("//tbody")

You’d expect this to return the <tbody> inside the current table, right?
But it might actually give you the <tbody> from a completely different table elsewhere on the page.

Let’s walk through why this happens — with a clear example — and how to fix it using relative XPath.


🧪 Example HTML with Two Tables

Here’s a simple HTML snippet with two tables:

<html>
  <body>
    <table id="first">
      <thead><tr><th>Item</th><th>Price</th></tr></thead>
      <tbody>
        <tr><td>Apple</td><td>$1</td></tr>
      </tbody>
    </table>

    <table id="second">
      <thead><tr><th>Name</th><th>Age</th></tr></thead>
      <tbody>
        <tr><td>Alice</td><td>30</td></tr>
        <tr><td>Bob</td><td>25</td></tr>
      </tbody>
    </table>
  </body>
</html>

You want to get rows from only the second table (with id="second").


❌ Absolute XPath: The Wrong Way (Usually)

from parsel import Selector

selector = Selector(text=html)

table = selector.xpath('//table[@id="second"]')
tbody = table.xpath('//tbody')  # ⛔ This is the problem!

What’s wrong here?

  • //tbody is an absolute XPath.
  • It ignores the fact that you’re inside table.
  • It starts from the top of the document and finds all <tbody> elements.

Result? It could return both <tbody> elements (from both first and second tables) — or the wrong one entirely.


✅ Relative XPath: The Right Way

tbody = table.xpath('.//tbody')  # ✅ Note the dot!
  • The dot . means: start from this node (in this case, the second table).
  • .//tbody says: look inside this table, and find all <tbody> elements underneath.

This returns only the <tbody> for table#second, as expected.


💡 Why This Happens

XPath expressions behave differently depending on how you write them:

Expression Means
//tbody Look for all <tbody> elements anywhere in the document (starts from the root)
.//tbody Look for <tbody> elements inside the current node

Even though you’re calling .xpath() on a specific node, starting with // resets the search back to the whole page.

That’s why using the dot . is so important when you want to limit your search to a specific part of the page.


✅ Real Example: Extracting Rows from a Specific Table

Here’s how you might use this properly in code:

from parsel import Selector

html = open("two_tables.html").read()
selector = Selector(text=html)

# Get only the second table
table = selector.xpath('//table[@id="second"]')

# Use relative XPath to get rows inside this table
rows = table.xpath('.//tbody/tr')

for row in rows:
    cols = row.xpath('./td/text()').getall()
    print(cols)

Output:

['Alice', '30']
['Bob', '25']

Perfect!


Summary: Absolute vs. Relative XPath

XPath Starts From Use Case
//tbody Entire HTML Use when you want to search globally
.//tbody Current Node Use when you’re drilling into a specific element
./td Current Node Get child nodes from the current row

Final Thoughts

If you’re chaining .xpath() calls in Parsel and wondering why you’re getting unexpected results, check whether you’re using absolute (//) or relative (.//) XPath.

Adding that little . makes all the difference.