XPath in Parsel: Absolute and Relative Path

If you’ve ever worked with Parsel to scrape HTML, you’ve probably used XPath to extract parts of the page.

But here’s a trap that gets a lot of people (myself included):

table.xpath("//tbody")

You’d expect this to return the <tbody> inside the current table, right?
But it might actually give you the <tbody> from a completely different table elsewhere on the page.

Let’s walk through why this happens — with a clear example — and how to fix it using relative XPath.

🧪 Example HTML with Two Tables

Here’s a simple HTML snippet with two tables:

<html>
  <body>
    <table id="first">
      <thead><tr><th>Item</th><th>Price</th></tr></thead>
      <tbody>
        <tr><td>Apple</td><td>$1</td></tr>
      </tbody>
    </table>

    <table id="second">
      <thead><tr><th>Name</th><th>Age</th></tr></thead>
      <tbody>
        <tr><td>Alice</td><td>30</td></tr>
        <tr><td>Bob</td><td>25</td></tr>
      </tbody>
    </table>
  </body>
</html>

You want to get rows from only the second table (with id="second").

❌ Absolute XPath: The Wrong Way (Usually)

from parsel import Selector

selector = Selector(text=html)

table = selector.xpath('//table[@id="second"]')
tbody = table.xpath('//tbody')  # ⛔ This is the problem!

What’s wrong here?

//tbody is an absolute XPath.
It ignores the fact that you’re inside table.
It starts from the top of the document and finds all <tbody> elements.

Result? It could return both <tbody> elements (from both first and second tables) — or the wrong one entirely.

✅ Relative XPath: The Right Way

tbody = table.xpath('.//tbody')  # ✅ Note the dot!

The dot . means: start from this node (in this case, the second table).
.//tbody says: look inside this table, and find all <tbody> elements underneath.

This returns only the <tbody> for table#second, as expected.

💡 Why This Happens

XPath expressions behave differently depending on how you write them:

Expression	Means
`//tbody`	Look for all `<tbody>` elements anywhere in the document (starts from the root)
`.//tbody`	Look for `<tbody>` elements inside the current node

Even though you’re calling .xpath() on a specific node, starting with // resets the search back to the whole page.

That’s why using the dot . is so important when you want to limit your search to a specific part of the page.

✅ Real Example: Extracting Rows from a Specific Table

Here’s how you might use this properly in code:

from parsel import Selector

html = open("two_tables.html").read()
selector = Selector(text=html)

# Get only the second table
table = selector.xpath('//table[@id="second"]')

# Use relative XPath to get rows inside this table
rows = table.xpath('.//tbody/tr')

for row in rows:
    cols = row.xpath('./td/text()').getall()
    print(cols)

Output:

['Alice', '30']
['Bob', '25']

Perfect!

Summary: Absolute vs. Relative XPath

XPath	Starts From	Use Case
`//tbody`	Entire HTML	Use when you want to search globally
`.//tbody`	Current Node	Use when you’re drilling into a specific element
`./td`	Current Node	Get child nodes from the current row

Final Thoughts

If you’re chaining .xpath() calls in Parsel and wondering why you’re getting unexpected results, check whether you’re using absolute (//) or relative (.//) XPath.

Adding that little . makes all the difference.