Skip to main content

Tutorial: Convert Scraping to Semantic

This tutorial shows how to replace a brittle scraping-based agent integration with AXAG semantic annotations — making your agent integrations reliable, typed, and safe.

Why does this matter?

Every time a website redesigns, scraping agents break. AXAG annotations let agents discover operations through a stable semantic contract instead of fragile CSS selectors.


Interactive Comparison

Use the tabs below to explore the difference between scraping and the AXAG semantic approach. Toggle between views, check off migration steps, and explore the full pipeline.

❌ Before — Fragile ScrapingPYTHON
# ❌ Fragile scraping approach
from bs4 import BeautifulSoup
import requests

def search_products(query):
  resp = requests.get(
      f"https://store.example.com/search?q={query}"
  )
  soup = BeautifulSoup(resp.text, 'html.parser')

  products = []
  for card in soup.select('.product-card'):
      # Breaks if class name changes
      name = card.select_one('.product-name').text
      # "$19.99" — needs locale-aware parsing
      price_text = card.select_one('.price').text
      price = float(
          price_text.replace('$','').replace(',','')
      )
      products.append({
          'name': name, 'price': price
      })

  return products
✅ After — AXAG SemanticHTML
<button
axag-intent="product.search"
axag-entity="product"
axag-action-type="read"
axag-required-parameters='["query"]'
axag-optional-parameters='[
  "category","price_min","price_max",
  "sort_by","page"
]'
axag-risk-level="none"
axag-idempotent="true"
axag-description="Search the product catalog"
>Search</button>

<!-- Agent calls the generated MCP tool: -->
<!-- mcp.call("product_search", {query: "..."}) -->
<!-- → Structured JSON result, no parsing -->

📊 Impact Metrics

🛡️
Reliability
~40%99.9%
↑ 150% more reliable
⏱️
Setup Time
2-4 hrs15 min
↓ 90% faster
🔧
Maintenance
WeeklyZero
↓ 100% eliminated
📄
Code Lines
200+10
↓ 95% less code

⚖️ Feature Comparison

Aspect
Scraping
AXAG Semantic
🛡️Reliability
Breaks on redesign
Stable across redesigns
🔒Type Safety
No
Yes (JSON Schema)
🔍Parameter Discovery
Reverse-engineer DOM
Declared in manifest
⚠️Risk Awareness
None
Declared risk levels
🔄Idempotency
Unknown
Declared per-action
🔧Maintenance
Per-site, per-redesign
Zero (site owner maintains)
Speed
Slow (parse full HTML)
Fast (direct tool call)
🚨Error Handling
Catch DOM exceptions
Structured error codes

✅ Migration Checklist0/9


Step-by-Step Walkthrough

Step 1: The scraping approach (what we're replacing)

A typical scraping agent for a product search page:

❌ Fragile scraping approach
# ❌ Fragile scraping approach
from bs4 import BeautifulSoup
import requests

def search_products(query):
response = requests.get(f"https://store.example.com/search?q={query}")
soup = BeautifulSoup(response.text, 'html.parser')

products = []
for card in soup.select('.product-card'): # Breaks if class changes
name = card.select_one('.product-name').text # Breaks if structure changes
price_text = card.select_one('.price').text # "$19.99" — needs parsing
price = float(price_text.replace('$', '').replace(',', ''))
products.append({'name': name, 'price': price})

return products

Problems with scraping:

  • CSS selectors break with every redesign
  • Price parsing varies by locale
  • Pagination requires DOM interaction
  • No type safety or parameter validation
  • No safety metadata (risk level, idempotency)

Step 2: Annotate the search button

Add AXAG attributes to your existing HTML element — annotations are purely additive:

✅ Annotated search button
<button
axag-intent="product.search"
axag-entity="product"
axag-action-type="read"
axag-required-parameters='["query"]'
axag-optional-parameters='["category","price_min","price_max","sort_by","page"]'
axag-risk-level="none"
axag-idempotent="true"
axag-description="Search the product catalog"
>Search</button>

Step 3: Generate the Semantic Manifest

Generate manifest from annotations
npx axag-generate-manifest --input src/ --output axag-manifest.json

Step 4: Generate the MCP Tool

Generate MCP tools from manifest
npx axag-generate-tools --manifest axag-manifest.json --output tools/

Step 5: Agent uses the generated tool

✅ Semantic approach — structured tool call
# ✅ Semantic approach — no scraping needed
def search_products(query, category=None, price_max=None):
result = mcp_client.call_tool("product_search", {
"query": query,
"category": category,
"price_max": price_max,
})
return result # Structured JSON, no parsing needed

Key Takeaway

Scraping is coupling to presentation. AXAG is coupling to intent.

When a site's CSS classes change, scraping agents break. When a site's purpose stays the same, AXAG agents keep working — because they're bound to semantic operations, not DOM structure.