Tutorial: Convert Scraping to Semantic
This tutorial shows how to replace a brittle scraping-based agent integration with AXAG semantic annotations — making your agent integrations reliable, typed, and safe.
Every time a website redesigns, scraping agents break. AXAG annotations let agents discover operations through a stable semantic contract instead of fragile CSS selectors.
Interactive Comparison
Use the tabs below to explore the difference between scraping and the AXAG semantic approach. Toggle between views, check off migration steps, and explore the full pipeline.
# ❌ Fragile scraping approach
from bs4 import BeautifulSoup
import requests
def search_products(query):
resp = requests.get(
f"https://store.example.com/search?q={query}"
)
soup = BeautifulSoup(resp.text, 'html.parser')
products = []
for card in soup.select('.product-card'):
# Breaks if class name changes
name = card.select_one('.product-name').text
# "$19.99" — needs locale-aware parsing
price_text = card.select_one('.price').text
price = float(
price_text.replace('$','').replace(',','')
)
products.append({
'name': name, 'price': price
})
return products<button
axag-intent="product.search"
axag-entity="product"
axag-action-type="read"
axag-required-parameters='["query"]'
axag-optional-parameters='[
"category","price_min","price_max",
"sort_by","page"
]'
axag-risk-level="none"
axag-idempotent="true"
axag-description="Search the product catalog"
>Search</button>
<!-- Agent calls the generated MCP tool: -->
<!-- mcp.call("product_search", {query: "..."}) -->
<!-- → Structured JSON result, no parsing -->📊 Impact Metrics
⚖️ Feature Comparison
✅ Migration Checklist0/9
- ☐Identify all scraping targets (pages, selectors)
- ☐Map each scraping target to an AXAG intent
- ☐Add axag-* annotations to source HTML
- ☐Generate semantic manifest
- ☐Generate MCP tools from manifest
- ☐Replace scraping code with tool calls
- ☐Remove BeautifulSoup / Selenium / Puppeteer dependencies
- ☐Add CI validation (npx axag-validate)
- ☐Run integration tests against generated tools
Step-by-Step Walkthrough
Step 1: The scraping approach (what we're replacing)
A typical scraping agent for a product search page:
# ❌ Fragile scraping approach
from bs4 import BeautifulSoup
import requests
def search_products(query):
response = requests.get(f"https://store.example.com/search?q={query}")
soup = BeautifulSoup(response.text, 'html.parser')
products = []
for card in soup.select('.product-card'): # Breaks if class changes
name = card.select_one('.product-name').text # Breaks if structure changes
price_text = card.select_one('.price').text # "$19.99" — needs parsing
price = float(price_text.replace('$', '').replace(',', ''))
products.append({'name': name, 'price': price})
return products
Problems with scraping:
- CSS selectors break with every redesign
- Price parsing varies by locale
- Pagination requires DOM interaction
- No type safety or parameter validation
- No safety metadata (risk level, idempotency)
Step 2: Annotate the search button
Add AXAG attributes to your existing HTML element — annotations are purely additive:
<button
axag-intent="product.search"
axag-entity="product"
axag-action-type="read"
axag-required-parameters='["query"]'
axag-optional-parameters='["category","price_min","price_max","sort_by","page"]'
axag-risk-level="none"
axag-idempotent="true"
axag-description="Search the product catalog"
>Search</button>
Step 3: Generate the Semantic Manifest
npx axag-generate-manifest --input src/ --output axag-manifest.json
Step 4: Generate the MCP Tool
npx axag-generate-tools --manifest axag-manifest.json --output tools/
Step 5: Agent uses the generated tool
# ✅ Semantic approach — no scraping needed
def search_products(query, category=None, price_max=None):
result = mcp_client.call_tool("product_search", {
"query": query,
"category": category,
"price_max": price_max,
})
return result # Structured JSON, no parsing needed
Key Takeaway
Scraping is coupling to presentation. AXAG is coupling to intent.
When a site's CSS classes change, scraping agents break. When a site's purpose stays the same, AXAG agents keep working — because they're bound to semantic operations, not DOM structure.