The Data Flywheel: How ORBEXA Continuously Improves Commerce Data Quality

The Data Quality Bottleneck

Every conversation about AI agents in commerce eventually runs into the same wall: data quality. According to industry research, approximately 80% of the engineering work required to deploy AI agents in production is data engineering -- cleaning, normalizing, validating, and enriching the information that agents consume. The remaining 20% is the model itself.

This ratio surprises people. The popular narrative focuses on model capabilities: reasoning, planning, tool use. But in practice, a GPT-4-class model fed dirty data will hallucinate prices, recommend discontinued products, and confidently present wrong information. The model is not the bottleneck. The data is.

Poor data quality manifests in specific, measurable ways. A product listing with a missing currency code causes price comparison failures across regions. An image URL that returns a 404 makes an AI agent's recommendation look broken. An outdated inventory count leads to an agent-initiated purchase that gets cancelled, destroying user trust in a single interaction. A brand name spelled three different ways across a catalog -- "Apple", "APPLE", "apple inc." -- fragments the knowledge graph and reduces the accuracy of brand-level queries.

ORBEXA was built on the premise that structured commerce data is only as valuable as its accuracy, completeness, and freshness. The platform's data flywheel -- a continuous, self-improving pipeline -- is the core technical investment that makes everything else work.

The Six-Stage Pipeline

ORBEXA processes merchant data through a six-stage pipeline: Crawl, Extract, Validate, Score, Enrich, and Serve. Each stage has its own technical challenges, failure modes, and feedback mechanisms. The pipeline runs continuously, not as a batch job, though individual stages operate at different frequencies depending on the merchant's data velocity.

Stage 1: Crawl

Crawling is the process of fetching raw data from merchant sources. ORBEXA supports three primary ingestion paths: platform API integration (Shopify's Admin API, WooCommerce's REST API), direct DOM crawling for custom-built stores, and CSV/feed file imports for merchants who maintain product feeds.

The distributed crawling infrastructure manages adaptive frequency. A merchant with 50 SKUs that change weekly does not need the same crawl cadence as a fashion retailer with 10,000 SKUs and daily price updates. ORBEXA tracks change velocity per merchant and adjusts crawl intervals accordingly -- as frequently as every 15 minutes for high-velocity catalogs, and as infrequently as daily for stable ones.

For DOM-based crawling, the system uses headless browser rendering to handle JavaScript-heavy storefronts that render product data client-side. This is significantly more resource-intensive than API-based ingestion, but it is necessary for the long tail of custom-built stores that do not expose structured APIs.

API-based integration is always preferred when available. Shopify's Storefront API provides typed, paginated product data with webhooks for real-time change notification. WooCommerce's REST API offers similar capabilities. These paths are faster, more reliable, and produce cleaner initial data than DOM parsing.

Stage 2: Extract

Extraction transforms raw crawled data into a normalized internal representation. This is where the diversity of e-commerce platforms creates significant engineering complexity.

A Shopify store represents a product variant as a JSON object with specific fields: variant.price, variant.sku, variant.inventory_quantity. A WooCommerce store uses a different schema entirely. A custom-built store might embed variant data in HTML data attributes, or generate it dynamically via JavaScript.

ORBEXA uses a hybrid extraction approach. Rule-based extractors handle known platforms -- Shopify, WooCommerce, BigCommerce, Magento -- where the data schema is documented and predictable. Machine learning models handle unknown or custom storefronts, using trained classifiers to identify price elements, product titles, image galleries, and variant selectors from raw DOM structure.

The extraction layer outputs a canonical product object: title, description, images (as validated URLs), price (as integer cents with currency code), availability status, variant dimensions, SKU identifiers, brand name, and category path. Every field has a defined type, and every field can be null -- the system never fabricates data to fill gaps.

Stage 3: Validate

Validation catches errors that extraction cannot. It operates as a series of constraint checks against domain-specific rules.

Price validation rejects prices outside plausible ranges for a given category. A laptop priced at $0.99 or $999,999 triggers a flag. Currency consistency checks ensure that a merchant listing prices in USD on their storefront is not accidentally ingested with EUR values due to geolocation-based price switching.

Image URL validation performs a HEAD request against every image URL to confirm it returns a 200 status with an image content type. Broken images are flagged but not removed -- they may be temporarily unavailable rather than permanently gone. The system re-checks flagged images on subsequent crawls.

Category taxonomy mapping normalizes merchant-specific category names to a standard taxonomy. A merchant categorizing a product as "Men's > Casual Wear > Tees" gets mapped to a standardized hierarchy that aligns with Google's product taxonomy. This normalization is critical for cross-merchant search and comparison.

Completeness checks verify that required fields are present. A product without a title, without at least one image, or without a price is flagged as incomplete. These products are still served through the API -- agents may have context that fills the gaps -- but they receive lower quality scores.

Stage 4: Score

Every product in the system receives a composite quality score based on four dimensions, each rated 0-100.

Completeness (0-100) measures how many of the expected fields are populated. A product with title, description, three images, price, SKU, brand, category, and availability scores 100. A product with just title and price might score 35. The weighting is not uniform -- price and availability are weighted more heavily than description length because they have higher impact on agent decision-making.

Accuracy (0-100) captures confidence that the extracted data matches the source. For API-ingested data, accuracy starts at 95 (APIs can still have stale data). For DOM-extracted data, accuracy depends on extraction confidence scores from the ML models. A price extracted from a clearly labeled <span class="price"> element scores higher than one inferred from ambiguous DOM structure.

Freshness (0-100) decays over time since the last successful crawl. A product crawled within the last hour scores 100. At 24 hours, the score drops to around 70. At 7 days, it falls to 30. The decay curve is configurable per category -- perishable goods and flash-sale items decay faster than books or industrial equipment.

Consistency (0-100) measures alignment across data points. If a product's title says "Red Running Shoes" but the color attribute is "Blue," consistency drops. If the price changed by 50% since the last crawl with no accompanying sale indicator, consistency flags the discrepancy for review.

The composite score is a weighted average: Completeness (25%), Accuracy (30%), Freshness (25%), Consistency (20%). This score is exposed in the API response, allowing consuming agents to make trust-adjusted decisions.

Stage 5: Enrich

Enrichment fills gaps in merchant data using external reference sources without fabricating information.

GTIN/UPC resolution cross-references product titles and SKUs against manufacturer databases to fill missing barcode identifiers. A merchant who lists "Sony WH-1000XM5" without a GTIN can have it resolved to the correct barcode through a lookup against Sony's published product catalog.

Brand name normalization collapses variant spellings into canonical forms. "Apple", "APPLE", "apple inc.", and "Apple Inc." all resolve to a single canonical brand entity with a consistent identifier. This normalization uses a maintained dictionary of known brands supplemented by fuzzy matching for emerging or niche brands.

Category enrichment uses the product's attributes to suggest or refine category placement. A product with "Bluetooth" in its title, a weight under 500g, and a price between $20-$200 has strong signals for the "Electronics > Audio > Headphones" category, even if the merchant did not categorize it explicitly.

Description augmentation does not rewrite merchant descriptions but can supplement them with structured attribute extraction. If a product description mentions "battery life: 30 hours" in free text, the enrichment layer extracts this as a structured attribute (batteryLife: "30h") that agents can query directly.

Stage 6: Serve

The serving layer exposes enriched, scored product data through multiple protocol endpoints: MCP resources and tools, UCP REST APIs, and ACP JSON-RPC methods.

Serving is not a simple database read. ORBEXA employs a two-tier caching architecture. Redis acts as the L1 cache, holding frequently accessed product data in memory for sub-millisecond reads. PostgreSQL (hosted on Supabase) serves as the persistent store and L2 cache, handling the full catalog and historical data.

Cache invalidation is event-driven. When a crawl detects changes, affected cache entries are invalidated immediately rather than waiting for TTL expiration. This ensures that agents always receive the freshest available data, even for merchants with high-velocity catalogs.

The serving layer also handles protocol-specific formatting. The same underlying product data is transformed into Schema.org-compliant JSON-LD for UCP responses, MCP resource objects for Claude and Cursor integrations, and ACP-formatted payloads for OpenAI agent interactions.

Anomaly Detection

The pipeline includes dedicated anomaly detection that runs continuously alongside the main processing stages.

Price anomalies are flagged when a product's price drops by more than 80% between crawls without a corresponding sale or clearance indicator. This catches both data extraction errors and potential fraud (merchants listing bait prices to attract agent traffic). Similarly, sudden price increases above 300% trigger review.

Review count spikes detect artificial inflation. If a product's review count jumps from 10 to 500 between crawls, the anomaly detector flags this for verification. Legitimate review surges (product going viral) are distinguished from suspicious patterns by correlating with sales velocity data when available.

Inventory inconsistencies flag products that show "in stock" status but have zero inventory count, or vice versa. These contradictions often indicate stale data from one source that has not yet been reconciled with another.

Schema Drift Detection

E-commerce stores redesign frequently. When a merchant changes their site structure -- new template, new CSS classes, new DOM hierarchy -- the rule-based extractors for that store can break silently. Schema drift detection monitors extraction success rates per merchant over time.

A sudden drop in extraction completeness for a specific merchant triggers an automatic re-evaluation. The system falls back to ML-based extraction while flagging the merchant for extractor rule updates. In most cases, the ML models can handle moderate schema drift without manual intervention. Significant structural changes require updated extraction rules, which are deployed as configuration changes without code releases.

Feedback Loops

The pipeline is not one-directional. Two feedback mechanisms drive continuous improvement.

Consumer-reported issues. API consumers -- the AI agents and applications querying ORBEXA's endpoints -- can report data quality problems via a structured feedback endpoint. A report that a product's price is wrong triggers an immediate re-crawl and extraction validation. Aggregated reports across multiple consumers about the same product carry higher weight and can automatically adjust extraction confidence scores.

Extraction rule refinement. When the validation or anomaly detection stages repeatedly flag products from a specific merchant, the extraction rules for that merchant are automatically reviewed. Patterns in validation failures -- for example, prices consistently appearing 10x higher than expected -- often indicate a systematic extraction error (such as parsing a formatted price string incorrectly) that can be fixed across the entire merchant catalog at once.

The Technical Stack

ORBEXA's data pipeline runs on a stack chosen for reliability and operational simplicity.

PostgreSQL (Supabase) serves as the primary data store. Product data, merchant configurations, extraction rules, and quality scores all live in PostgreSQL. Supabase provides managed hosting with row-level security, real-time subscriptions for cache invalidation events, and Edge Functions for lightweight processing.

Redis provides the L1 caching layer. Hot product data is cached in Redis with TTLs calibrated to each product's freshness decay rate. The L2 PostgreSQL layer handles cache misses with single-digit millisecond latency for indexed queries.

Background job queues manage the asynchronous processing stages. Crawl jobs, extraction jobs, validation passes, and enrichment tasks all run as queued background jobs with retry logic, dead-letter handling, and rate limiting per merchant to avoid overloading source stores.

Business Impact

The flywheel effect is straightforward. Higher data quality produces higher agent confidence scores. Higher confidence means agents are more likely to recommend products from ORBEXA-connected merchants. More recommendations drive more merchant revenue. More revenue attracts more merchants to the platform. More merchants provide more data to refine the extraction and enrichment models.

This is not a theoretical loop. The quality score directly influences how AI agents rank and present products. An agent comparing two similar products -- one with a quality score of 92 and complete structured data, another with a score of 45 and missing fields -- will preferentially surface the higher-quality listing. Over time, this creates a measurable revenue differential between merchants with clean data and those without.

Data quality is not a feature. It is the product.