Open Trust Registry: Seven-Dimension Trust Scoring for 8,700+ Commerce Brands

The Trust Problem No One Is Talking About

Every second, AI agents are making purchasing decisions on behalf of consumers. OpenAI's Operator browses stores. Google's Project Mariner compares products. Perplexity's Buy with Pro completes checkouts. These agents process thousands of merchants per query cycle, but they all face the same fundamental problem: how do you determine, at machine speed, whether a merchant is trustworthy?

Human shoppers rely on intuition, brand familiarity, word-of-mouth, and scattered review signals. That model breaks down completely when the buyer is an algorithm. An AI agent cannot "feel" that a storefront looks shady. It cannot draw on years of personal experience with a brand. It needs structured, quantitative trust data delivered through a fast, reliable interface.

This is the problem the Open Trust Registry (OTR) was built to solve.

What the Open Trust Registry Is

The OTR is a continuously updated database that maintains trust profiles for over 8,700 commerce brands. Each brand is evaluated across seven dimensions, scored from 0 to 100, and the results are exposed through a machine-readable API that any AI agent can query in real time.

The OTR does not replace human judgment. It provides the structured data layer that AI agents need to make informed recommendations and purchasing decisions on behalf of consumers.

Seven Dimensions of Trust

The OTR's scoring system is built around seven distinct trust dimensions. The first four can be evaluated from publicly available data (cold-start phase), while the remaining three require merchant authorization to access operational data.

1. Identity (Business Verification & Corporate History)

The Identity dimension verifies who the entity behind a domain actually is. The scoring algorithm evaluates SEC filings and stock exchange listings (NYSE/NASDAQ listings earn the highest identity scores), Wikidata entity matching, corporate registry verification, domain age and WHOIS data consistency, parent company relationships, headquarters data, and Tranco traffic ranking. A brand listed on NYSE with a Wikidata entry, founded over ten years ago, with a confirmed parent company and a top-1000 Tranco ranking can score up to 90 points on this dimension. A newly registered domain with no public business records scores near zero. Identity carries the highest weight (35%) in cold-start scoring.

2. Technical (Security Infrastructure)

The Technical dimension measures the security posture of the brand's web infrastructure through automated probing: SSL certificate type (EV certificates score highest at +25, DV/OV at +15), DMARC email authentication policy, SPF and DKIM records, HSTS enforcement, CAA records, security.txt presence, and MTA-STS deployment. A brand with EV SSL, DMARC reject policy, full email authentication, and HSTS can achieve a near-perfect technical score. This dimension carries 25% weight in cold-start scoring.

3. Compliance (Regulatory Adherence)

Compliance evaluates adherence to GDPR, CCPA, PCI DSS, SOC2, and industry-specific regulations. In the cold-start phase, compliance evidence comes from AI-assisted analysis of the brand's public pages. Industries with high regulatory obligations (banking, insurance, pharmaceuticals, healthcare, energy) receive compliance floor scores reflecting their legal compliance requirements. After merchant authorization, direct compliance auditing becomes possible.

4. PolicyScore (Policy Completeness)

PolicyScore is a web-scanning dimension that checks whether the brand's website contains the essential consumer protection pages: privacy policy (with GDPR and CCPA provisions), refund/return policy (including return window duration), terms of service, and cookie consent mechanisms. This is not a checkbox exercise — the OTR's scanning engine actually fetches and analyzes these pages, checking for substantive content rather than mere existence. PolicyScore carries 20% weight.

5. WebPresence (Site Professionalism)

WebPresence evaluates the technical quality and AI-readiness of the brand's website: robots.txt presence and configuration, sitemap.xml availability, Schema.org JSON-LD structured data, Organization schema completeness, multi-language support (hreflang tags), mobile viewport configuration, favicon presence, and whether the page contains real content versus being an empty shell. This dimension directly measures how well-prepared a site is for AI agent interaction. WebPresence carries 20% weight.

6. DataQuality (Product Data Completeness) — Requires Merchant Authorization

Once a merchant connects their store to ORBEXA through the Shopify, WooCommerce, or Universal SDK integration, the DataQuality dimension activates. It evaluates the completeness and accuracy of the merchant's product catalog: whether products have structured data, pricing information, inventory sync, and rich media. In the authorized scoring model, DataQuality carries 20% weight.

7. Fulfillment (Operational Reliability) — Requires Merchant Authorization

The Fulfillment dimension measures a merchant's ability to deliver on its promises: shipping policy clarity, return policy enforcement, average delivery times, return window adherence, and order tracking capabilities. Like DataQuality, this dimension requires merchant API access and carries 20% weight in the authorized scoring model.

How Scoring Works: Cold-Start and Authorized Phases

The OTR operates a two-phase scoring model designed for progressive trust assessment.

Cold-Start Phase uses four publicly verifiable dimensions with the following weights: Identity (35%), Technical (25%), PolicyScore (20%), WebPresence (20%). Compliance, DataQuality, and Fulfillment carry zero weight because they require either AI-intensive scanning or merchant API access. This phase allows the OTR to score any commerce brand on the internet without requiring the brand's cooperation.

Authorized Phase activates when a merchant connects their store through ORBEXA's platform integrations. All seven dimensions carry weight: Identity (20%), Technical (10%), Compliance (15%), PolicyScore (10%), WebPresence (5%), DataQuality (20%), Fulfillment (20%). The shift is significant — direct operational data (DataQuality + Fulfillment = 40%) becomes the dominant signal, while publicly-derived signals reduce in relative importance.

Scores are combined into a weighted composite that maps to badge tiers: Platinum (90+), Gold (80+), Silver (70+), Bronze (60+), and Unrated (below 60). Automated scoring is capped at 94 — scores of 95 and above are reserved for human review, preventing any algorithmic process from assigning maximum trust without human verification.

Brand Fast-Track provides score bonuses for publicly listed companies and high-traffic brands (Tranco Top 1,000), recognizing that these entities carry inherent accountability through regulatory oversight and public scrutiny.

AI agents receive the full dimension breakdown alongside the composite score, enabling them to apply context-specific weighting. An agent evaluating a new merchant for a consumer weights Identity and Technical higher. An agent comparing fulfillment options weights the Fulfillment dimension higher.

Technical Architecture

The OTR is built on a multi-stage pipeline infrastructure designed for both comprehensive data collection and fast AI agent queries:

PostgreSQL on Supabase serves as the primary data store for the otr_registry table with full trust profiles, scoring evidence, and historical data
In-memory caching (30-minute TTL for domain lookups, 15-minute TTL for aggregate stats) ensures fast repeated queries
Cloudflare CDN caching (6-hour TTL) reduces origin load for the public API endpoints
RESTful API (/api/otr/registry for paginated listings, /api/otr/verify/:domain for individual lookups, /api/otr/stats for aggregates)
Machine-readable endpoints (/.well-known/otr/registry.json and /.well-known/otr/verify) follow emerging standards for agent-discoverable trust data
/llms.txt — an AI agent guidance file that helps LLMs understand how to query the OTR

The pipeline itself runs through six stages: Discovery (identifying commerce domains via e-commerce fingerprint detection across Shopify, WooCommerce, Magento, BigCommerce, Salesforce Commerce, and SAP Commerce platforms), Audit (automated technical and policy scanning), Enrichment (Wikidata entity matching, Finnhub financial data, corporate registry lookups — powered by Claude AI for entity matching), Backfill (filling data gaps from authoritative sources under strict "never fabricate" rules), Scoring (seven-dimension computation), and Approval (human review for edge cases).

The system enforces data integrity through what the codebase calls "backfill iron rules": AI is allowed to match data but never generate it; low-confidence matches below 90% are never auto-written; all changes are audit-logged; and existing non-null values are never overwritten.

The Data Flywheel

Trust scores are not static snapshots. The OTR operates on a continuous update model where new data triggers re-evaluation. A brand that receives a surge of customer complaints will see its operational score adjusted within hours, not months. A company that resolves a compliance issue sees the change reflected in its next scoring cycle.

This creates a data flywheel: as more AI agents query the OTR and more transaction outcomes feed back into the system, the scoring models improve. Patterns emerge. Correlations between specific trust indicators and actual consumer outcomes sharpen the predictive accuracy of each dimension.

Why Openness Matters

The "Open" in Open Trust Registry is a deliberate architectural decision. The OTR publishes its scoring methodology, allows brands to view and contest their scores, and provides full dimension breakdowns so that both humans and AI agents can understand why a brand received a particular score.

This transparency serves multiple purposes:

Brands can identify specific areas for improvement rather than facing an opaque rating
AI agents can explain their recommendations to consumers with concrete trust data
Consumers can verify that agent recommendations are based on substantive evaluation, not hidden biases
The ecosystem benefits from a trust layer that is auditable and accountable

What This Means for the Agentic Commerce Ecosystem

As protocols like MCP (Anthropic), ACP (OpenAI/Stripe), and UCP (Google/Shopify) define how AI agents discover and transact with merchants, the OTR provides the trust evaluation layer that sits alongside these protocols. An agent connecting to a merchant through MCP can query the OTR to evaluate that merchant before recommending it. An agent processing a payment through ACP can check financial trust scores before completing the transaction.

Trust is not a feature of any single protocol. It is a cross-cutting concern that every agent needs regardless of which protocol it uses to interact with merchants. The OTR provides that layer as open, queryable infrastructure.

Current Coverage and Anti-Fraud

The OTR currently maintains trust profiles for over 8,700 brands across more than 40 industry categories, from technology and e-commerce to luxury goods, pharmaceuticals, and financial services. The badge distribution reflects the OTR's conservative scoring philosophy: approximately 15% of brands achieve Bronze or higher, with Gold and Platinum reserved for brands that demonstrate excellence across multiple dimensions. The majority of brands are in the Unrated tier, reflecting the reality that most commerce domains lack the public evidence needed for high trust scores without merchant authorization.

The discovery engine uses a sophisticated e-commerce fingerprint detection system that identifies commerce sites by platform signatures (Shopify, WooCommerce, Magento, BigCommerce, Salesforce Commerce, SAP Commerce), generic shopping signals (add-to-cart elements, Schema.org Product markup, payment-related meta tags), and actively excludes non-commerce domains (search engines, social media, streaming services, government sites, CDN infrastructure).

Anti-fraud detection runs parallel to scoring. Suspicious patterns trigger automated flags: domains with high-risk TLDs (.xyz, .top, .club, etc.), domains younger than two years with insufficient trust signals, domains outside the Tranco Top 100K without compensating identity evidence, and fraud scores exceeding configurable thresholds.

The roadmap includes expanding coverage to 25,000 brands by end of 2026, deepening merchant-authorized scoring through platform integrations, and adding industry-specific scoring models for verticals with unique trust requirements.