Onchain Data at Scale

The Spaces brought together Valeria (Walrus) with Matt (Allium), Sheev (Tatum), and Albert (BlockPi) to unpack what on-chain data infrastructure looks like in practice and why decentralized storage matters for institutions, developers, and AI agents. Allium positions itself as the institutional “system of record” for on-chain finance, emphasizing verifiable delivery and fine-grained, programmable access via Walrus and SEAL on Sui. BlockPi detailed its bare-metal, globally distributed RPC and snapshot services, explaining how Walrus eliminates the classic tradeoff between low-cost bare metal and high-cost cloud for archive snapshots while boosting resiliency. Tatum outlined its data and RPC tooling, public-good raw datasets (blocks/tx/logs) and historical prices on Walrus, and how this slashes historical backfilling from weeks/months to hours—critical for analytics and AI. The panel explored requirements for scaling agentic and AI workloads: semantic, high-quality data; verifiability; machine-speed economic interfaces; and reliable operator networks. They highlighted unexpected advantages from Walrus, including mature dev tooling and Sui-native blob lifecycle events as verifiable “data-ready” signals. Concluding, each shared roadmaps (stablecoin rails, RWAs, richer notifications, expanded snapshots) and advised builders to pilot non-critical workloads on Walrus now to gain cost, availability, and trust advantages.

Unchained Data Infrastructure at Scale: Walrus + Allium, BlockPI, Tatum

Session overview

  • Host: Valerie (Walrus). Purpose: explore real-world on-chain data infrastructure, why decentralized storage matters, and why teams chose Walrus as their data layer for AI builders and institutions.
  • Guests:
    • Matt (Allium) — building the system of record for on-chain finance; enterprise-grade, enriched blockchain data across 130+ chains.
    • Albert (BlockPI; referred to in the session as Block Pile/Blockai/“block pie”) — global bare-metal Web3 infra provider offering RPC, snapshots, indexing, and advanced APIs.
    • Steve (Tatum; transcription alternates Shift/Sheeve) — developer tooling for on-chain data and RPC across 120+ networks; real-time alerts, gRPC, and curated datasets.
  • Core theme: moving from “trusting” centralized data delivery to “verifying” via decentralized storage and programmable access, to meet the needs of institutional workflows and AI agents.

What each project does

  • Allium (Matt)

    • Mission: “system of record” for on-chain finance, analogous to Bloomberg for crypto-native markets where assets, payments, and settlement move on-chain.
    • Problem: raw blockchain data is not human-readable, fragmented across hundreds of chains, missing business context; inconsistent standards; compliance/reporting blockers.
    • Solution: ingest major chains; enrich with entity labels, metadata, asset classifications; deliver via APIs, data-warehouse integrations (e.g., Snowflake, BigQuery), and web app.
    • Market context: stablecoins becoming real payment rails; tokenized equities/RWAs moving from pilots to production; workflows need canonical, auditable on-chain sources of truth.
  • BlockPI (Albert)

    • Global infra: bare-metal nodes across 30+ locations (NA/EU/APAC) with proprietary load-balancing for predictable performance.
    • Offerings: advanced APIs (archive access, real-time event streams, price feeds), indexing pipelines, Li.Fi integration (cross-chain aggregation), and dedicated nodes (isolated infra, no rate limits/caps).
  • Tatum (Steve)

    • Developer tooling: data insights (wallet balances/tx/activity; contract/tx insights), real-time alerts with advanced filtering, and RPCs across 120+ networks (recently expanded Sui with gRPC).
    • Value prop: simplify Web3 development via straightforward data APIs, notifications, and reliable MEV-secured infra; serving payment networks, banks, governments, protocols, and enterprises.

Why they chose Walrus as the data layer

  • Allium (Matt)

    • Philosophy: meet customers where they build; replicate data to major warehouses; deliver near workloads.
    • Bets on decentralized storage as a real trend: builders want storage with the same trust properties as on-chain applications.
    • Walrus selection: standout team/tech; rapid ecosystem growth; long-term partner.
    • Mutual benefits: Allium brings production-grade datasets (130+ chains) to Walrus; Walrus gives Allium a first-class decentralized distribution channel for customers preferring that trust model.
  • BlockPI (Albert)

    • Integrity & availability are non-negotiable; decentralized storage provides cryptographic verifiability and no single point of failure — critical for node snapshots.
    • Platform strength: Walrus on Sui (true parallel execution, massive concurrency) translates to reliability/scalability under high throughput and real-world conditions.
    • Cost efficiency: many decentralized storage networks are too expensive for large-scale archives; Walrus fees are low enough to make bulk snapshot packaging/storing economically sustainable.
  • Tatum (Steve)

    • Strategic alignment: shared vision on a decentralized data marketplace for high-quality datasets (for AI labs and agents).
    • Operator reliability: Walrus’s storage operator network proved reliable versus prior experiences on other protocols.
    • Security: strong encryption/programmable access with Seal.
    • Use cases: public goods (snapshots, raw chain data), agent-accessible datasets for better responses, and traditional indexing/backfill acceleration.

Deep dives

Verifiable data delivery and why it matters (Allium)

  • Challenge: delivering terabytes of indexed, historical data with high quality/availability usually requires centralized stitching and trust.
  • Walrus impact: network-level guarantees on data integrity and availability; shift from trusting to verifying.
  • Institutional workflows (regulatory reporting, reconciliation, risk, audit) require assurance that the dataset used is exactly what was published — cryptographic guarantees can exceed centralized delivery assurances.
  • Agents: increasingly autonomous (trading, portfolio balancing, routing payments, reacting to market events). Without verifiable data, trust gaps become liabilities. Walrus enables agents to verify authenticity locally without calling back to centralized authorities.

Backfilling without slow RPC (Tatum)

  • Traditional approach: RPC-based backfill — even with gRPC improvements, still slow for full histories.
  • New approach: publish raw blockchain datasets (e.g., blocks, transactions, contract logs) as ready-to-ingest files (e.g., Parquet) on Walrus.
  • Outcome: reduces time-to-ingest for indexing pipelines from weeks/months to hours (bounded by bandwidth), enabling rapid onboarding of new chains. Datasets are continually updated and publicly accessible; verifiable against RPC if desired.

Snapshots and decentralized distribution at scale (BlockPI)

  • Prior options and tradeoffs:
    • Bare metal: low cost but operationally heavy — managing regional replicas.
    • Cloud (S3/GCP): convenient global distribution but brutal cost for archive snapshots.
  • Walrus eliminates the tradeoff: upload once; retrieve anywhere; no regional replica management; cost-efficient. Enabled turning snapshots into a public service so every developer gains region-agnostic access and enterprise-grade distribution without the enterprise price.

SLA and resilience (BlockPI)

  • SLA scope: applies to RPC endpoints/services; snapshot download is best-effort.
  • Recovery flow: if a node is unhealthy/destroyed, remove from grid, download snapshot, sync to latest height; as long as other nodes are healthy, SLA for endpoints is maintained. Cloud excels for hot data, but for static snapshots, Walrus adds antifragility without hurting SLA — effectively enhancing resilience.

Encryption and programmable access with Seal (Allium)

  • Business need: Allium’s value-add is proprietary enrichment/labeling over public chain data; public-by-default storage alone is insufficient.
  • Seal: on-chain, programmable access control (policies on Sui) enabling granular, enforceable permissions mirroring centralized patterns (tiers, dataset permissioning, RBAC, time-bounded trials) but enforced cryptographically on-chain.
  • Benefits: protects IP; gives customers stronger trust guarantees (license adherence via cryptography, not just promises); simplifies internal ops (no bespoke KMS/portals).
  • Agent economy: Seal enables scoped, revocable, time-bounded access for agents — superior to long-lived API keys.

AI and agent infrastructure: what’s needed to scale

  • Matt (Allium)

    • Clean semantic structure: entities, labels, classifications, relationships so models learn economic activity, not just raw logs.
    • Verifiability/provenance: provable training/serving datasets.
    • Machine-speed economics: agents must discover, pay (e.g., via stablecoins on-chain), and retrieve datasets inline. Mentions emerging standards (e.g., Coinbase’s X42, Tempo’s machine payments protocol) enabling agent-native, pay-per-dataset models that align incentives and avoid enterprise-sales bottlenecks.
  • Steve (Tatum)

    • Reliability of storage operators and data providers is paramount.
    • Dataset choice depends on use case: latency-sensitive agents benefit from raw data training; user-facing assistants may rely on curated datasets.
  • Albert (BlockPI)

    • Data poisoning risk: verifiable ≠ truthful. A decentralized layer can faithfully store a fake snapshot. Missing piece: economic/reputation mechanisms that incentivize honest datasets and penalize bad actors (staking, slashing, reputations tied to contribution history and user payments/system incentives).

Datasets currently distributed on Walrus

  • Tatum (Steve)

    • Chains: Ethereum, Bitcoin, BSC, Dogecoin, Litecoin (dataset suite expanding).
    • Contents: full raw on-chain data (blocks, transactions, contract logs where applicable) plus historical prices for 500+ top tokens.
    • Use cases: AI training (e.g., transaction classification), agent reasoning over wallets/customers, forensic/government investigations, and traditional analytics; all published as public goods with no gating.
  • Allium (Matt)

    • Indexed/enriched historical datasets spanning 130+ chains, delivered verifiably at scale via Walrus for institutional and agent use.
  • BlockPI (Albert)

    • Public snapshot service on Walrus; expanding chain coverage and snapshot variants (archive, full, pruned) to match different developer needs.

Unexpected Walrus capabilities that proved critical

  • Steve (Tatum)

    • Seal for encryption/access control.
    • Excellent developer tooling and support; straightforward integration accelerates global adoption.
  • Albert (BlockPI)

    • Smooth developer experience: clean SDKs; complexity (e.g., erasure coding, node distribution) abstracted away; high-quality docs/tutorials. MVP reached in days.
  • Matt (Allium)

    • Sui-native event model: every blob lifecycle event emits an on-chain event, enabling downstream systems to subscribe/verify independently. Provides a verifiable “data-ready” signal, removing webhook/polling layers and fitting agent-native workflows.

What’s next and advice for builders

  • Allium (Matt)

    • Focus areas:
      • Stablecoin payments: system-of-record for commerce/treasury operations using stablecoins at scale.
      • RWAs: support tokenized treasuries/equities/funds as they move to production; provide canonical, auditable on-chain truth.
      • Raise the bar on institutional-grade data: deeper coverage, richer enrichment, more semantic structure.
    • Vision: as more of global finance moves on-chain, be the trusted data layer underneath.
  • Tatum (Steve)

    • Data: adding enhanced datasets to Walrus that benefit the community.
    • Real-time notifications: more advanced filters (e.g., staking activities, prediction markets) to reduce developer-side processing.
    • Wallet-as-a-service exploration for enterprise-grade custody/payment/rails use cases.
    • Advice: start now. Walrus is reliable and cost-effective; current read economics are favorable. Great for small projects and production pipelines alike.
  • BlockPI (Albert)

    • Roadmap:
      • Expand snapshot coverage to more chains; offer multiple variants (archive/full/pruned) for different use cases.
      • Continuous RPC infra optimization: smarter load balancing, lower latency, higher throughput, lower cost.
      • Build value-added services via partnerships: advanced APIs, real-time indexing, data pipelines, custom query interfaces.
    • Advice: try an MVP or migrate non-critical workloads (snapshots, backups, middleware data) to Walrus; decentralized storage is easy to use. For production, plan adequate testing. Early adopters gain advantage as decentralized storage becomes the norm.

Key takeaways

  • Decentralized storage isn’t just a narrative — it solves concrete pain points: verifiable integrity/availability, lower cost at archive scale, and global distribution without complex replica ops.
  • Institutions and agents need verifiable, semantically rich data. Walrus plus curated/enriched datasets (Allium, Tatum) align with compliance, auditability, and autonomous decision-making.
  • Programmable access (Seal) is essential to protect proprietary enrichment and to enable scoped, revocable, time-bounded agent access — a prerequisite for the agent economy.
  • The Sui-native event model for data lifecycle is a quiet superpower: it enables verifiable, event-driven consumption patterns ideal for agents and high-scale integrations.
  • Reliability extends beyond storage: operator networks, continuous updates, and strong developer tooling are crucial for adoption and scale.
  • Quality assurance for decentralized AI datasets will likely require staking/slashing or reputation systems on top of verifiable storage to deter data poisoning.

Note: Speaker names and company spellings in the transcript had minor transcription inconsistencies (e.g., Valerie/Valeria; Steve/Shift; BlockPI/Block Pile/Blockai). The summary reflects the intended identities based on context.