Inside $AIDA’s AI Infrastructure

The Spaces features Terry outlining the architecture and hardware strategy behind his AI platform. He frames the system in four layers—hardware, software, data, and people—then dives deep on hardware. Drawing on three decades in AI and a long partnership with NVIDIA (from K40/K80 through Pascal and into current DGX systems), Terry explains their training stack: a SuperPOD built from DGX nodes with Blackwell-class GPUs, enabling rapid training of discrete, domain‑specific models (100–200B parameters) in about a week, rather than trillion‑parameter general models. Inference is decoupled to a dedicated farm (DFW11 data center) with access to 10,000 GPUs, supporting agent interaction, RAG, and vector stores, and targeted for full availability by September. The company procures HPC via HPE (Unleash AI and PCAI programs) and will host HPC in Virginia. Sustainability is central: 100% renewable power at DFW11, continuous telemetry (PDU power, FLOPs/W) down to transaction level, and material science for heat absorption. These metrics feed a cost‑per‑token model integrated with tokenomics (Ida) and an agent marketplace. Terry emphasizes user ownership of models and data, critiques monolithic approaches, and commits to transparency, with future sessions on the software, data, and people layers.

AI Platform Architecture Series — Hardware Deep Dive and Infrastructure Strategy

Participants and roles

  • Terry (speaker; platform lead/executive; real name shared by host)
  • Host (speaker; name not provided)

Key takeaways and highlights

  • The team frames AI architecture in four layers: hardware (compute muscle), software (intelligence/orchestration), data (knowledge/trust), and people (purpose/human layer). This session focused on hardware.
  • Strategy prioritizes domain-specific, privacy-preserving models (100–200B parameters) that can be trained rapidly (≈1 week) and retrained quickly, rather than pursuing massive general models (~2T parameters).
  • Training runs on high-performance compute (HPC) infrastructure; inference is offloaded to a separate “inference farm” designed for low latency and scale.
  • Partnerships: Nvidia (Inception program, long-standing collaboration), Hewlett Packard Enterprise (HPE Unleash AI and PCAI programs, hardware procurement), and JDSS (data center partner for inference capacity at DFW 11).
  • Sustainability is a core design constraint: 100% renewable power at the DFW 11 facility, continuous telemetry-driven power/thermal optimization, and materials research to improve heat absorption and efficiency.
  • Operational economics will be measured via a FLOP-per-watt-per-token framework, feeding directly into pricing/settlement in the agent marketplace using the Ida token. Clarified that “token” here refers to LLM tokens (≈1,000 tokens ≈ 750 words) when measuring compute usage, distinct from the crypto token.
  • Transparency pledge: regular updates on partnerships, deployment timelines, and both successes and setbacks; weekly Spaces to deep-dive each architectural layer.

Architecture framing: four-layer model

  • Hardware (compute foundation): Physical infrastructure required to train and run models; the “muscle” powering mathematical computation.
  • Software (intelligence/orchestration): Training, agentization, workflows, and turning models into operational agents.
  • Data (knowledge/trust): Curated, secure inputs; memory and sensory inputs for models; basis for trustworthy outputs and domain alignment.
  • People (purpose/human layer): Ensuring model, data, and infrastructure choices deliver value to users and respect ownership and privacy.

Hardware strategy (focus of this session)

  • Philosophy and differentiation

    • Differentiates from large general-model providers (e.g., OpenAI, Anthropic, Gemini) by focusing on discrete, domain-specific models tailored to high-stakes, private data environments (e.g., healthcare, government) rather than monolithic general-purpose LLMs.
    • Domain-specific approach yields faster training cycles (≈1 week to train; ≈1 week to retrain as needed), lower cost, and tighter control over privacy and provenance.
    • Ownership stance: models and data belong to the user; infrastructure and deployment are designed to preserve that ownership and privacy.
  • Training infrastructure (HPC)

    • Core partner: Nvidia; long-standing collaboration and membership in Nvidia’s Inception program.
    • Current training setup described as a “SuperPod” composed of quad DGX H2-hundreds with “B3” (Blackwell) chips, each providing 8 GPUs per box and ~96 GB RAM per GPU, with very high bandwidth for data movement.
    • Procurement and data center integration via HPE (premier partner in Unleash AI and PCAI programs), providing a scalable path to add capacity.
    • Physical deployment: HPC will be hosted in Virginia.
    • Multi-tenant and virtualized: platform partitions and virtualizes HPC resources to train many models concurrently (including customer-built models) rather than serializing on a single large job.
  • Inference infrastructure (inference farm)

    • Purpose-built for inference: separates inference from training to keep training capacity free and deliver low-latency serving.
    • Hosts agent interactions, retrieval-augmented generation (RAG), vector stores, and incremental knowledge updates to distilled models.
    • Capacity and partners: relationship with JDSS at the DFW 11 data center (≈300,000 sq ft), with access to 10,000 GPUs for inference.
    • Power source: DFW 11 facility operates on 100% renewable energy.
    • Availability: inference farm is available now; full availability and scaling track to a September date (team will provide ongoing timeline updates).
  • Model lifecycle and flow

    • Train domain-specific models on HPC against a defined loss function using private, compliant datasets.
    • Distill trained models for efficient serving and deploy to the inference farm.
    • Agents interact with distilled models for production workloads; training clusters remain focused on new training/retraining tasks.

Sustainability, telemetry, and operational efficiency

  • Real-time telemetry

    • Continuous monitoring across hardware and OS layers: collects sensor data, PDU power draw, and compute metrics (e.g., floating-point operations) per request/transaction.
    • Scheduler places workloads on appropriate hardware tiers to maximize performance-per-watt and minimize waste.
  • Materials and thermal management

    • Working with materials science to line machines with heat-absorbing materials, improving thermal efficiency and reducing cooling overhead.
  • Critique of industry status quo

    • Concern that hyperscale AI deployments invest tens of billions of dollars but may be insufficiently attentive to sustainability, while delivering single general models rather than enabling user-owned models.
    • Belief that incumbent infrastructure choices may hinder a shift toward user-owned, domain-specific models.

Economics and Ida token alignment

  • Measurement framework

    • “FLOP-per-watt-per-token” calculation connects compute intensity and power usage to usage-based economics.
    • Clarifies that “token” in this metric is an LLM token (≈1,000 tokens ≈ 750 words), not the crypto token.
  • Marketplace and pricing

    • In the agent marketplace, users can pay with the Ida token to use agents; the platform measures compute cost per request via the FLOP-per-watt-per-token metric to inform pricing and settlement.
  • Token-first vs build-first

    • Emphasizes a build-first approach: the team has been building the platform for years (≈10 years overall, ≈8 years concentrated), rather than launching a token to fund development retroactively.
    • Acknowledges community-driven momentum in the token’s traction while reiterating the platform and infrastructure-first roadmap.

Example domain and privacy posture

  • Government/healthcare example: working with U.S. Health and Human Services, Centers for Medicare & Medicaid Services (CMS), to assess medical claims and detect fraud/waste/abuse.
  • Data is private and sensitive; the platform’s domain-specific, user-owned model stance is engineered to respect privacy while delivering targeted performance.

Roadmap and next sessions

  • This session: hardware deep dive and infrastructure strategy.
  • Upcoming deep dives:
    • Software/intelligence layer: converting computational capability into operational action, agent orchestration, training pipelines.
    • Data/knowledge layer: how decisions are informed, secured, and verified; ensuring trustworthy data for training and inference.
    • People/human layer: adoption, usability, and sustaining the ecosystem around user-owned AI.
  • Transparency commitment: public updates on partnerships, deployments, timelines, and any delays; ongoing “raw” video updates; weekly Spaces.

Host wrap-up

  • Host thanked Terry and the audience; noted this was the infrastructure session and invited listeners to next week’s Space for the next layer in the series.