Robert and Irena with Nils about robot fights

The Spaces explores collaborative spatial computing and robotics with Nils, Robert, and Irina, focusing on the "real world web" that lets phones, glasses, and robots share a common understanding of physical spaces without centralizing data. Nils details a retail AR copilot that creates hyper-accurate shelf maps, generates sales heatmaps, and guides staff with route optimization—already producing a few million in revenue and a ~$150M ARR pilot pipeline. He contrasts a decentralized, venue-hosted visual positioning system with Niantic’s world-scale model, emphasizing privacy, efficiency, and a new node type that serves paths without exposing maps. The conversation covers SLAM advances, shared coordinate systems, and offloading compute from glasses to local nodes to overcome battery and latency limits, aiming for sub-4ms latency. Beyond navigation, they discuss robot orchestration across brands, China’s dominance in robotics, America’s software and cultural advantages, and the launch of VR-controlled robot fights in San Francisco (two Unitree G1s piloted by Justin Kan and Hydramil), which Nils is sponsoring.

Collaborative Spatial Computing, Decentralized VPS, and Robot Coordination

Participants

  • Nils: Founder/lead working on the Real World Web (collaborative spatial computing). Focus: decentralized visual positioning, shared SLAM, venue self-hosting, interoperability across devices and robots.
  • Robert: Host/interviewer, investor, and commentator on AI, robotics, and spatial computing ecosystems.
  • Irene/Irina: Co-host/interviewer (name referenced both ways in the conversation), probing comparisons and realism around cross-robot coordination and glasses adoption.

Executive Summary

  • Nils’ team is building the Real World Web: a decentralized, venue-owned visual positioning and spatial data layer that lets phones, AR glasses, and robots share a common understanding of physical spaces without centralizing sensitive map data.
  • Their leading product is a grocery retail AI copilot that generates hyper-accurate product location maps, derives performance heatmaps from sales data, and provides AR guidance and route optimization to reduce training burden and improve fulfillment speed. It is already making a few million annually with a pilot pipeline of over $150M ARR.
  • Technically, they address the core SLAM limitation (session-specific coordinate systems) by enabling persistent, shareable spatial maps and easy multi-device calibration. Accuracy is centimeter-level with RGB alone, approaching LiDAR quality.
  • Versus Niantic’s global VPS, Nils argues for a decentralized, venue-hosted model on privacy, ethics, and practicality: venues keep control of their data while still inter-operating with visiting robots/AI devices using a protocol layer.
  • New privacy-preserving node type: robots can receive only a path to a target (e.g., ketchup) to navigate inside a store, without ever downloading the store’s map.
  • For AR glasses, mainstream 6DoF use is constrained by battery and compute limits. Nils advocates offloading compute to hyper-local edge resources (e.g., a Mac mini at home or venue compute on-site) to achieve sub-4ms latency and multi-user shared SLAM.
  • The near-term path favors “AI glasses” (contextual, multi-agent experiences) over full AR/VR on-face compute; Mentra’s open-source, fully programmable glasses and OS are highlighted.
  • Robot orchestration requires cross-brand communication standards and shared spatial semantics. Market incentives, particularly in Asia/China, are strong.
  • Macro context: China leads in robot adoption and hardware iteration speed; the US retains strengths in software and risk-on culture. Building an American “Shenzhen” would likely require government action and political will.
  • Cultural/innovation highlight: VR-mediated robot fights in San Francisco (REK/Rec) with Unitree G1 robots piloted by Justin Kan and UFC fighter Hydra Mil; Nils is the champion sponsor.

The Real World Web: Vision and Architecture

  • Vision: An “upside-down internet for robots” where physical venues become browsable, searchable, and navigable to AI and robotics, analogous to how the web made digital content discoverable to humans.
  • Challenge with SLAM: Simultaneous Localization and Mapping creates a fresh, device-specific coordinate system per session, making shared AR and robot navigation across devices complex.
  • 2021 Innovation: Nils’ team built a simple calibration technique for shared coordinate systems across devices, enabling multi-device alignment and raising $20M in seed funding to expand techniques for persistent, shared spatial understanding.
  • Persistence and sharing: Standard phone AR performs SLAM but discards or isolates data. The Real World Web persists and shares spatial maps so robots/other devices can benefit from prior SLAM work in a venue.
  • Decentralized VPS: Each venue self-hosts its spatial data. A protocol allows devices/robots to discover and interoperate with the venue’s map without central authorities.
  • Privacy rationale: Visual positioning compares sensor views to a model of the space. Centralized VPS could “see through our eyes.” Decentralization keeps sensitive data with venues—no platform-level surveillance beyond what the venue already captures (e.g., CCTV).
  • Practical benefits: Localized maps are smaller, more efficient to store and query on phones/glasses; reduces reliance on cloud round-trips and heavy downloads.

Retail AI Copilot: Grocery Focus and Outcomes

  • Use case: Grocery retail workers capture “hyper-accurate where-it-really-is” product maps (not idealized planograms), then link to sales data to produce shelf heatmaps (high vs low performers).
  • Recommendations: Near-term AI will suggest shelf reshuffles to drive sales; today, heatmaps already inform merchandising decisions.
  • AR guidance for staff and shoppers: Staff benefit more due to high turnover. AR-guided route optimization accelerates picking for online orders and reduces training time.
  • Turnover economics: Over half of retail coworkers change jobs annually; Nils estimates Walmart spends ~$100M per year on first-day salaries, underscoring the value of guidance tools.
  • Traction: Launched earlier this year; a few million dollars in annual revenue; pilot pipeline >$150M ARR.
  • Hardware integration: Mentra’s open-source, fully programmable glasses (Meta Ray-Bans alternative) are being used to build a more capable copilot. They can auto-detect issues (e.g., empty shelves) and escalate via the AI that “sees through your eyes.”

Robots on the Real World Web: Interoperability and Privacy

  • Live demo (Hong Kong): Phones place markers, film a venue; the Real World Web constructs a navigable spatial map from video; a humanoid robot then navigates the space despite never having been there, benefiting from phone-collected SLAM.
  • Natural interaction: Point with a phone or glance with glasses to command robots (“clean this up,” “lift this”). The network is the interoperability layer that bridges device and robot understanding.
  • New node type (privacy-preserving navigation): Robot requests a path-only solution to a coordinate (“Where is the ketchup?”); venue serves the path but never the full map. Robot can be policy-managed (e.g., banned for deviating off-path).
  • Venue discovery and authentication: A robot arriving at Home Depot queries the network for the local server address and auth; connects to a hyper-local map on Home Depot’s compute; enters the venue’s coordinate system so store management AI and the robot can converse consistently.

Speed and Manipulation vs Navigation

  • Manipulation speed: Nils explicitly notes he is not a manipulation expert and avoids claims on tasks like high-speed screw driving.
  • Focus area: Navigation and spatial context—moving items from A to B (e.g., transporting sheet metal between workstations) and enabling robots to traverse complex spaces reliably.
  • Ecosystem view: Others are advancing manipulation with vision-language-action models; Nils’ team complements by providing robust spatial context.

Shared AR Experiences and Offloading Compute

  • Multi-user shared AR: The Real World Web enables sharing spatial data and coordinate systems across diverse devices (e.g., multiple Vision Pros) without a centralized middleman.
  • Compute offloading: 6DoF AR is battery- and compute-expensive; offload SLAM to local edge resources (e.g., a Mac mini at home) streaming camera and IMU to a SLAM server.
  • Capacity: A Mac mini can run up to four simultaneous SLAM sessions.
  • Latency: Sub-4ms is the target; cloud data centers generally can’t meet this for tight 6DoF loops. Hence, hyper-local compute (on your Wi‑Fi/home network) and on-prem venue compute.
  • Mobility: At home, glasses use home compute; in venues (e.g., Home Depot), glasses connect to venue compute for low-latency spatial services.

AR Glasses Landscape and Near-Term Path

  • Current devices: Meta’s latest consumer glasses are essentially HUDs (3DoF) rather than full spatial computers. Apple Vision Pro is highly capable but heavy; comfort decreases after ~1 hour.
  • Technical constraints: No imminent breakthroughs in on-face compute or battery sufficient to make full-day 6DoF AR glasses ubiquitous.
  • Strategy: Move compute off the face; rely on edge compute networks for shared AR.
  • Mentra OS: Designed for multi-agent AI use—simultaneous access to camera/microphone/context by specialized agents (translation, note-taking, reminders, lookup), enabling richer experiences (e.g., live translation plus note-taking).
  • Ergonomics: Historical lesson—people began wearing glasses all day when they weighed ~40g; new AR wearables should target similar comfort. Nils suggests current Meta Ray-Bans are too heavy for all-day use; Mentra plans to ship only glasses light enough for all-day wear.

Physical AI: From Agentic to Embodied

  • Macro trend: Jensen Huang’s framing—from generative AI to agentic AI, and now toward physical AI.
  • TAM expansion: Giving AI access to the physical world (70% of the economy is physical/venue-bound) roughly triples AI’s addressable market.
  • Co-pilots everywhere: Across manufacturing, logistics, retail, and beyond—AI co-pilots can enhance focus, motivation, and task guidance, even gamify workflows.

Robot Orchestration and Cross-Brand Coordination

  • Requirement: Robots from different vendors (e.g., Unitree, Tesla) must discover each other, communicate, and collaborate across apps and ecosystems.
  • Shared semantics: A common language and shared spatial context are essential for multi-robot coordination.
  • Market forces: Nils believes capitalist incentives—especially visible in Asia—will drive standardization. China’s scale accelerates robot deployment and coordination needs.
  • Data points: Over half of all robots sold last year went to China; more than half of all robots on Earth are already in China; the US is barely top 10 in adoption.
  • Mobility example: Beijing traffic—massive productivity loss weekly; coordinated autonomy (vehicles cooperating and sharing data) could reclaim significant GDP.

US vs China: Capabilities, Ecosystems, and Roles

  • China’s advantage: Dense ecosystems—factories at massive scale across cities; Shenzhen’s deep electronics markets enable same-day prototyping; faster iteration cycles than waiting for multi-day shipments.
  • Anecdote: Nils needed a custom global shutter camera with specific specs; in Shenzhen, a one-off was manufactured and delivered within days for ~$300—illustrating hardware iteration velocity.
  • US strengths: Software excellence, risk-on capital, cultural willingness to try weird, new things—“memes of production” in San Francisco.
  • Path forward: Building an American “Shenzhen” likely requires government-led infrastructure investment and long-horizon planning (venture capital alone is ill-suited). The limiting factor is political will.
  • Debate: Robert is skeptical the US can catch up due to markets, space, and governance (eminent domain, prolonged public process). Nils counters that US capability remains high if political will materializes; China is a potential ally, not necessarily an adversary.

VR Robot Fights: A New Sport Emerges

  • Format: VR-mediated robot fights (REK/Rec) in San Francisco—pilots wear VR headsets and control humanoid robots in real time.
  • Event details: Two Unitree G1 robots; pilots are Justin Kan (Twitch co-founder) vs Hydra Mil (UFC fighter). When a pilot throws a punch, the robot mirrors it.
  • Sponsorship: Nils is the champion sponsor; event at Temple, San Francisco; livestream available.
  • Scale-up vision: The team aims to take the format international and develop a new sport where both piloting skills and custom robot design can compete.
  • Info: rek.tv (and sponsor.rek.tv for sticker sponsorships). Round structure not specified.

Storms and Context

  • Nils is currently in San Francisco; his team in Hong Kong is dealing with an extreme monsoon (one of the top 20 storms in recorded history). Climate satellite imagery noted by Robert.

Open Questions and Next Steps

  • Decentralized VPS business model: How to sustainably align incentives while keeping data venue-owned and private.
  • Standards for robot coordination: Protocols and ontologies for cross-brand, cross-app communication.
  • Privacy governance: Venue policies, robot behavior constraints (e.g., path-only navigation), auditability, and compliance frameworks.
  • Edge compute deployment: Ensuring ubiquitous, low-latency access (<4ms) in homes and venues; resilient handoff across networks and locations.
  • Scaling retail pilots: Operational integration, measuring productivity gains, turnover cost reductions, and sales lift from AI-driven shelf optimization.
  • Glasses ergonomics and UX: Achieving sub-40g devices, multi-agent experiences, and seamless off-face compute orchestration for mainstream adoption.