Perception and Action in Dynamic Environments

Towards Physically Predictive Models of Reality from Ego-Centric Data

Modelling Reality for Perception & Action in Dynamic Environments

A world-wide shared, dynamically updating, physically predictive model of reality… that is built and kept up-to-date with reality by our our mobile observing devices.. Is a key to enabling super-human ultra-low power robotic and AI/AR perception in our world. 

  • Why? Why Shared? Why Physically Predictive? Predict What if.. Building a model of reality that can predict sensor observations given actions provides a foundation for a [task free] representation on which AI and Robotics applications can be built with higher performance, lower power and trust guarantees to users.

  • How is Semantic Free possible? Currently something of  chicken-and-egg problem-how to represent what’s valuable in a dynamically updating model? 

  • Completeness, Consistency, Privacy,...,? What is the granularity of representation required? It is made easier to state if you have a specific task to solve. 

Predictive Models of Reality?

Key Research Areas

Devices

Location

Providing always-available-anywhere high accuracy and precision location.

Physical Limits on mobile sensing compute and communication.

Index

Represent and synchronise the predictive model to reality.

Devices that build a map of reality

  • Satalites

  • Aerial & Cars

  • Backpacks

    Represent and synchronise the predictive model to reality.

Devices that Build a Map of Reality

Satalites

Aerial & Cars

Backpacks

Overview

  • Project Aria

    Research devices to enable dynamic ego-centric data capture in diverse real-world settings.

  • Vision-Less Tracking

    IMU-only navigation with deep learning enabling robustness in highly dynamic environments.

  • Towards Predictive Models

    Neural 3D video Synthesis and STaR: Self-supervised tracking and reconstruction.

All day wearable computer

All day usage

  • <70g

  • 2.5Wh battery

  • 30 hrs standby

  • 1.5 hours of continuous recording time

Mobile 

  • Multi Core Snapdragon Processor

  • 128GB storage

  • 4GB RAM

Connectivity

  • WiFi 802.11.a/b/g/n/ac

  • Bluetooth 5.0

LiveMaps

What matters in tech? Stepping back from the noise, I work out what’s really happening, what matters, and what it might mean.

[…] if you were supplied with a computer… that worked all day long and was instantly responsive to every action you had. How much value could you derive from that?”

DOUG ENGELBART - 1968

Wearable Rendering Surface

Dream it

Build it

Grow it

“[...] if you were supplied with a computer… that worked all day long and was instantly responsive to every action that you had..”

…now AI is here

AI is here - but there is a clear gap in its capabilities today.

A physically predictive, live, and updated representation of the physical environment is built by the observation of the devices that use it.

ALL-DAY CONTEXT 

ALL-DAY WEARABLE

ALWAYS-ON MACHINE PERCEPTION

REASONING (PREDICTION)

CONTEXTUAL AI = CONTEXT + AI

PHYSICAL + DIGITAL REALITY

Connecting Spatial Computing to the Physical World

Goals

  1. Demonstrate the value of a fully indexed environment by showing how LiveMaps can be used to interface with the physical world.

  2. Demonstrate LiveMaps' technical state and the path that will help us practically achieve Goal #1 in a consumer product.

Representation in Distributed Cognitive Task

This project is focused on representational issues in distributed cognitive tasks that require the processing of information distributed across the internal mind and the external environment. It focuses on three problems.

Focusing on Three Problems

  1. The Distributed Representation of Information.

  2. The Interaction between Internal and External Representation.

  3. The Nature of External Representations

  4. Blurring the physical and the Digital.

  5. The future of computation in Extending and Augmenting Human Capabilities 

  6. Through the Sensing and Actuation of Objects and Environments.

Representation in Distributed Cognitive Tasks.

Everyday Cognition

But I’m not only interested in paper as a design surface. Before screens, paper was just one way that we solved design problems. So, what do we do with the physical world?

Can we Embedded Objects with Information?

Representation in Distributed Cognitive Tasks.

Everyday Cognition

I’m inspired by this paper that Don Norman put out in the early 90s about the towers of Hanoi. In it, he talks about how you can embed information within objects. They invented three games that embedded more and more of the rules or the cognition of the game into the physical world. 

The towers of hanoi game. A player must get all the disks onto the rightmost empty pole by only moving disks one at a time between the three poles.

Tower of Hanoi

Focusing on Three Problems

  1. In the towers of Hanoi, you have to move the doughnuts over on at a time, and you can never put a bigger doughnut on top of a smaller doughnut

  2. Regarding coffee cups, there’s a lot that tells you not to put a small coffee into a full cup because you’ll spill coffee everywhere. So, to me, this kind of frames up this challenge in a precise way about the value of putting information into the physical world.

Representation in Distributed Cognitive Tasks

Animation of an iterative algorithm-solving 6-disk problem.

The Brief: Unlocks a new category of ‘contextualised query’

Make any object ‘smart’ e.g. ‘What’s in my fridge?’

  • Change the meaning of a query according to spatial context e.g. ‘Turn On’

  • Mark a specific query using spatial context e.g. ‘Turn on the light above the cot’

  • Unlock new object interfaces e.g. when I leave the room, turn off the lights

The ultimate ‘If this, then that’

Demonstrate possible ways to set up this rule:

When you sit on the sofa with a book, the lamp turns on.

Dimensions

  • Technical skill requirements vs No technical skill requirements

  • Close physical proximity vs Far physical proximity

  • Immersive vs Non-Immersive (refers to embodiment)

  • Fine-grained control vs less fine-grained control

  • High involvement vs. low friction (level of involvement)

  • Assisted creation vs non-assisted creation

The following 3 pages simply depict the scenario,
When you sit on the sofa with a book, the lamp turns on.

The following pages explore a breadth
of ways, one could create this routine.

Standard Text Based Coding

Time Machine recognises a user pattern and suggests turning it into a routine via 

a smartwatch notification

Standard text based coding

Coding through a 3D Engine

Coding using an AR Dolls house

Coding by performing a routine and recording it

Time Machine recognises a user pattern and suggests turning it into a routine 

via spatial sound (voice).

Coding by Drawing Connections

Coding using physical blocks with AR Labelling

Coding by drawing Connections

Time Machine recognises a user pattern and suggests turning it into a routine 

via spatial sound (voice).