Dr Ivan Poupyrev: The Era of Physical AI

Today’s AI is trained on the internet, and only understands the digital. When AI understands the physical, everything changes.

Dr. Ivan Poupyrev is an award-winning designer, scientist, and entrepreneur. Currently, he is the CEO and founder of Archetype AI, a deep tech Silicon Valley startup that seeks to build a foundational AI model that can perceive, understand, and reason about the physical world, an emerging direction known as “Physical AI.”


These are live-blogged notes from a session at NEXT24 in Hamburg. As such, they are prone to error, inaccuracy and lamentable crimes against grammar and syntax. Posts will be improved in the coming days and this note updated.


AI is the most exciting thing that has happened in technology in a few years. People are worried about disruption, people would rather not be the dinosaurs of the 21st century. People talk about AI like it’s a super-smart alien, coming from space to destroy us.

Objectively, though, what we have now is just a chatbot. It’s a nerdy friend we can call to help with a crossword, or homework. It’s not AGI. And it only really understands what’s online. That’s what it’s trained on, so that’s what it knows.

But our big problems of today are physical ones. The world is more complex and getting even more so every decade. We don’t know exactly what is happening in the real world — or where. And that’s the difficulty that AI could help us solve.

So, to build an AI which understands the physical world, we need to use sensor data. Lots and lots of sensor data. But that creates some serious difficulties.

1. The physical world generates huge amounts of data

A car with sensors generates 4Tb of data per day. And it’s highly variable data depending on what’s around the car. Right now, all those car sensors generate dark data — we can’t access it, and it’s only useful in the moment. What could we learn from that data?

2. The data is multimodal

We need to fuse that disparate sensor data — it has to be multimodal data. The human brain is very good at that, but how do we do it with AI?

3. The data is difficult for humans to interpret

For example, at Google, they created Soli, a “radar” that creates magnetic fields, and captures the changes as items move through it. They expected data that was visually comprehensible. That wasn’t what they got.

4. Information changes with location

You have to understand the context of the data sensors are getting.

Today’s physical AI models are limited

All of that explains why people have been building siloed ecosystems of very specific AI models to deal with elements of physical data. But it doesn’t scale, and it’s very expensive — and there’s none of the multimodality that you need to make it really useful.

For example, the “wave in front of the phone” interaction used on Pixel phones took huge amounts of training — and then it had to be done all over again for the Nest. These bespoke, single-task models are expensive, and don’t transfer well.

That’s why five of them left Google to found Archetype AI. They wanted to build a single model for the physical world. It should be an AI that can notice things in the physical world that we miss. They’re building a horizontal model that they can use for a variety of cases, and it’s called Newton. A ChatGPT for the physical world, if you like.

Building an AI model on sensor data

Newton should be a single repository for all sensor data, and can then generate any representation you need from it. Why? To make spaces smarter than before. People want spaces to be smart to respond to their intent. But can you do it without cameras?

He showed us an example of the model describing somebody walking into a shot, staying still for a while, and then walking away. And then of the model describing someone brewing tea.

How could that be useful? Imagine the AI monitoring construction sites, and issuing instant warnings when it spots danger. Or you could use it to track packages and spot when someone mishandles them in some way. You don’t need specific programming for this — you just ask the model. Use it on data from a drone, and it can tell you how many people are on a bridge, for example.

Visual output of Physical AI

However, it’s capable of more than text: it’s a generative model. So, it could generate overlays on a car’s screens, like showing crowded areas on your route for example, so you can avoid them. Amazon is interested in that, for obvious reasons…

There’s a big difference between generative AI and what we’re trying to build with Physical AI. Today’s AI is a black box disconnected from the real world. Physical AI is a lens through which we can see the real world.

“Everybody’s talking about agents these days. I don’t like the idea. I see AI as a lens.”

It’s not an asteroid that’s going to hit the earth, it’s an alien invading. It’s a powerful lens — like the one the original Newton used — that changes how we see the world.