#3: Extropic - Why Thermodynamic Computing is the Future of AI (PUBLIC DEBUT)
You can't escape thermal fluctuations. They just inevitably become
significant. So in some sense, like, extropic is a little
Why don't we just create physics-based computing systems that harness
the noise from environments? To us, from first principles of
mathematics, information theory, probability theory, physics, thermodynamics, this
is the future. Hopefully this podcast is like the beginning of a new revolution.
Alrighty, everybody. Welcome to an emergency episode of the First Principles
podcast. We're coming to you on a Sunday
night because we need to understand what the heck Extropic
is building. They've just launched their light paper. Not a white paper, but
a light paper. It is a great introduction to what they're doing and
I've tried my best to dive into it, but I'm actually sort of at this perfect you
know, in between points. Sometimes I know, like, basically everything about what a
company's building before they come on the podcast. In this case, I have lots of questions still.
I actually don't totally understand what the hell you all are building. So
I'm excited to learn. I'm excited for everybody to watch me learn. And I'm just going to throw as many dumb
questions as I can out there. So welcome to the show, guys.
Thanks so much, Christian. Excited to be here. Love your shows. Couldn't be
more excited to share more with the world today about what we've been building
sort of in secret. This is just the beginning of us getting people excited about this
thermodynamic paradigm of computing. And hopefully this podcast is like the
beginning of a new revolution. I'm sure some of your listeners probably
have seen one of my identities. flowing about
online during daytime. I am Guillaume Verdon. I'm
formerly a research scientist in quantum computing, co-founded the TensorFlow Quantum
Project with Trevor here. Back when we were in school,
used to be a theoretical physicist at Perimeter, ended up being a pioneer of
quantum machine learning, which is a field where You use
quantum computers to do a form of physics-based AI
to understand quantum mechanical matter around us, which
is, you know, that's my previous life. Now, essentially,
I'm founder and CEO of Xtropic, and started
this new physics-based computing paradigm and also happened to
run a little thing called EAC to some extent, as
much as one can run it or be very involved since the
beginning. And that's a techno-optimist movement. And
that's sort of the dual identities that many people are
familiar with. But I'll let Trevor
give more of a bio here. I think people are pretty familiar with my
Yeah, I'm definitely not as online as you. Basically,
I'm an engineer who got swept into Guillaume's field. No,
so I met Guillaume back when I was doing my mechanical engineering degree at
Waterloo, which is clearly the best engineering school in North America.
We love our Waterloo interns. I was doing a mechanical engineering
degree. I mostly did manufacturing kind of stuff before
I met him. I worked at a little company called Formlabs, did some stuff
with linear motors. Then I met Guillaume, and he was like, Trevor, do you want
to work on quantum machine learning? And I was like, I have no idea what that is, but
it sounds cool. So that proposal
turned into the Google product that Guillaume was talking about, and
then I got sucked into the quantum hardware physics
and engineering lab down in Santa Barbara. And I did a couple
of years there working on device engineering, modeling,
studying the effect of noise on quantum computers, calibration, control,
pulse sequences, that kind of thing. And after that,
went on to MIT for a bit and got a call from an old friend and
had to come help him out at Xtropic. So yeah, I'm happy to be here. It's been
Absolutely. You guys want to give us the 101, just the highest level. We're
going to dive super deep and feel free in this explanation to
use a bunch of words that people might not know, because then later we'll dive in and try
I mean, essentially, Trevor and I have had
this career trying to build ways to program quantum mechanical
computers, where you try to embed computational tasks into
quantum mechanical physics, right? Quantum is, we're going
to dive into the contrast between quantum and thermo in a sec, but Quantum
is really, you have things in superposition that the physics of the very, very
small and the very, very, very, very cold, ideally as cold as possible,
ideally zero temperature. And there you
get to program the physics of matter,
usually matter or light, and you learn
to embed sort of programs that are parameterized, just
like neural networks. Neural networks have parameters that you train with all
sorts of algorithms that usually use gradients. And that's
kind of where we came from. We brought differentiable programming thinking
to quantum computing. We were very early on on that. There was no software
doing it at the time. And that was our project. And
then in quantum computing hardware,
there you have, the reality is that
you can't cool down a quantum computer to zero temperature, right? And so there's
a mismatch between the program you want to run and the actual physics
of the hardware. The program you want to run runs at zero
temperature, theoretically, and the hardware has finite
temperature. But what does having finite temperature mean? It just means
that things are jiggling. Things are unpredictable. There's entropy. There's uncertainty that
gets injected in your system because your system interacts with the environment,
and we call that noise, right? And so fighting
noise has been the quest to scale quantum computing
so far, and it's been the bane of the
existence of many scientists. So Trevor's background was
sort of at the very lowest level, how you make the
quantum bits dance. Can you filter out noise? Can you deal
with noise? There at the lowest level, I
was more involved at the algorithms and architectures level where
In quantum computing, you try to do a process called quantum error
correction, where you detect errors, detect these
sort of injections of errors, and undo them, right?
And you've got to keep track of how they spread in your computer. The problem
is they're often, by trying to fix the problem, you make it worse. If
the thing trying to fix the problem adds more noise than was there before. And
so, your quantum bits have to be of sufficient
quality, they have to be low enough noise so that it's worth doing
this error correcting process. And this error correcting
process you can view as a form of refrigeration, right? Really,
you're pumping entropy out, you're using energy to pump entropy out
of the system. And so we saw sort of the
road ahead for quantum computing was very long, you
know, reaching the level where you have a very large scale computer where
you're below that threshold of noise where it's worth scaling
up. There's a long road ahead for that. And
we sort of lost patience there. And we're like, well, if you can't beat them, join them,
right? If you can't beat the noise, you should use
it, right? And so we were thinking, Well,
what if we could use the noise? These general AI algorithms, right?
The parent concept is probabilistic machine learning algorithms.
All these algorithms want to be probabilistic, right?
And so they want this sort of entropy and uncertainty present. And
it turns out that even when we run things on digital computers that
are nearly perfect, right? They're deterministic. we
end up sprinkling in noise at sort of a very abstract level in
our software later on, right? Not at the sort of analog
hardware level. And so it seemed like we do all this effort, just
like in quantum computing, we have all this effort to keep things pristine, right?
And in digital computing, You use a lot of power and
energy so that your system is hard to disrupt. The
noise of the environment is trivial compared to the amplitudes of the signals. And
so there, things are very, relatively
to the amplitude of the signals, not so noisy. But then at
the algorithmic level, you add the noise again. So we were like, why don't we just simplify
that and just create physics-based computing
systems that harness the noise from environments
sort of above the sort of temperatures and noise levels of quantum
computers, but noisier and
lower power than deterministic computers. So it's kind of
this in between, right? So we're trying to build a new paradigm of computing from
the middle out in terms of scales. Had to
That's kind of a top-down explanation. There's also a bottom-up
version that's pretty compelling. Yeah, go ahead. If you look at what it takes
to keep making computational devices smaller, what
you find, and it doesn't really matter what the device is, what you find is when
you make it sufficiently small, you can't escape thermal
fluctuations. they just inevitably become significant,
right? And so if we want to keep scaling computes
smaller and smaller, it's actually inevitable that you have to
go into this thermal or probabilistic regime, right?
And this is becoming, you know, if you look at the data for
like digital computer scaling, you can see that the
rate of exponential growth in efficiency of
computing technology is starting to slow down. And
that's because you're starting to hit some of these effects. There's a ton of reasons
why it's hard to make transistors small, but a lot of them come down to
the fact that these thermal fluctuations are starting to get really big. So
I expect within the next several
generations of transistor technology, you're
going to have to start looking at some of the things we are. So in some
sense, Extropic is a little bit inevitable, and
we're just trying to front run the danger
Yeah, that's really interesting, because you've hit on two different types
of computing. The top-down answer kind of came at it from the quantum angle, and
then this bottom-up one that you just answered, those are just normal
digital chips or whatever that we're talking about, just normal digital transistors. I
would love to take this conversation sort of piece by piece. Maybe let's start with
the first paradigm, which is just normal computing. talk about what
are those chips, like how are they, you know, they're getting down to the nanometer,
like single low digit, like two or three nanometer size
now. So let's talk about that. And then let's talk about quantum and then we can kind of use
that to bridge over into thermal. But on the
quantum, so on the classical note then, do you mind just telling us how these
algorithms and, you know, all this, you know, neural network stuff
is run today? Like, what are those chips? Like, what do they look like? How do they work? And
So to start at the very, very low level, right, what
is a transistor? Yeah, exactly. A transistor is
actually many things depending on what kind of voltages you
put into it. But in the regime that digital computers operate today, transistors
are switches, right? And you network these switches together
to do digital logic. And so The mathematical abstract
thing you're trying to do is Boolean logic, and the way
you embed that in physics is by driving transistors
really hard. And so that's how computers today work, right?
So you're taking these kind of inherently fuzzy
devices, right? They're made out of real matter, so they're fuzzy.
And you're applying very large signals to them so that they behave like
this mathematically abstract object of
Boolean logic that you want. Right? So that's kind of
how digital computers work. If you want to run, let's
say, a sampling algorithm on a digital computer,
which is what a lot of probabilistic algorithms come
down to, that's kind of like one of the main subroutines, because
the dynamics of your device are completely deterministic, right?
They're operating in this kind of high signal regime where the natural fluctuations
of nature aren't important. you have to generate pseudo-randomness,
which is instead of harvesting the noise of nature,
you run a circuit that has really complex and uncomputable dynamics,
right? And so you get kind of streams of bits that look random. And
that process takes a lot of entropy, right? Because a random stream
of bits is kind of like heat in the sense of
connection between thermodynamics and information, and you're using electricity
to produce that heat, right? So it's like you're running an electric heater on your
chip, literally, is the analogy to thermodynamics, right? And
then, okay, so now you have a random bit stream that's not computationally useful
unless you happen to want to do coin flips, right? So then what you have
to do is you have to take that random bit stream and essentially filter it to
get samples from the distribution that you're actually working with. right?
And that step of filtration also takes a ton of energy because
now it's like you're taking this bowl of heat you have and you're putting it inside of
a freezer to cool it back down a little bit, right? So the
process of sampling on a digital computer is thermodynamically
pretty similar to running an electric heater inside of a freezer to
achieve some kind of intermediate temperature. So it's a little bit ridiculous, right?
When you look at it from that perspective, it's like, this doesn't make sense, but it's
clear how we got here, right? Because digital computers are really nice and they're
very easy to scale. So it's convenient to do things this way,
but it's from first principles, it's not even close to the best ways.
Yeah. This approach to sampling is like so inefficient on
digital computers that people, unless you're like on a Wall Street
where things are super mission critical and you're willing to throw a ton of compute to
get the best quality sort of uncertainty for
your decision-making, unless you're on Wall Street, you end up trying
to avoid sampling entirely, right? Because it's
too costly on our deterministic devices. Again, as Trevor
mentioned, it's really unnatural for our deterministic devices to be probabilistic. And
so another way, instead of sampling, to represent probability distributions is
usually through deep learning. And what deep learning does is it usually
starts with very sort of trivial randomness, like
a Gaussian blob, a single Gaussian blob, and then it deforms that
blob to shape it into the shape of the data. So it's
a high-dimensional blob, and high-dimensional data could be like images,
text, whatnot. But it has to use many, many transformations
to take that very simple randomness and transform it into the
shape of the data. And very often, that sort
of fails to capture the tail events, the
tail distributions, a low data regime, right? When you're focused
on like covering everything with one blog, essentially, you're
just going to cover sort of the center of mass or like, of
probability mass, like the typical data, right? You'd be focused on
that. And you're going to need like, more and more and more dimensions and
more and more parameters in a deeper and deeper
transformation that's more and more complex. So you need more parameters, more
data, more compute in order to reach in that sort
of low data regime in those tail events that are very rare,
right? And so we've been seeing that with sort of self-driving cars.
In self-driving cars, we've just been throwing metric
tons of data at the self-driving problem to
reach a level that a human reaches with like 10 hours
of driving classes, right? And there's clearly way more than 10 hours
of data in all the data sets of all the players. And
so that's sort of fundamentally the reason that
current day deep learning is not
quite the end game. We think that this sort of probabilistic approach
where you can use very little data and you
can fill in the blanks with noise, with entropy and uncertainty, Right?
If you don't know something, you don't have data, you should fill it out with uncertainty.
But that process of sort of painting everything with
a noisy brush is very costly, because you got to sample,
you got to like, you got to explore those parts of landscape, you got
to kind of hallucinate everything that's not data, or,
you know, within your model, within the scope of your model, and sort of penalize the things
that are too far from data. And that sort of process of hallucinating
all these possibilities and making those corrections, for
the technical folks, it's called contrastive learning. That
process usually requires sampling, and that's very costly, so
people avoid it. So they stick to these sort of taking these Gaussians and
deforming them. That's how old school Old school neural
nets like variational autoencoders work. It's somewhat how
diffusion models work. Diffusion models kind of mix in the noise as
you go to some extent, simple noise. But
that's kind of the common pattern essentially. So both
from a sort of hardware standpoint, it's inevitable
that we're gonna have to go stochastic because matter is
jiggly and so your transistors are are technically jiggly,
and so will the electrons hopping across it. And so it's going to
get stochastic. And the algorithms want to go probabilistic to
be more data efficient. And so that's why we're building the whole
stack. And we think it's going to be disruptive for everyone. And that's why
we're really excited to sort of put our thesis out there of
the future of AI, which is very contrary and very different, but if
it succeeds, it changes everything, right? And so, at
least to us, from first principles of mathematics, information theory,
probability theory, physics, thermodynamics, this is the future. And
Yeah, basically. I love
it. So to take just a tiny step back, can you talk about what
is it that makes a GPU so good at
doing that sort of estimation task, basically, of
making it so that you have this really crazy distribution and
GPUs do the deep learning approach, right? Because they suck at
the sampling approach, right? So often people use
CPUs for Monte Carlo sampling because
it's a very serial task. You gotta like have little walkers that travel. You're
simulating a sort of particle in this landscape, whereas we use
literal particles to do that job, right? So
a GPU really got lucky, right? A GPU was not
imagined from first principles to be a processor for
AI. It was a graphics processor that did
really well with matrix multiplications. And
it turns out that, you know, these transformations that I was talking about
to morph a simple distribution into
a complicated one, a lot of those transformations, the
big computational element, are matmuls, matrix
multiplications, right? And so GPUs are accelerators for
that. And so most attempts that you've read
in the news or over the past several years that
have been trying to accelerate software
for AI, they've been focusing on accelerating matrix
multiplication, which first of all, you're competing with NVIDIA. Good
luck with that. Jensen will eat your lunch and thank you for it. But
Trevor, you have some first principles reasons why you think And
from the back of the envelope principle, you know, any sort of matrix multiplication
accelerator has a fundamental bottleneck, and
it's not worth necessarily pushing in that area. It's much more
interesting to try to disrupt how
you do the entire algorithm rather than just a subroutine, right, Trev?
Yeah, so the basic reasoning here is if
you go into PyTorch or something and profile a
neural network like a transformer, right, what you'll
find is that you spend about 25% of your time
moving things in and out of memory, right? So what
that means is if you accelerate the other 75% down
to zero time using your fancy accelerator, maybe some
kind of optical MatMul accelerator, right, that literally does the math of
the speed of light, you still only have a 4x speedup because
you're still paying the 25% of time to move things
in and out of memory, right? So accelerating part
of an algorithm only ever gets you kind of a modest speedup.
And so you do a lot better if you look at tasks that are much more
compute bound, like sampling. So
that was kind of another reason we thought this
Is there a reason that you can't do, so I'm, this is maybe skipping ahead,
we haven't really talked about this yet, but you hinted at it when you said that, hey, this
Gaussian, whatever, this like normal distribution
thing isn't gonna be the answer for the future. Like
you wanna do different types of probability distributions with your
chips, right? And can you talk a little bit about why that is? Like what is
so wrong about this normal distribution? And then why
can't we just do those other distributions with normal, like analog or
Yeah, that's a great question, actually. We
use Gaussian or normal distributions, right? It's basically what
is known as a bell curve. We use those all the
time because we can actually keep track, like
fundamentally, what is a Gaussian? It's like a blob, there's
where is it in space, and then how is it squished along which
axis, right? And by how much, right? So
the squishing is a matrix called covariance matrix, and the position is
called a mean, right? And if you have that vector in
a matrix, you fully specify the distribution. So essentially,
it's a way to cheat and have deterministic computers represent distributions,
because they just need to store a matrix and a vector. And they're they have
a proxy for distribution. And you can sort of analytically for
many simple transformations, keep track of how the
Gaussian gets morphed, right? And these tricks
are actually why diffusion models work so well, right? Diffusion
models, they approximate every transformation as like a
slight transformation of a Gaussian. And
so essentially, it's kind of an artifact of them being
some of the, I mean, obviously the simplest distributions you can come up with. And
essentially being easily representable by a classical computer. If
your computer can natively represent much more complicated distributions, we
wouldn't have that sort of bias, right? And the
problem is, you know, there's distributions that have much longer tails, right?
They're not just so concentrated around one
mode. They have all sorts of, you know,
blobs and long tails where, you
know, a very, you know, very low likelihood event
still has, like, a non-trivial probability, whereas Gaussians, as
you get far away, you know, they get, like, more than exponentially low
probability as you get away from the mode. And so, you
know, many machine learning algorithms and machine learning algorithms are
really good at modeling the typical case Right.
And we feel this with LLMs. Right. They're kind of like basic. Right.
Like they're really good at like typical things, but like, it's like, I
need this sort of like edge case. I
need this sort of edgy thing. Like they can't, they can't go there with
you. Right. But human brains can. Why is that? It's so weird. Right. Like, just
like if we, if you're driving and you
see something that's never been in a dataset on the road, you don't
like glitch out. You like, you generalize. Right. And so.
Fundamentally, it's like the constraints of
the hardware, deterministic hardware has constrained our thinking in
terms of where the algorithms are going and
where they should stay. And that
has sort of held back AI. So something, you
know, our ultimate goal here by proposing new hardware is
to also disrupt how software and AI
works and which algorithms tend to dominate and do well when
But what is it about those algorithms that make it impossible or
impractical to model them using a classical computer? It seems like,
I don't know, when I was a stats monitor, I could do a little binomial
plot, you know what I mean? That's a non-Gaussian distribution. What
I mean, if you try to sample directly from a hundred million dimensional distribution,
right, you know, directly without using
Well, the fundamental reason, right, is like, if I have,
let's not go to a hundred million dimensions. Let's start with one and two.
If I have a one dimensional distribution, right, that's just a
function in 1D. So I can slice that function up
into n chunks and store those n chunks in memory, and now I have a
representation of the distribution. Now I go to two
dimensions. Instead of having n chunks, I have a
grid of n squared chunks. So now I
have to store n squared things in memory to represent the distribution
in generality. What if I go to d dimensions? If
I have n slices along each dimension, I have a d-dimensional hypercube
of things to store in memory, which grows really fast, right? So the
general point here is that the complexity of representing a
totally general probability distribution tends to grow exponentially
in the dimensionality of the system. Right. And so, um,
and obviously there's like a lot of caveats that argument because the
representation, uh, like the complexity of the distribution doesn't
have to be exponential, but it can be. And that's kind of the key thing that makes
this difficult on a classical computer. Um, you
can't store them in memory. So you have to sample and sampling has all of
these problems I discussed earlier. So dude, you're just
And so is this something that, was this thought, like this kind
of train of thinking, was this what led people to want quantum computers in
the first place? It's like we can represent these super high dimensional aspects
of reality by just remaining high dimensional, by
Yes. Yeah. So that was a big, so back
in our days in quantum computing, I would just keep hammering home Don't
use a quantum computer for probabilistic machine learning or classical machine
learning, as we call it, because quantum computers are really good
at quantum interference, not necessarily probabilistic inference. Yes,
you can. It's kind of like using a
rocket that's on, you know, rockets are finicky and
less reliable to ship something across town. It doesn't make sense,
right, intraday, right? It doesn't make sense. It is gonna blow up. You
know, there's a chance it's gonna blow up. Like, why would you do that, right? Sure,
like, in principle, it could go much faster, but, like, there's
a chance it could blow up. So what we've seen is sort of, yes,
on paper, a quantum computer can do slightly better
for probabilistic inference. I've written a bunch of papers on this, because I
wanted to, like, rule it out properly, right? So I've spent the
last eight years, I guess I put out my last paper in this space, a
week ago for fun, because it was on my shelf for two years, but I
thoroughly studied, can you do classical machine learning on a quantum computer? It
seems to me like the main advantage is instead of
having sort of jiggly particles that hop above sort
of barriers in landscape, you can tunnel through. So there's an
advantage if your landscape has very thin barriers, because you
have a form of quantum tunneling, right? So sometimes,
like in very special cases, you can find an
optimum a bit faster, but when you do the whole systems
thinking, the full stack thinking of like, okay, I
have a quantum computer, I'm gonna have an error correcting system that's like 99% of
the computer, 99% plus of the computer is the error correcting system, and
I'm gonna have the cooling, and I'm gonna have the control systems, why the
heck am I going through all that trouble for this tiny speedup, right? So basically, it's
not worth using a quantum computer for these sort of low order polynomial speedup,
these sort of, hey, you know, like, I
get a square root speedup, and it's still slow,
it's still relatively slow, and I have to prop up this huge
computer to do it, when you could do it
just, you know, much cheaper on even a classical digital
computer. And so in our case, instead of trying to seek
sort of asymptotic, what is called asymptotic speed ups, like in quantum
computing, like there's, there's literally different complexity classes,
if you have a quantum mechanical computer versus a classical computer, we're
just, we're not trying to violate any sort of laws of complexity theory, we're
just doing, you know, Classical algorithms, algorithms
that you can simulate theoretically on a classical computer, we're
just doing them way faster by a large, like
sort of constant factor speed up, right? And
that constant factor speed up is several orders of magnitude.
sometimes more than can fit on one hand, depends on the algorithm.
But before we pin down exactly
what those speed ups are in the public, we want to put
out some careful scientific works. So stay
tuned for that. But it's very substantial. It's enough that it's
worth going through this exercise of rebuilding
the whole stack from first principles, right? That seems like a huge change,
right? We're taking a fork in the tech tree. We're forking off
the root node. That seems like a huge effort. Is it worth the payout? We think so,
at least from first principles. And so that's why we're really
excited, you know, and that's why we're kind of, you know, we've been
very secretive. Unfortunately, I got As
we know, I got doxxed in December. The plan was always to reveal more in
March. And so here we are. So it's right on schedule.
But our goal here is for people to be open minded about
the future of AI. I know right now it just feels like the
current labs doing LLMs, that's the end game's future. They've
captured the market. It's over. You either work for one of these companies or
you missed out, right? I don't think so. That's the beauty. of
disruption. That's the beauty of this sort of techno capital acceleration. A
couple of crazy kids, you know, with one
or two GPUs can have an idea that can, you
Yeah, that drive, like the reason behind
that makes a lot of sense. I think that the promise of
quantum computers, at least the way that I understood it, was that eventually they're
going to be so, they're going to get, you know, n squared number
of operations in the same amount of bits or whatever. So
we're going from bits to qubits. And when we have qubits, like
pretty soon we're going to have quantum supremacy because you can see like even
the biggest classical computer will be so much smaller than this puny, you
know, or even this very small quantum computer. But
there are problems. There are things that it's not simply captured in
the number of bits or qubits. There are other considerations that
you have to have when building a quantum computer. And I imagine that you two probably have
very strong opinions about that. So I would love to ask you about that. Maybe Trevor,
There. So for stars, it's funny you mentioned quantum
supremacy. The way we achieved quantum supremacy back in the
We were there. We were there a thousand years ago when it happened. Only
Yeah. Back in the day. The problem
that I have with quantum computing, and the main reason I stopped working
on it, is because most of the phenomenon that
are important to humans do not have long-range quantum
coherent effects. So all atoms are governed by
quantum mechanics, but things that are macroscopically observable, that
involve a large number of atoms, don't
need to be simulated on a quantum computer, right? Like our classical models
of them work really well. And so that's one of the fundamental reasons
it's been difficult to find a practical advantage in
quantum computing, right? Like we have these kind of, you know, there's like
a Shor's factoring algorithm, which is like the most common
thing people tout that it's going to like break RSA and whatever, and it
might, but we can just use a different crypto protocol
that isn't broken by a discrete log and
such. So it's very unclear,
even if we had a big quantum computer, what you would do with it.
And that, to me, kind of made it hard to
dedicate my life to it, right? Because the physics
and engineering challenges involved in building quantum computers are
extremely formidable. And after you do that for
I, you could see that Trevor worked close to the metal where,
you know, the challenges are extremely hard. And, you
know, I was a theorist and an algorithmist, you know,
a bit isolated away from the difficulties. I was aware of it
because I would talk to my neighbors and so on. But, you know, the ideal
thing with quantum computers is that they can represent
and sample from states of very high quantum complexity, right?
So, if you have a state of very high complexity, but
it's still, you could still sample from it with Monte Carlo, you
could just run, again, a Monte Carlo algorithm, maybe it's a million times slower than
doing it by nature, but it's still, you're still going to
get there. You know, you just throw a lot of compute at it, you're going to get there, you
know, it scales linearly. The thing with quantum complexity is
that it scales up in some cases super exponentially, like
in terms of how much classical compute you need to use in
order to replicate that distribution. What was achieved in the
Google quantum supremacy experiment and then later surpassed by
Chinese simulations and then reiterated by Google
quantum, so it's been kind of a little race there, but
essentially it was just sampling from any sort of quantum
program that you can't sample from with a classical
computer, even if you were to throw most compute on earth towards
that end. And that was achieved, I would say,
so I don't think there's anything stopping us from achieving
that. I know it's still controversial, but essentially
the promise there was that, okay, if we can show we can sample
from these complicated distributions, right? The narrative, at
least for the quantum AI side, was that, okay,
well, if we have these classes of distributions, maybe
we could search over that space and represent very highly complex states
in nature with these complicated distributions on our computer, right?
And then map one to the other, and that unlocks the ability for us
to sort of understand matter at a quantum mechanical level.
There, there was a lot of challenges to train such such
distributions because when they get really complicated, they get really
hard to train. It kind of is a sort of conservation of difficulty. So
until the hardware gets much better, it's very hard. for
you to use a quantum computer, even if you're trying
to just generally model nature in sort of native fashion,
right? You're trying to model quantum mechanics of nature with a quantum mechanical computer,
running a quantum mechanical AI representation. It's still difficult because
if the computer's not reliable enough, you can't make it big enough, you can't run
the easily trainable representations, and you're kind of screwed. And
so from the algorithmic standpoint, it was also sort of doomed
in the near term, I'm more of an optimist than Trevor. I think,
you know, humans are really smart. I think on a 20 year time
scale, people are going to figure it out. But again, for us, it's like, okay, we're
trying to do all these applications where it's not clear that
you need quantum complexity, right? Really what you
just want is a computer that allows you to do probabilistic
machine learning and optimization very
cheaply, very energy efficiently, and very fast. And
for us, it's like from first principles of thermodynamics of computing, it's
not going to be a digital computer, it's deterministic, it's not going to be a quantum computer, it's
going to be a thermodynamic computer that achieves that. And so that's what we
got to build. And so that's why we left all the secret labs. You
know, I was at Google Apps working for Sergey and Trevor
was like a different black ops lab in Santa Barbara. And
then we joined forces. We both kind of left that. And
now we're here. And now we have this thesis that
we've kind of kept close to the chest. But, you know, now
we're telling it to the world and we're asking if people want to join us. And
Yeah. And one more point on quantum computing that's interesting
in contrast to what we're doing at Extropic. To build
a quantum computer, you have to build some really weird system at large
scale. So that might be superconducting circuits where you're
making Joseph's injunctions, which are not new, but at
least a relatively new object, you might be doing neutral
atoms where you have to build these big arrays of optical tweezers and tables
and tables of lasers. Trapped ions is very much the same thing.
My point here is that the kind of manufacturing and supply chain
for all of these things is extremely immature. And so there's
going to be decades of challenge just there, achieving scale, right? Versus
if you build a thermodynamic computer, what you need to do that
is a noisy circuit. And I can think of lots of ways
to make noisy circuits that lean heavily on
the way we know how to make circuits today, right? Like the whole semiconductor industry.
So that's ultimately what we're chasing here is something that
we can do, you know, in this decade, not several decades from
Yeah, so that's a perfect tie-in. Let's just hop right in and
start talking about what you guys basically announced in this light paper. I
mean, you mentioned Joseph's injunctions. That appears strongly
in the light paper. We
talked earlier about having to keep quantum computers extremely cold, and
I believe that that also is true of this first wave that you've announced
here too. I don't know, from a first blush, I would imagine some
people would think, well, it sounds kind of familiar. What you just said is like, you know, hard to do.
So what's the value there?
We're starting within our neighborhood and we're taking a path
to sort of room temperature and large scale manufacturability, right? We're going
from the bottom up. We're going from the very cold, using similar building blocks
to what we're used to engineering and quantum computing and operating
them in a thermodynamic regime where there's no more quantum coherence. There's
no superpositions of states anymore. It's just fuzz,
probabilistic fuzz over states. And that's where we're operating the
devices, right? And for us, it was just our native
language. It was the first sort of concept of a programmable and
parametric thermodynamic computer we thought of
building. And that was basically our first prototype. And
for us, there's a lot of learnings there of like, how do you even program this thing?
How do you map all sorts of applications to it? What is programming gonna
look like? How fast can it get, right? And showing
the world, hey, this is how efficient you can have neural
computing, computing for AI be and how fast it can be,
fast and efficient, speed and efficiency. It's
very similar to the Tesla Roadster, super expensive, very
exotic, had to import a bunch of parts from all
sorts of suppliers, wasn't vertically integrated yet. And
then that's a stepping stone towards a large scale mass production, in
our case, eventually room temperature. chip that we're going to build.
And we have a roadmap to that. And so for now, we're
just showcasing the world what's possible. Hey, you
have this new paradigm that's coming, we have a first instance of it, but
we have a roadmap to get to sort of having a thermodynamic
Yeah, like in CMOS. So like, you
know, the same way you make your digital computer that you're likely watching this
on, we know how to make thermodynamic computers using
the same manufacturing technology that operates at room temperature, which
So how do they work? What's the, you know, what's the,
like, can you explain the 101 of what is
Yeah, I mean, let's talk about, let's focus on the superconducting chip,
that's the one we're disclosing, the CMOS stuff, you know, we're keeping
on a high level for now. It's the same concept, a lot of the software maps over
in thinking, but it's, you know, just like in quantum computers, you can have
different substrates, right? There's optical ion trap, you
know, photonic superconducting quantum
computers. There's many ways to do it through a computer. This is
a first way. There's gonna be a better way later, but for now,
we're talking about this one. So this one, Trevor's
going to give you a much better, more technical explanation. But essentially, we're
just using jiggles of electrons that happen in superconductors. In
superconductors, electrons like to bundle up. They like to pair up.
They're called Cooper pairs. And when they pair up, they can pass through each
other. That's why there's sort of no friction, you know, there's no
traffic congestion for your electrons in superconductors. That's why they're
superconducting, right? They have way less resistance. For us,
the superconducting aspect is more to have a
sort of non-linearity in the landscape. So that means not it being
a simple Gaussian, right? So if you have a simple LC circuit like
you do in high school, you know, and you add some, some
noise to it, you're going to get a Gaussian out of it. But we didn't want
that. We wanted programmable. super general, fully general landscape.
Essentially, what we do is something called energy-based
models. I'm more on the software side. Trevor's going to give you more
of the hardware side picture. But energy-based models
are models where you try to model data
distributions as equilibrium states called Boltzmann
distributions of certain parameterized landscapes. So
essentially you shape some hills, right? We
have little knobs that we could tweak and it changes
the shape of some hills and we pour some
sort of, you know, just a bucket of bouncy balls
and keep shaking it as we go, right? And that's it,
right? And then the algorithm is just changing this landscape over
time and the bouncy balls kind of flow. But, you know, on average, the
probability mass of where your bouncy balls are. kind of changes and
we guide those bouncy balls. For us, the bouncy balls are literally
electrons, but theoretically, you
can make it out of all sorts of other stuff. But in
our case, that's it. Essentially, we have a programmable probabilistic
computer that has parameters that you can train in
order to morph this sort of equilibrium distribution of
the bouncy balls by morphing the sort
of landscape in which they're dancing. And
we have algorithms that are physics-based to tune that sort of landscape
that correspond to machine learning, you know, like cross-entropy
minimization, which is what transformers do and diffusion models do,
amongst others. And so essentially there's a
connection between, you know, machine learning really is operationalizing
information theory, information theory and entropy, right? The
theory of entropy from Claude Shannon. appears in thermodynamics as
well, right? So we're instantiating information theory as
thermodynamic processes. And so that's the bridge between machine learning and
Trevor, do you want to go, Trev? Yeah. Sure. I
mean, actually, you kind of have absorbed my talking points at
this point, so that was pretty close to what I
would say. I'll add a layer of
generality. In a sense, any circuit you build experiences
thermal noise, so that the charges that are moving around your circuit
are getting battered around by vibrating atoms. So
every circuit you build works that way. The
trick in designing something that's not
just kind of noisy, but very noisy, is
that you have to make sure that the noise is
significant compared to the other energy scales in
your device. Right, so that's kind of where the device physics
and hardware engineering, more hardcore stuff
kind of comes in. But once you figure out how to get into
that regime, basically all you need is some kind of circuit component
that's tunable, that lets you kind of change where
the electrons prefer to sit. And that
gives you a programmable sampling machine,
basically, right? So the principles at play here are
pretty generic and you can imagine a ton of different ways
to build it. And so we're just kind of thinking like, well,
what's the most scalable thing we have? semiconductors.
So basically, I think the thing that is still confusing to me is like,
what's the input and what's the output? So the input, as I
understand it, is you're giving like weights or whatever to each
of these things, each of these, what would you
I mean, it's like a neural network, right? You have inputs, like data, and
then you might have outputs, and then you have parameters. And those are
So you have the parameters which you input, which are the ways
To be more concrete, I think that'll be helpful. You could think of
like data and parameters as voltages, right?
So I apply some voltage to the circuit, which changes
how it behaves in some way, right? And that changes the distribution that
the charges will follow, right? And when you want
to take something out of the circuit, what you're doing is
observing it. So the circuit will
have a bunch of degrees of freedom that are kind of moving randomly under
the influence of thermal noise. And basically what I can do is I
can hook an amplifier up to the circuit and measure one of
those signals. And so doing that lets me observe
the random dynamics of the circuit. And if I do that over
and over again, as long as I leave a long enough time in between observations,
In the bouncy ball analogy, right, you have your landscape, you poured a
bunch of bouncy balls, still shaking a bit in this landscape, right?
Eventually it would equilibrate to some sort of distribution. Sampling
is like applying a sort of porous grating on
top and letting a bouncy ball sort of hop out. And
from that, you can infer where that bouncy ball comes from. That gives
you one sample, one bouncy ball from the probability mass
of where they all are, right? So that gives you one snapshot. If
you take many snapshots, there's all sorts of algorithms that you
can use those sort of what are called estimators of where the
distribution is as a sort of
signal, either for learning or inferring what values
you're predicting, right? That position could be like the
value of a pixel, an image, right? But
you have probability distributions over everything, right?
Yeah. Interesting. So is that, so the
translation from thermal land back to normal digital
land, I assume still has to happen. Like you still, in order to show something
on my screen, which is a digital screen, like I need to get those
values out. But that's what you're talking about right there. You're saying that whatever we're
basically, you're able to sample and pull out these electrons
or whatever they are, see where they fell, and then that
gives you the value that you need for like a color or for a letter
You're going from this cloud of bouncy balls to, okay, this one is
definite, now I have it, it came from here, right? So, that's a deterministic sample
out of a probabilistic sort of distribution, right? And
so, and there's this old, Yeah,
there's this old thought experiment called Maxwell's
Demon that if you observe a
probabilistic system, it's going to cost you energy to get that information. So for
us, our goal, instead of having to sample
a lot from the device and always have to relay things with
classical computers, we're trying to do as much as
possible natively in probabilistic physics because that's much lower power.
It's going to cost less energy because observing things costs energy.
And that's sort of like what, so one of the things
I understand is like wrong or whatever with quantum computers is that step basically.
Like how do you get the thing out of the quantum, like qubit representation and
put it into a normal, like classical bit. And so are
there similar problems that you run into in this thermal world versus in
the quantum world with that, like basically with the, you
know, that pulling out of the other regime into the normal regime? Like
I imagine in this world to be more specific that you're,
the thing reading the voltage or whatever could be itself noisy. And so you
don't know whether you're actually getting the value that you intend to
get in the translation step out of the thermal system.
So that's where the real work comes
in, right? Is how do I design these various circuits to
In quantum computing, it's called the readout problem, right? It's like I'm
at the quantum regime where we're down to literally few
quantums of energy, right? That's where the word quantum comes
from. We're a few more energy
packets than quantum. We're a bit higher energy than that. But
still, for us, it becomes a problem of amplifying that signal, right?
So ideally, you don't want to have to
have your observations get off of your thermodynamic computer or your quantum computer
into a classical computer and then back. That's very slow. In fact, that's been
a problem with most quantum computers today. If
you try to use them for these sort of quantum deep learning
algorithms, that iteration loop to optimize your parameters way
too slow. Getting those samples out and then getting
that feedback loop update way too slow. And so our
insight is that eventually we want to do that as physics, as
a physical process in the device. And so,
And basically, whenever you want a signal to
travel far, you need to amplify it a lot, right?
Because there's more, like when a single has to physically travel
further through some like weird environment, right? More noise hits it.
And so it's kind of interesting about our approach is you could imagine putting
a lot of this stuff in the same package, right? It's
a CMOS all the way down. So potentially we'll
Huh. Wow. Okay. This is sort of breaking
my brain, but in like a good way. Like I, it's like, it's coming together.
Like, I think, I think I'm picking up what you guys are putting down. Um, I,
so I have a question about like, basically if there's an analog
to like coherence here, like quantum coherence is obviously a
big problem where you can, you did the thing just collapses and
that it like loses its quantum properties. Basically. Do you, does
that happen to you guys when things are, um, like too
big basically because there was a part of the light paper where you said we
got to keep it small we got to keep it low power because then these crazy
So we explicitly don't have quantum coherence, to
be clear. Quantum effects are actually important in
transistors. That's one of the things that limits how small you can make
them is quantum tunneling. But there's a difference between observing
a quantum effect and having a coherent quantum state. Quantum
tunneling in CMOS is not coherent because it's at room temperature. So
that's one tangent. I think the closest analogy
we have to that is if you have a device that's too
big in the right sense of big, you end up
with technically still probabilistic, but
I would say metastable systems in a sense that
if I have two wells with a giant energy barrier between them,
It's very unlikely that thermal noise is going to ever kick you over that barrier, right?
So that thing is going to look more like a digital bit. And so you have to
So for us, it's like the opposite, right? So quantum coherence is
like, it's like the time until your quantum
computer starts to thermalize, until the noise starts seeping in and
affecting things. We want the noise to start seeping in as fast
as possible and for things to equilibrate as fast as possible. So we
have something called the thermalization time. And we want that to be fast.
So for us, it's actually the opposite. We want more noise and
it helps us go faster in many ways. And so
that's the lesson we learned. Instead of trying to extend coherence
times, it's like, hey, nature wants to thermalize. Let
it rip, right? And so let's use that. Let's
use that natural tendency as a building block for our algorithms.
And so it's kind of dual. It's like the opposite of coherence time.
It's like decoherence time. How fast can you decoherence? Yeah,
exactly. So it's like switching sides, you know, half times.
Yeah. Very cool. So what I imagine that when
you both started out on this, and maybe this is a good time to talk sort of about your backgrounds, but
or like how you guys got to, we talked about a little bit at the beginning, but I'd
be curious, like, I imagine at the very beginning of this project, it
was like a glimmer of an idea. And you were like, okay, probably
it's not gonna work, but like, maybe it will. And that will, how sweet would that
be? But then now, you guys have so much further
down that path of like actually building some chips. Like I saw your
little chip at that party, Gil. And like, it's
so, you're so much further down the idea maze and
like the, you know, you must have more confidence now. So I'm curious to
hear, like how much more confident are you that this is actually going to
We have a lot of confidence that the
local neighborhood of ideas we're exploring here with sort of the intersection of
probabilistic machine learning and stochastic electronics is
the future of computing for AI. We
have a couple hypotheses of what that looks like, and we
got a couple bets in that sort of local neighborhood. We're not even married
to one substrate, as you can see. So even in terms of algorithms, we have
a couple bets there. But we're pretty sure that
something in this neighborhood is the future from first principles. And
we've built that conviction from doing these investigations and
having a larger team to sort of paralyze our learning over the
past year and a half or whatnot since the founding. For
me, this idea was a super slow simmer over
the past eight years. Well, eight years if you include time at
Xtropic. And it was an idea that seemed so crazy that
I thought I had lost my mind or something. And I wanted to sanity check
by working in quantum computing, like, hey, we're going to learn a lot
about how to do physics-based computing and imbue AI
into physical systems with quantum
AI, and those learnings will bring to sort of this
alternative form of computation. And so, you
know, that's been a sort of slow exploration in the idea maze, like
Backburner idea, but then I think I think the point of
going all in and burning the boats, right? Like, turning
down every big tech job, selling everything, moving back home with the parents. Obviously,
I don't love my parents anymore. But, you know, that
was a big move, right? Swallowed a lot of my pride, but then I had a lot
of skin in the game, right? It's like, I either make this work... You
only have one life, right? If you have an idea that you think is your
greatest idea of your life, that you think is gonna have the most impact to helping civilization,
you gotta go all in. And so, at that point, going
all in, was what we needed. And then,
you know, getting Trevor on board was a matter of time. Just
had to convince him to drop out from MIT. That
took a couple months. But, you know, and at that point, once he came
in, you know, things really accelerated. Because, you know, we've worked together before.
We shipped TensorFlow Quantum. Again, that was a similar
scenario. It's like, there is no adults in
the room. There's no guidance. The field didn't exist. Big
tech people were asking us where it's going and to
build it. And the best way to predict the future is to invent it
and build it. And that's what we did back then. And that's what we're
doing now again. So Trevor, do
you want to add to that? I'll look for my chip. I think it's somewhere
No, I think you pretty much covered that. I mean, personally, you
know, I feel that computing has to go this way. I've,
you know, I've been thinking about noise and computing and
how they might help each other, how they harm each other for basically like my
entire, you know, academic and adult life. So it's a topic
that's very near and dear to me. And when you kind of combine the
theoretical angles with the fact that we want
devices to be small, when things are small, thermal fluctuations
are important, and therefore devices become noisy, it
all seems kind of inevitable to me. And I
really think what we're doing is inevitable. So in
that sense, I have a lot of confidence. that
Totally. So I think that you mentioned
the category or whatever, like this category of ideas seems right. Ooh, there
we go. There's some hardware. A little metal
There you go. Old chip. Yeah. There you go. It's almost as
big as your people. That's wild. So this one's made out of
aluminum. That's like the easiest process to start with.
There's other fancy superconducting metals we can experiment with that can operate at
a higher temperature. But for us, we
came from quantum computing. This was kind of our lingua franca. And
we know how to build modular physics-based devices out of the substrate. So
it was somewhere to start. But, you know, obviously we have a long
road ahead because, you know, if you're an alien and you don't
have earthly supply chains that are already established, you
build your thermodynamic computer out of this, right? But of course, for
us to grow on a fast timescale, we got to meet in
the middle where existing supply chains add and
find sort of mineral ground. That's where we're going to silicon. I
like to joke that this is the floor of
computers aliens would build from first principles. But
of course, if you have the deep pockets to
scale up superconducting technology, you might be interested in this and you can give
us a call and we can work with you. Including
the aliens, they're listening to this. Give
us a call. But hey man, I don't judge.
So yeah, essentially, yeah.
So you mentioned the broad category that, you know, this idea is
a part of seems like the right one. It's, you know, for physical reasons,
for like, you know, like the frustrations you had in quantum reasons. are
you guys are one of one? Are you guys the only people thinking about
this? Or I know that like neuromorphic computing
is broadly a category, but I don't know if it's like
somewhat applicable to what you guys are doing. Do you consider yourself part
of that broader subfield or more
I think people have been obsessed with sort of biomimicry, right?
They're like, well, if we obsess over every details of
how neurons actually work and we mimic that, something's going
to work, right? We're going to figure out how to program it later. Whereas for us,
it's like, no, no, no, that's not how it works. You got to like start with the algorithm and
then, you know, both top-down and bottom-up sort of engineer this
bridge between what you want to do and the physics of the
device. And that's what we're doing. We've established this sort of full stack bridge. And
that's so interdisciplinary. It's so difficult because you need to have like ML
people talking to physicists, talking to compiler people, talking to hardware
people. It's a very difficult effort, but
we did it before in
Give me a second here. A comment there. Computing
has kind of started as this abstract thing where a
computer meant like a Boolean logic machine, right? But
in the 21st century we're actually starting to see things go
a different way in a sense that computing is
just becoming kind of more widely understood as just
embedding math into a physical process, right? This
started to become really obvious in quantum computing because the
way that people have been successful in quantum computing to date is
you start with the physics of your device and you see what
kind of computations it does relatively naturally, right? Those are
going to be the things that are going to be highest performing. And back
at Google, like the things that we've gotten working on quantum
computers to date are all kind of
things that very naturally map to the physics of the qubits, right?
Padram Rishan, Vadim Semyansky, that's the kind of game they
play there and it's been very successful. So I'm kind
of taking that approach to computing and
bringing it to like room temperature devices that scale, right?
And ultimately what that's going to do is it's going to kind of hack the
last, every last drop of performance out
All right, yeah, what else do you guys want to talk about?
Yeah, no, please. I had an analogy. So an analogy we like
to use about sort of biomimicry versus what we're doing, right?
If you set out to build a flying machine, right, you're
like, oh, well, you know, the proof of existence is out
there, we have birds. right? Birds flap
their wings. They use some form of physical principle we don't quite understand. Let's
make a plane that flaps its wings, right? And
that's going to be the device I make to achieve flight. On
the other hand, you can sort of go up the supply chain
of nature itself, right, of biology. Ultimately, biology
just finds a way to hack some sort of principle in
physics to its own advantage. And so, in this case, it was like
the physics of lift, right, or flight. And
so, when you build an airplane, an artificial system, you
just try to build the best system that
leverages that physical principle that biology found a way to leverage,
not obsessed with biomimicry. So, neuromorphic devices are obsessed with
biomimicry. We just understand how
natural systems leveraged out-of-equilibrium thermodynamics
in order to do probabilistic machine learning natively
as a physical process, and we're building devices that are better
than nature. Like, our neurons are, the superconducting chip, far more efficient than
your brain. Right? Which is astounding because your
brain is like tens of millions of times more
energy efficient than GPU clouds today and much denser. And
we're going more denser and more energy efficient than the brain, which
may scare some people. But, you know, to
us, in order to be able to understand
and predict our world at all scales, There's just so much
intelligence needed for us to scale civilization that we
just need to accelerate as fast as possible and reach the end We're trying
to reach the end of computing, right? We're trying to reach the ultimate substrate
for intelligence in terms of energy and efficiency and density because
that's where everyone is going And so we're like, let's just
go there right away from first principles and see how
far we can get. And I think we can get pretty far. And so that's
what we're going after. So we're taking inspiration from
nature, but really we're just trying to hack physics directly. The
What's interesting about this neuromorphic space, or
neuromorphic or physics-inspired computing, whatever you want to call it,
is that every different kind of device has
its own kind of natural set of algorithms that
it can accelerate, right? Because building a physics-based accelerator means
you're embedding some kind of math that you want the answer to
in the dynamics of an analog system. Right? And
so when you build dramatically different devices, maybe
like a quantum computer is really good at solving the Schrodinger equation, like
a memristor array type of thing is really good
at simulating memristor arrays. Our
computer is really good at sampling from programmable distribution. So
in a sense, the point I'm trying to get at here is there's room for
a lot of different plays in this space because every
accelerator ends up being good at something different. So
in that sense, I don't think there's any real competition out
What do you think the main applications are going to be just that fit
today's world? So, you know, I imagine that you guys will invent some new
stuff, some new software algorithms, your own software, but
is there going to be a one-to-one analog for people that are doing normal
models today that they'll be able to, oh, they'll just plug your thing in instead
Yeah, we definitely want to support current day, you
know, deep learning and machine learning practitioners. Of course, for
us, those applications like large language models are
applications we achieve at scale, right? When our devices scale, because there's
large in the name of large language models, right? And so there's
a lot of machine learning models that are more valuable
to businesses in some ways. that are in the low data regime,
where you need to have probabilistic uncertainty about your predictions, right? Let's
say you're, you're doing a trade, you're pricing options, or, you
know, you're trying to optimize the manufacturing process, every
data point can cost millions, if not billions of dollars. you
don't have that many data points, and it doesn't matter how much compute you want to
throw at it. You want the best possible answer, and
you want to quantify how uncertain you are about your predictions. That's
the sort of algorithm we're trying to enable in the short term.
And so, did I cut out there? No,
I did not. Okay. I did that. So that's the sort of algorithm
we want to enable. And so that's a different regime
than the big data, big compute regime or big classical compute regime. We're
liking the extreme compute regime. We harness a lot of
that compute from nature, right? The probabilistic compute for practically
free just from heat from nature. But we're
going to tackle sort of low data regime probabilistic
algorithms, right? And we think that these are
actually in some ways more useful than
large language models for businesses. Or
at least a nice sort of dual to them or another
type of machine learning that is synergistic. And
so for us, sort of the LLM market
is interesting. That's where we want to get to. But in the early days,
it's going to be these sort of other algorithms that maybe are more popular in
Let's just say that. Like how there's lots of room in
device space for new things, I think there's lots of room in
AI space for new things, right? It's easy to get caught up in
what's most important today, but you have to take a second and
have some perspective. We're really early in
AI. This is like the birthing years, right?
And so the technology that, I mean, there's never going to be an
ultimate technology. There's always going to be a new and better thing. But, you know, looking
10 years down the line, 20 years down the line, that stuff
might have no resemblance to what we're doing today, right?
This is the next S-curve. We think that current AI,
it's scaling, it's very impressive. It's going to hit some
sort of saturation, might be the data bottleneck. I think definitely the
energy bottleneck, right? You've got
to move mountains. 7 trillion. 7 trillion, right? Just throw 7 trillion
at it. It's going to fix everything. you know, we think there's a
better way. We're already working on the next S-curve, right, after
this one. So it takes time to ramp up to the level where
we're at the state of the art, but we know that, again,
by 2030 or so, we're probably gonna hit a wall in
terms of scaling down our current deterministic transistor technology
because they're gonna hit this sort of thermodynamic regime. Their wobbliness
is gonna be a problem. We're building beyond Moore's wall. So we're
gonna enable us to extend Moore's law in a sense,
just for AI and probabilistic computing, not
for general computing, but that's still great because that's where
we want the extreme scale compute anyways. And
so to us, this is the most important thing we could be working on in our lives. And
every day we just wake up with insane levels of internal fire.
We're on a mission to save the world here, and
we kind of hyperstition this, and we're kind of in a position where it
is the case, and it's kind of surreal, of
course, and there's many other things going on in
our lives as well. Most
people's priors is that if we had a successful movement,
couldn't be successful in technology, but for me, I think this
technology, you know, the cultural movement is great and I want more people to
join in on the optimism and to have life paths similar to
ours, if possible, because then everybody would accelerate. But,
you know, ultimately, for me, this mission is
the one I'm like most passionate about. And, you know, I'm
all it, right? This is the meaning and purpose of
our lives. I truly believe that. And that
just gives a deep sense of satisfaction working on this stuff
every day and gives you near infinite
energy somehow. It just comes out of nowhere, right? Like if you have this infinite
goal and you're making progress towards it, it
feels really good. And so any other bump in the road, any challenge, it's
just a temporary setback on this road to
an amazing goal. And so couldn't be more excited to finally, talk
more about it today. It feels very cathartic right now to talk
about it in a sort of public podcast. These have been
like our internal secrets for a while on our
thesis, but the reason we're showing more about
the world is so that people know. People know it's coming. And
if you want to work on this sort of stuff, if you're a talented builder, if
you're ready to run through walls to do this, you should give us a call
So we got four new job postings, but really it's
whoever wants to join us on this journey and believes
in the mission now that we've kind of laid it out, you
know, should talk to us. And so, you know, our goal
is for everyone to accelerate and, you know, in the ethos of, you
know, some of our techno-optimistic thinking, we're putting our
ideas out there. And hopefully
the universe will reward us back for creating
all this value, but we're going to keep going no matter what, because it's the most
And by talk to us, he means apply to our job postings, because if you
DM us on Twitter, it's very likely not
Yeah, I get a lot of DMs. Yeah, yeah, yeah,
yeah. So ideally job posting. Yeah, so
that's our goal today. I love it. Thanks so much, Christian. I
Oh yeah, absolutely. I mean, you guys are the perfect people to have on it. I feel like you
name-dropped first principles many times throughout the podcast, which I am very, very
happy to hear. Awesome. And I hope that people, so people
should know if they didn't know this already, and it'll be linked in the show notes, but there
is a paper we were talking about through this, so they've actually published some
It's just a few ideas to get you thinking. It's
Not yet. That's right. Well, they did have that conference in San Francisco yesterday
that was like how to build a nuclear bomb. So who knows, maybe they're- Right. No,
thank you guys so much for joining. This was awesome. And who knows, maybe
when you get your next chips or when you publish a full white paper, we'll have another one of these and go
That sounds good. Sounds good. Thanks so much. Awesome. Thanks, Christian. Thanks, guys. All right. Cheers.