- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.
- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.
Reminds me of the blog post about Waymo's "World Model". Training on real-world data results in a sufficiently rich model to start simulating novel scenarios that aren't in the training data (like the elephant wandering into the street), which in turn can feed back into training. One could imagine scientific inquiry working the same way.
It strikes me that many of these complex systems have indeterminate boundaries, and a fair amount of distortion might be baked into the choice of training data. Poverty (to take an example from this post) probably has causes at economic, psychological, ecological, physiological, historical, and political levels of description (commenters please note I didn't think too hard about this list). What data we feed into our models, and how those data are understood as operationalizations of the qualitative phenomena we care about, might matter.
He talks about the Santa Fe institute and how they failed to carry their findings into the real world.
They did not.
They showed that for certain problems one could not do more than figure out some invariant and scaling laws. Showing what is impossible is not failure.
For the rest:
Modern gene networks and lots of biological modelling is based on their work as well as quite a few other things. That’s also not failure.
I disagree with the article. I think it is always possible to come up with reasonably small theories that capture most of the given phenomena. So in a sense, you don't need complex theories in the form of large NNs (models? functions? programs?), other than for more precise prediction.
For example - global warming. It's nice to have AOGCMs that have everything and the carbon sink in them. But if you want to understand, a two layer model of atmosphere with CO2 and water vapor feedback will do a decent job, and gives similar first-order predictions.
I also don't think poverty is a complex problem, but that's a minor point.
> I also don't think poverty is a complex problem, but that's a minor point.
I'm not sure it's a minor point. I don't think poverty is a "complex" problem either, as that term is used in the article, but that doesn't mean I think it fits into one of the other two categories in the article. I think it is in a fourth category that the article doesn't even consider.
For lack of a better term, I'll call that category "political". The key thing with this category of problems is that they are about fundamental conflicts of interest and values, and that's a different kind of problem from the kind the article talks about. We don't have poverty in the world because we lack accurate enough knowledge of how to create the wealth that brings people out of poverty. We have poverty in the world because there are people in positions of power all over the world who literally don't care about ending poverty, and who subvert attempts to do so--who make a living by stealing wealth instead of creating it, and don't care that that means making lots of other people poor.
It's an optimistic point of view. Still, when people use large neural nets to model physics, they also have a lot of parameters but they replicate very simple laws. So there's something deeper about this. Something like a simulation of theory.
Summary: good scientific theories have “reach,” which is not defined in any precise way. Reach has complexity and this can be handled with large parameter neural networks. Assumptions: mechanistic and deterministic worldview; epistemological perfection is the goal (perfect knowledge of facts).
This might be an unkind reading, but to me this just sounds like an attempt to reinvent the very same kind of mysticism that it mentions in the first paragraph.
“No need to study the world around you and wonder about its rules, peasant - it’s far beyond your understanding! Only ~the gods~ computers can ever know the truth!”
I shudder to think about a future where people give up on working to understand complex systems because it’s hard and a machine can do it better, so why bother.
Not the intention at all. The part about mechanistic interpretability was meant to gesture at how building such systems can provide new tool kit for building further intuition and understanding.
Might we ever distinguish what is complex and complicated? Probably not, but I guess the author argues that this gives us a way forward because we can try to distill large models.
> You could capture the behavior of every falling object on Earth in three variables and describe the relationship between matter and energy in five characters.
What we can do is to approximate. Newton had a good approximation some time ago about gravitation (force equals a constant times two masses divided by distance squared. Super readable indeed)
But nowadays there's a better one that doesn't look like Newton's theory (Einstein's field equations which look compact but nothing like Newton's). So, what if in a 1000 years we have yet a better approximation to gravity in the universe but it's encoded in millions of variables? (perhaps in the form of a neural network of some futuristic AI model?)
My point is: whatever we know about the universe now doesn't necessarily mean that it has "captured" the underlaying essence of the universe. We approximate. Approximations are useful and handy and will move humanity forward, but let's not forget that "approximations != truth"
If we ever discover the underlaying "truth" of the universe, we would look back and confidently say "Newton was wrong". But I don't think we will ever discover such a thing, thereore sure approximations are our "truth" but sometimes people forget.
Einstein’s equations look like Newton’s in the limit. It would be a little weird if we ended up having to add millions of additional parameters over the next thousand years. At the current rate we seem to get multiple years per parameter, rather than hundreds of parameters per year, right?
This kind of view tends to logically conclude in the idea of a noumenal, unknowable reality. I think it's more reasonable to say that truth itself is gold star we award to descriptions that suit our purposes. After all, descriptions are necessarily approximations (or reductive or "compressions"), since the only model of a thing with 100% fidelity is... the thing itself.
Connectionist models have lots of theory by theoreticians explicitly pissed off about Chomsky's assertion that there is an inbuilt ability for language. Jay McClelland's office had a little corkboard thingy with Chomsky mockery on the side, for example. Putting forth even the implicature that the present direct descendants are intellectual descendants of Chomsky is like saying Protestants are intellectual descendants of Pope Leo X.
Perhaps a failure of communication -- I was indeed attempting to say that Chomsky was wrong and his ideas were interesting, but more or less a dead end.
I think this also creates a vulnerability where, the more time and effort is spent to craft the “correct” solution, it becomes easier to dismiss topics out of hand. Even if our modeling tools have changed, emotions and the human mind have not.
Let's gather authors of 15 different world languages together in a room and see if they can collaboratively write a short story. Surely their inability to do so will prove their inadequacy in their native language. /s
Simplicity brings us closer to truth — Occam's razor has underpinned the development of our species for centuries. It's enterprise, empire, and capital that feed off of complexity.
We're entering a period of human history where engineers and businesspeople drive academic discourse, rather than scientists or philosophers. The result is intellectual chicken scratch like this article.
Two handwavey ideas upon reading this:
- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.
- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.
Maybe we can come up with smaller models that perform almost as well as the bigger ones. Could that just be pca of some kind?
Gpt nano vs gpt 5 for example.
Reminds me of the blog post about Waymo's "World Model". Training on real-world data results in a sufficiently rich model to start simulating novel scenarios that aren't in the training data (like the elephant wandering into the street), which in turn can feed back into training. One could imagine scientific inquiry working the same way.
It strikes me that many of these complex systems have indeterminate boundaries, and a fair amount of distortion might be baked into the choice of training data. Poverty (to take an example from this post) probably has causes at economic, psychological, ecological, physiological, historical, and political levels of description (commenters please note I didn't think too hard about this list). What data we feed into our models, and how those data are understood as operationalizations of the qualitative phenomena we care about, might matter.
> like the elephant wandering into the street
Or a dinosaur that looks like it might:
https://x.com/phatman_19/status/2030728278437491102
This "world model" concept has been a big deal in AI research, in LLMs.
He talks about the Santa Fe institute and how they failed to carry their findings into the real world.
They did not.
They showed that for certain problems one could not do more than figure out some invariant and scaling laws. Showing what is impossible is not failure.
For the rest: Modern gene networks and lots of biological modelling is based on their work as well as quite a few other things. That’s also not failure.
I agree that modern AI is alchemy.
True -- I didn't mean to communicate that Santa Fe was a failure writ large. Their contribution was very important!
Though I think it's fair to say that the torch was picked up and carried by others with a different set of strategies.
I disagree with the article. I think it is always possible to come up with reasonably small theories that capture most of the given phenomena. So in a sense, you don't need complex theories in the form of large NNs (models? functions? programs?), other than for more precise prediction.
For example - global warming. It's nice to have AOGCMs that have everything and the carbon sink in them. But if you want to understand, a two layer model of atmosphere with CO2 and water vapor feedback will do a decent job, and gives similar first-order predictions.
I also don't think poverty is a complex problem, but that's a minor point.
> I also don't think poverty is a complex problem, but that's a minor point.
I'm not sure it's a minor point. I don't think poverty is a "complex" problem either, as that term is used in the article, but that doesn't mean I think it fits into one of the other two categories in the article. I think it is in a fourth category that the article doesn't even consider.
For lack of a better term, I'll call that category "political". The key thing with this category of problems is that they are about fundamental conflicts of interest and values, and that's a different kind of problem from the kind the article talks about. We don't have poverty in the world because we lack accurate enough knowledge of how to create the wealth that brings people out of poverty. We have poverty in the world because there are people in positions of power all over the world who literally don't care about ending poverty, and who subvert attempts to do so--who make a living by stealing wealth instead of creating it, and don't care that that means making lots of other people poor.
It's an optimistic point of view. Still, when people use large neural nets to model physics, they also have a lot of parameters but they replicate very simple laws. So there's something deeper about this. Something like a simulation of theory.
Summary: good scientific theories have “reach,” which is not defined in any precise way. Reach has complexity and this can be handled with large parameter neural networks. Assumptions: mechanistic and deterministic worldview; epistemological perfection is the goal (perfect knowledge of facts).
This might be an unkind reading, but to me this just sounds like an attempt to reinvent the very same kind of mysticism that it mentions in the first paragraph.
“No need to study the world around you and wonder about its rules, peasant - it’s far beyond your understanding! Only ~the gods~ computers can ever know the truth!”
I shudder to think about a future where people give up on working to understand complex systems because it’s hard and a machine can do it better, so why bother.
Mark Cubain had a good line, I don't know if he came up with it or who, but he reportedly said:
" There are 2 types of people using AI: Those who use it so they can know everything, and those who use it so they don't have to know anything. " :-
Not the intention at all. The part about mechanistic interpretability was meant to gesture at how building such systems can provide new tool kit for building further intuition and understanding.
Might we ever distinguish what is complex and complicated? Probably not, but I guess the author argues that this gives us a way forward because we can try to distill large models.
> You could capture the behavior of every falling object on Earth in three variables and describe the relationship between matter and energy in five characters.
What we can do is to approximate. Newton had a good approximation some time ago about gravitation (force equals a constant times two masses divided by distance squared. Super readable indeed) But nowadays there's a better one that doesn't look like Newton's theory (Einstein's field equations which look compact but nothing like Newton's). So, what if in a 1000 years we have yet a better approximation to gravity in the universe but it's encoded in millions of variables? (perhaps in the form of a neural network of some futuristic AI model?)
My point is: whatever we know about the universe now doesn't necessarily mean that it has "captured" the underlaying essence of the universe. We approximate. Approximations are useful and handy and will move humanity forward, but let's not forget that "approximations != truth"
If we ever discover the underlaying "truth" of the universe, we would look back and confidently say "Newton was wrong". But I don't think we will ever discover such a thing, thereore sure approximations are our "truth" but sometimes people forget.
Einstein’s equations look like Newton’s in the limit. It would be a little weird if we ended up having to add millions of additional parameters over the next thousand years. At the current rate we seem to get multiple years per parameter, rather than hundreds of parameters per year, right?
This kind of view tends to logically conclude in the idea of a noumenal, unknowable reality. I think it's more reasonable to say that truth itself is gold star we award to descriptions that suit our purposes. After all, descriptions are necessarily approximations (or reductive or "compressions"), since the only model of a thing with 100% fidelity is... the thing itself.
Agreed!
Connectionist models have lots of theory by theoreticians explicitly pissed off about Chomsky's assertion that there is an inbuilt ability for language. Jay McClelland's office had a little corkboard thingy with Chomsky mockery on the side, for example. Putting forth even the implicature that the present direct descendants are intellectual descendants of Chomsky is like saying Protestants are intellectual descendants of Pope Leo X.
Perhaps a failure of communication -- I was indeed attempting to say that Chomsky was wrong and his ideas were interesting, but more or less a dead end.
>Jay McClelland's office had a little corkboard thingy with Chomsky mockery on the side, for example.
I've never understood why the idea of linguistic nativism is so upsetting to people.
Very skeptical Adam Curtis hat on while reading this, but it is quite well written. Thanks & kudos!
I think this also creates a vulnerability where, the more time and effort is spent to craft the “correct” solution, it becomes easier to dismiss topics out of hand. Even if our modeling tools have changed, emotions and the human mind have not.
Let's gather authors of 15 different world languages together in a room and see if they can collaboratively write a short story. Surely their inability to do so will prove their inadequacy in their native language. /s
Simplicity brings us closer to truth — Occam's razor has underpinned the development of our species for centuries. It's enterprise, empire, and capital that feed off of complexity.
We're entering a period of human history where engineers and businesspeople drive academic discourse, rather than scientists or philosophers. The result is intellectual chicken scratch like this article.