— Evolution, AI, and the Five Breakthroughs That Made Our Brains

Select Quotes

Introduction

What is most striking when we examine the brains of other animals is how remarkably similar their brains are to our own. The difference between our brain and a chimpanzee’s brain, besides size, is barely anything. – Page 18

survival instincts, emotions, and cognition do not delineate cleanly—they emerge from diverse networks of systems spanning all three of these supposed layers. – Page 21

I wrote this book because I wanted to read this book. – Page 25

1: The World Before Brains

The fungal strategy was, by some measures, more successful than the animals’—by biomass, there is about six times more fungus on Earth than animals. But as we will continually see, it is usually the worse strategy, the harder strategy, from which innovation emerges. – Page 45

Active killing was, of course, not new; the first eukaryotes had long ago invented a strategy—phagotrophy—for killing life. But this worked only on level-one (single-cellular) life; level-two multicellular blobs were far too big to engulf into a single cell. And so early animals evolved internal digestion as a strategy for eating level-two life. – Page 46

These features of neurons—all-or-nothing spikes, rate coding, adaptation, and chemical synapses with excitatory and inhibitory neurotransmitters—are universal across all animals, – Page 58

All-or-nothing electrical spikes triggered rapid and orchestrated reflexive movements so animals could catch prey in response to even the subtlest of touches or smells. Rate coding enabled animals to modify their responses based on the strengths of a touch or smell. Adaptation enabled animals to adjust the sensory threshold for when spikes are generated, allowing them to be highly sensitive to even the subtlest of touches or smells while also preventing overstimulation at higher strengths of stimuli. – Page 58

Breakthrough #1: Steering And The First Bilaterians

2: The Birth Of Good And Bad

Radially symmetrical body plans work fine with the coral strategy of waiting for food. But they work horribly for the hunting strategy of navigating toward food. – Page 64

Bilaterally symmetrical bodies make movement much simpler. Instead of needing a motor system to move in any direction, they simply need one motor system to move forward and one to turn. Bilaterally symmetrical bodies don’t need to choose the exact direction; they simply need to choose whether to adjust to the right or the left. – Page 65

Steering in an organism that contains millions of cells required a whole new setup, one in which a stimulus activates circuits of neurons and the neurons activate muscle cells, causing specific turning movements. And so the breakthrough that came with the first brain was not steering per se, but steering on the scale of multicellular organisms. – Page 70

But it may not be a coincidence that the first successful domestic robot contained an intelligence not so unlike the intelligence of the first brains. Both used tricks that enabled them to navigate a complex world without actually understanding or modeling that world. – Page 75

This requirement of integrating input across sensory modalities was likely one reason why steering required a brain and could not have been implemented in a distributed web of reflexes like those in a coral polyp. All these sensory inputs voting for steering in different directions had to be integrated together in a single place to make a single decision; you can go in only one direction at a time. The first brain was this mega-integration center—one big neural circuit in which steering directions were selected. – Page 80

3: The Origin Of Emotion

These persistent affective states are a trick to overcome this challenge: If I detect a passing sniff of food that quickly fades, it is likely that there is food nearby even if I no longer smell it. Therefore, it is more effective to persistently search my surroundings after encountering food, as opposed to only responding to food smells in the moment that they are detected. Similarly, a worm passing through an area full of predators won’t experience a constant smell of predators but rather catch a transient hint of one nearby; if a worm wants to escape, it is a good idea to persistently swim away even after the smell has faded. – Page 90

The simple brain of the nematode offers a window into the first, or at least very early, functions of dopamine and serotonin. In the nematode, dopamine is released when food is detected around the worm, whereas serotonin is released when food is detected inside the worm. If dopamine is the something-good-is-nearby chemical, then serotonin is the something-good-is-actually-happening chemical. Dopamine drives the hunt for food; serotonin drives the enjoyment of it once it is being eaten. – Page 92

While the exact functions of dopamine and serotonin have been elaborated throughout different evolutionary lineages, this basic dichotomy between dopamine and serotonin has been remarkably conserved since the first bilaterians. In species as divergent as nematodes, slugs, fish, rats, and humans, dopamine is released by nearby rewards and triggers the affective state of arousal and pursuit (exploitation); and serotonin is released by the consumption of rewards and triggers a state of low arousal, inhibiting the pursuit of rewards (satiation). – Page 93

Dopamine is not a signal for pleasure itself; it is a signal for the anticipation of future pleasure. Heath’s patients weren’t experiencing pleasure; to the contrary, they often became extremely frustrated at their inability to satisfy the incredible cravings the button produced. – Page 96

The adrenaline-induced escape response is one of the most expensive behavioral choices an animal can make—the escape response requires a large expenditure of energy on muscles for rapid swimming. So evolution came up with a trick to save energy and thereby allow the escape response to last longer. Adrenaline not only triggers the behavioral repertoire of escape; it also turns off a swath of energy-consuming activities to divert energetic resources to muscles. Sugar is expelled from cells across the body, cell growth processes are halted, digestion is paused, reproductive processes are turned off, and the immune system is tamed. – Page 98

It is no surprise, then, that nematodes, other invertebrates, and humans all have similar responses to opioids—prolonged bouts of feeding, inhibited pain responses, and inhibited reproductive behavior. – Page 99

a nematode starved for only twelve hours will eat thirty times more food than their normally hungry peers. In other words, stress makes nematodes binge food. After binging, these previously starved nematodes “pass out,” spending ten times longer in an immobile state than unstarved worms. Nematodes do this because stress is a signal that circumstances are dire and food may be, or may soon become, scarce. Thus, nematodes stock up on as much food as they can in preparation for the next experience of starvation. – Page 100

spending energy escaping is worth the cost only if the stimulus is in fact escapable. Otherwise, the worm is more likely to survive if it conserves energy by waiting. – Page 102

Chronic stress isn’t all that different from acute stress; stress hormones and opioids remain elevated, chronically inhibiting digestion, immune response, appetite, and reproduction. But chronic stress differs from acute stress in at least one important way: it turns off arousal and motivation. – Page 102

4: Associating, Predicting, And The Dawn Of Learning

It seems that at the same time valence—the categorizing of things in the world into good and bad—emerged, so too did the ability to use experience to change what is considered good and bad in the first place. – Page 110

An early bilaterian that could remember to avoid a chemical that had previously been found near predators would survive far better than a bilaterian that could not. – Page 110

broken associations are rapidly suppressed but not, in fact, unlearned; given enough time, they reemerge. – Page 112

old extinguished associations are reacquired faster than entirely new associations. – Page 112

This is the benefit of spontaneous recovery—it enables a primitive form of long-term memory to persist through the tumult of short-term changes in the contingencies of the world. – Page 113

The first trick used what are called eligibility traces. A slug will associate a tap with a subsequent shock only if the tap occurs one second before the shock. – Page 115

The second trick was overshadowing. When animals have multiple predictive cues to use, their brains tend to pick the cues that are the strongest—strong cues overshadow weak cues. – Page 116

The third trick was latent inhibition—stimuli that animals regularly experienced in the past are inhibited from making future associations. In other words, frequent stimuli are flagged as irrelevant background noise. Latent inhibition is a clever way to ask, “What was different this time?” – Page 116

Circuits whereby different valence neurons can be integrated into a singular steering decision (hence a big cluster of neurons we identify as a brain) – Page 123

Breakthrough #2: Reinforcing And The First Vertebrates

5: The Cambrian Explosion

It was originally believed that the only way to explain such intelligent behavior in animals was through some notion of insight or imitation or planning, but Thorndike showed how simple trial and error was all an animal really needed. Thorndike summarized his result in his now famous law of effect: Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation. – Page 134

The second breakthrough was reinforcement learning: the ability to learn arbitrary sequences of actions through trial and error. – Page 137

6: The Evolution Of Temporal Difference Learning

Minsky realized that reinforcement learning would not work without a reasonable strategy for assigning credit across time; this is called the temporal credit assignment problem. – Page 139

At first, dopamine neurons did respond like a valence signal, getting uniquely excited whenever a hungry monkey got sugar water. But after a few trials, dopamine neurons stopped responding to the reward itself and instead responded only to the predictive cue. – Page 149

Figure 6.3: Responses of dopamine neurons to predictive cues, rewards, and omissions – Page 150

Dopamine neurons in Schultz’s monkeys got excited by predictive cues because these cues led to an increase in predicted future rewards (a positive temporal difference); dopamine neurons were unaffected by the delivery of an expected reward because there was no change in predicted future reward (no temporal difference); and dopamine-neuron activity decreased when expected rewards were omitted because there was a decrease in predicted future rewards (a negative temporal difference). – Page 151

Dopamine is not a signal for reward but for reinforcement. As Sutton found, reinforcement and reward must be decoupled for reinforcement learning to work. To solve the temporal credit assignment problem, brains must reinforce behaviors based on changes in predicted future rewards, not actual rewards. This is why animals get addicted to dopamine-releasing behaviors despite it not being pleasurable, and this is why dopamine responses quickly shift their activations to the moments when animals predict upcoming reward and away from rewards themselves. – Page 151

How can the absence of something be reinforcing? The answer is that the omission of an expected punishment is itself reinforcing; it is relieving. And the omission of an expected reward is itself punishing; it is disappointing. – Page 153

In our metaphor, the basal ganglian student initially learns solely from the hypothalamic judge, but over time learns to judge itself, knowing when it makes a mistake before the hypothalamus gives any feedback. This is why dopamine neurons initially respond when rewards are delivered, but over time shift their activation toward predictive cues. This is also why receiving a reward that you knew you were going to receive doesn’t trigger dopamine release; predictions from the basal ganglia cancel out the excitement from the hypothalamus. – Page 160

7: The Problems Of Pattern Recognition

This was the first problem of pattern recognition, that of discrimination: how to recognize overlapping patterns as distinct. – Page 166

This is the second challenge of pattern recognition: how to generalize a previous pattern to recognize novel patterns that are similar but not the same. – Page 166

Figure 7.6: Expansion and sparsity (also called expansion recoding) can solve the discrimination problem – Page 171

This trick is called auto-association; neurons in the cortex automatically learn associations with themselves. This offers a solution to the generalization problem—the cortex can recognize a pattern that is similar but not the same. – Page 172

Auto-association reveals an important way in which vertebrate memory differs from computer memory. Auto-association suggests that vertebrate brains use content-addressable memory—memories are recalled by providing subsets of the original experience, which reactivate the original pattern. If I tell you the beginning of a story you’ve heard before, you can recall the rest; if I show you half a picture of your car, you can draw the rest. However, computers use register-addressable memory—memories that can be recalled only if you have the unique memory address for them. If you lose the address, you lose the memory. – Page 172

The auto-associative networks we described cannot identify an object you have never seen before from completely different angles. An auto-associative network would treat these as different objects because the input neurons are completely different. – Page 179

While V1 decomposes pictures into simple features, as visual information flows up the hierarchy, it is pieced back together into whole objects. – Page 181

Hubel and Wiesel had discovered two things. First, visual processing in mammals was hierarchical, with lower levels having smaller receptive fields and recognizing simpler features, and higher levels having larger receptive fields and recognizing more complex objects. Second, at a given level of the hierarchy, neurons were all sensitive to similar features, just in different places. For example, one area of V1 would look for lines at one location, and another area would look for lines for another location, but they were all looking for lines. – Page 181

His architecture first decomposed input pictures into multiple feature maps, like V1 seemed to do. Each feature map was a grid that signaled the location of a feature—such as vertical or horizontal lines—within the input picture. This process is called a convolution, hence the name applied to the type of network that Fukushima had invented: convolutional neural networks.* – Page 182

8: Why Life Got Curious

For trial-and-error learning to work, agents need to, well, have lots of trials from which to learn. This means that reinforcement learning can’t work by just exploiting behaviors they predict lead to rewards; it must also explore new behaviors. In other words, reinforcement learning requires two opponent processes—one for behaviors that were previously reinforced (exploitation) and the other for behaviors that are new (exploration). These choices are, by definition, opposing each other. Exploitation will always drive behavior toward known rewards, and exploration will always drive toward what is unknown. – Page 188

There is an alternative approach to tackling the exploitation-exploration dilemma, one that is both beautifully simple and refreshingly familiar. The approach is to make AI systems explicitly curious, to reward them for exploring new places and doing new things, to make surprise itself reinforcing. The greater the novelty, the larger the compulsion to explore it. – Page 188

Even though there are no explicit rewards until you get past all the rooms in level one, these AI systems didn’t need any external rewards to explore. They were motivated on their own. Simply finding their way to a new room was valuable in and of itself. – Page 189

In vertebrates, surprise itself triggers the release of dopamine, even if there is no “real” reward. – Page 189

Gambling and social feeds work by hacking into our five-hundred-million-year-old preference for surprise, producing a maladaptive edge case that evolution has not had time to account for. – Page 190

Breakthrough #3: Simulating And The First Mammals

10: The Neural Dark Ages

The neocortex gave this small mouse a superpower—the ability to simulate actions before they occurred. – Page 211

If the reinforcement-learning early vertebrates got the power of learning by doing, then early mammals got the even more impressive power of learning before doing—of learning by imagining. – Page 212

On land, even at night, you can see up to one hundred times farther than you can underwater. Thus, fish opted not to simulate and plan their movements but instead to respond quickly whenever something came at them (hence their large midbrain and hindbrain, and comparatively smaller cortex). – Page 212

The electrical signaling of neurons is highly sensitive to temperature—at lower temperatures, neurons fire much more slowly than at warmer temperatures. This meant that a side effect of warm-bloodedness was that mammal brains could operate much faster than fish or reptile brains. – Page 212

11: Generative Models And The Neocortical Mystery

According to Mountcastle, the neocortex does not do different things; each neocortical column does exactly the same thing. The only difference between regions of neocortex is the input they receive and where they send their output; the actual computations of the neocortex itself are identical. – Page 220

He suggested that a person doesn’t perceive what is experienced; instead, he or she perceives what the brain thinks is there—a process Helmholtz called inference. Put another way: you don’t perceive what you actually see, you perceive a simulated reality that you have inferred from what you see. – Page 227

might optimize for the accuracy with which the inner simulated reality predicts the current external sensory input. – Page 229

Nowhere was this network told the right answer; it was never told what properties make up a 2 or even which pictures were 2s or 7s or any other number. The only data the network had to learn from was pictures of numbers. The question was, of course, would this work? Would this toggling back and forth between recognition and generation enable the network to both recognize handwritten numbers and generate its own unique pictures of handwritten numbers without ever having been told the right answer? – Page 230

Some neuroscientists refer to perception, even when it is functioning properly, as a “constrained hallucination.” – Page 236

The neocortex (and presumably the bird equivalent) is always in an unstable balance between recognition and generation, and during our waking life, humans spend an unbalanced amount of time recognizing and comparatively less time generating. Perhaps dreams are a counterbalance to this, a way to stabilize the generative model through a process of forced generation. – Page 237

Imagination could have been performed by a system separate from recognition. But in the neocortex, this is not the case—they are performed in the exact same area. This is exactly what we would expect from a generative model: perception and imagination are not separate systems but two sides of the same coin. – Page 238

If reflex circuits are reflex-prediction machines, and the critic in the basal ganglia is a reward-prediction machine, then the neocortex is a world-prediction machine—designed to reconstruct the entire three-dimensional world around an animal to predict exactly what will happen next as animals and things in their surrounding world move. – Page 239

the generative model in the neocortex tries to infer the causes of its sensory input. Causes are just the inner simulated 3D world that the neocortex believes best matches the sensory input it is being given. This is also why generative models are said to try to explain their input—your neocortex attempts to render a state of the world that could produce the picture that you are seeing – Page 241

But others, like Yann LeCun, head of AI at Meta, believe they are something else, something more primitive, something that evolved much earlier. In LeCun’s words: We humans give way too much importance to language and symbols as the substrate of intelligence. Primates, dogs, cats, crows, parrots, octopi, and many other animals don’t have human-like languages, yet exhibit intelligent behavior beyond that of our best AI systems. What they do have is an ability to learn powerful “world models” that allow them to predict the consequences of their actions and to search for and plan actions to achieve a goal. The ability to learn such world models is what’s missing from AI systems today. – Page 242

The reason the neocortex is so powerful is not only that it can match its inner simulation to sensory evidence (Helmholtz’s perception by inference) but, more important, that its simulation can be independently explored. If you have a rich enough inner model of the external world, you can explore that world in your mind and predict the consequences of actions you have never taken. Yes, your neocortex enables you to open your eyes and recognize the chair in front of you, but it also enables you to close your eyes and still see that chair in your mind’s eye. You can rotate and modify the chair in your mind, change its colors, change its materials. It is when the simulation in your neocortex becomes decoupled from the real external world around you—when it imagines things that are not there—that its power becomes most evident. – Page 242

12: Mice In The Imaginarium

How groundbreaking this was cannot be overstated—neuroscientists were peering directly into the brain of a rat, and directly observing the rat considering alternative futures. – Page 247

Buddhists and psychologists alike realize that ruminating about what could have been is a source of great misery for humanity. We cannot change the past, so why torture ourselves with it? The evolutionary roots of this go back to early mammals. In the ancient world, and in much of the world that followed, such ruminating was useful because often the same situation would recur and a better choice could be made. – Page 249

Causation itself may live more in psychology than in physics. – Page 253

we don’t truly remember episodic events. The process of episodic remembering is one of simulating an approximate re-creation of the past. – Page 255

13: Model-Based Reinforcement Learning

It used search not to logically consider all future possibilities (something that is impossible in most situations) but to simply verify and expand on the hunches that an actor-critic system was already producing. – Page 263

The primary input to the agranular prefrontal cortex, however, comes from the hippocampus, hypothalamus, and amygdala. This suggests that the aPFC treats sequences of places, valence activations, and internal affective states the way the sensory neocortex treats sequences of sensory information. – Page 270

Tries to predict what the animal will do next – Page 271

Dickinson had discovered habits. By engaging in the behavior five hundred times, rats had developed an automated motor response that was triggered by a sensory cue and completely detached from the higher-level goal of the behavior. – Page 278

Humans and, indeed, all mammals (and some other animals that independently evolved simulation) sometimes pause to simulate their options (model-based, goal-driven, system 2) and sometimes act automatically (model-free, habitual, system – Page 279

Karl Friston also offers an explanation for the perplexing fact that some parts of the frontal cortex are missing the fourth layer of the neocortical column. What does layer four do? In the sensory cortex, layer four is where raw sensory input flows into a neocortical column. Layer four is speculated to have the role of pushing the rest of the neocortical column to render a simulation that best matches its incoming sensory data (perception by inference). – Page 280

Thus, the aPFC spends very little, if any, time trying to match its inferred intent to the behavior it sees, and so it has no need for a large, or even any, layer four. – Page 281

Controlling ongoing behavior often also requires working memory—the maintenance of representations in the absence of any sensory cues. Many imagined paths and tasks involve waiting. – Page 283

14: The Secret To Dishwashing Robots

This suggests that the motor cortex was originally not the locus of motor commands but of motor planning. When an animal must perform careful movements—placing a paw on a small platform or stepping over an out-of-sight obstacle—it must mentally plan and simulate its body movements ahead of time. This explains why the motor cortex is necessary for learning new complex movements but not for executing well-learned ones. When an animal is learning a new movement, the motor cortex simulations vicariously train the basal ganglia. Once a movement is well learned, the motor cortex is no longer needed. – Page 291

Figure 14.4: The motor hierarchy in early placental mammals – Page 294

imaginarium – Page 302

Breakthrough #4: Mentalizing And The First Primates

15: The Arms Race For Political Savvy

This process of ever-escalating deceptions and counter-deceptions reveals that both Rock and Belle were able to understand the other’s intent (“ Belle is trying to lead me away from the food,” “Rock is trying to trick me by looking away”), as well as understand that it is possible to manipulate the other’s beliefs (“ I can make Belle think I am not looking by pretending to be disinterested,” “I can make Rock think the food is in the wrong location by leading him in that direction”). – Page 315

They plucked fruit from trees right after it ripened but before it fell to the forest floor. This allowed primates to have easy access to food without much competition from other species. This unique ecological niche may have offered early primates two gifts that opened the door to their uniquely large brains and complex social groups. First, easy access to fruit gave early primates an abundance of calories, providing the evolutionary option to spend energy on bigger brains. And second, and perhaps more important, it gave early primates an abundance of time. – Page 322

16: How To Model Other Minds

This suggests that the granular prefrontal cortex plays a key role in your ability to project yourself—your intentions, feelings, thoughts, personality, and knowledge—into your rendered simulations, – Page 330

The idea that the new primate areas take part in modeling your own mind makes sense when you follow their input/ output connectivity. The older mammalian aPFC gets input directly from the amygdala and hippocampus, while the new primate gPFC receives almost no amygdala or hippocampal input or any direct sensory input at all. Instead, the primate gPFC gets most of its input directly from the older aPFC. One interpretation of this is that these new primate areas are constructing a generative model of the older mammalian aPFC and sensory cortex itself. – Page 331

Reflexes would say, Because I have an evolutionarily hard-coded rule to turn toward the smell coming from the left. Vertebrate structures would say, Because going left maximizes predicted future reward. Mammalian structures would say, Because left leads to food. But primate structures would say, Because I’m hungry, eating feels good when I am hungry, and to the best of my knowledge, going left leads to food. In other words, the gPFC constructs explanations of the simulation itself, of what the animal wants and knows and thinks. – Page 332

These newly primate neocortical regions seem to be the locus of both one’s model of one’s own mind and the ability to model other minds. – Page 338

17: Monkey Hammers And Self-Driving Cars

If mirror neurons were simply automatic mirrors, then they wouldn’t activate in the above cases where monkeys were not directly observing behaviors. – Page 347

One reason it is useful to simulate other people’s movements is that doing this helps us understand their intentions. – Page 348

The subregions of premotor cortex required for controlling a given set of motor skills are the same subregions required for understanding the intentions of others performing those same motor skills. – Page 348

Acquiring novel skills through observation required theory of mind, while selecting known skills through observation did not. There are three reasons why this was the case. The first reason why theory of mind was necessary for acquiring novel skills by observation is that it may have enabled our ancestors to actively teach. – Page 354

Teaching is possible only with theory of mind. Teaching requires understanding what another mind does not know and what demonstrations would help manipulate another mind’s knowledge in the correct way. – Page 354

The second reason why theory of mind was necessary for learning novel motor skills through observation is that it enabled learners to stay focused on learning over long periods. A rat can see another rat push a lever and a few moments later push the lever itself. But a chimpanzee child will watch its mother use anvils to break open nuts and practice this technique for years without any success before it begins to master the skill. Chimp children continually attempt to learn without any near-term reward. – Page 355

Theory of mind enables a chimp child to realize that the reason it is not getting food with its stick while its mother is getting food is that its mother has a skill it does not yet have. – Page 355

The third and final reason why theory of mind was necessary for learning novel motor skills through observation was that it enabled novices to differentiate between the intentional and unintentional movements of experts. Observational learning is more effective if one is aware of what another is trying to accomplish with each movement. – Page 355

Theory of mind evolved in early primates for politicking. But this ability was repurposed for imitation learning. The ability to infer the intent of others enabled early primates to filter out extraneous behaviors and focus only on the relevant ones (what did the person mean to do?); it helped youngsters stay focused on learning over long stretches of time; and it may have enabled early primates to actively teach each other by inferring what a novice does and does not understand. – Page 360

18: Why Rats Can’T Go Grocery Shopping

Setting up camp en route to a nearby popular fruit patch the night before requires anticipating the fact that you will be hungry tomorrow if you don’t take preemptive steps tonight to get to the food early. – Page 362

He surprisingly found that being a frugivore seemed to explain the variation in relative brain size perhaps even better than the size of a primate’s social group. – Page 363

The mechanics of making a choice based on an anticipated need, one you are not currently experiencing, presents a predicament to the older mammalian brain structures. We have speculated that the mechanism by which the neocortex controls behavior is by simulating decisions vicariously, the outcomes of which are then evaluated by the older vertebrate structures (basal ganglia, amygdala, and hypothalamus). This mechanism allows an animal to choose only simulated paths and behaviors that excite positive valence neurons right now, like imagining food when hungry or water when thirsty. – Page 365

Suddendorf may have been prescient in proposing that the general ability to model a dissociated mental state from your own can be repurposed for both theory of mind and anticipating future needs. – Page 368

Breakthrough #5: Speaking And The First Humans

19: The Search For Human Uniqueness

If it were the case that humans wielded numerous intellectual capabilities that were entirely unique in kind, we would expect human brains to contain some unique neurological structures, some new wiring, some new systems. But the evidence is the opposite—there is no neurological structure found in the human brain that is not also found in the brain of our fellow apes, and evidence suggests that the human brain is literally just a scaled-up primate brain: – Page 374

Human language differs from other forms of animal communication in two ways. First, no other known form of naturally occurring animal communication assigns declarative labels (otherwise known as symbols). A human teacher will point to an object or a behavior and assign it an arbitrary label: elephant, tree, running. In contrast, other animals’ communications are genetically hardwired and not assigned. Vervet monkey and chimpanzee gestures are almost identical across different groups that have no contact with each other. Monkeys and apes deprived of social contact still use the same gestures. In fact, these gestures are even shared across species of primates; – Page 376

a reward: “When I hear sit, if I sit, I will get a treat” or “When I hear stay, if I stop moving, I will get a treat.” This is basic temporal difference learning—all vertebrates can do this. Declarative labeling, on the other hand, is a special feature of human language. A declarative label is one that assigns an object or behavior an arbitrary symbol—“ That is a cow,” “That is running,”—without any imperative at all. – Page 377

The second way in which human language differs from other animal communication is that it contains grammar. – Page 377

On balance, most scientists seem to conclude that some nonhuman apes are indeed capable of learning at least a rudimentary form of language but that nonhuman apes are much worse than humans at it and don’t learn it without painstaking deliberate training. These apes never surpass the abilities of a young human child. – Page 381

declarative labels and grammar, enables groups of brains to transfer their inner simulations to each other with an unprecedented degree of detail and flexibility. – Page 381

All these practical benefits emerge from the fact that language expands the scope of sources a brain can extract learnings from. The breakthrough of reinforcing enabled early vertebrates to learn from their own actual actions (trial and error). The breakthrough of simulating enabled early mammals to learn from their own imagined actions (vicarious trial and error). The breakthrough of mentalizing enabled early primates to learn from other people’s actual actions (imitation learning). But the breakthrough of speaking uniquely enabled early humans to learn from other people’s imagined actions. – Page 383

By sharing what we see in our imaginations, it is also possible for common myths to form and for entirely made-up imaginary entities and stories to persist merely because they hop between our brains. We tend to think about myths as the province of fantasy novels and children’s books, but they are the foundation of modern human civilizations. Money, gods, corporations, and states are imaginary concepts that exist only in the collective imaginations of human brains. – Page 384

And so, with the ability to construct common myths, we can coordinate the behavior of an incredibly large number of strangers. This was a massive improvement over the system of social cohesion provided by primate mentalizing. Coordinating behavior using mentalizing alone works only by each member of a group directly knowing each other. – Page 384

The true power of DNA is not the products it constructs (hearts, livers, brains) but the process it enables (evolution). In this same way, the power of language is not its products (better teaching, coordinating, and common myths) but the process of ideas being transferred, accumulated, and modified across generations. – Page 386

If the baseline of ideas always fades after a generation or two, then a species will be forever stuck in a nonaccumulating state, always reinventing the same ideas over and over again. – Page 387

Eventually, the corpus of ideas accumulated reached a tipping point of complexity when the total sum of accumulated ideas no longer fit into the brain of a single human. This created a problem in sufficiently copying ideas across generations. In response, four things happened that further expanded the extent of knowledge that could be transferred across generations. First, humans evolved bigger brains, which increased the amount of knowledge that can be passed down through individual brains. Second, humans became more specialized within their groups, with ideas distributed across different members—some were the spear makers, others clothing makers, others hunters, others foragers. Third, population sizes expanded, which offered more brains to store ideas across generations. And fourth, most recent and most important, we invented writing. Writing allows humans to have a collective memory of ideas that can be downloaded at will and that can contain effectively an infinite corpus of knowledge. If groups don’t have writing, such distributed knowledge is sensitive to group size; if groups shrink, and there are no longer enough brains to fit all the information into, knowledge is lost. There is evidence that this occurred – Page 390

The real reason why humans are unique is that we accumulate our shared simulations (ideas, knowledge, concepts, thoughts) across generations. We are the hive-brain apes. – Page 391

The emergence of language marked an inflection point in humanity’s history, the temporal boundary when this new and unique kind of evolution began: the evolution of ideas. In this way, the emergence of language was as monumental an event as the emergence of the first self-replicating DNA molecules. – Page 391

20: Language In The Brain

Wernicke’s aphasia, – Page 394

The point is that language emerges not from the brain as a whole but from specific subsystems. This suggests that language is not an inevitable consequence of having more neocortex. It is not something humans got “for free” by virtue of scaling up a chimpanzee brain. Language is a specific and independent skill that evolution wove into our brains. – Page 396

Humans have, in fact, inherited the exact same communication system of apes, but it isn’t our language—it is our emotional expressions. – Page 398

Humans, however, have two communication systems—we have this same ancient emotional expression system and we have a newly evolved language system in the neocortex. – Page 400

A skill as sophisticated as flying is too information-dense to hard-code directly into a genome. It is more efficient to encode a generic learning system (such as a cortex) and a specific hardwired learning curriculum (instinct to want to jump, instinct to flap wings, and instinct to attempt to glide). It is the pairing of a learning system and a curriculum that enables every single baby bird to learn how to fly. – Page 402

To teach a new skill, it is often easier to change the curriculum instead of changing the learning system. – Page 403

It seems that joint attention and proto-conversations evolved for a single reason. What is one of the first things that parents do once they have achieved a state of joint attention with their child? They assign labels to things. – Page 405

Even before human children can construct grammatical sentences, they will ask others questions: “Want this?” “Hungry?” All languages use the same rising intonation when asking yes/ no questions. When you hear someone speak in a language you do not understand, you can still identify when you are being asked a question. This instinct to understand how to designate a question may also be a key part of our language curriculum. – Page 406

What makes these skills possible is not a single region that executes them but a curriculum that forces a complex network of regions to work together to learn them. So this is why your brain and a chimp brain are practically identical and yet only humans have language. What is unique in the human brain is not in the neocortex; what is unique is hidden and subtle, tucked deep in older structures like the amygdala and brain stem. It is an adjustment to hardwired instincts that makes us take turns, makes children and parents stare back and forth, and that makes us ask questions. – Page 407

21: The Perfect Storm

Both Homo erectus and modern humans have a peculiar method of cooling down—while other mammals pant to lower their body temperature, modern humans sweat. These traits would have kept our ancestors’ bodies cool while they were trekking long distances in the hot savannah. – Page 416

These changes are perplexing; with a bigger body and brain, Homo erectus would have needed more energy and thus stronger jaws and longer digestive tracts for consuming more food. In the 1990s, the primatologist Richard Wrangham proposed a theory to explain this: H. erectus must have invented cooking. When meat or vegetables are cooked, harder-to-digest cellular structures are broken down into more energy-rich chemicals. – Page 416

Big brains are hard to fit through birth canals. Human bipedalism would have further exacerbated this problem, as standing upright requires narrower hips. This is what the anthropologist Sherwood Washburn calls the “obstetric dilemma.” The human solution to this is premature birthing. A newborn cow can walk within hours of being born, and a newborn macaque monkey can walk within two months, but newborn humans often can’t walk independently for up to a year after they are born. Humans are born not when they are ready to be born, but when their brains hit the maximum size that can fit through the birth canal. – Page 417

Premature birthing and an extended period of childhood brain growth put pressure on H. erectus to change its parenting style. – Page 418

Many paleoanthropologists believed this shifted Homo erectus group dynamics away from the promiscuous mating of chimpanzees to the (mostly) monogamous pair-bonding we see in today’s human societies. – Page 418

One theory is that menopause evolved to push grandmothers to shift their focus from rearing their own children to supporting their children’s children. – Page 418

The first words may have emerged from proto-conversations between parents and their children, perhaps for the simple purpose of ensuring the successful transmission of advanced tool manufacture. – Page 428

Selective use of language to help rear children into independent successful tool-using adults is no more mysterious than any other form of parental investment. Second, the learning program for language is most prominent in the hardwired interplay of joint attention and proto-conversations between parents and children, suggestive of its origin in these types of relationships. – Page 428

If groups imposed costs on cheaters by punishing them, either by withholding altruism or by directly harming them, then gossip would enable a stable system of reciprocal altruism among a large group of individuals. – Page 429

The use of language for gossip plus the punishment of moral violators’ makes it possible to evolve high levels of altruism. Early humans born with extra altruistic instincts would have more successfully propagated in an environment that easily identified and punished cheaters and rewarded altruists. – Page 429

Herein lies both the tragedy and beauty of humanity. We are indeed some of the most altruistic animals, but we may have paid the price for this altruism with our darker side: our instinct to punish those who we deem to be moral violators; our reflexive delineation of people into good and evil; our desperation to conform to our in-group and the ease with which we demonize those in the out-group. – Page 430

For every incremental increase in gossip and punishment of violators, the more altruistic it was optimal to be. For every incremental increase in altruism, the more optimal it was to freely share information with others using language, which would select for more advanced language skills. For every incremental increase in language skills, the more effective gossip became, thereby reinforcing the cycle. – Page 430

Our language, altruism, cruelty, cooking, monogamy, premature birthing, and irresistible proclivity for gossip are all interwoven into the larger whole that makes up what it means to be human. – Page 432

Noam Chomsky, who argues that language initially evolved only as a trick for inner thinking. – Page 432

22: Chatgpt And The Window Into The Mind

This is a downside of reasoning by simulating—we fill in characters and scenes, often missing the true causal and statistical relationships between things. – Page 444

Children do not simply listen to endless sequences of words until they can predict what comes next. They are shown an object, engage in a hardwired nonverbal mechanism of shared attention, and then the object is given a name. The foundation of language learning is not sequence learning but the tethering of symbols to components of a child’s already present inner simulation. A human brain, but not GPT-3, can check the answers to mathematical operations using mental simulation. – Page 444

Much of what makes human language powerful is not the syntax of it, but its ability to give us the necessary information to render a simulation about it and, crucially, to use these sequences of words to render the same inner simulation as other humans around us. – Page 446

By training GPT-4 to not just predict the answer, but to predict the next step in reasoning about the answer, the model begins to exhibit emergent properties of thinking, without, in fact, thinking—at least not in the way that a human thinks by rendering a simulation of the world. – Page 451

Even if LLMs correctly answer commonsense and theory-of-mind questions, it does not necessarily mean it reasons about these questions in the same way. – Page 452

Conclusion: The Sixth Breakthrough

Breakthrough #1 was steering: the breakthrough of navigating by categorizing stimuli into good and bad, and turning toward good things and away from bad things. – Page 456

Breakthrough #2 was reinforcing: the breakthrough of learning to repeat behaviors that historically have led to positive valence and inhibit behaviors that have led to negative valence. – Page 456

Breakthrough #3 was simulating: the breakthrough of mentally simulating stimuli and actions. – Page 457

Breakthrough #4 was mentalizing: the breakthrough of modeling one’s own mind. – Page 457

Breakthrough #5 was speaking: the breakthrough of naming and grammar, of tethering our inner simulations together to enable the accumulation of thoughts across generations. – Page 458

Thus far, humanity’s story has been a saga of two acts. Act 1 is the evolutionary story: how biologically modern humans emerged from the raw lifeless stuff of our universe. Act 2 is the cultural story: how societally modern humans emerged from largely biologically identical but culturally primitive ancestors from around one hundred thousand years ago. – Page 459

It can be hard to conceptualize just how young our fourteen-billion-year-old universe actually is. – Page 460

Portia spiders. – Page 463

The more we understand about our own minds, the better equipped we are to create artificial minds in our image. The more we understand about the process by which our minds came to be, the better equipped we are to choose which features of intelligence we want to discard, which we want to preserve, and which we want to improve upon. – Page 463

Acknowledgments

Rebecca Gelernter and Mesa Schumacher – Page 465

Behave by Robert Sapolsky. – Page 467

About The Publisher

Although systems don’t necessarily get more complex, the possibility of complexity increases over time. – Page 552

In fact, recent studies show how elegantly evolution modified the function of dopamine while still retaining its earlier role of generating a state of wanting. The amount of dopamine in the input nuclei of the basal ganglia (called the “striatum”) seems to measure the discounted predicted future reward, triggering the state of wanting based on how good things are likely to be and driving animals to focus on and pursue nearby rewards. As an animal approaches a reward, dopamine ramps up, peaking at the moment when an animal expects the reward to be delivered. During this ramping-up process, if predicted rewards change (some omission or new cue changes the probability of getting a reward), then dopamine levels rapidly increase or decrease to account for the new level of predicted future reward. These rapid fluctuations in dopamine levels are produced through the bursting and pausing of dopamine neurons that Schultz found; these rapid fluctuations in dopamine levels are the temporal difference learning signal. The quantity of dopamine floating around in the striatum modifies the excitability of neurons, which shifts behavior toward exploitation and wanting. In contrast, the rapid changes in dopamine levels trigger modifications in the strength of various connections, thereby reinforcing and punishing behaviors. – Page 564