Encoding and learning of internal models by the cerebellum

One of the main theories in motor control is that in order to move accurately, the brain forms internal models that learn to predict the sensory consequences of motor commands. Evidence for this idea comes from human behavioral experiments and animal lesion studies, suggesting that a critical structure for forming internal models is the cerebellum. However, it has been difficult to decipher how, at the neuronal level, the cerebellum represents internal models: while for some tasks like smooth pursuit eye movements the activity of cerebellar Purkinje cells (P-cells) appears to predict velocity of the eye, for most other movements such as saccadic eye movements, wrist movements, or arm movements, it has been difficult to relate activity of P-cells to the actual motion. That is, if the P-cells are encoding an internal model, that encoding is not obvious.

Because learning likely involves a change in the neural encoding, the poor understanding of the neural encoding of internal models has made it difficult to decipher how experience of an error changes the encoding, and how that change in the encoding improves control of movements. The problem, therefore, is fundamentally about understanding neural encoding in the cerebellum.

Recently, our team described a new idea. We imagined that the P-cells of the cerebellum are organized in micro-clusters, where each micro-cluster is composed of P-cells that project onto a single output neuron (in the deep cerebellar nucleus). Importantly, the P-cells that form a micro-cluster share a specific property: a common preference for error. The preference for error in an individual P-cell is expressed through its complex spike activity. Therefore, one way to test our idea was through organizing the simple spikes of the P-cells based on functional properties of each P-cell’s complex spikes.

A large body of P-cell data from the cerebellum had been collected by our colleagues Dr. Robi Soetedjo and Dr. Yoshiko Kojima at Univ. of Washington. We analyzed that data using the micro-cluster hypothesis and found something interesting: the population response of the simple spikes of the P-cells precisely predicted motion of the eyes during a saccade via a gain-field. That is, if the P-cells were organized into groups wherein we measured the simple spikes generated by only the cells that shared the same preference for error, then the population response exquisitely predicted motion of the eyes as a multiplicative interaction between real-time speed and direction of motion. Because a gain-field encoding had earlier been found for representation of eye movements in the posterior parietal cortex, and for representation of arm movements in the motor cortex, our observations in the cerebellum raised the possibility that there was a common principle of encoding in disparate regions of the motor system.

So the idea is that to decode cerebellar activity, the unit of computation is not a single P-cell, or a population of P-cells where the members are selected randomly, but a population of P-cells that share the same preference for error. Our ongoing research, including collaboration with Univ. of Washington, and our own recordings from the cerebellum, uses this computational model to tackle the problem of how the cerebellum might learn to build internal models, and contribute to control of movements.
  1. Managing Director
  2. Managing Director
  3. Managing Director
​​
Vigor of movements and its relationship to decision-making

There are a couple of good puzzles regarding the question of how the brain controls behavior. The first puzzle is with regard to how the brain makes a decision. The second puzzle is with regard to how it performs actions. For example, at the breakfast table, you consider the various options and choose to reach for the bagel. The way you deliberated various options, and arrived at your choice, is the decision-making part. The way you reached is the motor-control part.

The first puzzle is studied in the field of decision-making using a framework in which a utility is assigned to each potential action. This utility depends on the reward at stake, and the effort that may be required to perform that action. The action that is chosen is often the one that has the highest utility. The second puzzle is studied in the field of movement-control using a framework in which a cost is assigned to each potential sequence of motor commands. The motor commands that are chosen, i.e., the speed of the movement and its trajectory, are the ones that minimize this cost. In a sense, the field of decision-making has been concerned with the question of what to do, whereas the field of motor-control has been concerned with the question of how to perform the selected movement.

We think, however, that the two puzzles are related. Both the decision of which movement to perform, and the ensuing movement, is influenced by the purpose of the movement: a movement that is associated with a high utility is not only preferred, but it is also performed with greater vigor (faster).  For example, animals not only prefer a stimulus that promises greater reward, but also move faster toward that stimulus. You walk faster toward someone you love, as compared to someone you don't like as much. These observations suggest that while the utility of an action depends on the reward at stake and the required effort (dictating which stimulus to move toward), the same variables also influence how the brain performs the action (dictating vigor of the ensuing movement). We have been working on formulating a unified framework in which we can understand both the decision that the brain makes as to which action to perform, and the details of the movement that follows that decision.

In collaboration with Prof. Alaa Ahmed at University of Colorado, we recently proposed a mathematical framework that attempts to unify motor-control with decision-making. The basic idea is that the way we make decisions, and the way we move, are related because both aim to maximize a common utility, and because in our brain, many of the neural circuits that care about reward and effort in decision-making, are shared with the circuits that control movements. And so perhaps the trait-like features of our individuality is reflected in not only the way we make decisions, but also the way we move. As a result, when disease affects one aspect of our behavior (decision-making), it can also affect the other (motor-control).

For example, some of the neural circuits  that are involved in representing reward and effort are located in the basal ganglia, and these circuits are damaged in Parkinson's disease. Perhaps this produces consistent deficits in both decision-making and motor control. We recently showed that through non-invasive brain stimulation, some of the symptoms associated with perception of effort could be allieviated in some PD patients. Our current research is investigating the relationship between decision-making and movement vigor in healthy people, as well as in people who suffer from Parkinson's disease. 
Meta-learning in humans, monkeys, and robots

The brain does not simply learn from error, but appears to control error-sensitivity, that is, in some cases a given prediction error results in robust learning, whereas in other cases the same error produces little or no learning.  How does the brain control how much it is willing to learn from error? 

Understanding control of error-sensitivity is important both from a biological perspective, and from a machine learning perspective.  From a biological perspective, control of error-sensitivity may provide insights into two critical puzzles: savings and meta-learning.  Savings refers to the observation that training in task (A), followed by washout, produces accelerated re-learning of (A).  Meta-learning refers to the observation that training in task (A), followed by washout, produces accelerated learning of task (-A).  From a machine learning perspective, control of error-sensitivity dominates rates of convergence of internal models, learning of trajectories, and reinforcement-dependent procedures.  It also dictates whether learning of one task can benefit the machine’s ability to transfer learning to a related task.

Our team includes Prof. Jose Carmena at Berkeley and Prof. Stefan Schaal at USC. Together, we have taken on the problem of learning from a new direction: the ability to control error-sensitivity in three domains, human psychophysics, monkey neurophysiology, and robotics.

Our basic idea is that the brain controls error-sensitivity in a principled way, relying on the history of past errors to build a memory of errors.  This memory of errors is effectively a value function that labels each error that is experienced during training in terms of its usefulness in improving performance. If indeed the brain maintains a memory of errors, then this previously unknown form of memory may not only account for the phenomenon of error-sensitivity, but also savings and meta-learning. That is, we propose that when humans exhibit the ability to do a task better than before, it is partly because they recognize the errors that they have experienced before.