Elsevier

Behavioural Brain Research

Volume 261, 15 March 2014, Pages 356-368
Behavioural Brain Research

Review
Learning from feedback: The neural mechanisms of feedback processing facilitating better performance

https://doi.org/10.1016/j.bbr.2013.12.043Get rights and content

Highlights

  • Task features influence how feedback processing EEG correlates relates to learning.

  • The association between FRN and performance is mediated by learning characteristics.

  • Higher mid-frontal theta after feedback is associated to better learning.

  • Two types of beta oscillations were found in feedback learning studies.

  • Mid-frontal areas connect to posterior and anterior regions to process feedback.

Abstract

Different levels of feedback, from sensory signals to verbal advice, are needed not only for learning new skills, but also for monitoring performance. A great deal of research has focused on the electrophysiological correlates of feedback processing and how they relate to good learning. In this paper, studies on the EEG correlates of learning from feedback are reviewed. The main objective is to discuss these findings whilst also considering some key theoretical aspects of learning. The learning processes, its operational definition and the feedback characteristics are discussed and used as reference for integrating the findings in the literature. The EEG correlates of feedback processing for learning using various analytical approaches are discussed, including ERPs, oscillations and inter-site synchronization. How these EEG responses to feedback are related to learning is discussed, highlighting the gaps in the literature and suggesting future directions for understanding the neural underpinnings of learning from feedback.

Introduction

Human beings have a remarkable learning capacity which allows them to acquire a wide range of skills, from motor actions to complex abstract reasoning. Independently of what is being learned, feedback is necessary not only for correcting mistakes, but also for monitoring and improving the performance up to a highly proficient level. Feedback is here conceptualized as an outcome of an action that is captured by the senses. An action usually has more than one outcome which may be used as feedback, from direct visual information, such as looking where the ball lands after hitting it when playing tennis, to more elaborated or explicit outcomes, such as the coach's verbal feedback on your movement execution to hit the ball. Because feedback is crucial for correcting, monitoring and improving performance [1], it has been widely investigated in learning related fields [2], [3], [4], [5].

Recent efforts to understand the role of feedback on learning have focussed on the neural correlates of feedback processing using neuroimaging [6], [7], [9]. Most of these studies focused on the event related potentials (ERPs) associated with feedback processing, especially on a component called the feedback related negativity (FRN), also referred to as the feedback ERN [10], [11], [12]. The latter term is due to the resemblance between the FRN and the error related negativity (ERN), which is a negative deflection of the ERPs that peaks 80 ms after an erroneous response. A similar negative deflection on the ERPs, about 145–300 ms following unexpected feedback instead of an erroneous response, was subsequently observed and named the FRN [13]. The FRN has been investigated in a number of studies [6], [8], [9], as an index of the performance monitoring system activity in response to feedback. Both the ERN and FRN have a mid-frontal topography, and are likely to be generated in the anterior cingulate cortex (ACC), especially the dorsal portion (dACC) [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]. Because an ERP waveform is typically an average of the brain electrical signals over a number of trials, it cannot capture brain activations that are not phase locked to the event as they are averaged out in the process [23]. These out of phase activations are often considered as induced rather than evoked by the stimulus, and they may reveal relevant aspects of the feedback processing. Some recent studies have shown that the oscillatory components of feedback processing can shed new light on the various features of feedback-guided learning [7]. It was found that theta power (4–8 Hz) increases in the mid-frontal areas, 200–500 ms after an error or negative feedback [7], [24], [25], [26], [27], [28], [29]. Positive feedback, on the other hand, is associated with increases in the beta frequency range oscillations (approximately 20–30 Hz) [24], [28], [30]. Recent studies investigating how distant brain areas communicate following correct and incorrect feedback demonstrated that the mid-frontal areas communicate with other executive and sensorimotor areas in the theta frequency range for feedback processing [7], [26], [29], [30].

These electrophysiological features of feedback processing have been linked to learning in many ways. First, there are studies showing that a stronger FRN is associated with a greater likelihood of corrections or adjustments in performance [24], [31]. Second, studies have demonstrated that learning brings about a lower FRN in response to negative feedback, as the prediction error is reduced when people already know the action and outcome associations. A reduced FRN is usually accompanied by an increased ERN, since the error detection can rely on internal sources after learning takes place [32], [33]. Third, it has been suggested that mid-frontal theta oscillations represent a variety of executive cognitive processes, which are essential for learning [34]. These theta oscillations are predictive of corrective actions during the course of learning [30], [35] and are higher in people who are better learners [29]. Oscillations in the beta range have also been related to learning in some studies [29], [30], [36]. Fourth, studies have suggested that communication between distant brain areas in response to feedback is related to how well a skill can be learned [29], [30]. This body of work provides support for the notion that the way we process feedback has an important implication on the quality of learning.

Notwithstanding the large number of published papers investigating the EEG correlates of feedback processing, in both ERP and oscillatory domains, there is still a lack of integration between the various findings. The nature of the learning associated with the adopted paradigms is rarely discussed. For example, an incorrect feedback generated FRN during a task that requires the participant to discover a set of rules is likely to play a different role on learning compared to an FRN generated during a task that requires the participant to produce a time interval. The main characteristics of the task, the learning and the feedback have to be considered when interpreting neuroimaging findings, but the feedback learning literature typically overlooks these characteristics. In this paper, some of these aspects will be discussed in order to integrate the findings of the literature on the electrophysiological correlates of feedback processing.

A search in PubMed with the terms ‘EEG AND feedback AND learning’ or ‘feedback negativity’ resulted in 699 papers (December 2013), which indicates the abundant number of studies published in the past few years. Therefore, there is no intention of covering all the published research on the topic in this paper, so the scope of this review is limited to published papers in which the EEG responses to feedback are explicitly linked to learning. The main objective of this paper is to present a critical overview of the state of the art of EEG feedback processing associated with learning.

The review is structured in five main sections. First, the main theoretical aspects of learning from feedback are introduced. Second, some key studies on how the FRN is linked to learning are reviewed. Third, the oscillatory correlates of feedback processing are explained. Fourth, the main inter-site synchronization or connectivity findings are elucidated. Fifth, the neural basis of the EEG correlates of feedback processing is discussed. The content of this review is summarized in Table 1, which presents the theoretical aspects to consider, and Table 2, which integrates the main findings discussed in the paper.

In Table 1, there is a description of the main aspects of learning from feedback discussed in this section. Albeit real life learning situations typically involve a mix of the aspects described in Table 1, distinguishing the differences among each aspect involved in learning and feedback is helpful to understand its neural underpinnings.

The first aspect is the learning process, which refers to the cognitive mechanisms required to learn the task used in the experiment. The two main types of learning processes in relation to feedback discussed in this review are the error-based and reinforcement learning. The error-based learning refers to process of learning through finely graded error information, whereas in reinforcement learning, the agent uses rewards/punishment in the form of categorical information, such as the relative success or failure of an attempt, to learn [37]. For example, imagine learning to keep the ball in play when performing a top-spin in tennis. In a reinforcement learning framework, the feedback will be categorical and you will only be told whether the ball landed in or out. In an error-based framework, the feedback will be finely graded and you will see exactly where the ball landed.

According to Wolpert et al. [37], the nature of the information used to learn, regardless of sensory modality, is what characterizes the learning process. In the error-based learning, the movement's outcome is compared with the predicted or desired error, and by estimating the error gradient, the system knows not only that it missed the target but also the way it was missed. Therefore, this type of learning requires finely graded and signed feedback that informs not only that an error was made, but also the particular way the error occurred. Through practice, people can develop reasonably good mappings between motor and sensory variables, which allow them to identify the most likely source of an error and to progressively reduce it.

In a reinforcement learning framework [38], an agent learns which actions to take in order to maximize the numerical reward signal by trial and error. The error information is categorical: it informs that a mistake has been made without providing more finely graded error information. This categorical output or feedback is often represented as a reward (correct response) or punishment or absence of reward (in case of errors). One crucial problem to solve in reinforcement learning is the credit assignment-how to assign credit or blame to the actions that led to reward, or punishment. Learning these associations between action and rewards improves the agent's capability of achieving a goal. There are four main sub-elements of a reinforcement learning system: (1) a policy: which is the way of behaving at a given time corresponding to a set of stimulus-response associations; (2) a reward function: representing the intrinsic desirability of a certain state, i.e., what are good and bad events, which is the basis for changing the policy; (3) a value function: which specifies the desirability of a certain state in the long run, or how much reward the agent expects to accumulate over time. While the reward function determines what is immediately good or bad, the value function sets the long-term desirability of states; (4) a model of the environment: which is a set of state and action predictions that mimics the behaviour of the environment. Models can be used for planning, since they facilitate the anticipation of future situations before they are experienced. This last feature allows the system to make new predictions, to plan ahead in order to avoid making new mistakes instead of only learning from them. It is an evolution of the reinforcement learning ideas, considering that planning is the opposite of a purely trial-and-error approach.

The differences between error-based learning and reinforcement learning are connected to the learning processes involved. Error-based learning can be associated with implicit or procedural learning processes, whereas reinforcement learning may be more relevant when learning involves hypothesis-testing processes. In general, implicit or procedural learning refers to the processes mediating the acquisition of skills whose performance does not depend on verbal processes while explicit or rule-based learning refers to the acquisition of knowledge that can be verbalized [39].

These concepts of error-based versus reinforcement learning are also connected to the second aspect of learning from feedback highlighted in Table 1, the feedback characteristics. In error-based learning, the feedback is typically represented as finely graded error information. In the reinforcement learning, however, the feedback can have various formats, including error information in its categorical format (e.g., ‘incorrect’ or ‘correct’). The three main types of feedback described here are motivational, performance related or a mix of these two. First, we can classify the feedback by its motivational value: reward, punishment, and neutral. Second, we can classify the feedback in relation to its content for performance monitoring: categorical, graded, and finely graded. Categorical feedback only informs whether a response was correct or not, whereas graded feedback assigns different degrees of correctness or incorrectness. Finely graded feedback also informs different degrees of error, but using error units, instead of graded categories. For example, in a time estimation task, a graded feedback would be ‘slow’, ‘too slow’, ‘fast’, ‘too fast’, and ‘correct’; whereas a finely graded feedback would inform the precise difference between the participant's estimation and the target (i.e., in milliseconds). A third type of feedback is a mix of performance feedback and motivational valued information. In a mixed feedback condition, a reward can also indicate a correct response. This last type is potentially problematic because it mixes the reward with the performance aspects of the feedback. For example, Mars et al. (2004) conducted three experiments to investigate the FRN in response to categorical (correct/incorrect) and to two different types of graded feedback in a time estimation task, one showing how incorrect the response was (too slow or too fast or correct) and the other presenting graded information as a plus or minus 2, 6, or 10 cents. The feedback indicated both how close or far the estimation was and how much reward would be given. Therefore, both performance and reward were represented in the same feedback, and it is consequently unknown whether the FRN encoded the degree of error, the reward value or some combination of the two.

The third aspect of learning from feedback to consider is the operational definition of learning, which refers to how learning was evaluated on the published studies. Because of its broad concept, learning can be measured in a number of different ways. Reviewing the literature on the electrophysiological correlates of feedback processing for learning, identifies three main learning operational definitions. The first one, performance adjustment, considers learning as a successful performance adjustment on the trial following an error [26], [30], [40]. This is particularly relevant in feedback processing studies where a good adjustment in performance after a mistake may indicate that the information conveyed by the feedback was incorporated to modify behaviour. The second type defines learning as the consolidation of a trained skill [29], [36]. In this case, learning is characterized as the stabilization of performance, after reaching a plateau in the learning curve. The better the learning, the lower the variability in the responses, and the lower is the error. The third operational definition evaluates learning as the discovery of the underlying rules mediating rewards [41], [42].

An important issue is whether the brain uses the feedback information in the same way for all learning processes and in response to all types of feedback or whether there are components which are more specific to the task being performed. Some of these questions are: Is the FRN predictive of learning in both implicit and explicit tasks? Is graded and categorical performance feedback processed similarly? What is the role of feedback processing on learning adjustment and consolidation in implicit tasks? In the next sections, the main electrophysiological correlates of feedback processing will be described with a focus on how they relate to learning, considering the various aspects described in Table 1. First, the FRN will be described and connected to learning. Second, the oscillatory correlates of feedback processing will be discussed. Third, the connectivity findings will be elucidated. The electrophysiological correlates of feedback processing related to learning reviewed in this paper are summarized in Table 2.

The FRN is probably the most investigated ERP component in feedback guided learning studies [9]. It was mentioned in the introduction that the FRN, or the feedback ERN, is a negative deflection in the ERPs occurring around 145–300 ms in response to feedback (Fig. 1a), especially when it signalizes an error has been made, although it was also observed after positive feedback when negative feedback was expected [43], which suggests that the FRN represents the expectedness of the feedback rather than its valence. The FRN has a mid-frontal topography (Fig. 1b), which is the reason why this ERP component is measured from electrodes around FCz or Fz. In Fig. 1a, the average waveforms over the mid-frontal electrodes show a clear FRN following incorrect feedback (red lines) for both high- (solid lines) and low-learners (dashed lines). In Fig. 1b, the topography of the activation averaged over the identified FRN time-window (transparent grey box) is presented, showing a mid-frontal topographical distribution.

The FRN is usually analyzed through difference waves between correct and incorrect feedback [9]. The idea is to ‘neutralize’ the shared cognitive processes involved in the processing of feedback. However, this can be risky as the difference waveform may produce components that are not present in the original waveforms [44]. In the case of feedback processing, difference waveforms can be problematic considering that ‘positive’ or ‘correct’ feedback is not necessarily neutral. An example of this problem is a study conducted to investigate the differences in the ERPs and EEG spectra in response to wins and losses with 25%, 50% and 75% probability [24]. They found that the difference wave was modulated by the probability of wins/losses, but when the average FRN was analysed separately per condition, it was clear that the responses to wins were driving the observed effects, and there was no impact of probability on the ERPs to losses. Therefore, it is recommended to compare the averaged or peak ERPs in the determined time window for each condition separately.

It is hypothesized that the FRN is triggered by the detection of a reward prediction error (RPE) [45], which is the difference between the expected and the obtained reward [38]. Because of its fundamental role in coding prediction errors [46], [47], the dopaminergic system has been widely implicated in human neuroimaging papers investigating the performance monitoring system. The influential work of Holroyd and Coles (2002) suggested that the mesencephalic dopamine (DA) system carries predictive error signals to be used by other parts of the brain for reinforcement learning, mainly by the ACC. The connection between the FRN, the mesencephalic dopamine, and reward prediction errors is the main motivation behind a number of studies showing how the FRN varies according to reward probability and magnitude [6], [9]. However, the hypothetical connection between the FRN and the mesencephalic dopamine system has been criticized for being mostly theoretical, since it cannot be directly tested [7].

In an insightful review [48], studies on the neurotransmitters involved in the feedback processing were analysed, including the experiments with Parkinson's disease (PD), which revealed that the FRN–DA link is more complex than previously assumed, and it is likely to involve interaction with other neurotransmitters such as norepinephrine, serotonin, GABA, and adenosine. They also observed that PD patients off medication learn better from negative feedback, whereas patients on L-DOPA learn better from positive feedback or rewards. Notwithstanding that this could support Holroyd and Coles (2002) hypothesis, there are two major problems that challenge this theory. First, if the FRN is brought about by phasic decreases in dopamine signals projected to the ACC resulting from RPEs, it would not be possible to observe an FRN in response to unexpected positive feedback as it was found by others [43], [49]. Second, the dynamics of the meso-prefrontal DA system, unlike the striatum, is too slow to be able to trigger a very fast burst that is observed in the ERN or even the FRN [48]. It is possible that the ERN and FRN represent a process of top-down modulation of the mesencephalic DA neurons. This idea has further support from studies showing that neurons in the cingulate cortex can modulate the activity in the striatum and midbrain [50], [51] which was already suggested in a previous study [40]. Therefore, the ERN/FRN could be generated in the frontal cortex, including the ACC, which sends an associated error signal to the midbrain DA neurons. This hypothesis, put forward in the cited review paper [48], could explain the results of number of studies which found that the FRN is sensitive to RPEs, and also to positive prediction errors [16], [42], [43], [52], [53], [54], [55], [56], [57], [58].

Independently of the relation between dopamine and the FRN, it is accepted that it represents activity of the performance monitoring system, which is crucial for improving performance. Therefore, some studies tried to analyse the relation between this ERP component and learning performance. While some studies focused on how the FRN changes over the course of learning, others attempted to analyse whether it predicts an adequate performance adjustment in the next trial. The studies dedicated to track the changes in the FRN over the course of learning assume that learning promotes the development of strong associations between action and outcomes. If this is true, it is expected that the FRN will be reduced in the later stages of learning, because the predictions are better matched with the actual results causing the prediction error to be low.

It was found that the FRN was reduced as the participants got insight into the rule of a guessing task [32]. In this study, the participants had to figure out a hidden rule for obtaining the reward. Each trial could or could not be rewarded based on the participants’ choice. They found that in the second stage of the task, only the participants who learned the rule reduced their FRN amplitude, while the non-learners continued to show an increased FRN following error feedback (or the absence of reward).

One of the main characteristics of the FRN is to respond to unexpected feedback. When people learn, however, they usually depend less on external sources to evaluate whether or not they have made a mistake. Using the Eriksen Flanker task, two studies examined the functional relationship between the FRN and the ERN [33], [59]. During the task, the participants were told to focus on the mid letter of the following letter strings: ‘HHHHH’, ‘SSSSS’, ‘SSHSS’ or ‘HHSHH’ and respond whether that letter was an H or a S. The feedback was ‘correct and in time’, ‘correct but out of time’, and ‘false’. In this task, the performance improves with practice, and errors are less frequent. Their results demonstrated that feedback following small errors (not easily detected) was associated with a larger FRN and absent ERN, whereas large errors (easy to detect) were followed by and ERN and a smaller FRN. The authors suggested that performance feedback only triggers an FRN if it provides non-redundant information. Other studies using different paradigms found the same inverse relationship between the ERN and the FRN [60], [61].

The previously cited studies analyzed the impact of learning on the FRN and its functional relation with the ERN, but understanding how the FRN is associated to learning a skill or acquiring knowledge is still challenging. van der Helden et al. [31] investigated how the FRN predicts good reinforcement learning, having defined learning as whether, after a mistake was committed, the participant did not repeat the same error. Using a paradigm where the participants had to learn a sequence of button presses, they found that the FRN predicted whether the participants would avoid or repeat the same mistake in the next attempt. They demonstrated that a larger FRN in response to error feedback was associated with better learning, which means that the subjects were less likely to repeat that same mistake on the trials following a stronger FRN. Other studies have found that both the FRN and the ERN were predictive of behavioural adjustment in the next trial [10], [40], [52], [62], [63]. Although those studies indicated that the FRN is related to corrective actions, others failed to find evidence supporting this prediction [29], [30], [36], [42].

Studies using a time estimation task, which requires implicit learning, failed to find an association between the FRN and learning [29], [30], [36]. In contrast, when the association between FRN and learning was tested using a reinforcement learning task in which the participants had to choose between two options with distinct reward probabilities, the FRN amplitude was predictive of the next trial positive adjustment [24]. Other studies using tasks involving probabilistic associations found that the magnitude of the FRN was related to adjusting the performance on the subsequent trial [52], [62], [63]. However, an experiment using a probabilistic reinforcement learning task [42] found that although the FRN was higher for larger prediction errors, it did not relate to changing the response on the next trial. The difference was that in the latter study the participants were previously informed of the explicit rules for finding out the more rewarding stimulus. The authors suggested that the FRN might represent a process that is more relevant for behavioural adjustment when the feedback contingencies are not explicitly known in advance.

A careful analysis of the paradigms and how the feedback is effectively used to perform is essential for understanding how the FRN is associated with learning. The FRN might not represent a process which is crucial to perform a time estimation task as it is for reinforcement learning. The studies described earlier found a relation between the FRN and learning, evaluated as an adjustment on the next trial, using tasks that require explicit associations, whereas the time estimation studies did not find any association between FRN and learning.

In addition, the relevance of the feedback for learning has also to be taken into account when comparing the brain responses to it. A study investigated how the FRN relates to learning using a paired associate learning paradigm [64]. In this task, the participant had to associate an abstract figure to one of four computer generated nonwords for representing the name of that figure. After each trial, the participants received feedback indicating whether the chosen label was or not the correct name of the presented figure. By trial and error, the participants had to learn 30 object names (nonwords), which were tested on the next day, in order to assess the long-lasting learning effect. The FRN and a fronto-central component in subsequent time-window (∼400 ms) were measured after negative (incorrect association) and positive (correct association) feedback, respectively. They observed that only the ERP's amplitude in response to positive feedback (fronto-central component) was predictive of learning, considered as the retention capability. Following negative feedback, there was no association between the FRN and learning. A naïve interpretation would be that we learn more from positive than negative feedback. This result, however, could be explained by the fact that in this task, positive feedback provides more useful information to learn than negative, since once the association is correct (positive feedback) it can be immediately memorized, whereas following negative feedback, the participant still has other three possible correct names to test. Therefore, it seems that the amount of useful information in the feedback is a crucial aspect in the comprehension of how the FRN or other ERPs in response to feedback are related to learning. It might be that the FRN represents a process more relevant to learning things that depend more on hypothesis-testing procedures or knowledge that can be tested explicitly using verbal processes and associations. However, because prediction errors might have a more complex or harder to quantify relation with implicit learning, the relation between performance and the FRN might have been missed using the current methods.

The vast majority of papers on feedback processing have focused on the evoked components of the performance monitoring [7]. As mentioned previously, information that is not time, and phase locked is often missed when looking exclusively in the ERPs domain, because signal from different trials can cancel each other out during the averaging process [23]. In this section, the oscillatory correlates of feedback processing are discussed. Following a general overview of the main frequency bands involved in feedback processing, the specific findings in the theta and beta frequency bands are elucidated and related to learning in separated sub-sections.

In Fig. 2, there is an example of two time-frequency representations (TFR–Fig. 2a) and their respective topographical distributions (Fig. 2b) of theta power from 200 to 500 ms following correct and incorrect feedback presentation. The TFR shows an increase in the theta relative power following incorrect feedback presentation, as indicated by the black frame around the effect in the figure (Fig. 2). This increase in theta power is stronger over the mid-frontal areas as evidenced in Fig. 2b, following incorrect feedback presentation. This is a typical example of the theta power effect found in a number of studies [7], [24], [29], [34], [65], [66] and its meaning will be discussed in the following sub-section.

Besides theta, many studies found a difference between negative and positive feedback in the beta frequency range (15–30 Hz) [28], [29], [30], [67], [68]. Some of these studies found an increase in beta power following a reward [24], [28], [67] or a correct performance feedback [30], whereas others found a desynchronization in this frequency range following incorrect [29] and small error feedback [36]. Therefore, it seems there are two different beta oscillations in response to reward and in response to performance feedback. The meaning of these two types of beta oscillatory responses is discussed in the sub-section following the theta oscillatory correlates of feedback processing. The main results discussed in this review are summarized in Table 2.

Oscillations in the theta frequency range (4–8 Hz) over the midline frontal electrodes (usually FCz and Fz, see Fig. 2b) have been associated with feedback processing in many studies [25], [26], [28], [29], [30], [66], [69]. These theta oscillations are likely generated in the ACC [25].

Using a set of different tasks, oddball, probabilistic learning, and a Simon task, the hypothesis that theta oscillatory processes are reflected in a multitude of mid-frontal ERPs, namely the ERN, the FRN, the N2 (control and mismatch), and the correct related negativity (CRN) was tested [34]. Although these components of the ERPs may all index the functioning of the performance monitoring system, they are differentially sensitive to response/feedback features. For example, the control N2 is sensitive to variations in stimulus-response demands, while the mismatch N2 responds to stimuli that represents an unexpected perceptual differentiation. The results showed that while these ERP components were sensitive to specific stimulus manipulations in each task, theta oscillations were sensitive to all manipulations, suggesting that they represent a non-specific mechanism for organizing neural processes around the decision points or for coordinating internal and external performance relevant information. In that study [34], theta power was sensitive to novelty, conflict, punishment and error commission, which are all considered performance relevant information.

The wide range of functions represented by theta is also evident in another study [65] investigating these oscillations in a reinforcement learning task in which participants had to explore the response options to find an optimal response. The findings showed that: (1) relative uncertainty and theta power were only positively correlated when participants were choosing an option with larger associated uncertainty; (2) these correlations were found over the areas associated with exploration; (3) these exploration related effects were larger in participants who effectively used uncertainty to guide exploration.

The hypothesis that theta oscillations represents uncertainty guided exploration is supported by another study [70] which found that theta increased in response to a switch cue in a Wisconsin Card Sorting Task (WCST) and to the first positive feedback of a new series indicating the choice was correct. They also found that this theta effect was reduced as the participants learned the association and did not have to change or explore a new strategy. Therefore, it seems that theta represents not only processes related to the detection of an error or punishment, but also in translating the uncertainty in the feedback into action, which is among the functions of the ACC [71].

There is evidence that theta oscillations are linked to better learning. Using a time production task, van de Vijver et al. (2011) found that increases in theta oscillations in response to error feedback was predictive of an adequate performance adjustment on the next trial. Cavanagh et al. [26] observed that theta was modulated by the prediction error in a probabilistic reinforcement learning task. Specifically, they observed mid-frontal theta was predictive of a slower next trial reaction time following an error while lateral frontal theta was associated with faster next trial reaction time following a correct choice. Studies comparing high- and low-learners showed that the former group has increased mid-frontal theta responses to incorrect performance feedback [29] and to losses [69].

Theta oscillations also seem to be involved in overriding more established Pavlovian biases. In a recent experiment [72], the participants had to learn the association between 4 different cues and a reward or punishment. The response format was either ‘GO’ (respond) or ‘NOGO’ (withhold the response), resulting in four conditions: 1) Go to Win: response resulted in a reward; 2) Go to Avoid: response resulted in a loss; (3) NoGo to Win: withholding the response resulted in a reward; (4) NoGo to Avoid: withholding the response resulted in a loss. According to the Pavlovian bias, animals, and human beings have a tendency to act in order to win and to inhibit in order to avoid a loss. Therefore, the responses ‘Go to Win’ and ‘NoGo to Avoid’ are easier to learn. In order to maximize their reward in the task, the participants had to override these biases for learning these ‘counterintuitive’ associations that resulted in higher reward. The results indicated that theta oscillations over the frontal cortex were correlated to the participants’ ability of overriding the Pavlovian learning biases during the course of the task. The authors suggested that frontal theta oscillations reflect effortful processes to suppress the bias by means of cognitive control mechanisms.

These studies found evidence linking frontal theta oscillations to adequate next trial adjustments, but is it related to the consolidation of what was learned? Using a time estimation task, a study [29] found that the high-learners presented an increase in mid-frontal theta oscillations, which was significantly lower in the low-learners group. Nevertheless, this increase did not correlate with task performance during the trials with no feedback, suggesting that whereas mid-frontal theta power may be important for guiding adequate performance adjustments, it does not scale with how well the participants stabilize performance in an implicit or procedural learning task. The fact that the participants who learned better presented higher mid-frontal theta responses may indicate that they recruited higher cognitive control in response to incorrect feedback, but that does not necessarily correlate with how much they incorporated that implicit skill.

Taken together, these studies may indicate that theta play a role in learning from feedback when it carries relevant information for action. Feedback which indicated a need for changing the behavioural policy seems to be associated to theta oscillations in many different paradigms. In this section, it was demonstrated that theta is relevant for learning in all operational definitions presented in Table 1. Higher mid-frontal theta responses was related to learning in papers that evaluated it as an adequate performance adjustment on the trial following error feedback [26], [30], or as the consolidation of a learned skill [29] or even as the discovery of underlying reward/punishment rules [69] when they conflict Pavlovian learning biases [72].

Previous studies have suggested that positive feedback is associated with larger power in the beta or low gamma frequency range (20–40 Hz) over the left central and mid-frontal electrodes [24], [28], [30], [67], [69], [73]. Based on previous animal studies demonstrating increased beta, gamma, and high-gamma oscillations in striatum in response to actions associated with rewards [74], [75], it has been suggested that beta and gamma oscillations following positive feedback reflect the activation of reward related areas in the brain. A recent study found that the beta only increases in response to unexpected rather than expected rewards [67]. This result may indicate that beta oscillations play a role on learning from positive feedback instead of only reacting to positive outcomes.

Another study [30] suggested that this increase in beta oscillations reflects a signal for keeping the status quo of the motor system following a correct trial [76]. There are, however, relevant differences between this study [30] and the previous ones [24], [28], [67], [73]. First, the paradigm adopted in the van de Vijver et al. (2011) study is a time estimation task whose feedback represents the accuracy of the response (correct or incorrect) rather than a reward. Second, the topography of beta oscillations differs, since the reward studies found a frontal or mid-frontal topography whereas the latter study has a sensorimotor left-lateralized topography. Third, the time estimation study looked at a slightly lower frequency range (17–24 Hz) than the reward studies (20–40 Hz). A recent study [29] using a time estimation task found a higher beta desynchronization over the left sensorimotor areas following error feedback rather than a beta synchronization following correct feedback. It seems, therefore, that there are two distinct forms of beta oscillations in response to feedback. The first is a beta/gamma synchronization related to reward processing, observed in many studies [28], [67], [68], [73] with a frontal or mid-frontal topography for beta and low-gamma frequency ranges. The second is a desychronization in the beta frequency range with a left central topography in response to incorrect performance feedback in motor or implicit tasks.

It is hypothesized that this second type of beta synchronization reflects processes related to mentally retrieving and correcting the previous trial. This hypothesis is supported by another study which found larger beta desynchronization following small rather than large error feedback in a time estimation tasks [36]. In addition, they found that beta desynchronization was highly correlated to how well the participants learned the task. The reason for higher beta desynchronization in response to small rather than large errors might be related to how much effort a small and a large error requires for correction. It is common sense that reaching perfection requires more than the initial learning. Therefore, a small error, which requires precise correction, has the potential to trigger more sophisticated error correction mechanisms than a large error where the scope for correction is larger.

Therefore, the activation of motor areas reflected in beta desynchronization seems to play an important role in learning from error feedback. However, it is difficult to determine whether larger desynchronization over the left motor areas following feedback facilitates learning or is a consequence of it. Nevertheless, one study [77] using transcranial direct-current stimulation (tDCS) demonstrated that by stimulating the primary motor area, it is possible to boost the skill consolidation, which means better learning. Although it seems the beta desynchronization plays an important role in learning from errors, there is a need for more studies to confirm the link and the causal relation between this oscillation and learning. In particular, studies looking at how feedback is processed in motor learning tasks are needed, especially graded feedback, which has been largely overlooked.

Studies on how distant brain areas communicate to process feedback information are still scarce. Cohen et al. (2011) proposed that inter-site phase synchronization in the theta frequency range represents the communication between distant brain areas for processing negative or error feedback. For learning from positive feedback or rewards, the authors suggested that the brain areas communicate in beta frequency range. Indeed, the few studies investigating the inter-site synchronization in response to feedback found that the main frequency to synchronize in mid-frontal areas is theta [26], [29], [30], [69], [78]. However, De Pascalis et al. (2012) showed that both beta (13–25 Hz) and gamma (30–40 Hz) inter-site synchronization between frontal and posterior areas is higher for losses compared to gains, which refutes the hypothesis that only theta inter-site synchronization represents processes of learning from rewards. More studies are needed on the beta and gamma inter-site synchronization to discuss this issue.

The studies on inter-site synchronization in response to feedback focused on how the mid-frontal areas synchronize with others, especially dorsolateral prefrontal and central/parietal sites. It was found that the mid-frontal (FCz) area synchronizes with the right prefrontal (F6) in the theta frequency range following incorrect error feedback [26], [30]. Moreover, right prefrontal (F6) and left sensorimotor (CP3) electrodes were higher following incorrect than correct error feedback [30]. Another study also found a similar connectivity pattern (FCz-F5/6) following errors in a Flanker task [78]. Importantly, it was observed that the degree of synchronization in the theta range between those areas predicted the next trial slowing following an error in the latter study. This communication between mid-frontal and dorsolateral prefrontal was interpreted as a mechanism in which the need for adjustment in performance is communicated from the performance monitoring areas (mid-frontal), possibly originated in the ACC, to the prefrontal brain regions, responsible for the planning and implementation of a corrective action. However, this interpretation is not supported by the results of another study [29] investigating the functional connectivity patterns in response to feedback, since it found that the synchronization between FCz and F5/F6 was similar following correct and incorrect feedback. The topographical organization is featured in Fig. 3a, in the heads-in-head plot. This figure shows the inter-site synchronization in the whole array of electrodes; each electrode is represented as a mini topographical plot. In this mini-topography, the synchronization values received (blue) and sent (red) to other channels are represented. In Fig. 3a, it is evident that 300–600 ms after the incorrect feedback there is synchronization in the theta band, with signals being sent from mid-frontal/central electrodes to pre-frontal ones (peak at F5).

Luft et al. (2013) PSI analysis revealed three different stages of feedback processing: (1) immediately following the response; (2) from 1 to 300 ms; and (3) from 300 to 600 ms after the feedback (Fig. 3b). Interestingly, all these three stages were linked to the mid-frontal areas in the theta frequency range. In the first stage, immediately after the response, the flux of information was from mid-frontal (peak at FCz) to sensorimotor electrodes. This flux might represent an internal ‘check’ of the motor system status, which could allow an ERN, for example, to emerge after an erroneous response. On the second stage, after feedback was presented, this flux reversed, going from sensorimotor, especially left central, to mid-frontal. This synchronization seems to represent the reverse process, where the ‘match’ or ‘non-match’ between the response and the feedback are communicated back to the mid-frontal performance monitoring areas. In the next stage, from 300 to 600 ms after feedback, this flux changes again, going from mid-frontal to prefrontal, especially dorsolateral. This is the same electrode pair that was found in other performance monitoring connectivity studies [26], [30], [78]. This stage may reflect an update of the feedback history, as the mid-frontal areas send signals to the dorsolateral prefrontal cortex, an area involved in building up expectations [79]. Importantly, there was no difference in the connectivity patterns between correct and incorrect feedback, indicating that this process represents a general path for processing feedback, rather than specific mechanisms for correcting errors. Importantly, the inter-site synchronization was stronger for the high-learners and it was correlated with performance during trials without feedback, which suggests that these processes are related to the skill consolidation.

It remains uncertain, however, if higher connectivity facilitates learning or if it is more pronounced after learning takes place. A study [80] connecting fronto-striatal connectivity with the error related theta oscillations found that stronger connections within these regions predicted a higher increase in theta power following errors in a Simon task. This result supports the idea that stronger structural connections mediate individual differences in learning capacity. Nevertheless, this topic needs more research as there is evidence that even short training sessions can alter structural connectivity [81].

The fact that all studies on the inter-site synchronization in response to feedback found connections with mid-frontal areas may indicate that the performance monitoring areas, especially the ACC, are involved in integrating performance related information. The ACC has extensive connections with the motor cortex [82], [83] and with the dorsolateral prefrontal cortex (DLPFC) [82], which could mediate the inter-site synchronization found in previous studies. It seems that the ACC plays an important role in integrating performance related information in order to boost learning.

A recent study [84] observed that the mid-frontal areas served as a hub for the theta band network interactions after errors. Using a perceptual discrimination task, they demonstrated that the mid-frontal sends signals to parietal occipital sites after errors (response related connectivity), whereas these posterior regions communicate with mid-frontal through alpha oscillations (stimulus anticipation related connectivity). Importantly, the higher the connectivity was, the more likely the participants were to correctly adjust their performance on the next trial. This study indicates that there is a difference in the frequency bands for inter-site synchronization between different brain areas, as alpha waves connected from posterior regions of the brain to anterior whereas theta from mid-frontal to posterior areas.

The idea that the brain areas communicate in specific frequencies is supported by a recent study [36] whose main findings indicate that the phase synchronization between sensorimotor and mid-frontal areas in response to graded feedback happens in the beta frequency band (17–24 Hz). By presenting graded feedback during a time estimation task, we found that the high-learners group presented increased phase synchronization in the beta frequency range (17–24 Hz) between motor related electrodes contralateral to the response hand (C3–C4) and mid-frontal areas (FCz, Fz). We found that the higher these regions synchronized in the beta frequency range, the better the participants were able to perform the task without feedback in the end of the session.

These results support the idea that the frequencies in which brain areas communicate may vary according to nature of cognitive operation involved. To understand how feedback is processed, it may be relevant to integrate the specific nature of the task being learned, the format of the feedback and its usefulness to perform the task, to the frequency in which the main involved brain areas synchronize. Nevertheless, more studies are needed to understand how distinct brain areas interact to learn from feedback. Are the network interactions similar for implicit and explicit learning? What are the frequencies mediating the communication to and from different brain areas? Are they brought about or the cause of better learning? These and other questions have the potential to help deciphering the neural substrates mediating highly proficient performance. Moreover, finding how the feedback is related to learning has the potential of not only improving our understanding on learning disabilities, but also can lead the development of better teaching methods.

Section snippets

The neural basis of the EEG correlates of feedback processing for learning

Source localization studies have implicated the ACC in the generation of both the FRN and the theta oscillations. Although the generation of the FRN has been theoretically connected to the mesencephalic dopaminergic system projecting to the ACC, this view has been challenged. The ACC has, amongst its variety of functions, a role in integrating feedback related information and translating it into action [71]. This brain structure has subgroups of neurons which differentially respond to positive

Future directions

In this paper, we discussed the state-of-the-art research on the neural correlates of feedback processing for learning. The scope of this review was limited to the studies using EEG for investigating feedback processing related to learning. The aim was to discuss the relation between the various findings (ERPs, oscillations and inter-site synchronization) with learning, considering some key theoretical aspects of learning from feedback

New studies on feedback processing using EEG are needed to

Acknowledgements

I am thankful to Dr. Stuart W. Derbyshire for his critical comments and advice on the paper, to Professor Joydeep Bhattacharya for his research support, to Aimee Goldstone and Nuno Reis Gonçalves for proof reading the paper. I am also grateful to the reviewers for their constructive comments on the paper.

References (89)

  • J.F. Cavanagh et al.

    Frontal theta links prediction errors to behavioral adaptation in reinforcement learning

    Neuroimage

    (2010)
  • D. Papo et al.

    Modulation of late alpha band oscillations by feedback in a hypothesis testing paradigm

    Int J Psychophysiol

    (2007)
  • J. Marco-Pallares et al.

    Human oscillatory activity associated to reward processing in a gambling task

    Neuropsychologia

    (2008)
  • I. van de Vijver et al.

    Aging affects medial but not anterior frontal learning-related theta oscillations

    Neurobiol Aging

    (2014)
  • U. Sailer et al.

    Effects of learning on feedback-related brain potentials in a decision-making task

    Brain Res

    (2010)
  • W. Schultz

    Getting formal with dopamine and reward

    Neuron

    (2002)
  • G. Jocham et al.

    Neuropharmacology of performance monitoring

    Neurosci Biobehav Rev

    (2009)
  • D. Joel et al.

    The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum

    Neuroscience

    (2000)
  • C. Bellebaum et al.

    It is less than you expected: the feedback-related negativity reflects violations of reward magnitude expectations

    Neuropsychologia

    (2010)
  • A. Gentsch et al.

    Dissociable medial frontal negativities from a common monitoring system for self- and externally caused failure of goal achievement

    Neuroimage

    (2009)
  • S.V. Muller et al.

    Brain potentials related to self-generated and external information used for performance monitoring

    Clin Neurophysiol

    (2005)
  • M.G. Philiastides et al.

    Temporal dynamics of prediction error processing during reward-based decision making

    Neuroimage

    (2010)
  • M.J. Frank et al.

    Error-related negativity predicts reinforcement learning and conflict biases

    Neuron

    (2005)
  • A. HajiHosseini et al.

    The role of beta-gamma oscillations in unexpected rewards processing

    Neuroimage

    (2012)
  • J. Marco-Pallares et al.

    et al: Genetic variability in the dopamine system (dopamine receptor D4, catechol-O-methyltransferase) modulates neurophysiological responses to gains and losses

    Biol Psychiatry

    (2009)
  • V. De Pascalis et al.

    EEG oscillatory activity associated to monetary gain and loss signals in a learning task: effects of attentional impulsivity and learning ability

    Int J Psychophysiol

    (2012)
  • M. Hallschmid et al.

    EEG synchronization upon reward in man

    Clin Neurophysiol

    (2002)
  • A.K. Engel et al.

    Beta-band oscillations—signalling the status quo?

    Curr Opin Neurobiol

    (2010)
  • M.X. Cohen

    Error-related medial frontal theta activity predicts cingulate-related structural connectivity

    Neuroimage

    (2011)
  • Y. Sagi et al.

    Learning in the fast lane: new insights into neuroplasticity

    Neuron

    (2012)
  • M.F.S. Rushworth et al.

    Contrasting roles for cingulate and orbitofrontal cortex in decisions and social behaviour

    Trends Cogn Sci

    (2007)
  • L.W. Leung et al.

    Electrical activity of the cingulate cortex. I. Generating mechanisms and relations to behavior

    Brain Res

    (1987)
  • D.G. Burkhard et al.

    Effect of film feedback on learning the motor skills of karate

    Percept Mot Skills

    (1967)
  • G. Wulf et al.

    Frequent feedback enhances complex motor skill learning

    J Mot Behav

    (1998)
  • M.X. Cohen

    Neurocomputational mechanisms of reinforcement-guided learning in humans: a review

    Cogn Affect Behav Neurosci

    (2008)
  • W.J. Gehring et al.

    A neural system for error detection and compensation

    Psych Sci

    (1993)
  • W.J. Gehring et al.

    The error-related negativity: an event-related brain potential accompanying errors

    Psychophysiology

    (1990)
  • W.H.R. Miltner et al.

    Event-related brain potentials following incorrect feedback in a time-estimation task: evidence for a generic neural system for error detection

    J Cogn Neurosci

    (1997)
  • M. Balconi et al.

    Error monitoring functions in response to an external feedback when an explicit judgement is required ERP modulation and cortical source localisation

    Int J Psychophysiol

    (2011)
  • P. Luu et al.

    Electrophysiological responses to errors and feedback in the process of action regulation

    Psychol Sci

    (2003)
  • G.F. Potts et al.

    Neural response to action and reward prediction errors: comparing the error-related negativity to behavioral errors and the feedback-related negativity to reward prediction violations

    Psychophysiology

    (2010)
  • D.L. Santesso et al.

    Neural responses to negative feedback are related to negative emotionality in healthy adults

    Soc Cogn Affect Neurosci

    (2011)
  • C.D. Ladouceur et al.

    Development of action monitoring through adolescence into adulthood: ERP and source localization

    Dev Sci

    (2007)
  • V. Van Veen et al.

    The timing of action-monitoring processes in the anterior cingulate cortex

    J Cogn Neurosci

    (2002)
  • Cited by (124)

    View all citing articles on Scopus
    View full text