The core process in addictions and other
impulses: Hyperbolic discounting versus
conditioning and cognitive framing

George Ainslie
University of Cape Town, South Africa
Veterans Affairs Medical Center, Coatesville

Presented at What Is Addiction
The Third Mind and World Conference
University of Alabama, Birmingham
May 5, 2007

Published in 2010 in
D. Ross, H. Kincaid, D. Spurrett, and P. Collins, eds.,
What Is Addiction? MIT, pp. 211-245.

This material is the result of work supported with resources and the use of facilities at the Department of Veterans Affairs Medical Center, Coatesville, PA, USA. The opinions expressed are not those of the Department of Veterans Affairs of the US Government.

Abstract

One of the great puzzles of behavioral science has been people’s frequent temporary preferences for alternatives that usually seem inferior, a pattern that can be called impulsiveness. Three explicit candidates for the basic mechanism have each gathered substantial backing in the literature of motivation: visceral learning—in effect the classical conditioning of appetite; the hyperbolic discounting of expected events; and shifts in the cognitive framing of the incentives. The visceral learning theory has been distinguished by its apparent ability to explain how stimuli can occasion sudden surges of appetite without predicting the greater probability or proximity of their objects. However, on closer examination none of the three kinds of mechanism accounts for such surges if applied linearly. Recursive self-prediction can amplify appetite sufficiently to do so. It probably requires discount curves to be hyperbolic, and appetite to be a reward-seeking process.

----------

It has been clear since Plato that we have a strong tendency to act against what we ourselves see as our rational interests. Rationality has been defined with increasing precision since economist Paul Samuelson’s “Note on the Measurement of Utility” (1937), in what has come to be called expected utility theory or, more broadly, rational choice theory (RCT; Boudon 1996). Antirational tendencies have been catalogued in departures from the norms of RCT, many of which have been identified both in the laboratory (Herrnstein 1990) and in real-life choices (Jolls, Sunstein, and Thaler 1998). Graphically, RCT predicts that a person’s valuation of a prospective event can be plotted over delay as an exponential curve, losing a constant proportion of its remaining height for every unit of delay (figure 1A):

where Value0 = value if immediate and δ = (1 – discount rate). Any other function, if it does not generate a straight line, will generate curves from a given amount that sometimes cross the curves from some other amounts at other moments, simply because of the passage of time; that is, it will describe inconsistent preference. Inconsistent preference leaves a person susceptible to being a money pump, that is, opens her to exploitation by a competitor who repeatedly buys from her when her valuations fall below their exponentially discounted value and sells back to her when they rise (Arrow 1959; Conlisk 1996). Inconsistent preference also implies that a person can expect to make future choices that she does not currently want. Competition for survival is usually thought to have selected for people who value future events consistently, in markets and, perhaps, in the evolution of species (but see Cubitt and Sugden 2001).

This comprehensive model of choice leaves unanswered the question of why we often make choices that defeat our own plans—that is, why impulses or temporary preferences arise for alternatives that usually seem inferior. The diagnosis of impulse control disorder now covers a wide range of behaviors that lie at the extremes of ordinary bad habits, including pathological gambling, compulsive shopping, intermittent explosive disorder, binge eating, and “problematic Internet use” (Hollander and Stein 2006), which have joined the classical substance abuse disorders in the realm of addictions. Much can be ascribed to naivete; incipient addicts often fail to anticipate the problems that will ensue. It has also been proposed that addiction, conceived broadly, does not violate RCT, but is a rational choice by people with low valuations of the future (Becker and Murphy 1988). However, both the naivete theory and the steep but consistent devaluation theory fail to account for the frequent experience of addicts themselves, who report strenuous efforts at recovery that are undermined by temporary preferences to resume their habits, often with full knowledge of the consequences. The problem confronting RCT has a tougher core: Why do experienced addicts and even highly sophisticated nonaddicts keep making choices that they know they will regret? In other words, what causes the basic tendency to form temporary preferences, in the absence of new information, for alternatives that usually seem poorer?
This question is usually answered by a resort to common experience: a temporary craving (but why temporary?), a passionate nature (but how does passion interact with other motives?), or a weak will (but why should will be required in the first place?). The commonsense approach is developed most fully in the works of Baumeister and his collaborators (1994): An impulse is defined only as “a specific motivation or desire to perform a particular action” (132) but implies a “lower process” overriding a “higher process” (8) and represents a failure of “strength” (17–20), which these authors equate with willpower. By analogy to a muscle, willpower is said to become exhausted with effort in the short run, but strengthened by repeated efforts. These are familiar experiences, but an appeal to them does not advance our understanding of impulsive choice. What directs the will, against what, and what factors make the will strong against one kind of impulse at the same time that it is weak against another? An explicit theory of impulsiveness—and control—must specify the properties of an evaluation process that enables or prevents temporary preferences in particular circumstances. To assure coherence, such a theory should make it possible to depict its predictions as a plot of the values of alternatives over time.

The temporary preference phenomenon implies an upward shift in the value of a poorer reward that makes a plot of its value over time look more bowed than an exponential curve—hyperconcavity. Hyperconcavity is an anomaly in any model that depicts consistent valuation of the future. Several mechanisms have been proposed for why temporary preferences occur (reviewed in Ainslie 1992, 24–55), but only three kinds have received serious discussion in the current experimental literature on impulsiveness. Under the visceral reward hypothesis, stimuli in your environment that you associate with a rewarding activity come to induce appetite for this activity, so that you suddenly experience an amplified desire for it when those stimuli appear. The hyperbolic discounting hypothesis (my proposal) holds that the basic discount curve is more deeply bowed than an exponential curve, leading smaller, sooner (SS) rewards to loom temporarily over larger, later (LL) ones, in the same way that a building can temporarily hide a taller one behind it as you approach. Under this hypothesis the value of imminent rewards is always amplified, and the flattening of curves into the exponential shape reflecting consistent choice is what needs explanation. Finally, cognitive framing hypotheses hold that your categorization of the available options may shift with changing distance in ways that disrupt continuous exponential discounting, inducing miscalculations that yield de facto hyperconcavity and thus cause temporary preferences for SS rewards. In this chapter I will evaluate these three kinds of theory. In particular, I will compare how the first two can be extended beyond their heretofore inadequate coverage of “conditioned appetite,” and will find in this comparison an additional basis for choosing between them.

Visceral Reward and Conditioning

Visceral reward theory is recognizable as the heir to the earliest psychological theory of impulsiveness, classical conditioning. In the early days of experimental psychology classical conditioning seemed to be the obvious explanation for impulsive choices. When experimenters created animal analogues of human choice, Thorndike represented the intuitively familiar process of goal-directed learning in a theory of response selection by “satisfiers” (Thorndike 1898; see Chance 1999). Subsequently, Pavlov reported what seemed to be a different kind of response selection: When an event could “reflexively” elicit a specific response, this response came to occur after an arbitrarily selected stimulus that had been paired with the event (Pavlov 1927). However, he did not say that this was a special kind of learning; he described responses that were “conditional” on the occurrence of a stimulus, but his term was mistranslated as “conditioned” (Dinsmoor 2004). Seemingly a minor change, the implication of substituting the verb for the adjective was that a process beyond ordinary learning about the predictiveness of environmental events was taking place. Subsequent theorists took “conditioned” responses to be selected by an unmotivated response transfer process.
            The relationship of Thorndike’s selection by goals and the goal-independent selection imputed to Pavlov was ambiguous until Skinner (1935) proposed a methodological dichotomy): Operant responses were those selected by reward, and were thus goal-directed. In addition, there existed some events (unconditioned stimuli, UCSs) that elicited reflexive responses (unconditioned responses, UCRs). UCSs could select for the transfer of UCRs to arbitrarily designated stimuli (conditioned stimuli, CSs) that predicted their occurrence, regardless of whether the CSs predicted that the transferred responses (now called conditioned responses, CRs) would be rewarded. Authors soon noted that all effective UCSs had a motivational valence (Hull 1943; Miller 1969), but since this valence could be in the direction of either reward or its opposite, aversiveness, it still did not seem possible that conditioned responses were selected by the same mechanism as goal-directed ones. Unmotivated conditioning was apparently necessary to explain a subject’s participation in unwanted experiences such as fear, rage, and craving for addictive substances.
            Since people often say that they do not want to do the impulsive things that they are in fact doing, and since the event that selects for these impulsive behaviors usually follows the behaviors closely, many authors from Watson (1924) on have ascribed disowned behaviors to classical conditioning. However, analytic experiments on conditioning itself have found it not to be a selective mechanism for behaviors. The pairing of novel stimuli with UCSs produces only CSs, not CRs (Mackintosh 1983; Rescorla 1988). That is, only information is learned by association; the occurrence and timing of “CRs” depends on what incentives are created by this information. Thus the sight of drugs on television, for instance, could not induce an involuntary drug taking response. Even the nature of the CR was recognized very early not to depend on what response is elicited by the UCS; only a few CRs happen to be the same, in detail or even in approximate kind, as the UCR (Upton 1929; Zener 1937). This difficulty required theorists to modify the proposal of conditioning as a mechanism for impulsive choice.
            In 1947, O. H. Mowrer proposed a two-factor theory of motivation. In Mowrer’s theory, an organism comes to associate a stimulus with a rewarding or punishing event through simple pairing, and thus experiences the relevant motivation whenever that stimulus appears (first factor). This motivation is then the basis for goal-directed behavior (second factor). The sight of drugs on television creates an appetite for the drug, which then motivates drug taking. However, the television show has not given the addict more information about either how available the drug is or how rewarding its consumption would be. To account for its motivational effect, conditioning theory has to postulate an extra-informational factor. This is the role of conditioned appetite or its close relative, conditioned emotion. This factor is what is said to explain impulsive preference. Both emotions and appetites are clearly responses, that is, processes that are separate from the perceptions that usually give rise to them. In fact, some emotions, mainly fear, were included in the work that found conditioning to transfer only information, not responses (Estes and Skinner 1941). Furthermore, in avoidance experiments signs of emotion appear after a longer latency than the behavior that the emotion supposedly motivates (Solomon and Wynne 1954). Thus, two-factor theory has also been held to be inadequate (Mackintosh 1983, 99–170).

Elicitation of strong appetites and emotions by reminders such as television shows is a familiar experience, but it is not adequately explained by the association process that has been studied in the laboratory under the name “classical conditioning.” What emerges from such studies is how efficient CRs are in the laboratory. In parametric experiments CRs anticipate the occurrence of UCSs with great accuracy. If a CS occurs or begins well before a UCS is due, subjects learn to estimate the delay and emit the CR just before the UCS (Kehoe, Graham-Clark, and Schreurs 1989; Savastano et al. 1998). With appetites specifically, cue-induced craving for cigarettes and skin conductance and salivation related to craving are strongly dependent on whether puffing is available within the next minute (Carter and Tiffany 2001; Field and Duka 2001). Thus in the laboratory involuntary responses closely track the prediction of reward. Furthermore, if the general process of learning behavioral contingencies in daily life counts as conditioning—the usual assumption—then nonanticipatory appetites are an anomalous variant here as well. Where the consumption of an addictive substance never happens in a given circumstance, humans do not crave it: Opiate addicts and alcoholics in programs that allow consumption only on certain days report absence of craving on other days (Meyer 1988), and observant orthodox Jews, who never smoke on the Sabbath, are reported not to crave cigarettes then (Dar et al. 2005; Schachter, Silverstein, and Perlick 1977). Classical conditioning, which is just associative learning, does not explain appetites that are disproportionate to the prospect of consuming and that change without changes in this prospect. It is true that some cue-induced appetites/emotions have been reported to grow without further contact with UCSs (Eysenck 1967); but these reports have not stood up under careful controls (Malloy and Levis 1990) and in any case go beyond passive association, thus requiring explanation themselves.

Cues that give new information about expectable rewards have the effects depicted in straightforward reward theory. The motivational effect of uninformative cues—mere reminders—remains to be explained. Why should a television show, or sometimes just a reverie about a drug, temporarily change someone’s preference? I will argue that none of the three theories described here accounts for this phenomenon as they have so far been developed; and I will propose a rationale for the needed mechanism. This mechanism works most parsimoniously with the second basic hypothesis, which I will now summarize.

Hyperbolic Discounting

The discovery that spontaneous discount curves are hyperbolic has offered an alternative explanation for impulses, although it, too, is incomplete. Hyperbolic or other hyperconcave discounting has been proposed as the basis of a revision of RCT that might serve as the model of motivation in all the behavioral sciences (Gintis 2007). The hypothesis that humans and nonhumans alike have a basic tendency to discount delayed events in a hyperconcave curve has two roots. The economist Robert Strotz (1956) pointed out that people might recognize nonexponential discount curves in themselves and thus expect themselves to reverse their own current plans in predictable ways as time goes by; and behavioral psychologists Shin-Ho Chung and Richard Herrnstein (1967) reported that pigeons working for food on two nonexclusive, unpredictable (concurrent VI VI) schedules distributed their pecks in proportion to the inverses of the mean delays to food delivery, demonstrating that Herrnstein’s matching law applies to delay. Application of the matching law to predictable rewards at specific delays (discrete trials) yields a hyperbolic function of value as a function of delay (Ainslie 1975), which was given its most-cited form by Mazur (1987):

where Value0 = value if immediate and k is degree of impatience. This function predicts that for some cases where smaller rewards precede larger alternatives, individuals will prefer the larger reward when both are distant, but will change to preferring the smaller reward as time elapses, an example in very basic terms of the predictable preference reversal that Strotz discussed (figure 1B).

            The first discrete trial experiment to look for hyperbolic discounting did not test the exact shape of the curve, only its property of predicting motivation to forestall a future choice (Ainslie 1974). Preference reversal as a function of delay was observed directly soon afterward (Ainslie and Herrnstein 1981; Green et al. 1981). In the same year economist Richard Thaler (1981) reported that discount rates inferred from human subjects’ self-reports declined as hypothetical delays became longer, implying hyperconcavity. The first preference-reversal experiment with human subjects offered college students temporary relief from irritating background noise (Solnick et al. 1980). The subjects preferred shorter, immediate respites to longer, delayed ones, but reversed their preference when the wait before each respite was lengthened by only 15 seconds. Soon preference reversal was found to occur even for cash prizes, both real and hypothetical (Ainslie and Haendel 1983). Precise curve-fitting came soon afterward (Mazur 1987; Rodriguez and Logue 1988) and has become increasingly fine-tuned (Grace 1996; Green, Fry, and Myerson 1994; Green and Myerson 2004; Kirby 1997; Mazur 2001).
            However, scrutiny of the hyperbolic model has brought up several apparent inadequacies: With some methods hyperbolic discounting cannot be elicited (Harrison, Lau, and Rutström 2005) or is inconsistent with the observed choice pattern (Read 2001; Read and Roelofsma 2003; Rubinstein 2003). Its greatest limitation for explaining impulsive choice in humans has been that in its basic form it does not predict the sudden craving elicited by stimuli associated with reward consumption, when these stimuli do not predict increased availability or proximity of this consumption (Loewenstein 1996; Laibson 2001). In craving induced by uninformative cues, some amplifying factor beyond immediacy is clearly operating. Conditioned appetite has been the obvious alternative.
            The conditioned appetite model meshes with the experience of sudden temptation that often precedes impulses. Many authors have continued to rely on conditioned appetite/emotion as an explanation for temporarily amplified valuations (Drummond et al. 1995; O’Brien 1997). Furthermore, brain correlates of rewarding events have recently become observable through functional magnetic resonance imaging (fMRI), and have been interpreted as revealing two separate motivational systems, one of which is based, in effect, on conditioned emotional reward and the other on consistent exponential discounting (McClure et al. 2004). There are also other phenomena for which differential reward alone has seemed not to be an adequate explanation: Both humans and nonhumans sometimes seek outcomes that they do not like, even for for as long as the moderately short times characteristic of impulses (Berridge and Robinson 1998); that is, they seem to want something at the same time as they dislike it. Also, the attention and engagement necessary for aversive experiences such as fear and pain have never been incorporated into a motivational model without being depicted as a reflex or a conditioned reflex. Thus there have been several reasons to stick to the classical conditioning hypothesis, of which visceral reward theory is the latest formulation.

Does Classical Conditioning Amplify Motivation?

Although conditioning is generally a process of information transfer, a larger role in some cases has been proposed. Ordinarily, conditioning entails the straightforward prediction of events, and thus should not create motives disproportionate to this prediction. However, Loewenstein has proposed an expanded role for conditioning in the case of a special, visceral class of motives, so that it alters “the relative desirability of different goods and actions” (Loewenstein 1999, 235) beyond what the information it conveys would do. Appetite need not then be commensurate with information.

Visceral factors include drive states such as hunger, thirst, and sexual desire, moods and emotions, physical pain, and, most importantly for addiction, craving for a drug. . . . At intermediate levels, most visceral factors, including drug craving, produce similar patterns of impulsivity, remorse, and self-binding. At high levels, drug craving and other visceral factors overwhelm decision making altogether, superseding volitional control of behavior. (Ibid.)

In situations where reward is imminently available, visceral reward theory predicts a spike of the motivation to consume it just as hyperbolic discounting theory does. The difference is hypothesized to be in the specific shape of the spike as consumption becomes possible—a sudden step upward from an exponential curve of what would be the value without appetite:

where Value0 = value if immediate and β has one of only two values, 0 < β < 1 or β = 1; δ = 1 – discount rate (McClure et al. 2004; figure 1C). This step function contrasts with a monotonic hyperbolic curve drawn from the height of the spike (formula 2).

When the prospect of a reward is imminent, the value of visceral reward theory is mainly formal. The economist David Laibson originally adopted this theory’s dual “hyperboloid” curve from an article on intergenerational transfers of wealth (Phelps and Pollack 1968) because “the discount structure [of the curve] mimics the qualitative property of the hyperbolic discount function, while maintaining most of the analytical tractability of the exponential discount function” (Laibson 1997, 450). However, as noted above, spikes of appetite for many kinds of reward occur without any cue that predicts greater availability or proximity of the reward. These seem to be visceral rewards—indeed the occurrence of these spikes has been used as a defining property of viscerality, as in the passage above. The addict suddenly gets intense cravings while watching a show about drugs. Since such sudden cravings are often implicated in relapses (Tiffany 1995), the question naturally arises of whether a sudden evocation of viscerality might have the same effect as immediacy in sending β to 1.0. This hypothesis converts Laibson’s original proposal from a straightforward discounting theory to a two-factor, conditioning-and-exponential-discounting theory (figure 1D).

Neurophysiological Evidence

Recent neurophysiological findings have been cited as evidence for qualitatively different processing of visceral rewards. McClure et al. (2004) reported that relative activity in cortical planning areas (lateral prefrontal and lateral orbitofrontal cortex; “δ areas”) and limbic craving areas (ventral striatum, medial prefrontal cortex, posterior cingulate cortex; “β areas”) predicted whether student subjects would choose to wait for LL rewards. The authors suggested that these two kinds of center are the seats of (presumably conditioned) visceral reward and rational reward, the bases of the spike and the exponential curve, respectively, in the step function in figures 1C–E. They found activity in the limbic centers only when the subjects could get the rewards, Internet gift certificates, on the same day, and none at all for two- or four-week delays. They interpreted this finding as support for the β-δ step function of visceral reward theory—rational, exponential discounting as a function of delay interrupted by a sudden spike of appetite from activity in a time-insensitive visceral center (figure 1C):

Our results help to explain why many factors other than temporal proximity, such as the sight or smell or touch of a desired object, are associated with impulsive behavior. If impatient behavior is driven by limbic activation, it follows that any factor that produces such activation may have effects similar to that of immediacy. (McClure et al. 2004, 506)

In a recent review of the fMRI literature, Montague and colleagues used the McClure group’s findings as a basis for rejecting hyperbolic discounting as a factor in choice:

[Hyperbolic discounting] begs several important questions. Why should exponential discounting, as it is expressed in reinforcement-learning models, account adequately for the variety of valuation and learning behaviors [in experiments the authors had reviewed]? . . . A second, more fundamental question is, How does one justify hyperbolic discounting—where it is observed—in terms of the rational agent model favored by standard economic theory? One answer to these questions is to assume that hyperbolic discounting reflects the operation of more than a single valuation mechanism. The simplest version of this view suggests that there are two canonical mechanisms: one that has a very steep discount function (in the limit, valuing only immediate rewards), and one that treats rewards more judiciously over time with a shallower discount function. (Montague, King-Casas, and Cohen 2006, 434)

However, it is premature to say that these fMRI reports demonstrate a discontinuity between visceral-or-immediate and nonvisceral-and-delayed valuations. Gift certificates for goods that would have to be chosen after the session and then mailed to the subjects are odd examples of visceral rewards. And β centers are not unresponsive at nonzero delays. When Monterosso, Ainslie, and London (2006) studied the response of the same β centers to the prospect of puffs on cigarettes during a period of deprivation they found that these centers show activity to prospects at least a week away. Furthermore, even within β centers delay sensitivity is not uniform, but can be pinpointed in the striatum by a graded map with ventroanterior regions tracking more immediate rewards and dorsoposterior regions tracking more delayed ones (Tanaka et al. 2004).
McClure et al. (2007) have recently revised their model to allow for (exponential) discounting by β centers, but at a greater rate than discounting by δ centers. Valuation by the two centers would be added together to form the individual’s operative motivation:

where Value0 = value if immediate, β and δ are one minus their respective discount rates, and ώ is a weighting factor. Graphically, this function would smooth out the abrupt step upward in figure 1D, more or less, depending on the β discount rate. With three parameters, formula 4 would undoubtedly improve the curve’s fit with hyperconcave discounting data, but the supposedly short range of β discounting would still leave it unable to account for the hyperbolic shapes reported when the shortest delays are months (Green, Myerson, and Macaux 2005) or years (Harvey 1994). More seriously, Glimcher, Kable, and Louie (2007) have reported that, although several brain regions respond proportionately to a subject’s valuation of rewards (striatum, posterior cingulate gyrus, medial prefrontal cortex), they all track declines in valuation at the same rate.

Valuation clearly takes place in a number of brain locations, which may process rewards with different zones of delay duration (Tanaka et al. 2004), degrees of predictability (Daw, Niv, and Dayan 2005), or incentive salience (Berridge 2007; see below). However, there has been no evidence against their interacting to create a single internal marketplace, with the common currency—reward—that many authors have argued to be theoretically necessary (Ainslie 1992, 28–32; Cabanac 1992; McFarland and Sibley 1975; Montague and Berns 2002; Shizgal and Conover 1996).
As for whether nonimmediate rewards are discounted hyperbolically or exponentially, no fMRI study has produced data precise enough to discriminate between these shapes, although Glimcher, Kable, and Louie (2007) found fMRI activity to be “linearly correlated” with hyperbolic discounting curves computed from subjects’ choices. In addition, a study of single cells in the pigeon brain area analogous to the mammalian prefrontal cortex (Nidopallium caudolaterale) has reported activity that is better fitted by a hyperbolic temporal discount function than by an exponential one (Kalenscher et al. 2005). A neurophysiological answer that is more than suggestive will have to await better data.

For now, the evaluation of conditioned appetite as a theory of sudden craving depends on the interpretation of behavioral data, as I discussed in the previous section. Seen simply at the level of evident motivation, a basic problem for visceral reward theory appears in plots of motivation against expected delay: Visceral reward theory does not make it clear why the associative process should not lead visceral rewards that are amplified by appetite to be anticipated and discounted like any other reward, so that they become consistently preferred where the amplification is great enough—a progression from the pattern in figure 1D to that in figure 1E. After all, rewards are usually remembered as consumed in the presence of appetite; adjusting their value for satiety is a distinct process (Balleine and Dickinson 1998).
However, a straightforward application of hyperbolic discounting theory also fails to predict spikes of appetite in response to cues that do not convey new information; if a reward is no closer and no more probable, its value should not increase. And the framing effects proposed by cognitive theories have not been based on visceral or appetitive situations at all. Something more is needed.

A Solution: Recursive Self-Prediction

An extension of either conditioning or hyperbolic discounting theory can handle the occurrence of explosive appetites/emotions. I will argue that this extension works well only for the hyperbolic theory; but the hyperbolic model also requires a radical revision of our assumptions about the selection of subjectively involuntary processes. First conditioning, then.

Recursive Self-Prediction of CRs

The rewards in laboratory experiments are outside of the subject’s control. Signs of appetite are studied as a function of when the experimenter signals their availability. In daily life, by contrast, goods that might be consumed impulsively are available much of the time, and their consumption is limited by a person’s decisions. The information predicting reward in life situations will be very different from predictive information in the laboratory.
Modern conditioning theory no longer holds that CSs have to be concrete stimuli. They can be just temporal patterns, interpreted stochastically by the subject (Gallistel 2002), a finding that can be summarized by saying that the expectation of a UCS, in whatever form, functions as a CS. In humans, and possibly other organisms to a limited extent, expectation includes estimation of the individual’s own future behavior. If a person always carried out her intentions, such estimation would serve no purpose; she could predict her behavior directly by examining these intentions. But behavior in even the near future is increasingly recognized as beyond the scope of such examination (Wegner 2002, esp. 63–144), and may depend on the dynamics of a population of competing processes (Ainslie 2001, 39–44). This means that expectation is apt to be recursive, with an expectation of a UCS, for instance taking cocaine, functioning as a CS and inducing the CR of appetite. But where the availability of cocaine is not a limiting factor, an increase in appetite will itself increase the likelihood of taking the cocaine. If this likelihood increases, the CS of expecting cocaine should increase, and in turn the CR of appetite again.
For cases where a person’s consumption is limited mainly by her own choice, appetite can be a positive feedback system of the kind first described by Darwin, James, and Lange:

The free expression by outward signs of an emotion intensifies it. On the other hand, the repression, as far as this is possible, of all outward signs softens our emotions. He who gives way to violent gestures will increase his rage; he who does not control the signs of fear will experience fear in greater degree. (Darwin 1872/1979, 366)

Their proposal has sometimes been interpreted as meaning that an individual feels fear because she notices somatic signs of fear, but this interpretation has not held up to empirical scrutiny (Rolls 2005, 26–28). However, it is still possible that such a feedback process modulates a given response once the individual has focused on it. Whenever an arbitrary stimulus has been associated with consumption in the past, the appearance of that stimulus might accurately predict an increased current likelihood of consumption, and accordingly function as a conventional CS. A sudden spike of appetite could thus come from the existence of positive feedback conditions. These conditions may obtain whenever the person’s consumption is determined mainly by her choice about a readily available consumption good, but are apt to have the strongest effect when there is weak-to-moderate resolve not to consume: Where a person is not trying to restrain consumption she will keep appetite relatively satisfied; where she is confident of not consuming regardless of appetite (as in cases of opiates in scheduled addicts and smoking in orthodox Jews) she will not expect appetite to lead to consumption. In both of these cases a stimulus associated with consumption should be only a trivial CS and thus not lead to an exceptional CR. In a recovering addict or restrained eater, by contrast, cues predicting that she might lapse could elicit significant CRs.
However, since conditioning is simply the acquisition of information, the power of this recursive model is limited. Even if we concede the existence of CRs in the case of appetites, they are supposed just to be UCRs that have been passively transferred to CSs because CSs predict their UCSs. If CRs are only such anticipatory responses, their amplitude should be limited to no more than that of their UCRs. If a person’s expectation of consumption increases by x%, her appetite (CR) should increase by no more than x% of the UCR—at most x% of what the CR would be when certainty is 100% and delay is zero. Since we are discussing delays that are significantly greater than zero—the cases where hyperbolic curves per se do not explain the upward spiking of motivation—the increase in appetite should be markedly less than x%. Conversely, if a person’s appetite increases by x%, the increase in estimated probability of consumption that this causes should also be fractional, reflecting the proportion of times when that much increase in appetite has been followed by actual consumption.
Take the case of a recovering addict who has moderate resolve not to relapse. An initial confrontation with a drug stimulus should increase her likelihood of relapsing by only a marginal amount. This increase should in turn have only a small effect on her conditioned craving (CR), which would be expected to increase her expectation of relapse by an even smaller amount again. The positive feedback effect should be damped down unless the percentage of each increase is perfectly preserved, and even in that case the CR will be capped at the level of the UCR and discounted for whatever delay she expects. The qualitative elements for explosive craving are there, but quantitatively the argument struggles uphill.

Recursive Self-Prediction of Motivated Processes

If appetites are selected not by the simple transfer of a UCS but by reward for their activity, as I have proposed elsewhere (Ainslie 2001, 48–70, 161–174), this limitation disappears. A recursive reward-seeking model predicts the same observations as a conditioning model, but without the damping effect: A stimulus associated with consumption will be a cue for generating appetite if the potential for consumption to be rewarding (nonsatiety) exists. When consumption is limited by self-control—probably the case only in humans—appetite itself has the potential to obtain fast-paying reward by motivating the abandonment of (slow-paying) self-control. The most rewarding amount of appetite, then, may be not that which optimizes the expected experience of consumption, but rather that which makes consumption most probable. The most productive timing of such appetite will take the form of concentrated attempts on discrete occasions; if appetite does not succeed in inducing consumption on a particular occasion it is unlikely to increase its chances by prolonged activity, and may indeed fatigue with repetition like many other physiological processes. Occasions for appetites could be arbitrary, especially at higher levels of deprivation, but the occasions that are the most apt to promise successful attempts must be limited in frequency—an external reminder, or a circumstance where they have succeeded in the past. In this view, the force of symbols and other reminders in relapse comes from their providing reward-based appetites with focal occasions to try to overturn self-control.
           In a marketplace model reward-dependent processes compete for acceptance on the basis of the current, hyperbolically discounted value of the prospective reward for these processes. The fact that preferability among a set of processes can shift as a function of time alone puts these processes in a limited warfare relationship with each other (Ainslie 1992, 154–179; 2001, 90–100). That is, they will operate as independent agents that have some but not all interests in common, on the basis of what are common contingencies of reward but differently discounted valuations of them. An appetite in this model arises when an individual perceives the opportunity for consumption that can be made either more rewarding or more likely by this appetite. Appetites still have some physical constraints such as nonsatiety, but the final selective factor for their arousal is the contingent prospect of reward.
The contingencies that determine whether appetite as a quasi-independent agent asserts itself in a given situation will be roughly the same as those determining whether a pet begs its owner for food or the chance to excrete. Begging is a low-cost behavior and is apt to occur whenever a nonsatiated pet encounters satisfaction-related cues; again, continuous begging will not be worthwhile. But if the pet is never fed or let out in a particular circumstance the begging gradually extinguishes, unless the biological need is extreme. Likewise, if gratification is so available that significant deprivation does not occur, begging adds no value. By analogy, the restrained eater or recovering addict has an opportunity to experience intense reward by indulging in immediate consumption. Insofar as appetite makes consumption look even a little more likely it will pay for itself, and any signs of weakening will serve as cues that still more appetite may succeed in motivating consumption (figure 1F). [Figure 1F is just an approximation of how recursive self-prediction could affect motivation. The result could probably look like a β-δ curve as in figure 1C above, but drawn at any delay. It thus would thus correct the β-δ curve’s inability to describe appetite/emotion when induced at a distance from a reward-- although without its math. -added May 14, 2019]
The low cost of appetite may explain why it must go unrewarded consistently over many trials in a given circumstance to extinguish—for instance, why the orthodox Jews who do not get cravings on the Sabbath do get cravings when they know they must not smoke at work (Dar et al. 2005).
              In short, appetite as a reward-seeking process can be subject to the same positive feedback mechanism as conditioned appetite. The important difference from the feed-back conditioning model is that the degree of appetite will not be limited to mere anticipation, but can be whatever increases the prospect of reward. There will still be constraints—appetite depends on nonsatiety; in modalities where unsatisfied appetite brings hunger pangs or withdrawal symptoms, these will be deterrents; and appetite without occasions of limited frequency will extinguish (see Ainslie 2001, 166–171)—but the explosive appetite that so often ends people’s efforts at controlled consumption can be understood as a motivated process that is rewarded in the short run when it detects these efforts.
            It remains a question why a person who has generated sudden appetite in a certain circumstance does not come to anticipate the higher value of the SS reward with that appetite, and thus come to prefer it consistently, as illustrated in the case of exponential curves by the progression from figure 1D–E. Failure to develop consistent preference is easily accounted for by discount curves with a hyperbolic shape, as in figure 1B. Failure to anticipate the sudden occurrence of the responsible appetite at short delays may happen because a person avoids rehearsing the situation in advance, for fear of triggering the appetite by doing so.
           [Thus appetite as an operant can be subject to the same positive feedback mechanism as conditioned appetite. The important difference from the fed-back conditioning model is that the degree of appetite will not be limited to mere anticipation, but can be whatever increases the prospect of reward. There will still be constraints—in modalities where unsatisfied appetite brings hunger pangs or withdrawal symptoms these will be deterrents; and appetite without occasions of limited frequency will extinguish (see Ainslie, 2001, pp. 166-171)—but the explosive appetite that so often ends people’s efforts at controlled consumption can be understood as a motivated process that has sought to do exactly that. -added May 14, 2019]
           The model of appetite as an operant cued by recursive self-prediction appears to be the only one proposed so far that can account explicitly for explosive appetite—that is, for why a cue that is associated with consumption but does not predict increased availability of a consumption good should lead to a great increase in appetite. In this model the cue is needed only to give occasion, that is, to select one moment over another for a focused attempt at reversing the dominant preference. The model depends on the hyperconcave shape of the discount curve, since an individual with consistent preferences over time would have no short-range motive to undermine her own resolutions, or indeed any long-range motive to make resolutions in the first place. It might still work with the “hyperboloid” β-δ step function hypothesized by visceral reward theory; but this shape describes the explosive appetite that, according to the damping argument presented above, the theory’s conditioning mechanism would be inadequate to produce. That is, the only viable mechanism for the β-δ hyperboloid discount curve is for discount curves to have an elementary hyperbolic or other hyperconcave shape to begin with.

Are Appetites Selected by Reward?

The above argument requires that an appetite respond to differential reward for that appetite, which is not a trivial change in the conventional view of appetite. But I have argued elsewhere that hyperbolic discount curves permit involuntary and even aversive processes to be incorporated into a unified motivational marketplace (Ainslie 2001, 48–70). All that is needed is to strip the selective factor, “reward,” of its connotations of pleasure and reduce it to its defining function: that which selects for a process that it follows. Briefly, the argument is that temporary preferences are apt to be cyclic, impulses that satiate and recover over a continuum of durations that define the periods of the cycles (figure 2). A binge may last for days before sickness sets in, but the urge to emit a tic lasts for only seconds until it is spent, and the urge to respond to a painful stimulus with pain emotion (Melzack and Casey, 1970) lasts, hypothetically, for just a moment before its negative consequences are felt. The value of this model is that both positive and negative appetites can be seen as luring individuals into participating in them, rather than springing up automatically like reflexes, outside of the marketplace of choice. As long as it is discounted hyperbolically, reward can then be the selective factor not only for long-enduring pleasures, but for temporary, regretted pleasures and for urges that do not feel pleasurable at all.

This argument is bolstered by recent reports of an extensive borderline area, which can be seen as lying halfway between the phenomena of addiction and frank aversion. Berridge and his colleagues have analyzed activities that subjects have strong tendencies to repeat despite a lack of pleasure that is evident either from self-reports, or, in the case of nonhumans, facial expressions (Berridge and Robinson 1998; Berridge 2003). These authors were suspicious of supposed “pleasure centers” that induced avid self-stimulation despite reported sensations that patients said they did not enjoy and that rats would not cross their cages to initiate after periods of interruption. Using increasingly precise physiological mapping techniques they have separated brain centers that subtend pleasurable activities and centers for activities that are “wanted but not liked,” which subjects perform repeatedly despite evident distaste for them (Peciña, Smith, and Berridge 2006). The authors interpret the latter findings as evidence of a dissociable component of decision making, response selection by “salience.” This interpretation remains controversial (O’Doherty 2004), partly because of the difficulty of separating attention from preference (Maunsell 2004; Schultz 2006); the motivational properties of salience have been especially difficult to define. Despite the “wanting” label, the Berridge group calls this kind of selection “nonhedonic” (Berridge 2003); a wanted, disliked goal is nevertheless “a motivational magnet” (Berridge 2007) or “a false pleasure” (Peciña, Smith, and Berridge 2006). They make it clear that wanting is still motivation, and must interact with hedonic reward to determine which motor behavior a subject will perform—a conclusion that temporal difference theory, an interpretation that partially incorporates incentive salience theory, also supports (McClure, Daw, and Montague 2003).
A possible form of this interaction is intertemporal conflict. That is, wanted-but-not-liked activity may have the same motivational valence as pleasurable activity, but only temporarily. The normal course of learning is for wanted behaviors to be evaluated for liking and replaced if they fail that test (Berridge 2007), but the effect of delay per se on choice has not been studied in these experiments. There do exist ordinary human behaviors that are wanted only briefly and that interfere with other rewards, so that from a distance people try to prevent them or buy cures for them. Activities of daily life that can be described as wanted but not liked include tics, speech mannerisms, psychogenic itches, and itch-like activities such as nail biting and hair pulling. The interaction of these disliked behaviors with liked alternatives obeys the same cyclic motivational math as binges with hangovers, but it has a much shorter periodicity (figure 2). Both addictive and itch-like activities are avoided at a distance but sought when up close, but you have to be much closer to a disliked activity before you want it.

            From the case of wanted-but-not-liked it is not a great leap to given-in-to-but-not-wanted; the same cyclic mechanism may explain seemingly unmotivated responses generally. If the “wanted” phase of an option is too short to motivate even brief approach behaviors, it might still attract attention and negative appetites/emotions. Maunsell has suggested that the reward/salience controversy has arisen from too narrow a definition of reward: “If reward is defined to include all motivating factors, then there may be no differences between attention and expectation of reward” (Maunsell 2004, 264). He thus implies that the concept should extend beyond the wanted to the attended-to. Berridge’s definition of incentive salience still excludes magnets for attention per se; however, the urge to pay attention to an aversive process might be how the very truncated “wanting” for this process is experienced—not in a consciously discriminable approach phase, but merged with its alternating avoidance phase as in flicker fusion. The reward that selects for this attention thus cannot last long, but also cannot be slight if the urge is strong; hence it might be represented as the recurring tall, thin spikes of figure 2.
            Certainly the evaluation of pain has been reported to involve the same brain centers as that of reward (ventral striatum, sublenticular extended amygdala, ventral tegmentum, and orbital gyrus [Becerra et al. 2001]), and in all but the striatum the response to pain is in the same direction as that to reward, which “suggests that they may constitute a general circuitry processing both rewarding and aversive information” (ibid., 942). Even in the striatum other authors have reported increased activity after both reward and punishment, and decreased activity after the unexpected nondelivery of either (O’Doherty 2004). The processes that have been said to be based on “salience” (e.g., Zink et al. 2003) thus include delivery of both reward and punishment but not their omission, and “unwanted” motor behaviors. As O’Doherty points out, if the striatum were merely tracking salience, its activity should increase for the unexpected omission of reward or punishment as well as their delivery. The common attribute of these processes is better characterized as brief attractiveness.
            The possibility that reward and the selective principle in UCSs are identical has been proposed before (reviewed in Pear and Eldridge 1984; Donahoe, Burgos, and Palmer 1993) but has not been pursued extensively, perhaps because a separate selective principle has seemed necessary to explain why aversive experiences can compete for attention, and perhaps because no other hypotheses have hinged on this identity. But hyperbolic discounting implies that a separate selective principle is not necessary to account for attention to aversive experiences, as I have just described. Of course, it may turn out that this attention can be better explained by other hypotheses that integrate reward and nonreward in close temporal proximity, or even simultaneity, as the processes underlying motivational salience are analyzed (e.g., Berridge and Robinson 1998, 348–349). The neurophysiological dissociability of wanting from liking is certainly relevant; but so far there has been no other hypothesis at the behavioral level about how wanting—or dreading—interact with liking in the marketplace of choice. As to whether any behavioral hypotheses depend on the question of one versus two basic selective factors, the very reason for including the foregoing section in this chapter is to support the possibility that explosive appetite is based on recursive self-prediction. I have argued above that the sudden occurrence of appetite when its object is no nearer requires appetite to be reward dependent—that is, selected by the same factor as motor behavior.

Can Cognitive Framing Models Replace Hyperbolic Discounting?

In addition to this activity in conditioning research, new proposals have been made to explain preference reversal over time as a purely cognitive phenomenon, the result of how subjects frame their choices. Research since the 1960s has revealed many instances where people reason erroneously or frame choices in ways that deviate from RCT (Wason 1966; Kahneman and Tversky 1984). These deviations have often been shown to be reduced or eliminated when problems are posed to subjects in terms of familiar experience (Gigerenzer 2000), and most of them do not involve preference reversal. However, several authors have suggested that the apparently hyperbolic shape of discount curves is an artifact of cognitive framing processes.

People’s Theories of Value

Human subjects’ choice sometimes follows the same pattern as that of nonhuman subjects. When concrete rewards are available at short delays, these rewards seem to shape human subjects’ choices in the same way that they do nonhumans’ choices, for example, fruit juice (Logue et al. 1986) or escape from noxious noise (Solnick et al. 1980); and when they do, these choices imply hyperconcave curves. Surprisingly, their choices of token rewards such as money, both hypothetical and real, often follow specifically hyperbolic curves as well; these are shallower but no less hyperbolic in normal adults than in various kinds of addict (Bickel and Marsch 2001. However, humans conceive of reward in complex ways that complicate our choice process beyond what is evident in nonhumans. We often make choices not by weighing the possible experiences but by interpreting them according to what could loosely be called theories of value, sometimes using identifiable heuristics. The difference can be seen with a melioration procedure (Herrnstein and Prelec 1992), a series of discrete nondelayed binary choices where choice of a larger reward in one trial leads to a reduction of both options in the next, and a choice of a smaller reward leads to an increase of both options in the next. Nonhuman subjects regularly choose larger options. Humans soon learn to choose smaller options when the options are stated in amounts of money; but they continue to choose larger options when the differences are in (unstated) delays before the next trial. When the subjects go by the feel of the alternatives they choose early payoffs, but when they go by the numbers they look for maximizing solutions. In the latter case they have gone beyond their impressions and adopted a cognitive method of evaluating their payoffs, a theory of value.
People’s theories of value do not replace their spontaneous motives, but rather anticipate and provide incentives to manipulate these motives, as Mischel’s painstaking work with 4- to 6-year-old children showed (e.g., Mischel and Mischel 1983). It was not enough for the children to conclude that a marshmallow later was worth more than a cracker now. They had to develop commitment tactics such as distracting themselves or “thinking cool thoughts,” tactics that are still discernible in the coping mechanisms of adults (Vaillant 1971). Thus it is only half true that

the connection between findings on pigeons or even monkeys and the behavior of humans seems rather tenuous. We commonly believe that an animal does not understand the choice it is facing in the same way that a human being does. (Rubinstein 2003, 1209)

            Paul MacLean (1990) likened the evolutionarily older parts of our brains to those of a horse, which is ridden rather than supplanted by our neocortex. To study motivation in nonhuman animals is to study the horse, an endeavor that reveals as much about human decision making as study of the theory-spinning rider does. This is especially true since horse and rider must share a common hedonic outcome, of which most of the functionality seems to be located in the horse.
            Theories of value do not necessarily make reward-seeking more efficient. When children begin to develop heuristics for getting rewards their performance initially falls off in comparison with younger children who are still letting their choices be shaped as nonhumans’ are (Sonuga-Barke, Lea, and Webley 1989). To prevail against spontaneous preference, their theories of value often dictate the development of committing devices to restrain gut reactions; but committing devices reduce their responsiveness to ongoing experience. Even in adults, “flying on instruments” may lead to rigidity and poor ability to exploit the environment. Most schools of psychotherapy do not target our impulsiveness as much as our overgrown controls, the products of “cognitive maps” (Gestalt), “conditions of worth” (Rogerian), “musterbating” (cognitive), and of course punitive superegos (psychoanalytic; Corsini 1984).
            A person’s theory of value must obviously deliver the means to get actual rewards—the process that selects behavior—or she will eventually abandon it for better-paying theories. However, the separate contributions of theories of value and direct shaping are hard for an observer to measure, because imagination can reward in its own right. This capability permits theories of value to be partially self-confirming when they propose tasks that function as cognitive games. If you have a theory that money, for instance, is a good source of reward, then getting money becomes an occasion for a positive emotion like satisfaction, relief, or gloating, benefits of winning per se that are above and beyond the prospect of enjoying what you will buy (see Lea and Webley 2006). A quest for distant goals requires the establishment of milestones to mark progress; if these milestones are achieved with adequate rarity and unpredictability they can become durable occasions for emotion, with only enough constraint by their ostensible purposes to keep winning from becoming too easy. Once you “believe in” the value of finding bargains, or having acquaintances remember your name, or winning arguments, sufficiently difficult feats in these areas will occasion rewarding feelings regardless of their practical effects.
            Thus when experimenters study people’s evaluation of their options, we usually see an admixture of reward that does not depend on the objective outcomes. Although we may detect a quantum of prospective value that is actually constrained by the physical nature of the goal—sweetness, say, or drug effect—this is mixed with an often larger internal component determined by a subject’s theories of value through both the commitments and game-like incentives that these theories set up. When we ask about preferences for money, even money that will actually be delivered as chosen, we cannot expect to sample the feel of the consumption of what the money will buy, only the conventions that the person has developed for what represents doing well in money-getting games. Fortunately, the multiplicity and interrelatedness of games that share money as a common counter has given some consistency to these conventions. Even more fortunately, subjects mostly do not subject their imagination of future money to their norm of “rational” economic behavior, so we can see their tendency to form temporary preferences even for money. Discount curves for expected money are robustly hyperbolic even though they are describing a token in a cognitive game; but their Ks (formula 2) vary among human subjects by hundredfolds, whereas Ks among nonhumans of a given species usually vary by single digits (Monterosso and Ainslie 1999). Such wide variation is unlikely to reflect differences in innate endowments, but rather reflects the differences in the cognitive games and commitments to which various subjects’ theories of value have led them.

Cognitive Framing Theories

Cognitively based objections to hyperbolic (or other hyperconcave) discounting as an underlying principle of motivation have been based on several kinds of finding. Most basic has been the failure to find hyperbolic discounting in amount versus delay choice experiments. More subtly, hyperconcavity has been said to be an artifact of various ways of framing choices: Human subjects have been reported to discount future payoffs more steeply when delays are broken into shorter periods, and when smaller amounts are at stake. Subjects’ heuristics, such as grouping outcomes into the categories of similar versus dissimilar, or treating outcomes abstractly versus concretely, have also been identified as possible causes of inconsistent choice.
It is well known that some prospects come to be valued as occasions for anticipatory imagination beyond what they are in themselves, that is, for what they would be if they occurred immediately or otherwise without anticipation. Thus subjects say that they would defer a kiss from a movie star in order to savor it (Loewenstein 1987), or arrange sequences of experiences so that the prospect is always for improvement (Frank 1992; Read and Powell 2002). There are even cases where people arrange to savor an event to the exclusion of ever experiencing it, for instance by repeatedly deferring a dream vacation. Conversely, people often prefer to hasten the occurrence of painful events so as to avoid the development of dread (Berns et al. 2006). Such cases are seen only in humans and illustrate the human ability to use imagination to occasion reward. Nonhumans have never been observed to prefer delay of a reward, and they will choose to hasten punishment to reduce it only when this earlier punishment will still be distant (Deluty et al.1983). Savoring and dread have been held up as counterexamples to hyperbolic discounting, but they count as such only if we disregard subjects’ generation of emotional reward, as just described, using external events as occasions.

The skill of harnessing the expectation of future events to manipulate current reward depends on a person’s theories of value, and the success of this skill is apt to be a major factor in selecting these theories in turn. The most systematic of these theories is the dominant principle of modern finance, RCT. The ability to make choices as if the effect of future reward decayed in a relatively shallow exponential curve is obviously adaptive in competitive markets. Commitment to evaluate prospects exponentially, at least in one’s more important mental accounts (Thaler 1985), may be motivated by the prospect of greater success in one’s own long-range projects, but is perhaps motivated more strongly by protection against being money-pumped by competitors (Ainslie 1991). This motivation will be insufficient if one attempts to adhere to too shallow a normative curve, and it will not be present in choices that are unimportant or unlikely to be repeated—probably including many choices made when serving as a subject in an experiment. However, to the extant that subjects regard their commitment to rational investment as applicable, they are apt not to display hyperbolic preference patterns. For instance, in an amount versus delay experiment Harrison, Lau, and Rutström (2005) reported that adults exhibited consistent exponential discount rates over delays from one to 24 months. The authors theorized that this result came from not using delays of less than one month for their SS rewards; but other experiments that used delays of a month or more for SS rewards have found preference reversals in adult subjects (Green, Myerson, and Macaux 2005; Ainslie and Haendel 1983). The consistent preferences seen in the Harrison experiment are more likely to be a result of an experimental design that encouraged prudent choice: Subjects chose between large amounts of real money ($450 to $1840, which they had a 10% chance of winning); choices were presented in ascending numeric order to avoid the “computational complexity” of the usual shuffled order; and, most important, the equivalent annual interest rate was listed next to each choice. There has never been doubt that people can often achieve the consistent financial planning displayed in this example. But according to the hyperbolic discounting model people must construct the necessary valuations strategically on the basis of an underlying hyperbolic function, a function that is discernible when conditions encourage subjects to choose according to their spontaneous impressions of value. The conditions of this experiment did not do so.

           Bringing preferences under rational marketplace discipline has clearly been the most important example of how people’s theories of value modify choices by framing them. However, others have become apparent in the numerous experiments conducted on human discounting during the last two decades. For example, Read (2001) and Read and Roelofsma (2003) report several experiments in which human subjects were asked their preferences between SS and LL rewards, as with Harrison et al. “in the context of investment decisions” (145), but without the prompts about interest rates. The authors computed implied discount rates from preferences between SS and LL rewards. Unlike previous researchers they compared discount rates for a particular period with the aggregate of discount rates when the period was broken into shorter, equal segments and preferences were elicited over each segment. They found neither rational choice nor hyperbolic discounting, but “subadditive discounting,” a tendency for people to report discount rates that are higher the shorter the period they cover, regardless of whether the period will begin immediately or up to two years later. Since almost all amount versus delay experiments thus far have used lengthening periods between choices as delays get longer, these authors suggest that subadditivity might be responsible for the resulting curves’ hyperbolic shape. The phenomenon stood up in the authors’ own replication and may represent a significant heuristic that people use in estimating discount rates. However, these experiments do not suggest an adequate substitute for hyperbolic discounting as a basic property of response selection. Crucially, as the authors acknowledge, subadditivity does not predict temporary preference for SS over LL rewards, which is the key implication of the hyperbolic shape. Furthermore, although there has been almost no discounting research using equal delay increments, one nonhuman experiment that used them found that decreased discounting, as evidenced by preference reversals, still occurred as delays became longer, a finding predicted by hyperbolic but not subadditive discounting (Ainslie and Herrnstein 1981). Furthermore, the hyperbolic shape of curves from single, discrete choices is only one manifestation of Herrnstein’s matching law, which describes the same inverse proportionality of value to delay in competing, simultaneous schedules of unpredictably timed rewards (concurrent VI-VI; Chung and Herrnstein 1967), where subadditivity would not be a factor.
            Read and Roelofsma’s interpretations are complicated by the fact that they did find increased rates of discounting as delays decreased—hyperconcave discounting—but in an irregular shape that deviated from hyperbolic. They attributed this finding to people’s tendency to discount larger rewards less steeply than smaller ones (the magnitude effect). This mechanism had already been proposed by Green and Myerson (1993), who pointed out that steeper discounting might cause an exponential curve from an SS reward to start above that from an LL reward but fall below it as delay increased. However, further analysis by the same authors showed that true hyperbolic curves fit the choice data better than do exponential curves with different slopes (Green and Myerson 1996). In any case, the magnitude effect has been reported only for amounts differing by factors on the order of ten or more, not for amounts within the twofold range such as were used by Read and Roelofsma. For instance, Kirby (2006) found that the discounted value of a series of three rewards is exactly the same as the sum of the discounted values of the individual rewards. The magnitude effect is not seen at all in nonhumans or in humans when both amounts offered are large (Green and Myerson 2004), suggesting that it reflects the increasing salience of the rational choice norm as amounts go from trivial to important, rather than a basic way of evaluating future options. Subadditive discounting and the magnitude effect thus probably both arise from people’s heuristics for estimating the value of reward, and can be expected to influence their choices to some extent, but not sufficiently to account for temporary preferences.
            Ariel Rubinstein (2003) has pointed out that human subjects use a similarity heuristic in evaluating monetary choices. That is, they treat values of either amount or delay that are close to each other as equal, and decide according to any conspicuously dissimilar values involved. For instance, when amounts of reward are close enough to each other to be judged as similar, the change from “different” delays (e.g., none versus a year from now) to “similar” delays (e.g., 10 years versus 11 years—p. 1210) could be responsible for a switch in preference between a pair of amounts offered at those delays. However, the greater perceived similarity of 10 to 11 years than of 0 to 1 years should itself be seen as a manifestation of hyperbolic discounting, or at least of the widely applicable psychophysical principle that changes in quantity are perceived in proportion to the baseline quantity. This principle, the Weber-Fechner law, is sometimes hypothesized to be the basis of hyperbolic discounting (Gibbon 1977). That is, the greater similarity of two delays that are separated by a given amount when the delays are long is based on the same principle as the lesser impact of the delay between SS and LL rewards when both are distant. Whether or not to call the corresponding hyperbolic curves measures of similarity will be a matter of taste.
            Rubinstein goes beyond the similarity-with-distance effect to report three amount versus delay experiments in which the results actually contradict hyperbolic discounting. The key to those findings seems to have been that, to get the necessary rejection of the predicted offer in each of the experiments, the gain for waiting for the LL reward had to be minuscule: $467.39 versus $467.00, $1000 versus $997, and $960 versus $958, respectively; even so, no choice was endorsed by as many as 70% of the subjects. It should not be controversial that people often estimate values by rules of thumb; an astute familiarity with these rules adds to many merchants’ profit margins. Operating with small relative differences that are close to indifference points may indeed let a similarity heuristic determine choice, but this shows little about the more robust motives that challenge self-control.
            Another familiar experience—that one sees trees when close but forests when distant—has been proposed as a factor in temporary preference, under the name of temporal construal (Trope and Liberman 2000, 2003). The authors take notice of extensive overlap with hyperbolic discounting, in that the “low-level construals” with which people perceive imminent events often get their value from short-term reward or cost, whereas “high-level construals” are often designs for long-range reward. Their first two experiments (as described in both articles) do not discriminate between hyperbolic and temporal construal theories. However, experiments 3 and 4 “were designed to rule out hyperbolic discounting” (2000, 882). In experiment 3, students imagined that they had bought one of two clock radios, either a good radio with a poor clock or a poor radio with a good clock. They strongly preferred the good radio, but the strength of this preference was greater when they imagined the purchase “a year from now” than when it was to be “tomorrow” (N = 190, p < .05). In experiment 4, students rated how much they were interested in taking part in two prospective experiments: four sessions of an interesting task separated by boring “filler,” or four sessions of a boring task separated by interesting filler. Subjects strongly preferred the interesting task, but again the strength was greater when the experiment was to be run in 4–6 weeks than when it was to be run “tomorrow” (N = 64, p < .05). The authors ascribed the results of both experiments to the higher-level construal of radio over clock and main task over filler. These experiments do seem to rule out hyperbolic discounting within their own designs, in that the alternative payoffs (and costs) were not to occur at different delays. However, they did not produce preference reversal, and they do not contradict a role for hyperbolic discounting in situations where preference reversal occurs.
            There have been many failures to find hyperbolic discounting, the vast economic literature on efficient investment choices being the largest example. Disseminating the theory that exponential discounting reveals the true value of reward has been one of the great historical projects of civilization. People’s limited ability to conform their choices to their belief in this tenet has been one of the great historical puzzles of civilization. In the framing experiments just described, subjects revealed additional theories of value, and these have been proposed as solutions to the puzzle; but these theories seem to have a relatively small motivational impact. The magnitude effect has been the most demonstrable, but it probably represents subjects’ greater belief in exponential valuation when amounts are of investable size. Subadditive discounting, the similarity heuristic, and temporal construal have required finely tuned designs to separate them from more robust discounting patterns, and have so far not been shown to induce temporary preference.

Summary

Temporary preference for smaller, sooner (SS) over larger, later (LL) rewards is not usually the result of simply coming too close to the SS rewards. Most of us spend a great deal of time “too close” to such rewards, in the sense that unhealthy foods, tobacco, and alcohol are easily available, as is often the case as well with recreational drugs, sexual adventure, dangerous driving, unwise purchases, and certainly procrastination. Proximity does increase their attraction relative to LL rewards, but in all of these cases except for procrastination sudden craving can arise after mere reminders, or with no obvious precipitant at all. In straightforward application, none of the three kinds of explanation currently proposed in the literature accounts for sudden craving in the absence of new information about availability.
The explanation that has had the most commonsense appeal is that conditioned stimuli (CSs) that predict rewards—at least a special class of “visceral” rewards—can elicit overwhelming appetite. However, in systematic experiments CSs only convey information, and only elicit responses (CRs) insofar as they have accurately predicted the unconditioned stimuli (UCSs). CRs cannot exceed UCRs, and they are attenuated from the UCRs as expected delay increases. Mere reminders should not be able to elicit the intense conditioned appetites that are the hypothesized mechanism of preference reversal. Nor does hyperbolic discounting in itself predict preference reversal in response to mere reminders. The third class of theories, that changes in cognitive framing can overshadow the discounting factor, have been shown to predict substantial effects only in the familiar case where a norm of financial rationality modifies inconsistent choice toward exponential discounting. Framing may contribute to rationality, but not in any major way to temporary preferences for poorer rewards.
I have argued here that either conditioning or hyperbolic discounting might be adequate to account for sudden appetite if consumption is predicted recursively. That is, in cases where consumption is limited not by availability but by the person’s own choice, spontaneous increases in appetite can increase the likelihood of consuming, which can induce further increases in appetite, a potentially explosive positive feedback system. However, the passive information transfer involved in conditioning should damp down any positive feedback system that depends on it. Only if appetite can be a reward-seeking process, an implication of hyperbolic discounting, can significant amplification be expected. Conversely, the need to account for this amplification joins other arguments that appetite is a reward-dependent process. The hyperbolic form of the basic discount curve is a necessary and sufficient mechanism for temporary preference phenomena in general, including sudden craving in response to uninformative stimuli.

Acknowledgments

I am grateful to Kent Berridge and John Monterosso for comments on drafts of the manuscript.

Notes

1. For simplicity I am focusing on appetite, but some studies of emotion provide examples that are lacking for appetite. Emotion might be regarded as appetite without an object of consumption, and even that distinction is unclear in borderline cases such as rage and lust.
2. Rachlin and Green (1972) had observed both preference reversal and behavioral commitment, but a requirement of 26 responses to make each choice prevented interpretation of these effects as functions of pure delay.
3. Modified from their formula 5 to express momentary value.
4. A limited warfare relationship implies that consistent choice has to be achieved strategically by some kind of precommitment. I have proposed recursive self-prediction, such that a person sees her current choice as a test case of whether she will choose a whole bundle of similar LL rewards in the future, as the mechanism of willpower and of related phenomena such as sudden failures of will and the experience of freedom of will (Ainslie 2001, 90–104; 2005). The recursive mechanism for sudden appetite in the absence of new information, proposed here, follows the same dynamic as sudden failure of will, and could sometimes constitute the initial phase of it.
5. There may be no instrinsic line between evaluating a behavior and initiating it. Edward Tolman’s (1939) original concept of vicarious trial and error, in which a subject estimates the reward for alternative choices by serially initiating them without committing to them, has recently been validated with single hippocampal neurons (Johnson and Redish 2007). To the extent that appetites are reward-dependent, the same process may govern them.
6. This definition modifies Rolls’s behavioral definition—”A reward is anything for which [a subject] will work” (1999, 60–61)—to recognize the potentially split-second duration of some preferences, too short to motivate work. Berridge (2003) makes a similar extension of the concept with “nonhedonic” reward.
7. McClure et al. (2007) have observed a related variability in what rewards people’s brains respond to as visceral. They report that a prospect of getting immediate coupons that will take days to exchange generates activity in subjects’ β centers, whereas a juice reward has to be delivered within one minute to do this. But in another laboratory, a promised cigarette in a week produces β activity (Monterosso, Ainslie, and London 2006).

References

Ainslie, G. (1974). Impulse control in pigeons. Journal of the Experimental Analysis of Behavior 21: 485–489.

Ainslie, G. (1975). Specious reward: A behavioral theory of impulsiveness and impulse control. Psychological Bulletin 82: 463–496.

Ainslie, G. (1991). Derivation of “rational” economic behavior from hyperbolic discount curves. American Economic Review 81: 334–340.

Ainslie, G. (1992). Picoeconomics: The Strategic Interaction of Successive Motivational States within the Person. Cambridge: Cambridge University Press.

Ainslie, G. (2001). Breakdown of Will. New York: Cambridge University Press.

Ainslie, G. (2005). Précis of Breakdown of Will. Behavioral and Brain Sciences 28 (5): 635–673.

Ainslie, G., and Haendel, V. (1983). The motives of the will. In E. Gottheil, K. Druley, T. Skodola, and H. Waxman, eds., Etiology Aspects of Alcohol and Drug Abuse, pp. 119–140. Springfield, IL: Charles C. Thomas.

Ainslie, G., and Herrnstein, R. (1981). Preference reversal and delayed reinforcement. Animal Learning and Behavior 9: 476–482.

Arrow, K. J. (1959). Rational choice functions and orderings. Economica 26: 121–127.

Balleine, B., and Dickinson, A. (1998) Goal-directed action: Contingency and incentive learning and their cortical substrates. Neuropharmacology 37: 407–419.

Baumeister, R. F., Heatherton, T. F., and Tice, D. M. (1994) Losing Control: How and Why People Fail at Self-Regulation. New York: Academic. Becerra, L., Breiter, H. C., Wise, R., Gonzalez, R. G., and Borsook, D. (2001). Reward circuitry activation by noxious thermal stimuli. Neuron 32: 927–946.

Becker, G., and Murphy, K. (1988). A theory of rational addiction. Journal of Political Economy 96: 675–700.

Berns, G. S., Chappelow, J., Cekic, M., Zink, C. F., Pagnoni, G., and Martin-Skurski, M. E. (2006). Neurobiological substrates of dread. Science 312: 754–758.

Berridge, K. C. (2003). Pleasures of the brain. Brain and Cognition 52: 106–128.

Berridge, K. C. (2007). The debate over dopamine’s role in reward: The case for incentive salience. Psychopharmacology 191: 391–431.

Berridge, K. C., and Robinson, T. (1998). What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience? Brain Research Reviews 28: 309–369.

Bickel, W. K., and Marsch, L. A. (2001). Toward a behavioral economic understanding of drug dependence: Delay discounting processes. Addiction 96: 73–86.

Boudon, R. (1996). The “rational choice model”: A particular case of the “cognitive model.” Rationality and Society 8: 123–150.

Cabanac, M. (1992). Pleasure: The common currency. Journal of Theoretical Biology 155: 173–200.

Carter, B. L., and Tiffany, S. T. (2001). The cue-availability paradigm: The effects of cigarette availability on cue reactivity in smokers. Experimental and Clinical Psychophamacology 9: 183–190.

Chance, P. (1999). Thorndike’s puzzle boxes and the origins of the experimental analysis of behavior. Journal of the Experimental Analysis of Behavior 72: 433–440.

Chung, S., and Herrnstein, R. J. (1967). Choice and delay of reinforcement. Journal of the Experimental Analysis of Behavior 10: 67–74.

Conlisk, J. (1996). Why bounded rationality? Journal of Economic Literature 34: 669–700.

Corsini, R. J. (1984). Current Psychotherapies, 3rd ed). Rockland, MA: Peacock.

Cubitt, R. P., and Sugden, R. (2001). On money pumps. Games and Economic Behavior 37: 121–160.

Dar, R., Stronguin, F., Marouani, R., Krupsky, M., and Frenk, H. (2005). Craving to smoke in orthodox Jewish smokers who abstain on the Sabbath: A comparison to a baseline and a forced abstinence workday. Psychopharmacology 183: 294–299.

Darwin, C. (1872/1979). The Expressions of Emotions in Man and Animals. London: Julan Friedman.

Daw, N. D., Niv, Y., and Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience 8: 1704–1711.

Deluty, M. Z., Whitehouse, W. G., Mellitz, M., and Hineline, P. N.(1983). Self-control and commitment involving aversive events. Behavior Analysis Letters 3: 213–219.

Dinsmoor, J. A. (2004). The etymology of basic concepts in the experimental analysis of behavior. Journal of the Experimental Analysis of Behavior 82: 311–316.

Donahoe, J. W., Burgos, J. E., and Palmer, D. C. (1993). A selectionist approach to reinforcement. Journal of the Experimental Analysis of Behavior 60: 17–40.

Drummond, D. C. , Tiffany, S. T., Glautier, S., and Remington, B. (1995). Addictive Behavior: Cue Exposure Theory and Practice. Chichester: Wiley.

Estes, W. K., and Skinner, B. F. (1941). Some quantitative properties of anxiety. Journal of Experimental Psychology 29: 390–400.

Eysenck, H. J. (1967). Single trial conditioning, neurosis and the Napalkov phenomenon. Behavior Research and Therapy 5: 63–65.

Field, M., and Duka, T. (2001). Smoking expectancy mediates the conditioned responses to arbitrary smoking cues. Behavioural Pharmacology 12: 183–194.

Frank, Robert H. (1992). Frames of reference and the intertemporal wage sequence. In G. Loewenstein and J. Elster, eds., Choice Over Time, pp. 371–382. New York: Sage.

Gallistel, C. R. (2002). Frequency, contingency and the information processing theory of conditioning. In P. Sedlmeier and T. Betsch, eds., Frequency Processing and Cognition. Oxford: Oxford University Press.

Gibbon, J. (1977). Scalar expectancy theory and Weber’s law in animal timing. Psychological Review 84: 279–325.

Gigerenzer, G. (2000). Adaptive Thinking: Rationality in the Real World. Oxford: Oxford University Press.

Gintis, H. (2007). A framework for the unification of the behavioral sciences. Behavioral and Brain Sciences 29: 1–61.

Glimcher, P. W., Kable, J., and Louie, K. (2007). Neuroeconomic studies of impulsivity: Now or just as soon as possible? American Economic Review 97: 1–6.

Grace, R. (1996). Choice between fixed and variable delays to reinforcement in the adjusting-delay procedure and concurrent chains. Journal of Experimental Psychology: Animal Processes 22: 362–383.

Green, L., Fisher, E. B., Jr., Perlow, S., and Sherman, L. (1981). Preference reversal and self-control: Choice as a function of reward amount and delay. Behaviour Analysis Letters 1: 43–51.

Green, L., Fry, A., and Myerson, J. (1994). Discounting of delayed rewards: A life-span comparison. Psychological Science 5: 33–36.

Green, L., and Myerson, J. (1993). Alternative frame-works for the analysis of self-control. Behavior and Philosophy 21: 37–47.

Green, L., and Myerson, J. (1996). Exponential versus hyperbolic discounting of delayed outcomes: Risk and waiting time. American Zoologist 36: 496–505.

Green, L., and Myerson, J. (2004). A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin 130: 769–792.

Green, L., Myerson, J., and Macaux, E. W. (2005). Temporal discounting when the choice is between two delayed rewards. Journal of Experimental Psychology: Learning, Memory, and Cognition 31: 1121–1133.

Harrison, G. W., Lau, M. I., and Rutström, E. E. (2005). Dynamic consistency in Denmark: A longitudinal field experiment. Working Paper 5-02, Department of Economics, College of Business Administration, University of Central Florida, January, 2005.

Harvey, C. M. (1994) The reasonableness of non-constant discounting. Journal of Public Economics 53: 31–51.

Herrnstein, R. J. (1990). Rational choice theory: Necessary but not sufficient. American Psychologist 45: 356–367.

Herrnstein, R. J., and Prelec, D. (1992). Melioration. In G. Loewenstein and J. Elster, eds., Choice Over Time, pp. 235–264. New York: Sage.

Hollander, E., and Stein, D. J. (2006). Clinical Manual of Impulse-Control Disorders. Washington, DC: American Psychiatric Publishing.

Hull, C. L. (1943). Principles of Behavior. New York: Appleton-Century-Crofts.

Jolls, C., Sunstein, C. R., and Thaler, R. (1998). A behavioral approach to law and economics. Stanford Law Review 50: 1471–1550.

Johnson, A., and Redish, A. D. (2007) Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. Journal of Neuroscience 12: 483–488.

Kahneman, D., and Tversky, A. (1984). Choices, values, and frames. American Psychologist 39: 341–350.

Kalenscher, T., Windmann, S., Diekamp, B., Rose, J., Gunturkun, O., and Colombo, M. (2005). Single units in the pigeon brain integrate reward amount and time-to-reward in an impulsive choice task. Current Biology 15: 594–602.

Kehoe, E. J., Graham-Clark, P., and Schreurs, B. G. (1989). Temporal patterns of the rabbit’s nictitating membrane response to compound and component stimuli under mixed CS-US intervals. Behavioral Neuroscience 103: 283–295.

Kirby, K. N. (1997). Bidding on the future: Evidence against normative discounting of delayed rewards. Journal of Experimental Psychology: General 126: 54–70.

Kirby, K. N. (2006). The present values of delayed rewards are approximately additive. Behavioural Processes 72: 273–282.

Laibson, D. (1997). Golden eggs and hyperbolic discounting. Quarterly Journal of Economics 62: 443–479.

Laibson, D. (2001). A cue-theory of consumption Quarterly Journal of Economics 66: 81–120.

Lea, S. E. G., and Webley, P. (2006). Money as tool, money as drug: The biological psychology of a strong incentive. Behavioral and Brain Sciences 29: 161–209.

Loewenstein, G. (1987). Anticipation and the valuation of delayed consumption. Economic Journal 97: 666–685.

Loewenstein, G. (1996). Out of control: Visceral influences on behavior. Organizational Behavior and Human Decision Processes 35: 272–292.

Loewenstein, G. (1999). A visceral account of addiction. In J. Elser and O.-J. Skog, eds., Getting Hooked: Rationality and Addiction. Cambridge: Cambridge University Press.

Logue, A. W., Pena-Correal, T. E., Rodriguez, M. L., and Kabela, E.(1986). Self-control in adult humans: Variations in positive reinforcer amount and delay. Journal of the Experimental Analysis of Behavior 46: 113–127.

MacKintosh, N. J. (1983). Conditioning and Associative Learning. New York: Clarendon.

MacLean, P. D. (1990). The Triune Brain in Evolution: Role in Paleocerebral Functions. New York: Plenum.

Malloy, P. F., and Levis, D. J. (1990) A human laboratory test of Eysenck’s theory of incubation: A search for the resolution of the neurotic paradox. Journal of Psychopathology and Behavioral Assessment 12, 309-327.

Maunsell, J. H. R. (2004). Neuronal representations of cognitive state: Reward or attention? Trends in Cognitive Sciences 8: 261–265.

Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A.

Nevin, and H. Rachlin, eds., Quantitative Analyses of Behavior V: The Effect of Delay and of Intervening Events on Reinforcement Value. Hillsdale, NJ: Lawrence Erlbaum.

Mazur, J. E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review 108: 96–112.

McClure, S. M., Daw, N. D., and Montague, P. R. (2003) A computational substrate for incentive salience. Trends in Neurosciences 26: 423–428.

McClure, S. M., Laibson, D. I., Loewenstein, G., and Cohen, J. D. (2004). The grasshopper and the ant: Separate neural systems value immediate and delayed monetary rewards. Science 306: 503–507.

McClure, S. M., Ericson, K. M., Laibson, D. I., Loewenstein, G., and Cohen, J. D. (2007). Time discounting for primary rewards. Journal of Neuroscience 27: 5796–5804.

McFarland, D. J., and Sibley, R. M. (1975). The behavioural final common path. Philosophical Transactions of the Royal Society of London B 270: 265–293.

Melzack, R., and Casey, K. L. (1970). The affective dimension of pain. In M. B. Arnold, ed., Feelings and Emotions, pp. 55–68. New York: Academic.

Meyer, R. E. (1988). Conditioning phenomena and the problem of relapse in opioid addicts and alcoholics. In B. Ray, ed., Learning Factors in Substance Abuse, pp. 161–179. NIDA Research Monograph series 84. Washington, DC: NIDA.

Miller, N. (1969). Learning of visceral and glandular responses. Science 163: 434–445.

Mischel, H. N., and Mischel, W. (1983). The development of children’s knowledge of self-control strategies. Child Development 54: 603–619.

Montague, P. R., and Berns, G. S. (2002). Neural economics and the biological substrates of valuation. Neuron 36: 265–284.

Montague, P. R., King-Casas, B., and Cohen, J. D. (2006). Imaging valuation models in human choice. Annual Review of Neuroscience 29: 417–448.

Monterosso, J., and Ainslie, G. (1999). Beyond discounting: Possible experimental models of impulse control. Psychopharmacology 146: 339–347.

Monterosso, J., Ainslie, G., and London, E. D. (2006). Delay discounting based on activation in the ventral striatum. Poster session presented at the 68th annual meeting of the College on Problems of Drug Dependence, Scottsdale, Arizona, June 2006.

Mowrer, O. H. (1947). On the dual nature of learning: A re-interpretation of conditioning and problem solving. Harvard Educational Review 17: 102–148.

O’Brien, C. (1997). A range of research-based pharmacotherapies for addiction. Science 278: 66–70.

O’Doherty, J. P. (2004). Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology 14: 769–776.

Pavlov, I. P. (1927). Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. Trans. G. V. Anrep. Oxford: Oxford University Press.

Pear, J. J., and Eldridge, G. D. (1984). The operant-respondent distinction: Future directions. Journal of the Experimental Analysis of Behavior 42: 453–467.

Peciña, S., Smith, K. S., and Berridge, K. C. (2006). Hedonic hot spots in the brain. Neuroscientist 12: 500–511.

Perkins, C. C., Jr. (1968). An analysis of the concept of reinforcement. Psychological Review 75: 155–172.

Phelps, E. S., and Pollack, R. A. (1968). On second-best national saving and game-equilibrium growth. Review of Economic Studies 35: 185–199.

Rachlin, H., and Green, L. (1972). Commitment, choice, and self-control. Journal of Experimental Analysis Behavior 17: 15–22.

Read, D. (2001). Is time-discounting hyperbolic or subadditive? Journal of Risk and Uncertainty 23: 5–32.

Read, D., and Powell, M. (2002). Reasons for sequence preferences. Journal of Behavioral Decision Making 15: 433–460.

Read, D., and Roelofsma, P. H. M. P (2003). Subadditive versus hyperbolic discounting: A comparison of choice and matching. Organizational Behavior and Human Decision Processes 91: 140–153.

Rescorla, R. A. (1988). Pavlovian conditioning: It’s not what you think it is. American Psychologist 43: 151–160.

Rodriguez, M. L., and Logue, A. W. (1988). Adjusting delay to reinforcement: Comparing choice in pigeons and humans. Journal of Experimental Psychology: Animal Behavior Processes 14: 105–117.

Rolls, E. T. (1999). The Brain and Emotion. Oxford: Oxford University Press.

Rolls, E. T. (2005). Emotion Explained. Oxford: Oxford University Press.

Rubinstein, A. (2003). “Economics and psychology”? The case of hyperbolic discounting. International Economic Review 44: 1207–1216.

Samuelson, P.A. (1937). A note on measurement of utility. Review of Economic Studies 4: 155–161.

Savastano, H. I., Hua, U., Barnet, R. C., and Miller, R. R. (1998). Temporal coding in Pavlovian conditioning: Hall-Pearce negative transfer. Quarterly Journal of Experimental Psychology 51: 139–153.

Schachter, S., Silverstein, B., and Perlick, D. (1977). Psychological and pharmacological explanations of smoking under stress. Journal of Experimental Psychology: General 106: 31–40.

Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology 57: 87–115.

Shizgal, P., and Conover, K. (1996). On the neural computation of utility. Current Directions in Psychological Science 5: 37–43.

Skinner, B. F. (1935). Two types of conditioned reflex and a pseudo type. Journal of General Psychology 12: 66–77.

Solnick, J., Kannenberg, C., Eckerman, D., and Waller, M. (1980). An experimental analysis of impulsivity and impulse control in humans. Learning and Motivation 2: 61–77.

Solomon, R., and Wynne, L. (1954). Traumatic avoidance learning: The principles of anxiety conservation and partial irreversibility. Psychological Review 61: 353–385.

Sonuga-Barke, E. J. S., Lea, S. E. G., and Webley, P. (1989). Childrens choice: Sensitivity to changes in reinforcer density. Journal of the Experimental Analysis of Behavior 51: 185–197.

Strotz, R. H. (1956). Myopia and inconsistency in dynamic utility maximization. Review of Economic Studies 23: 166–180.

Tanaka, S. C., Doya, K., Okada, G., Ueda, K., Okamoto, Y., and Yamawaki, S. (2004). Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience 7: 887–893.

Thaler, R. (1981). Some empirical evidence on dynamic inconsistency. Economics Letters 8: 201–207.

Thaler, R. (1985). Mental accounting and consumer choice. Marketing Science 4: 199–214.

Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Monographs 2: 1–109.

Tiffany, S. T. (1995). Potential functions of classical conditioning in drug addiction. In D. C. Drummond, S. T. Tiffany, S. Glautier, and B. Remington, eds., Addictive Behavior: Cue Exposure Theory and Practice. Chichester: Wiley.

Tolman, E. C. (1939). Prediction of vicarious trial and error by means of the schematic sowbug. Psychological Review 46: 318–336.

Trope, Y., and Liberman, N. (2000). Temporal construal and time-dependent changes in preference. Journal of Personality and Social Psychology 79: 876–889.

Trope, Y., and Liberman, N. (2003). Temporal construal. Psychological Review 110: 403–421.

Upton, M. (1929). The auditory sensitivity of guinea pigs. American Journal of Psychology 41: 412–421.

Vaillant, G. (1971). Theoretical hierarchy of adaptive ego mechanisms. Archives of General Psychiatry 24: 107–118.

Wason, P. C. (1966). Reasoning. In B. M. Foss, ed., New Horizons in Psychology, pp. 135–151. Hammondsworth: Penguin.

Watson, J. B. (1924). Behaviorism. New York: The Peoples Institute.

Wegner, D. M. (2002). The Illusion of Conscious Will. Cambridge, MA: MIT Press.

Zener, K. (1937). The significance of behavior accompanying conditioned salivary secretion for theories of the conditioned response. American Journal of Psychology 50: 384–403.

Zink, C. F., Pagnoni, G., Martin, M. E., Dhamala, M. and Berns, G. S. (2003). Human striatal response to salient nonrewarding stimuli. Journal of Neuroscience 23: 8092–8097.

2. Rachlin and Green (1972) had observed both preference reversal and behavioral commitment, but a requirement of 26 responses to make each choice prevented interpretation of these effects as functions of pure delay.

3. Modified from their formula 5 to express momentary value.

4. A limited warfare relationship implies that consistent choice has to be achieved strategically by some kind of precommitment. I have proposed recursive self-prediction, such that a person sees her current choice as a test case of whether she will choose a whole bundle of similar LL rewards in the future, as the mechanism of willpower and of related phenomena such as sudden failures of will and the experience of freedom of will (Ainslie 2001, 90–104; 2005). The recursive mechanism for sudden appetite in the absence of new information, proposed here, follows the same dynamic as sudden failure of will, and could sometimes constitute the initial phase of it.

5. There may be no instrinsic line between evaluating a behavior and initiating it. Edward Tolman’s (1939) original concept of vicarious trial and error, in which a subject estimates the reward for alternative choices by serially initiating them without committing to them, has recently been validated with single hippocampal neurons (Johnson and Redish 2007). To the extent that appetites are reward-dependent, the same process may govern them.

6. This definition modifies Rolls’s behavioral definition—”A reward is anything for which [a subject] will work” (1999, 60–61)—to recognize the potentially split-second duration of some preferences, too short to motivate work. Berridge (2003) makes a similar extension of the concept with “nonhedonic” reward.

7. McClure et al. (2007) have observed a related variability in what rewards people’s brains respond to as visceral. They report that a prospect of getting immediate coupons that will take days to exchange generates activity in subjects’ β centers, whereas a juice reward has to be delivered within one minute to do this. But in another laboratory, a promised cigarette in a week produces β activity (Monterosso, Ainslie, and London 2006).

The core process in addictions and other impulses: Hyperbolic discounting versus conditioning and cognitive framing

George Ainslie University of Cape Town, South Africa Veterans Affairs Medical Center, Coatesville