Derivation of “rational”
economic behavior from hyperbolic discount curves

George Ainslie
Veterans Affairs Medical Center, Coatesville PA, USA
University of Cape Town, South Africa
George.Ainslie@va.gov

This material is the result of work supported with resources and the use of facilities at the Department of Veterans Affairs Medical Center, Coatesville, PA, USA, and is thus not subject to copyright. The opinions expressed are not those of the Department of Veterans Affairs of the US Government.

Published in: American Economic Review, 81, 334-340, 1991.

(References updated for this online edition.)

Recent research has discovered frequent anomalies in the utilitarian reasoning of the normal human adult (Amos Tversky and Daniel Kahneman, 1981; Richard Thaler 1987). One of these seems especially inimical to a rational market economy: the finding that peoples' preference for a smaller good vs. a greater but more delayed good often changes as a function of the time the choice is made, even though the difference in delay stays constant. For instance, a majority of adults report that they would rather have $50 immediately than $100 in 2 years, but almost no one prefers $50 in 4 years over $100 in 6 years, even though this is the same choice seen at 4 years' greater distance (see myself and V. Haendel, 1983, for a systematic study).

Such change of preference as a function only of elapsing time is not an isolated finding, but has been observed in under graduates choosing between longer or shorter periods of access to a video game or relief from noxious noise; in women deciding whether or not to have anaesthesia for childbirth; in substance abuse patients choosing between different amounts of real money; and even in animals choosing between two amounts of food at different delays (see my 1992 book, ch. 3).

According to conventional utility theory, the value of delayed goods is discounted in an exponential curve; the curves from two alternative amounts of the same good available at different times should never cross in the absence of new information. However, when behavioral psychologists have conducted parametric studies of choice, they have found a radically different discount function that has become known as Herrnstein's matching law. Various experimental designs have given the curve a number of specific forms, but all are hyperbolic more bowed than an exponential curve, so that preference for goods of different sizes at different delays will indeed change as a function of time. The best version of this discount curve for a single good is probably J.E. Mazur's (1987):

(1) V = A / (ζ + Γ( T–t ) )

where V is the good's value (its ability to compete with alternatives), A is its amount, T is the time at which each good is available, and t the time of the behavior that obtains it (so that T–t is its delay from the moment of choice), ζ is an empirical constant that determines value at zero delay, and Γ is an empirical constant that modifies the steepness of the delay gradient. In the limited data available, neither constant seems to range far from 1.0.

If formula (1) is used to compare two goods separated by three units of time, the later good objectively twice as large as the earlier, the indifference point (where Vlater = Vearlier) occurs when the earlier good is available two units of time from the moment of choice. At a delay of five units, the larger good is preferred by 2/(1 + 5 + 3) = 2/9 to 1/(1+ 5) = 1/6, while at one unit the larger good loses by 2/5 to 1/2. The smaller good is temporarily preferred to the larger when the delay is short.

Insofar as choice is governed by the matching law, a tendency to form temporary preferences will present a major obstacle to rational planning: Any plan requiring a prolonged course of action will fail unless the person can arrange consistent motivation for or binding commitment to it. Looking at the long view, he may want to be generally thin, brave, and prudent, but to accomplish this, he will have to overcome strong desires for food, escape, and financial abandon in the immediate future. Ulysses and the Sirens will not be a remote fantasy but a central problem of life.

On the other hand, financial markets display a different pattern of human choice making. Participants behave as though they discount future goods at single digit, exponential rates, and they do so without having bound themselves to any obvious mast. But then, which is the "true" discount function—the matching law or the exponential rate available from the bank?

The evidence suggests some kind of mixture. Depending on the circumstances, ordinary people choose annual discount rates in the thousands (see my paper with Haendel) or hundreds (George Loewenstein and Thaler, 1989) of percent as well as the bank rate, and can be seen adopting committing devices that resemble the psychoanalysts' "defense" or "coping" mechanisms (see my 1992 book, ch. 5), as well as engaging in straightforward rational planning. Furthermore, a careful look at the implications of temporary preference formation suggests mechanisms by which a person who evaluates goods strictly according to the matching law can be expected to arrive at the banker's shallow exponential discount curves for at least some of his transactions, but imperfectly, as a result of varying skill and effort. These will be my topic.

I. Stability via Detection of Intertemporal Prisoner’s Dilemmas

This mechanism is the personal rule, that can be derived from the matching law as follows: an individual must make a series of choices between goods of amount Ai and later, larger goods of amount A'i (i.e., all A'i > Ai and all T’i > Ti), each choice will be described simply by the matching law evaluation of the two alternatives involved in that particular choice unless the choices are linked. If, however, the whole series of choices must be made all at once in the same direction, then the choice will be governed by the summed values of the goods on each side. Looking at ratios of summed values, each derived from formula (1), the crucial time at which preference between the two whole series of goods changes will be represented by the t when the value V' of the series of larger goods equals the value V of the series of smaller ones, called tindif :

If the choice is made before tindif it will favor the series of larger, later goods, and if it is made after tindif it will favor the series of smaller, earlier ones.

This would be a trivial application of the matching law to the case of serial choices except for an important phenomenon: tindif between the series of larger (primed) goods and the series of smaller (nonprimed) ones will move closer to the moment when the first smaller good is available as the series is made longer (see my 1975 article). The period of temporary preferences for the smaller good will be reduced or eliminated. Figure 1 shows this for the simplest case, two pairs of goods with both Ai = A and both A'i= A' = 2A. The curves show their values at all times t before they are due. The period of temporary dominance by the smaller, earlier good is shorter when the curves from another pair are added (just before the left-hand pair) than when only one pair remains available (just before the right-hand pair). With appropriate mathematical transformations, this finding holds equally for streams of continuous reward as for discrete, momentary goods.

The practical effect of choosing a whole series of goods at once is thus to increase the individual's tendency to choose the larger goods. That is, a given small, early good can be available more imminently without his forming a temporary preference for it. This predicted phenomenon is obviously relevant to the problem of intertemporal consistency.

But how can a person arrange to choose whole series of goods at once? In fact, he does not have to physically commit himself. The values of the alternative series of goods cannot depend on whether he will actually get them, an event that has not yet occurred, but only on his expectation of getting them. Assuming he is familiar with the outcomes of his possible choices, the main element of uncertainty will be what he himself will actually choose. In situations where temporary preferences are likely, he is apt to be genuinely ignorant of what his own future choices will be. His best information is his knowledge of his past behavior under similar circumstances, with the most recent examples probably being the most informative. Furthermore, if he has chosen the smaller good often enough that he knows self-control will be an issue, but not so often as to give up hope that he may choose the larger goods, his current choice is likely to be what swings his expectation of future goods one way or the other. If he makes an impulsive choice, he will have little reason to believe he will not go on doing so, and if he controls his impulse, he has evidence that he may go on doing that. The same logic is the basis for what is called a "self enforcing contract" between individuals (B. Klein and K.B. Leffler, 1981).

According to this logic, amplification of impulse control can be expected to occur to some extent whenever a person perceives a series of confrontations with temptations as similar to each other. He will not necessarily notice the process itself, or develop any way of describing it. He may develop an extensive practical understanding of it by trial and error, but have only tangential theories about how it works. However, insofar as he has become aware of this phenomenon, he will be able to induce it where it has not occurred spontaneously, by arbitrarily defining a category of gratificationdelaying behaviors that will thereafter prevail or not as a set (see my 1975 article).

Such personal rules can be seen as a solution to a bargaining problem. The temporary preference phenomenon creates a relationship among an individual's successive motivational states that can be described as limited warfare (T.C. Schelling, 1960). Successive motivational states have some interests in common, and others that are peculiar to them. The interests in common are identical with the person's longrange interest. The peculiar ones are shortrange interests in whatever goods happen to be imminently available. At any given time, the alcoholic wants to drink less in the aggregate (he does not want to be an alcoholic), but he may want to drink a great deal currently. His long-range interest, common to all his successive motivational states, is to be generally sober; this interest is challenged, and often overwhelmed, by a succession of short-range interests in getting drunk just once.

To explore an everyday example: Say that a person at midnight faces the choice of staying up for two more hours and having fun before he finally gives in to fatigue, but feeling tired at work the next day, vs. giving up his present fun and expecting to feel rested at work. He values the imminent fun at 60 units per hour, and expects to lose 60 units per hour of comfort from when he gets up at 7 A.M. until he leaves work at 5 P.M. At midnight, the value of staying up will be

and the differential value of feeling rested at work will be

Given only this one choice, he will stay up and suffer the next day. However, if he faces this choice nightly, he may perceive his current choice as a precedent for future nights as well. Assuming he believes that he will go to bed on time on subsequent nights if he does tonight, and not otherwise, the values of his alternatives are

for staying up on the next 10 nights, vs. (by similar calculation) 105 for going to bed early on the next 10 nights. He will go to bed, if he expects that he will thereby be motivated to follow suit on the subsequent nights.

Considering separately the present values of the alternatives in his first choice (64 vs. 49), and the present values of two subsequent series of 9 choices all in one direction (14 for always staying up vs. 56 for always going to bed), his incentives create a prisoner's dilemma between his own present and future motivational states (Table 1), one that he will face on a nightly basis. From his present point of view, going to bed on time both today and in the future is worth 105, and staying up both today and in the future is worth 78. However, if he can stay up today and still expect to go to bed on time in the future, that is worth 120. Conversely, if he goes to bed today but fails to also go to bed in the future, that is worth only 63. The latter two outcomes respectively represent his successfully finding a loophole that excepts the present case from the string of precedents, and falsely hoping that his current cooperation would give himself sufficient incentive to follow suit in subsequent choices.

The person's best move at present will thus depend on how he forecasts his future perceptions. Insofar as he sees his current choice as a precedent and not an isolated instance, he will face the incentives of a repeated prisoner's dilemma.

II. Factors Stabilizing a Person's Valuation of Money

The incentives to cooperate in an internal prisoner's dilemma will take us part of the way from the sharp fluctuations of matching law discounting to the prudence of rational investment. It is likely that valuations of financial transactions start out the same way as the valuations of the visceral goods like alcohol and sleep that have just been mentioned. Assume for instance that a person likes to skimobile in the winter and sail in the summer, that he is just willing to pay what each vehicle costs ($1000 for a used model on which there is little annual depreciation) at the beginning of the season, and that he can sell each back to a store at the end of the season for 25 cents on the dollar. Then every 6 months, he will face the choice of getting $250 immediately vs. saving $1000 in 9 months, alternatives that formula (1) evaluates at $250 and $100, respectively, if ζ = 1 month. If he actually sees this as an isolated instance, the matching law predicts that he will sell. However, if he expects to face a similar choice twice a year for roughly the next 20 years, the incentives he faces are the sum of 40 $250s, one immediate, the others discounted, vs. 40 $1000s, each 9 months after the corresponding $250. These contingencies, too, represent a prisoner's dilemma (Table 2). He will hold his equipment for next season, if he believes he must do so this time in order to continue doing so (i.e., if the upper right cell is not a credible outcome), and that he will continue to do so if he does so this time (i.e., if he does not see a great risk that the lower left outcome will occur).

Seeing a transaction as a member of a larger category somewhat dampens the fluctuations in spontaneous value predicted by the matching law. However, even the person who does this in the above example will be indifferent between selling and holding at a selling price of $354, for a good that will be worth $1000 to him in just 9 months. While people may sometimes buy and sell according to discount rates of such magnitude (the effective rates in Hausman's study of air conditioner purchases reached 89 per cent; see Loewenstein and Thaler), it is still much higher than would be called either normal or prudent. We must appeal to three additional factors to produce the stability with which economists are familiar.

First, cash pricing makes a wide variety of transactions conspicuously comparable, and hence invites an encompassing personal rule about the value of money generally. Just as the person in the foregoing example achieved more constancy than he otherwise would by seeing his choices about the ski mobiles as related to his choices about the sailboats, so he will become more constant still if he sees each of his financial transactions as a precedent for all others. That is, if he sees what he spends for food, clothes, movie tickets, toys, postage stamps, etc., as examples of wasting or not wasting money, he will add thousands of examples to his interdependent set of choices, each flattening his effective discount curve by a greater or lesser amount.

The ease of quantitatively evaluating and comparing all financial transactions lets the value of purchasable goods fluctuate much less over time than, say, the value of an angry outburst, or of a night's sleep. Accordingly, it is common to see someone swayed considerably more by today's emotional comfort than by next year's, but also common to see him behave as if today's wealth were worth only a tiny fraction more than next year's. However, an encompassing rule creates as a side effect the familiar sight of people pinching pennies lest they set a wasteful precedent. The stability that it brings about is apt to be insensitive to cost effectiveness.

Second, financial transactions tend to become rivalrous activities. This adds an additional stake to the intrinsic consumption value of the goods involved in these transactions. Unless the night owl robs his sleep so much that he snoozes at work, his discomfort is unlikely to become the basis for invidious distinctions between himself and his fellow citizens. However, in buying and selling, he is not choosing simply in parallel with his neighbors, but in competition with them. If some of them are prudent enough to buy his skimobile every spring for $500 and sell it back to him every fall for $1000, they will soon wind up richer than he, and rewards in power over him, not to mention the sweetness of victory, will be added to the goods that originally seemed to be at stake. Of course, rivalry may also make people rash; but if the relevant choices are perceived in series of precedents as above, they will further enlarge the motivation for deferral of gratification. It may be the need to defend at least rough comparability with one's peers that makes the credit card rate of 20 percent the indifference point for consumer interest, instead of 200. Conversely, it may be the relative invisibility of Japanese growth to American consumers that lets the latter continue to indulge in a 20 percent rate rather than setting a still lower one, despite the consequent transfer of wealth overseas.

Of course, a society often makes nonfinancial activities a basis of competition as well. Where people gain an advantage by staying hungry to attain stylish slimness, or by cultivating sexual indifference to increase their bargaining power with partners who are more readily aroused, the personal rules governing these activities gain power from these additional stakes just as the rules governing the value of money do, and can sometimes motivate heroic acts of abstention. However, just as cash pricing labels the largest number of a person's choices as comparable, so it engages the largest number of people in social competition.

Finally, a person can set up his personal rules so that investment decisions are not weighed against his strongest temptations. As H.M. Shefrin and Thaler (1988) have recently pointed out, people assign their wealth to different "mental accounts" such as current income, current assets, and future income. These accounts seem to represent personal rules for how readily the money they govern may be used to satisfy immediate wants (see my 1992 book, ch. 7). Skillful designation of income as "investment" vs. "spending" money (or even less protected "pin" money) may permit a person to accept bank discount rates on some of his money, while still appeasing temptations that might overwhelm a monolithic rule. Thus an individual may evaluate goods in compartments, requiring investment rationality in one, abandoning himself to the matching law in another, and probably following intermediate rules in still others.

However, temptation can defeat this strategy not only by sheer intensity or immediacy (as some drugs notoriously do), but by motivating the search for loopholes in the rules that assign money to one account or another. In Shaw’s Pygmalion, the ne'erdo-well Alfred P. Doolittle asks Professor Higgins for five pounds to spend on "one good spree for myself and the missus." The professor is charmed by Doolittle's hedonism and offers him ten, but Doolittle refuses. "Ten pounds is a lot of money: it makes a man feel prudent like; and then goodbye to happiness." Most rationalizations are less costly, but may still erode a person's will to maintain an account governed by conventional prudence, leading perhaps to the compulsive spending syndromes that are so often observed.

Taken together, the mechanisms just discussed may account for why priceable goods are not usually evaluated simply according to the matching law, even though that law may underlie all valuations. I have argued that goods are assigned value according to their strategic importance in a repetitive, intertemporal prisoner's dilemma, cooperation in which may maximize objective income despite the vicissitudes of spontaneous preference. Such a process creates a special realm where discounting is shallow and exponential, a realm that exists as a special case of the matching law in much the same way that Newtonian physics forms a special case of relativistic physics. The consequence for utility theory is that most valuations are made not only or even mainly according to a present hunger, but according to the precedents they will set. If such a valuation process can sometimes make five pounds sterling worth more than ten, it may have the power to account for a number of other anomalies in utility theory as well; but that is another story.

Acknowledgements

Suggestions by Elmer Schaefer, Margo Schaefer, George Loewenstein, and Drazen Prelec are gratefully acknowledged.

References

Ainslie, G. (1975). Specious reward: a behavioral theory of impulsiveness and impulse control. Psychological Bulletin, 82, 463-496.

Ainslie, G. (1992). Picoeconomics: The strategic interaction of successive motivational states within the person. New York, NY, USA: Cambridge University Press.

Ainslie, G., & Haendel, V. (1983). The motives of the will. In E. Gottheil, K. Druley, T. Skodola & H. Waxman (Eds.), Etiology Aspects of Alcohol and Drug Abuse. Springfield, Illinois: Charles C Thomas.

Klein, B. and Leffler, K.B. (1981). The role of market forces in assuring contractual performance. Journal of Political Economy, 89, 615-41.

Lowenstein, G. and Thaler, R.H. (1989). Anomalies: Intertemporal choice. Journal of Economic Perspectives, 3, 181-93.

Mazur, J.E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons et al., eds., Quantitative Analyses of Behavior V: The Effect of Delay and of Intervening Events on Reinforcement Value. Hillsdale: Erlbaum.

Schelling, T.C. (1960). The Strategy of Conflict, Cambridge: Harvard University Press.

Shefrin, H. M. and Thaler, R. H. (1988). The behavioral life-cycle hypothesis. Economic Inquiry, 26 , 609-43.

Thaler R. (1987). Anomalies: Saving, fungibility, and mental accounts. Journal of Economic Perspectives, I, 197- 201.

Tversky, A. and Kahneman, D. (1981) The framing of decisions and the psychology of choice. Science, 30, 453-58.

Derivation of “rational” economic behavior from hyperbolic discount curves