- Original Paper
The effect of 5-HT1A receptor antagonist on reward-based decision-making
The Journal of Physiological Sciences volume 69, pages 1057–1069 (2019)
When choosing the best action from several alternatives, we compare each value that depends on the balance between benefit and cost. Previous studies have shown that animals and humans with low brain serotonin (5-HT) level tend to choose smaller immediate reward. We used a decision-making schedule task to investigate whether 5-HT1A receptor is responsible for the decisions related to reward. In this task, the monkeys chose either of two different alternatives that were comprised of 1–4 drops of liquid reward (benefit) and 1–4 repeats of a color discrimination trial (workload cost), then executed the chosen schedule. By the administration of 5-HT1A antagonist, WAY100635, the choice tendency did not change, however, the sensitivity to the amount of reward in the schedule part was diminished. The 5-HT1A could have a role in maintaining reward value to keep track with the promised reward rather than modulating workload discounting of reward value.
Whenever we choose an option from two or more alternatives to obtain associated reward, we compare the expected value. The value depends on the balance between expected benefit (reward amount, quality, etc.) and potential cost (needed money, expensed time, workload, etc.).
The relation between the level of brain 5-HT and temporal discounting has been extensively examined [1,2,3,4,5]. Another line of study focused on the relation between brain 5-HT and effort discounting . However, the delay and the effort are inseparably entangled each other under the decision-making in an actual context such as foraging  and motor control-associated decision . Therefore, as an integrated variable of the cost, we introduced a “workload” into a decision-making schedule task we previously developed [9, 10]. The workload is the equivalent of a length of the schedule that comprises an iteration of simple color discriminations. The monkey can choose one of two different schedules to earn promised reward. Using an economic model, we showed that the sunk cost explained by the workload in completed color discriminations positively weighted the future reward . Conversely, remaining workload discounted the value of reward .
A question now arises: Is the value discounting by workload also modulated by brain 5-HT as the delay discounting broadly acknowledged [1,2,3,4,5]? If this is the case, which subtype of 5-HT receptor is involved in the value discounting?
Previous studies revealed that low level of 5-HT in brain was implicated in the impulsive choice. The animals with the lesions in their ascending serotonergic pathway or the humans whose brain 5-HT level are lowered often choose small immediate reward [1,2,3,4], whereas an increase in 5-HT levels lead the subjects to choose delayed larger reward [1, 5]. Miyazaki et al. [11,12,13] investigated the relation between the activity of putative 5-HT neurons and impulsiveness that was manifested in impatience for future reward. The 5-HT neurons in dorsal raphe showed increased firing rate of tonic activity during waiting for delayed reward. This activity seemed to be necessary for waiting delayed reward. All these studies had examined the relation between brain 5-HT and the patience during delay.
On the other hand, the relation between the subtypes of 5-HT receptor and the impulsiveness has been extensively investigated in the rodents performing a delay discounting task. However, the studies using 5-HT1A agonist 8-OH-DPAT [14,15,16,17,18], 5-HT1A antagonist WAY100635 , 5-HT2 antagonist ritanserin , ketanserin , SB242084 , or 5-HT3 antagonist MDL72222 , granisetron , ondansetron  provided confounding results. We could not reach an integrated understanding of the involvement of 5-HT receptor subtypes onto the behavior influenced by the balance between reward and cost.
Hence, we initially aimed to examine how the blockade of 5-HT2A modulated the workload choice and the behavior during schedules. Of 14 subtypes, this subtype is most widely distributed over an entire neocortex and exists in great quantity . However, our preliminary experiment using 5-HT2A antagonist MDL-100907 at the dose inducing 50% occupancy did not show consistent change in the monkeys’ task performances, indicating that this receptor subtype did not seem to be implicated in the task related to workload (data are not shown). Now, we focused on 5-HT1A subtype as a target of manipulation. This subtype is densely expressed in the structures connecting to or partly consisting the limbic system . The quantity of 5-HT1A is the second largest in the forebrain . To investigate the role of 5-HT1A in workload discounting, we injected widely used 5-HT1A antagonist WAY100635 to the monkeys immediately before starting the decision-making schedule task, and accessed the behavior.
Materials and methods
Three male rhesus monkeys (Macaca mulatta; monkey Y, ~ 11.3 kg; monkey H, ~ 7.3 kg; monkey P, ~ 6.4 kg) were used. All monkeys learned the decision-making schedule task within 12 months. All experimental procedures were approved by Animal Care and Use Committee of the University of Tsukuba, and were in accordance with the Guide for the Care and Use of Laboratory Animals in the University of Tsukuba.
The monkeys sat in a standard primate chair and faced a 22-in. cathode-ray tube (CRT) monitor (CV921X; TOTOKU, Japan) placed 95 cm apart from its eyes (Fig. 1a). Three touch sensitive bars that were named center bar, right bar, and left bar were attached to the front panel of the primate chair at the level of the monkey’s hand. The latter two bars were choice bars since the monkey makes a decision by touching one of them. Water reward was delivered through a stainless tube attached to the monkey’s lip. The experiment was conducted in the soundproof chamber in which sound was further masked by a white noise. Experimental control and data acquisition were performed using real-time experiment system “REX”  adapted for the QNX operating system. Visual stimuli were presented by “Presentation” (Neurobehavioral Systems, Inc., Albany, CA) running on the Windows PC which communicated with the REX PC.
Decision-making schedule task
The decision-making schedule task was consisted of a decision-making part and a reward schedule part .
In the decision-making part, the monkey chose a schedule from 2 alternatives (Fig. 1b). In front of the CRT monitor, the monkey sat in the primate chair equipped with three bars. When the monkey touched the center bar, the decision-making part began. After 500 ms from the onset of the fixation spot (a small white square, 0.17° × 0.17°) that was presented at the center of the monitor, the choice targets appeared on either side of the fixation spot for 3 s. The choice targets indicated the alternatives of schedules that the monkey could choose. The brightness and the length of the choice target were proportional to the reward amount (1 drop: 25% brightness; 2 drops: 50% brightness; 3 drops: 75% brightness; 4 drops: 100% brightness = white) and schedule length (1 schedule: 25% length (1.50° × 0.60°); 2 schedules: 50% length (3.00° × 0.60°); 3 schedules: 75% length (4.50° × 0.60°); 4 schedules: 100% length (6.00° × 0.60°), respectively (Fig. 1c). Different two schedules were randomly picked up from the set of 16 schedules, therefore there were 16C2 = 120 combinations of the alternatives. To make a decision the monkey had to touch either the right or the left bar at ipsilateral side of the chosen target between 150 and 3000 ms after the choice targets were presented. If the monkey kept touching the choice bar for 500 ms, the unchosen target and fixation spot were immediately extinguished. The chosen target was also extinguished after additional 500 ms. If the monkey chose neither of the alternatives within 3000 ms after the choice target presentation, the decision-making part was counted as a late error. Touching the choice bar too early (within 150 ms after the onset of choice targets) was scored as an early error. Touching to the center or unchosen bars within 500 ms after making a choice were scored as a bar error. When the monkey failed in the decision-making part, the fixation spot and the choice targets were extinguished and the trial was terminated. The inter-trial-interval (ITI) of 1000 ms was interleaved after the choice error and the same decision-making part began.
Reward schedule part
After 1000 ms interval following successfully completed the decision-making part, the reward schedule part began (Fig. 1d, e). In this part, the monkey was required to perform the chosen schedule to earn the chosen amount of liquid reward (0.15 ml of water per drop).
A schedule consisted of 1, 2, 3, or 4 repeats of a color discrimination trial. When the monkey touched the center bar the color discrimination trial began (Fig. 1d). At the beginning of the trial, a white rectangle visual cue was presented at the top of the monitor. After 800 ms from the onset of the visual cue, the fixation spot (a small white square, 0.17° × 0.17°) was presented at the center of the monitor. Four hundred milliseconds after the fixation spot appeared, the spot was replaced with a red square (WAIT signal, 0.40° × 0.40°). While the red square was presented, the monkey had to keep touching the center bar. After 800 ms of WAIT signal presentation, the color of the square changed to green (GO signal, 0.40° × 0.40°). If the monkey released the center bar within 150–1000 ms after the green square appeared, the color of the square changed to blue (OK signal, 0.40° × 0.40°). The OK signal indicated that the trial had been completed successfully. The visual cue and the square were extinguished after 250–350 ms from the onset of the OK signal and the liquid reward was given. The schedule state is indicated as trial/schedule length (e.g. 2nd trial in 3 trial schedule is 2/3).
An error occurred when the monkey released the center bar too early (during the appearance of the cue, the fixation point, or the WAIT signal and earlier than 150 ms after the appearance of the GO signal). This error was scored as an early error. If the monkey did not release the center bar within 1000 ms from the onset of the GO signal, this case was scored as a late error. Touching the right or left bar (choice bars) was also scored as a bar error. When the monkey made these errors, the visual cue and the square were extinguished and the trial was terminated immediately. The ITI of 1000 ms duration was interleaved after either every rewarded trial or every error.
During the task, the white rectangle visual cue was presented at the top of the monitor (Fig. 1e). Brightness and length of the visual cue indicated reward amount and the schedule progress, respectively. The brightness of the visual cue was proportional to the reward amount (i.e. 1 drop: 25% brightness; 2 drops: 50% brightness; 3 drops: 75% brightness; 4 drops: 100% brightness = white). The visual cue was lengthened proportionally to the schedule progress (i.e. 1/4: 25% of full length (6.06° × 0.60°); 1/3: 33% of full length (8.08° × 0.60°); 1/2 and 2/4: 50% of full length (12.12° × 0.60°); 2/3: 66% of full length (16.16° × 0.60°); 3/4: 75% of full length (18.18° × 0.60°); 1/1, 2/2, 3/3 and 4/4: 100% of full length (24.24° × 0.60°). The trials with the longest cues were rewarded while those with shorter cues were unrewarded.
Three monkeys were initially trained to perform the simple color discrimination trial. When the correct performance rate exceeded 80%, a multi-trial reward schedule task (1, 2, 3, and 4 schedule) was introduced as a training for the reward schedule part. After leaning the multi-trial reward schedule task, we introduced the decision-making part. After the correct rate of the decision-making task exceeded 80%, pharmacological experiments were started. During the experiment the water reward was deprived, and was dispensed during performing the task.
5-HT1A receptor antagonist, WAY100635 (0.15 mg/kg; SIGMA), was dissolved in saline. A PET study showed that about 50% of 5-HT1A in monkey brain was occupied at the dose of 0.3 mg/kg (information was provided through a personal communication with Dr. Minamimoto who is a research fellow in National Institute of Radiological Sciences, Chiba, Japan). However, in our preliminary experiment, the monkeys were reluctant to perform the task at this dose. Therefore, we used the dose of 0.15 mg/kg. The receptor occupancy can be calculated by the equation below.
where [DR] is the amount of the receptor bound with the drug, [R] is the amount of total receptor, [D] is the concentration of the drug, and Kd is an equilibrium dissociation constant. As 50% of 5-HT1A was occupied at the dose of 0.3 mg/kg, Kd is 0.5, therefore, the receptor occupancy approximates 33% at the dose of 0.15 mg/kg.
During 2 weeks, the behavioral experiments were conducted on every weekday. The monkey was deprived of water 1 day before each experiment. We administrated 0.15 mg/kg drug solution to the monkeys twice a week (Tuesday and Thursday in the first week, Wednesday and Friday in the second week) systemically (intramuscular injection) 15 min before the start of the decision-making schedule task. For control condition, the vehicle (saline) was administrated similarly 1 day before the drug experiment in each week (Monday and Wednesday in the first week, Tuesday and Thursday in the second week). We performed the drug injection on different weekday (Tuesday–Friday) to counterbalance the day effect of a week. The task was stopped when the monkeys stopped working on the task.
Before conducting the present experiment, we preliminary estimated the concentration of WAY100635 in the monkey brain. A PET study using rhesus monkeys showed that a two-tissue compartment model  well explained the kinetics of WAY100635 in the brain . Using the plasma curve fitted from a plot  and the values of parameters provided by previous studies , the model predicted that the concentration in the brain would reach the peak at 13 min after injection. It would decrease to half level of the peak in 90 min. WAY100635 would be almost washed out at 400–500 min after the injection. Presumably, the concentration of WAY100635 in the brain would be largely maintained throughout the task. The pharmacological effect would not carry over the next day allocated for the control condition.
The number of performed trials varied day by day. As closer to the end of the experiment in each day, the behavioral performance became increasingly unsteady, maybe because the monkeys got satiety and their fatigue grew. To minimize those effects outside the pharmacology, we removed the data around the end of each experiment. After finished all the experiments, we identified the day in which the monkey earned the least number of the drop of reward. For other days, we used all the behavioral responses before the trial in which the accumulated number of the drop of earned reward reached the least number.
The running time for the sessions (four sessions in the control condition and four sessions in the antagonist condition) ranged 85–137 min in monkey Y, 117–148 min in monkey H, and 67–168 min in monkey P. The number of accumulated rewards in the day in which the monkeys earned the least amount of total reward was 747 drops in monkey Y, 908 drops in monkey H, and 416 drops in monkey P. In every session, we removed the data after the accumulated number of dispensed rewards reached those least amounts. Accordingly, the length of analyzed data varied between 61 and 92 min in monkey Y, 77 and 125 min in monkey H, and 33 and 99 min in monkey P.
Error rate and reaction time
We counted the number of all the performed choices in the decision-making part. The number of the correct and the failed choices were compared between two conditions by a Chi-squared test. We also analyzed the reaction time from all the correct choices in the decision-making part. The reaction time was defined as the time from the choice target onset to the time when the monkey touched either side bar. Since the bar release movement was different between when touching the left and the right bars, we sorted reaction times during the decision-making part as the left or the right bar group based on the chosen side. The reaction time was collected in either group and compared between two conditions by a t test.
As for the behavior in the reward schedule part, we counted the number of error and correct color discriminations, and compared them by the Chi-squared test between two conditions. The reaction time for the color discrimination was also analyzed from all the correct trials in the reward schedule part. We defined the reaction time in the reward schedule part as the time from the green square onset to the time when the monkey released the center bar. They were compared by the t test between two conditions.
Discount factor estimated by choice probability
Using the behavioral data during the decision-making part in the control condition, we counted the number of successfully performed decision-making and the number of chosen options in a given pair of two alternatives. Then we calculated the ratio of choice options across all 120 pairs.
The ‘R’ statistical computing environment (R development core team 2008) was used for statistical analyses. To analyze the choice behavior, the ratio of choice during the decision-making part was fit by a value discounting model. In our decision-making schedule task, the reward value is discounted by workload which is consisted of physical effort and delay. Our previous report showed that the subjective reward values of the choice targets are discounted exponentially in the decision-making schedule task [9, 10]:
where V is the current reward value, R is reward amount (1, 2, 3, or 4), k is the discounting factor, D is the number of the color discrimination trial to obtain reward (1, 2, 3, or 4). Then, the difference in the value between two targets, g, was calculated by a following equation:
where Vleft is the value of the left choice target and Vright is the value of the right choice target. The difference in the value was transformed to the choice ratio (ranges from 0 to 1) through a sigmoidal function.
where the p is the ratio of the choice calculated across all the 120 pairs, a is an intrinsic parameter to define the sensitivity to the value difference g. Using the function “optim” provided by ‘R’, we performed fitting of the choice data to investigate whether the choice ratio p could be explained from both the value difference g and the discounting factor k.
Comparison of choice behavior
To analyze whether 5-HT1A receptor antagonist had an effect on monkey’s choice based on the reward value, we performed a logistic regression analysis using the generalized linear model (GLM) with a binomial link function as follows:
The dependent variable y is binary (0/1); the choice of left or right. The predictor variable xvalue is the difference in value between the left target and the right target, and xcondition is the vehicle or the drug condition. The target values are calculated by the exponential discounting function (Eq. 2). For this purpose, discounting factor k estimated from the control condition was applied. ρ0 and ρ1 are the coefficients estimated by GLM.
Fitting by an economic model
We previously reported that an economic model could predict the behavior during reward schedule part . An extended context-sensitive (ECS) model is based on the temporal difference learning model. Incorporated with the sunk cost and the diminishing marginal utility, ECS model allows us to describe the value of forthcoming reward in τth trial of s length schedule as follows:
where V(τ, s) shows the value expressed in τ/s schedule state, r denotes the number of reward drop, m (0 < m < 1) determines the degree of diminishing marginal utility, γ (0 ≤ γ < 1) is temporal discounting rate, and σ (0 ≤ σ < 1) is the fraction of the value carried forward to the subsequent trial. The value V(τ, s) could be transformed to error rate E(τ, s) (0 ≤ E ≤ 1) through Sigmoidal function below.
where the parameter C (0 ≤ C < 1) denotes the lower asymptote, β controls the steepness of the Sigmoidal curve, and δ determines the degree of horizontal shift. We sought the best fit value of parameters that minimizes the square sum of the difference from actual error rates by 1/50 steps.
We administered 5-HT1A receptor antagonist WAY100635 to three monkeys and analyzed the effect on their choice tendency, error rate and reaction time in the decision-making part, and those in the reward schedule part.
5-HT1A receptor antagonist did not show consistent effect on the choice
Firstly, we analyzed the error rate in the decision-making part. The choice error rate did not show consistent changes among three monkeys (Fig. 2). In two monkeys, the choice error rate in antagonist condition was significantly larger than in control condition (monkey Y: p = 0.026, monkey H: p = 0.97, monkey P: p = 0.0007, Chi-squared test, with FDR correction). The ratio of error types also did not show consistent changes among the monkeys (Fig. 2, Table 1, p values were corrected with FDR). Only one monkey showed significant differences in early and bar errors. Next, we investigated whether the reaction time in the decision-making part was affected by the drug administration. All the monkeys showed significantly longer reaction times in the antagonist condition than in the control condition (when choosing the left side target; monkey Y, 880 ms in drug vs 794 ms in control: p ~ 0, monkey H, 829 ms vs 701 ms: p ~ 0, monkey P, 675 ms vs 608 ms: p = 0.00058. when choosing the right side target; monkey Y, 917 ms vs 838 ms: p ~ 0, monkey H, 1040 ms vs 881 ms: p ~ 0, monkey P, 738 ms vs 631 ms: p ~ 0, t test, with FDR correction) (Table 2).
We then collected the successful trials of the decision-making part, and analyzed choice probabilities in all the combinations of the choice targets during the antagonist and the control conditions (antagonist; monkey Y: 1122 trials, monkey H: 1434 trials, monkey P: 656 trials, control; monkey Y: 1096 trials, monkey H: 1402 trials, monkey P: 603 trials). We fit the choice probability in all the combinations of options. The mathematical model for fitting depended on the exponential discounting model of reward value (Eq. 2). Figure 3 shows results by the model fitting. The parameter explaining workload discounting, k, did not show consistent changes by the antagonist among three monkeys (monkey Y, control: k = 0.493, antagonist: k = 0.482; monkey H, control: k = 0.464, antagonist: k = 0.519; monkey P, control: k = 0.575, antagonist: k = 0.527) (left side columns in Table 3).
We calculated the difference in the target values across all combinations of alternatives based on Eq. (3) in the control condition, then performed a logistic regression analysis by using the GLM (Eq. 5) to check the difference of the choice tendency between the antagonist and the control condition. Consistent with the findings in our previous report , the monkeys here tended to choose the option associated with higher value schedule in either condition. However, significant change in the probabilities of choice between conditions was found only in one monkey (right side columns in Table 3).
5-HT1A receptor antagonist increased the error rate and the reaction time in the reward schedule part
Subsequent to the decision-making part, the monkey had to perform the chosen reward schedule consisted of the repeat of the simple color discriminations to obtain the promised reward.
The error rates in the drug and the control conditions were sorted by the schedule states and the reward amount (Fig. 4a–c). In either condition, the error rates seemed to be higher in longer schedule. Furthermore, the error rates in the schedule states with larger reward seemed to be smaller. Those results were consistent with our previous reports [9, 19].
The overall error rates in the antagonist condition (monkey Y: 153/2238, 6.8%; monkey H: 587/3199, 18.3%; monkey P: 259/1410, 18.4%) were significantly larger than those in the control condition (monkey Y: 98/2182, 4.5%; monkey H: 220/2777, 7.9%; monkey P: 97/1202, 8.1%) (monkey Y: p = 0.00096, monkey H: p ~ 0, monkey P: p ~ 0, Chi-squared test with FDR correction) (Fig. 4d). We further investigated the change in the ratio of each error type. The late error was significantly increased by the antagonist in all three monkeys (Table 4, p values were corrected with FDR).
The averaged reaction times in the antagonist condition were significantly longer than those in the control condition (t test, Monkey Y: 489 ms in drug condition vs 416 ms in control condition, p ~ 0; Monkey H: 548 ms vs 473 ms, p ~ 0; Monkey P: 634 ms vs 460 ms, p ~ 0, with FDR correction) (Table 5).
5-HT1A receptor antagonist reduced the sensitivity to the amount of reward
We fitted the error rates of the schedule part by the ECS model. The best fit values of the parameters derived from ECS model were listed in Table 6. Substituting them into the ECS model, the error rates could be predicted. The predicted error rates were overlaid on the actual error rates in Fig. 4a–c. In most of the schedule states in the drug condition (closed circles), the slope of the model predicted line across different amount of reward seems to be smaller than that in the control condition (open circles). This implied that the error rates in the drug condition might be less sensitive to the amount of reward than in the control condition. Therefore, we checked how much the predicted value V(τ, s) in given schedule state differed between the trial with 1 drop reward and the trial with 4 drops reward. These value differences were compared between the drug condition and the control condition. In all 3 monkeys, the value differences were smaller in the drug condition than in the control condition (t test, Monkey Y: 0 in drug condition vs 0.91 in control condition, p = 1.655 × 10−6; Monkey H: 0.19 vs 2.28, p = 1.285 × 10−6; Monkey P: 0.17 vs 0.24, p = 4.574 × 10−5).
We also quantified how much the model predicted error rates E(τ, s) differed between the trial with 1 drop reward and the trial with 4 drops reward. Those differences in the drug condition were smaller and significantly less varied than in the control condition (F test, Monkey Y: 0.19 in drug condition vs 0 in control condition, p ~ 0; Monkey H: 0.28 vs 0.21, p = 0.0013; Monkey P: 0.24 vs 0.17, p ~ 0).
We then sorted the error rates predicted by ECS model by the amount of reward (lines and circles in three panels in Fig. 4e). The predicted error rates in the control condition became smaller as the amount of reward increased. In contrast, the predicted error rates in the drug condition did not change a lot in any amounts of reward. These results suggest that 5-HT1A antagonist WAY100635 reduces the sensitivity to the amount of reward in the schedule part.
These predicted data well met the actual data in the schedule part (bars in Fig. 4e). The standard deviations of the error rates were calculated among the schedule states in each amount of reward. This varied wider in the control condition (monkey Y: 0.02–0.12, monkey H: 0.04–0.28, monkey P: 0.03–0.20) than in the antagonist condition (monkey Y: 0.05–0.11, monkey H: 0.09–0.29, monkey P: 0.08–0.16). In the control condition, Bartlett’s test showed that the variance of the error rate significantly differed across the amount of reward (monkey Y: χ2 = 24.1, df = 3, p ~ 0; monkey H: χ2 = 40.0, df = 3, p ~ 0; monkey P: χ2 = 23.2, df = 3, p ~ 0), while in the antagonist condition, we could not find significant differences in two monkeys (monkey H: χ2 = 4.2, df = 3, p = 0.24; monkey P: χ2 = 3.8, df = 3, p = 0.29). Though one monkey showed significant difference in the antagonist condition (monkey Y: χ2 = 13.7, df = 3, p = 0.003), both the value of χ2 and the range of standard deviation of the error rates were smaller than in the control condition. These results suggest that the motivation level more changed in response to the amount of reward in the control condition. Conversely, the monkeys were less sensitive to the amount of reward under the administration of WAY100635.
We investigated whether the administration of 5-HT1A receptor antagonist WAY100635 affects the decision-making based on the balance of reward magnitude and workload, or the subsequent reward guided behavior. There were not consistent changes in the choice and the error rate during the decision-making part by 5-HT1A antagonist. In the reward schedule part, however, the error rate and the reaction time in the antagonist condition were significantly larger than those in the control condition in all three monkeys. Furthermore, 5-HT1A antagonist reduced the sensitivity to the amount of reward in the schedule part. The 5-HT1A receptors are rich in anterior temporal lobe which sends the visual information to a value processing area, e.g. orbitofrontal cortex. Anterior or medial prefrontal cortex, which is a part of “basal ganglia–thalamocortical loop circuit” receives the information via anterior temporal lobe . This basal ganglia–thalamocortical loop circuit is important for initiating movement in response to motivationally or emotionally salient stimuli . These results, together with the previous anatomical findings, suggest that 5-HT1A might be involved in maintaining the reward value during the reward-seeking behavior rather than directly controlling the discount factor of the reward value during choice.
The effect of 5-HT1A antagonist on choice behavior
Though there is a report that the reduction in serotonergic function does not always lead to the change in choice behavior , a number of studies suggest that the low 5-HT level leads to the increase in impulsive choice when performing temporal discounting task [1, 3, 29] or the increase in risky choice during gambling task .
We could not find consistent change in the behavior during the decision-making part of our task. Though the reaction time increased in the antagonist condition, our analyses using a behavioral model did not show a consistent change in the discount factor of reward value. A possible reason for the discrepancies between our findings and the previous reports is that the previous studies manipulated the whole 5-HT level in the brain whereas we manipulated only 5-HT1A receptor binding, in addition to the task difference. Our results tentatively suggest the possibility that the discount factor may be regulated by another type of 5-HT receptor at least in the decision-making task of reward magnitude and workload. The second possibility is the habituated decision during the decision-making part. To learn the task, our monkeys had been trained for 12 months or longer. The excessive training might change the choice behavior to be a passive or a stereotyped behavior and become resistant to a pharmacological challenge.
The effect of 5-HT1A antagonist on the reward schedule part
On the other hand, the reaction time and the error rate in the reward schedule part significantly increased. Of the error types, the late error increased among all three monkeys.
There are various possibilities for the results. First, both the prolonged reaction time and the increased late error in the drug condition were induced by the side effects: disturbance in motor response or in perception of color change, etc. Second, the 5-HT1A receptor at the presynaptic site works as autoreceptor . The dorsal raphe is rich in presynaptic 5-HT1A. The antagonism to presynaptic 5-HT1A enhances the 5-HT release. The serotonergic terminal connects to dopaminergic cells that express 5-HT3 receptors and attenuates the dopamine release into the basal ganglia through 5-HT binding to 5-HT3 . These might perturb the activity of the dopamine neurons that are related to the motor action. Third, the enhanced 5-HT release also stimulates 5-HT2A. This subtype is involved in the visual processing , suggesting that the performance of color discrimination might be affected.
However, we found that the error rates could be explained by ECS model as in the previous study , and the model fitting results raised a possibility that WAY100635 reduced the sensitivity to the amount of reward. The error rates predicted by ECS model in the drug condition were not necessarily higher than those in the control condition. The drug induced change in error rates depended on the schedule states to some degree. This suggests that the decline in performance during reward schedule arises from the reduction in motivation in terms of reward sensitivity (insensitive to large reward) rather than the side effect to motor/sensory system.
The hippocampus and entorhinal cortex, which are abundant in 5-HT1A , are involved in the emotional processing. Furthermore, these areas send the information to medial or anterior prefrontal cortex and orbitofrontal cortex. These are the parts of basal ganglia–thalamocortical loop circuit that is involved in reward-based decision-making and goal-directed behavior in response to motivationally or emotionally salient stimuli. We recently found that orbitofrontal cortex serves as the value processing . The perturbation to the signal transmission toward these areas by the antagonism to 5-HT1A in anterior temporal lobe might underlie the declined sensitivity to the amount of reward in reward schedules.
Another line of studies using optogenetics recently showed the context-dependent role of 5-HT neurons. The activity of 5-HT neurons was needed to wait just for a highly probable reward , and it might have a role to switch the behavior from suppression to facilitation under the urgent risk . These studies suggest that 5-HT neurons might be involved in controlling the behavior in response to the certainty of or the proximity to the motivationally salient stimuli. However, the implicated subtypes of postsynaptic receptor were outside the scope of these studies. In the schedule part of our task, it was suggested that the motivation grew as the schedule progressed . Based on those recent studies, the whole level of serotonin releasing might underlie the growing motivation related to oncoming reward. As for the amount of reward, it is a crucial information needed to optimize the behavior to maximize the future reward. This might be modulated through postsynaptic 5-HT1A subtype.
The histochemical remarks on synaptic 5-HT1A presence complicate the understanding of the 5-HT1A functioning. Both the presynaptic and the postsynaptic sites have 5-HT1A. The presynaptic 5-HT1A works as an auto-receptor and involves the regulation of the 5-HT release into the synaptic cleft. The postsynaptic 5-HT1A devotes to signal transmission toward the postsynaptic cell . Since the competing interaction arising from concurrent blocking of pre- and postsynaptic 5-HT1A, the dynamics of WAY100635 effect arises from the balance between its preference to the presynaptic or the postsynaptic 5-HT1A. Furthermore, there is a possibility that the results in the present study were affected by the perturbation of systemic functions which were mediated by peripheral 5-HT1A.
There is also another possibility. One of other 5-HT1A modulating compounds, S-15535, preferentially activates the presynaptic 5-HT1A auto-receptor and works as an antagonist at the postsynaptic 5-HT1A . The vulnerability to social or physical stress was found in rodents when injected with S-15535. The offensive aggressive behavior remarkably decreased . Similarly, the behavioral change during the schedule part under WAY100635 administration seemed to relate to the decrease in the incentive to act in response to emotionally salient stimuli.
On the other hand, it is shown that WAY100635 is a potent dopamine D4 receptor agonist  and α1-adrenoceptor antagonist . Compared with 5-HT1A, the affinity to D4 and α1 receptors are 13% and 11%, respectively . The possibility of behavioral effect via D4 agonism and α1 antagonism was not ruled out in the present study. The selective dopamine D4 agonist A-412997 was found to improve short-term memory and attention . However, the effect to color discrimination remains unknown. It has shown that WAY100635 exerts its hypotensive effect via binding to α1 receptor . It has not shown yet whether and how the systemic effect interferes the color discrimination. Combination use of pure D4 antagonist and pure α1 agonist, use of S-15535, or the sequential administration of 5-HT1A antagonist and agonist might be able to rule out those side effects of WAY100635.
We found that 5-HT1A antagonist WAY100635 changed the behavioral performances during reward schedules, not during decision-making. We did not find consistent changes in decision-making part. However, the sensitivity to the amount of reward in the error rate of the reward schedule part was diminished under WAY100635 administration. The 5-HT1A receptor is densely distributed in hippocampus, entorhinal cortex, temporal pole, and medial prefrontal cortex. These are brain structures which is said to be involved in motivation and emotion, and one of basal ganglia-thalamocortical loop circuit receiving the information from these areas is related to reward-seeking behavior. We speculate that 5-HT1A receptor in these areas could have a role in modulating effortful behavior depended on the amount of reward rather than controlling the discounting of reward value.
Bizot J, Le Bihan C, Puech AJ, Hamon M, Thiébot M (1999) Serotonin and tolerance to delay of reward in rats. Psychopharmacology 146:400–412
Wogar MA, Bradshaw CM, Szabadi E (1993) Effect of lesions of the ascending 5-hydroxytryptaminergic pathways on choice between delayed reinforcers. Psychopharmacology 111:239–243
Schweighofer N, Tanaka SC, Doya K (2007) Serotonin and the evaluation of future rewards: theory, experiments, and possible neural mechanisms. Ann N Y Acad Sci 1104:289–300
Schweighofer N, Bertin M, Shishida K, Okamoto Y, Tanaka SC, Yamawaki S, Doya K (2008) Low-serotonin levels increase delayed reward discounting in humans. J Neurosci 28:4528–4532
Yates JR, Perry JL, Meyer AC, Gipson CD, Charnigo R, Bardo MT (2014) Role of medial prefrontal and orbitofrontal monoamine transporters and receptors in performance in an adjusting delay discounting procedure. Brain Res 1574:26–36
Meyniel F, Goodwin GM, Deakin JW, Klinge C, MacFadyen C, Milligan H, Mullings E, Pessiglione M, Gaillard R (2016) A specific role for serotonin in overcoming effort cost. Elife 8;5. pii:e17282
Stevens JR, Rosati AG, Ross KR, Hauser MD (2005) Will travel for food: spatial discounting in two new world monkeys. Curr Biol 15:1855–1860
Shadmehr R, Huang HJ, Ahmed AA (2016) A representation of effort in decision-making and motor control. Curr Biol 26:1929–1934
Setogawa T, Mizuhiki T, Matsumoto N, Akizawa F, Shidara M (2014) Self-choice enhances value in reward-seeking in primates. Neurosci Res 80:45–54
Setogawa T, Mizuhiki T, Matsumoto N, Akizawa F, Kuboki R, Richmond BJ, Shidara M (2019) Neurons in the monkey orbitofrontal cortex mediate reward value computation and decision-making. Commun Biol. 2:126. eCollection
Miyazaki K, Miyazaki KW, Doya K (2011) Activation of dorsal raphe serotonin neurons underlies waiting for delayed rewards. J Neurosci 31:469–479
Miyazaki KW, Miyazaki K, Doya K (2012) Activation of dorsal raphe serotonin neurons is necessary for waiting for delayed rewards. J Neurosci 32:10451–10457
Miyazaki KW, Miyazaki K, Tanaka KF, Yamanaka A, Takahashi A, Tabuchi S, Doya K (2014) Optogenetic activation of dorsal raphe serotonin neurons enhances patience for future rewards. Curr Biol 24:2033–2040
Evenden JL, Ryan CN (1999) The pharmacology of impulsive behaviour in rats VI: the effects of ethanol and selective serotonergic drugs on response choice with varying delays of reinforcement. Psychopharmacology 146:413–421
Winstanley CA, Theobald DE, Dalley JW, Robbins TW (2005) Interactions between serotonin and dopamine in the control of impulsive choice in rats: therapeutic implications for impulse control disorders. Neuropsychopharmacology 30:669–682
Adriani W, Zoratto F, Romano E, Laviola G (2010) Cognitive impulsivity in animal models: role of response time and reinforcing rate in delay intolerance with two-choice operant tasks. Neuropharmacology 58:694–701
Mori M, Tsutsui-Kimura I, Mimura M, Tanaka KF (2018) 5-HT3 antagonists decrease discounting rate without affecting sensitivity to reward magnitude in the delay discounting task in mice. Psychopharmacology 235:2619–2629
Miyazaki KW, Miyazaki K, Doya K (2012) Activation of dorsal raphe serotonin neurons is necessary for waiting for delayed rewards. J Neurosci 32:10451–10457
Paterson NE, Wetzler C, Hackett A, Hanania T (2012) Impulsive action and impulsive choice are mediated by distinct neuropharmacological substrates in rat. Int J Neuropsychopharmacol 15:1473–1487
Beliveau V, Ganz M, Feng L, Ozenne B, Højgaard L, Fisher PM, Svarer C, Greve DN, Knudsen GM (2017) A high-resolution in vivo atlas of the human brain’s serotonin system. J Neurosci 37:120–128
Hays AV Jr, Richmond BJ, Optican LM (1982) Unix-based multiple-process system, for real-time data acquisition and control. WESCON Conf Proc 2:1–10
Lammertsma AA, Bench CJ, Hume SP, Osman S, Gunn K, Brooks DJ, Frackowiak RS (1996) Comparison of methods for analysis of clinical [11C] raclopride studies. J Cereb Blood Flow Metab 16:42–52
Carson RE, Lang L, Watabe H, Der MG, Adams HR, Jagoda E, Herscovitch P, Eckelman WC (2000) PET evaluation of [(18)F]FCWAY, an analog of the 5-HT(1A) receptor antagonist, WAY-100635. Nucl Med Biol 27:493–497
Tsukada H, Kakiuchi T, Nishiyama S, Ohba H, Harada N (2001) Effects of aging on 5-HT1A receptors and their functional response to 5-HT1A agonist in the living brain: PET study with [carbonyl-11C] WAY-100635 in conscious monkeys. Synapse 15:242–251
Shidara M, Richmond BJ (2002) Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296:1709–1711
Alexander GE, DeLong MR, Strick PL (1986) Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu Rev Neurosci 9:357–381
Heimer L, Switzer RD, Van Hoesen GW (1982) Ventral striatum and ventral pallidum. Components of the motor system? Trends Neurosci 5:83–87
Winstanley CA, Dalley JW, Theobald DEH, Robbins TW (2003) Global 5-HT depletion attenuates the ability of amphetamine to decrease impulsive choice on a delay-discounting task in rats. Psychopharmacology 170:320–331
Crockett MJ, Clark L, Lieberman MD, Tabibnia G, Robbins TW (2010) Impulsive choice and altruistic punishment are correlated and increase in tandem with serotonin depletion. Emotion 10:855–862
Long AB, Kuhn CM, Platt ML (2009) Serotonin shapes risky decision making in monkeys. Soc Cogn Affect Neurosci 4:346–356
Forster EA, Cliffe IA, Bill DJ, Dover GM, Jones D, Reilly Y, Fletcher A (1995) A pharmacological profile of the selective silent 5-HT1A receptor antagonist, WAY-100635. Eur J Pharmacol 281:81–88
Wang HL, Zhang S, Qi J, Wang H, Cachope R, Mejias-Aponte CA, Gomez JA, Mateo-Semidey GE, Beaudoin GMJ, Paladini CA, Cheer JF, Morales M (2019) Dorsal Raphe Dual Serotonin-Glutamate Neurons Drive Reward by Establishing Excitatory Synapses on VTA Mesoaccumbens Dopamine Neurons. Cell Rep. 26:1128.e7–1142.e7
Kometer M, Schmidt A, Jäncke L, Vollenweider FX (2013) Activation of serotonin 2A receptors underlies the psilocybin-induced effects on α oscillations, N170 visual-evoked potentials, and visual hallucinations. J Neurosci 33:10544–10551
Miyazaki K, Miyazaki KW, Yamanaka A, Tokuda T, Tanaka KF, Doya K (2018) Reward probability and timing uncertainty alter the effect of dorsal raphe serotonin neurons on patience. Nat Commun. 9:2048
Seo C, Guru A, Jin M, Ito B, Sleezer BJ, Ho YY, Wang E, Boada C, Krupa NA, Kullakanda DS, Shen CX, Warden MR (2019) Intense threat switches dorsal raphe serotonin neurons to a paradoxical operational mode. Science 363:538–542
Billard T, Le Bars D, Zimmer L (2014) PET radiotracers for molecular imaging of serotonin 5-HT1A receptors. Curr Med Chem. 21:70–81 (review)
de Boer SF, Koolhaas JM (2005) 5-HT1A and 5-HT1B receptor agonists and aggression: a pharmacological challenge of the serotonin deficiency hypothesis. Eur J Pharmacol 526:125–139 (review)
Chemel BR, Roth BL, Armbruster B, Watts VJ, Nichols DE (2006) WAY-100635 is a potent dopamine D4 receptor agonist. Psychopharmacology 188:244–251
Villalobos-Molina R, López-Guerrero JJ, Gallardo-Ortíz IA, Ibarra M (2002) Evidence that the hypotensive effect of WAY 100635, a 5-HT1A receptor antagonist, is related to vascular alpha 1-adrenoceptor blockade in the adult rat. Auton Autacoid Pharmacol 22:171–176
Browman KE, Curzon P, Pan JB, Molesky AL, Komater VA, Decker MW, Brioni JD, Moreland RB, Fox GB (2005) A-412997, a selective dopamine D4 agonist, improves cognitive performance in rats. Pharmacol Biochem Behav 82:148–155
This work was supported by Grant-in-Aid for JSPS Fellows (Grant Number JP15J00709) (FA); Grant-in-Aid for Scientific Research on Priority Areas-System study on higher order brain functions from MEXT of Japan (Grant Number JP17022052) (MS) and JSPS KAKENHI Grant Numbers JP22300138, JP25282246, JP16H03301 (MS); JSPS KAKENHI Grant Number JP26119504 (TM).
Conflict of interest
The authors declare no competing financial interests.
All experimental procedures were approved by Animal Care and Use Committee of the University of Tsukuba, and were in accordance with the Guide for the Care and Use of Laboratory Animals in the University of Tsukuba.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Akizawa, F., Mizuhiki, T., Setogawa, T. et al. The effect of 5-HT1A receptor antagonist on reward-based decision-making. J Physiol Sci 69, 1057–1069 (2019). https://doi.org/10.1007/s12576-019-00725-1
- Value discounting
- Rhesus monkey