Ventral Pallidum for Sustain and Timeout

Previous essays have used Pv (ventral pallidum) as part of the seek and avoidance circuit without exploring it in detail. For this essay, I’m revisiting Pv in more detail for two purposes: first, to check that the simulation’s seek and avoid model is compatible with scientific results about Pv, and second, to understand more on how those internal circuits work.

Timeouts are critical for the food-odor seek circuit to prevent the animal from getting stuck in a trap where it either can’t reach the food, or the food odor has no food. A timeout could simply disable seek and return to the default roaming random walk, or it could actively avoid the current area. When the seek times out, an active avoidance phase is more effective than returning to roaming, because the avoidance moves away from the current false cues and into a distant area more likely to have a new food source.

Diagram illustrating the seek and avoid circuit related to food detection, showing phases of roaming, detecting odor, seeking food, timing out, and avoiding false cues.
State machine for seeking food. When the animal detects an odor, it follows the odor gradient until the animal either finds food or an internal timeout shifts the seek to avoid.

The simulation uses the basal ganglia as a timeout system, specifically Sv (ventral striatum) with Pv that’s interconnected with food-seek motivation based in H.l (lateral hypothalamus). The model uses Ado (adenosine) as a timeout neurotransmitter and S.d2 (striatum projection neuron with D2.i receptor) to signal the timeout. Essay 31 covered the adenosine-S.d2 system in more detail. Essentially, neural activity produces Ado from neurons and neighboring astrocytes. The Ado then activates A2a.s (adenosine G-s coupled receptors) on S.d2, which potentiates S.d2 and increases internal activity in an PKA (protein kinase A) activation chain. As Ado builds up over time, S.d2 activity increases until it triggers a switch from seek to avoid in Pv.

A flowchart illustrating the seek and timeout process in a neural simulation, showing the interactions between 'Ob', 'H.I seek', and 'R1.a' with a 'S.ot/Pv timeout' indicator.
The current simulation model uses the Sv/Pv to timeout seek motivation. H.l (lateral hypothalamus), Ob (olfactory bulb), Pv (ventral pallium), R1.a (anterior hindbrain motor area), S.ot (olfactory tubercule portion of Sv)

The above diagram shows how the current simulation model uses Sv/Pv as a timeout. H.l (lateral hypothalamus) is responsible for seek motivation based on odor input from Ob (olfactory bulb) and it drives roaming search to R1.a (anterior hindbrain motor region). The basal ganglia, represented by S.ot (olfactory tubercle, an olfactory region of Sv) and Pv serve as the timeout function. This essay aims to expand that simple model into a more accurate representation of the Sv/Pv timeout.

Seek and avoid

In neuroscience, seek and avoid are measured with RTPP (real-time place preference) and RTPA (real-time place avoidance) experiments, although these measurements are often interpreted as “valence” instead of actions. Circuits that produce RTPP could contribute to the seek action, and circuits that produced RTPA could produce avoidance. For example, Hb.lm (lateral habenula, medial part) produces RTPA when stimulated and RTPP when inhibited [Stamatakis et al 2016], and Sv, Pv, and H.l produce either RTPP or RTPA, depending on which neurons are stimulated. In Sv, S.d1 (striatum projection neuron with D1.s dopamine receptor) produces RTPP [Soares-Cunha et al 2020], [Tan et al 2024], while S.d2 produces RTPA [Bonnavion et al 2024], but only when stimulated for longer times [Soares-Cunha et al 2020]. Different regions of Sv have flipped seek and avoidance, between S.msh.d (medial shell of Sv, dorsal) and S.msh.v (medial shell of Sv, ventral) [Yao Y et al 2021]. In Pv, glutamate neurons produce RTPA and GABA neurons produce RTPP [Stephenson-Jones et al 2020], which matches H.l, where glutamate produces RTPA [Stamatakis et al 2016] and GABA produces RTPP [Jennings et al 2015], [Siemian et al 2021].

Diagram illustrating the neural circuits involved in the seek and avoid behavior in the brain, showing connections between various components like S.ot, Pv, H.l, Ob, and R1.a.
Simplified seek and avoid timeout circuit. The seek circuit uses H.l as the subthalamic motor region to the R1.a anterior hindbrain motor region. The avoid circuit uses Hb.lm to V.rn raphe also to R1.a. The Sv and Pv basal ganglia switch between the circuits. H.l (lateral hypothalamus), Hb.lm (lateral habenula, medial part), Ob (olfactory bulb), Pv (ventral pallidum), R1.a (anterior hindbrain motor region), S.ot (olfactory tubercle), V.rn (raphe nuclei).

The above diagram shows a simplified timeout and avoid circuit. The blue arrows show the proposed timeout avoid path. The greyed arrows show related connectivity, which are either contextual or for other actions. For example, the H.l glutamate to Hb.lm avoidance is necessary for predator and toxin avoidance such as a looming response from OT (optic tectum) [Lecca et al 2017] or pain responses from R.pb.l (lateral parabrachium) [Phua et al 2021]. Although the H.l is RTPA and also uses Hb.lm as an avoidance action path, it seems less likely to be a seek-timeout path. Because the Sv, Pv, and H.l circuit is also an eating circuit, some of the locomotion is stopping to eat. Some of the Sv and Pv projections to H.l are eating circuits [Root et al 2015], and eating also inhibits Hb.lm avoidance [Hu H et al 2020] because the animal shouldn’t move away from its food.

Hb.lm is a key action node for avoidance, using V.rn (raphe nuclei) to drive avoidance. In zebrafish, this path is exclusively V.mr (median raphe) because the zebrafish Hb.lm only connects to V.mr [Agetsuma et al 2010]. In mammals, the target of Hb.lm is less clear cut because both V.mr and V.dr (dorsal raphe) receive Hb.lm output [Baker et al 2015] and could participate in avoidance.

Pv as a heterogenous area

In this model, Pv is a key decision node. It receives seek-driving input from H.l and A.bl (basolateral amygdala) [Giardino et al 2018], [Heinsbroek et al 2020] and decision and timeout information from S.ot. Pv is defined by the projection of Sv, specifically using tac1 (tachykinin 1 for substance-p neurotransmitter), which S.d1 neurons exhibit. However, the neuron types and origins are heterogenous [Ottenheimer et al 2024], and derive from neighboring regions. In part, Pv derives from Po.l (lateral preoptic area) and H.l neuron types, in part it derives from P.bst (bed nucleus of the stria terminalis, extended amygdala), in part it derives from Pd (global pallidus external) [Ottenheimer et al 2024], and it has some functionality more similar to P.bf (basal forebrain), including ACh (acetylcholine) attention projections.

A diagram illustrating the connections and circuits involved in attention, avoidance, and decision-making within the brain, specifically highlighting the ventral pallidum (Pv), lateral hypothalamus (H.l), and other neural components.
Multiple circuits in Pv, including attention, avoidance, wake, seek, eat, avoidance, selection, and feedback to Sv. A.bl (basolateral amygdala), H.l (lateral hypothalamus), H.stn (sub thalamic nucleus), Hb.lm (lateral habenula, medial), P.epn (entopeduncular nucleus), Pv (ventral pallidum), Pv.a (anterior Pv), Pv.p (posterior Pv), Pv.dl (dorsolateral Pv), Pv.vm (ventromedial Pv), S.d1 (striatum projection neuron with D1.s receptor), S.d2 (striatum projection neuron with D2.i receptor), S.pv (striatum parvalbumin inhibitory neuron), Snr (substantia nigra pars reticulata).

The above diagram shows some of the difficulty by categorizing Pv functions by its output projections. Pv ACh (acetylcholine) projections particularly to A.bl to sustain attention, such as enabling odor seek [Kim R et al 2024], which is a P.bf function. Separate Pv glutamate and GABA projections to Hb.lm produce RTPP and RTPA [Stephenson-Jones et al 2020], which matches theH.l and Po.l function. Projections to H.l are more complex, producing wake [Luo YJ et al 2023] and eating [Palmer et al 2024]. Pv has choice-related output to Vta (ventral tegmentum) [Faget et al 2018], [Palmer et al 2024], which drives seek but is not motivational. Pv also has similar connections to the basal ganglia, similar to the S.d (dorsal striatum) and Pd (dorsal pallidum aka globus pallidus external) connections to H.stn (subthalamic nucleus), Snr (substantia nigra pars reticulata) and P.epn (entopeduncular nucleus aka globus pallidus internal) [Root et al 2015]. However, those Pd-like circuits are restricted to a particular part of Pv.dl (dorsolateral Pv). Finally, like Pd, Pv has “arkypallidal” feedback connections to Sv [Vachez et al 2021].

Decision: selection and commitment

Decision can be decomposed into a selection function and a commitment function. Selection chooses between competing options, such as left or right. Commitment ensures that the selection follows through and is not immediately distracted. Commitment is more important because without commitment, a selection isn’t a decision, while a random selection or a first-arriving selection is a workable decision. In a WTA (winner-take-all) process, the key part is the “take-all” part. Random take-all would also work. The commitment function needs a lockout function (“take-all”) but also a timeout function,e ach of which may be separate circuits.

A flow diagram illustrating the relationship between Sv (ventral striatum) and Pv (ventral pallidum) in a neural circuit, highlighting components like Vta select, Sv lockout, and Hb.lm timeout.
Possible circuit decomposition of decision between selection, lockout, and timeout. Hb.lm (lateral habenula, medial), Pv (ventral pallidum), Sv (ventral striatum), Vta (ventral tegmentum).

The above diagram shows a possible functional decomposition for Pv and decision-making. The Pv to Vta projection is important for the selection process [Palmer et al 2024]. More speculatively, the Pv feedback connection to Pv could provide a lockout function by inhibiting new selections through Sv. A similar circuit may exist in H.sth, which also projects directly to S.d [Williams 2024]. The Pv to Hb.lm projection is more clearly established as an avoidance pathway [Faget et al 2018].

One neuron, two functions

Although selection isn’t the focus of the essay, some learning theory results and some neuroscience measurements show that single S.d2 neurons are possibly serving opposite roles: selecting an action, but then opposing that same action [Hodge and Yttri 2025], [Soares-Cunha et al 2020], or terminating the current activity [Tecuapetla et al 2016]. In the classical model of basal ganglia selection, S.d1 and S.d2 are oppositional: S.d1 promotes an action and S.d2 either opposes the action or promotes an opposite action [Bariselli et al 2019]. In the learning model where DA (dopamine) serves as a teaching signal, DA enhances selected actions when successful and suppresses unsuccessful actions. However, some scientists argue that this learning model doesn’t work for S.d2 if S.d1 and S.d2 are selection with no other function [Lindsey et al 2025]. Some proposals to rescue the learning models include sustaining S.d2 activity after selection [Lindsey et al 2025]

Some prominent results show both S.d1 and S.d2 selecting the winning option [Cui G et al 2013], not opposing each other. However, studies consistently show the stimulating S.d1 makes contralateral turns but stimulating S.d2 makes ipsilateral turns [Conde-Berriozabal et al 2025], which is clearly oppositional. Possibly resolving this conflict, stimulating S.d2 shows a short 1s period of inhibiting Pv and exciting Vta while longer 2s stimulation excites Pv and inhibits Vta [Soares-Cunha et al 2020]. Another study shows short 350ms S.d2 as not producing RTPA, but 2s long S.d2 stimulus does produce RTPA [Hodge and Yttri 2025].

S.d2 neurons produce both GABA and the opioid enkephalin as neurotransmitters [Dai KZ et al 2022]. GABA is a fast neurotransmitter on the order of 3-5ms and only requires electrical AP (action potentials). Enkephalin is a much slower neuropeptide and is released when internal Ca2+ (calcium) and PKA (protein kinase A) levels have risen [Konradi et al 2023], [Hook et al 2008]. PKA levels rise in response to G-s protein coupled receptors like A2a.s (adenosine G-s coupled receptor). Enkephalin requires both action potentials and PKA, likely triggered by A2a.s. This A2a.s PKA signaling needs to overcome D2.i, which inhibits the PKA pathway. Technically, D2.i inhibits AC (adenylyl cyclase), which prevents cAMP accumulation, which prevents PKA. One result of this longer chain is that enkephalin signaling is much slower than GABA and is modulated by other neurotransmitters like DA and Ado.

This dual transmitter system means that a short stimulus might release GABA, while a longer stimulus would release enkephalin. In addition, S.d2 axons contain DOR.i (δ-opioid inhibitory receptor), which can self-inhibit its own GABA release [Steiner and Gerfen 1998]. The longer enkephalin path may disable the faster GABA path. Prolonged S.d2 stimulation produces RTPA and requires active DOR.i in Pv [Soares-Cunha et al 2020]. A similar oppositional fast vs slow transmitter system exists in the H.l to Vta connection, where GABA provides fast inhibition but a slower neurotensin neurotransmitter excites [Patterson et al 2015].

Diagram illustrating the functional decomposition of the ventral pallidum (Pv) and its role in timeout and decision-making circuits, involving interactions with the lateral habenula (Hb.lm) and ventral tegmental area (Vta).
Hypothetical fast and slow multiplexing circuit. The fast path uses GABA through Pv.g to activate DA in Vta for a selection. The slow path uses enkephalin to disinhibit an avoidance action path using Pv glutamate and Hb.lm. DA (dopamine), glu (glutamate), Hb.lm (lateral habenula, medial), Pv (ventral pallidum), Pv.g (Pv GABA neuron), S.d2 (striatum projection neuron with D2.i receptor), Vta (ventral tegmentum).

The above diagram shows hypothetical fast and slow multiplexing circuit with GABA driving the fast selection path and enkephalin driving the slow avoidance path. The fast S.d2 GABA path disinhibits Vta by inhibiting a tonically active Pv GABA interneuron. The slow S.d2 enkephalin path inhibits a distinct tonically active Pv GABA interneuron, which disinhibits the Pv glutamate to Hb.lm avoidance path, and re-inhibits Vta DA. Re-inhibition of Vta DA serves as a lockout of subsequence decisions. Disinhibition of the Hb.lm avoidance enables timeout avoidance. With this temporal multiplexing system, a single S.d2 neuron can serve all three decision functions: selection, lockout, and timeout.

Pv glutamate inputs vs tonic activity

The most prominent Pv inputs from Sv are inhibitory, which raises the question: what are they inhibiting? Either it is inhibiting an excitatory input or it’s inhibiting tonically active neurons. So, the glutamate inputs have an outsized importance because without glutamate or tonic activity, the inhibition has nothing to work against.

In studying the Pv projection to Hb.lm, [Stephenson-Jones et al 2020] inhibited glutamate and GABA neurons to explore the tonic behavior. Inhibiting glutamate did not produce an effect, either RTPP or RTPA, and inhibiting GABA also did not produce an effect. This result suggests that the Pv output neurons are not tonically active, either from their own activity or other internal Pv activity. Without tonic activity, glutamate inputs are necessary to drive output.

The major glutamate inputs are from A.bl, H.l, and H.stn, but the H.stn input is specific to the Pd-like area in Pv.dl [Root et al 2015], so for the purpose of this essay I’m assuming H.stn is restricted to a specific Pv subarea with dorsal basal ganglia function and does not apply to the rest of Pv.

A diagram illustrating the neural connections involving the lateral hypothalamus (H.l), striatal projection neurons (S.d1 and S.d2), and the ventral pallidum (Pv), highlighting their roles in the seek and avoid circuits.
H.l glutamate as powering the Pv. Without H.l input the system is unpowered and has no output. Enk (enkephalin), Glu (glutamate), H.l.ox (lateral hypothalamus orexin), Hb.lm (lateral habenula, medial), Pv.g (ventral pallidum GABA output), Pv.glu (Pv glutamate), S.d1 (striatum projection neuron with D1.s dopamine receptor), S.d2 (striatum projection neuron with D1.i dopamine receptor).

The above diagram shows an hypothetical circuit using H.l.ox (orexin neurons of H.l) as a food search signal that drives both roaming random walk and directed, targeted seek. When the animal is not seeking food because it’s sated or eating, H.l.ox is silent, which unpowers the circuit. My choice of H.l.ox as a glutamate source is hypothetical. H.l has at least 17 glutamate populations [Wang Y et al 2021], including one that implements SLR (subthalamic locomotor region) [Ji C et al 2024], some that project to Hb.lm directly for aversion [Lecca et al 2017], as well as eating-related neurons, and H.l.ox.

I’ve used the enkephalin output from S.d2 because the Hb.lm is the avoidance circuit. The enkephalins receptor DOR.i (δ-opioid receptor) is coupled to inhibitory G-protein and acts primarily presynaptically but does act postsynaptically in Pv [Neuhofer and Kalivas 2023], [Rysztak and Jutkiewicz 2020]. In Pv, stimulating DOR.i inhibits 24% of Pv neurons and excites 13% [Root et al 215]. In an alternative circuit, the S.d2 enkephalin-triggered DOR.i receptor is presynaptic on the glutamate input to Pv.g. Without that glutamate input, the Pv.g neuron is inhibited, which disinhibits the Pv.glu path.

A diagram showing neural pathways related to ventral pallidum circuits. The diagram is divided into four sections with representations of different neuron types and their interactions, including S.d1 neurons interacting with GABA and enkephalin, as well as their connections to the lateral habenula.
Several possible hypothetical slow RTPP and RTPA circuits, focusing on S.d1 opposition to RTPA. S.d1 could directly oppose Pv.glu avoidance with GABA, it could enhance inhibitory interneurons with substance P, or it could inhibit Pv.glu RTPA with dynorphin. Dyn (dynorphin opioid), enk (enkephalin opioid), glu (glutamate), H.l.ox (lateral hypothalamus orexin), Hb.lm (lateral habenula, medial), Pv (ventral pallidum), Pv.g (Pv GABA), Pv.glu (Pv glutamate), S.d1 (striatum projection neuron with D1.s receptor), S.d2 (striatum projection neuron with D2.i receptor), SP (substance-P neurotransmitter), tac1 (tachykinin 1 transcription factor for SP),

Unfortunately, the exact details of the circuits aren’t known yet. It seems reasonable to assume that the S.d1 RTPP path opposes the S.d2 RTPA path using peptides or opioids instead of GABA, but S.d1 produces two additional outputs: the opioid dynorphin with its inhibitory KOR.i (κ-opioid receptor) and the peptide SP (substance P) with its excretory NK1.q (neurokinin 1 with PLC/PKC path). Like enkephalin’s DOR.i receptor, dynorphin’s KOR.i is primarily presynaptic. The above diagram shows three hypothetical circuits, but other more complicated possible circuits exist, including using more tonically active inhibitory GABA interneurons. In particular, S.d1 and S.d2 have auto-receptors for dynorphin and enkephalin respectively, which inhibits their own release of the opioids. Dynorphin is known to self-inhibit S.d1 neurons in Pv [Steiner and Gerfen 1998], which may be its main function. Although I’ve focused on S.d1 and S.d2 neurotransmitters for the slow circuit, another possibility is that a distinct internal Pv mechanism drives the slow avoidance circuit, independent of S.d2 enkephalin and S.d2 dynorphin or SP.

A.bl glutamate

I used H.l.ox as the source of glutamate above, but A.bl is also an important source of glutamate, and inhibiting A.bl can turn odor seek into avoid [Kim R et al 2024], which is exactly the situation here. A.bl is a cortical area, which means it’s more complicated, but has the advantage of supporting sustained, working-memory output. A.bl receives olfactory input from Ob and O.pir (piriform cortex) and outputs glutamate to Pv and to Sv. A.bl has both seek and avoid outputs with distinct projections [Sniffen et al 2024]. A.bl is necessary for conflicting seek and threat, but disabling A.bl does not prevent seek [Hernández-Jaramillo et al 2024]. In addition A.bl receives ACh input from Pv [Root et al 2015]. For this circuit, I’m using the A.bl seek output to serve the same function as H.l did in the previous description. Without A.bl seek input, the seek collapses and turns to avoidance [Kim R et al 2024].

Diagram illustrating the role of the A.bl region in glutamate signaling and its connections to various structures including the olfactory bulb (Ob), ventral pallidum (Pv), and habenula (Hb.lm).
Using A.bl as the primary glutamate source to power the Pv seek and avoidance circuit. A.bl itself is powered by ACh from Pv. A.bl (basolateral amygdala), ACh (acetylcholine), H.l (lateral hypothalamus), Hb.lm (lateral habenula, medial), Ob (olfactory bulb), P.bst (bed nucleus of the stria terminalis, extended amygdala), Pv (ventral pallidum), Pv.g (Pv GABA), Pv.glu (Pv glutamate), Sa (central amygdala), S.d1 (striatum projection neuron with D1.s receptor), S.d2 (striatum projection neuron with D1.i receptor), Sv (ventral striatum).

The ACh input from Pv to A.bl is important to sustaining attention. ACh acts on m1.q (ACh metabotropic G-q coupled receptor) in the A.bl PY (pyramidal) neurons [Unal et al 2015]. Activating m1.q turns the PY neurons into a sustained excitation with an ADP (after-depolarization potential) after receiving both ACh and an AP (action potential) [Unal et al 2015]. ADP turns the PY neuron into an Up state for 7-10 seconds, meaning it’s more easily activated by inputs than its base state. Essentially, ACh converts A.bl firing into working memory or sustained attention.

The Pv ACh neuron inputs include H.l, Sv, and Sa (central amygdala) and P.bst (bed nucleus of the stria terminalis, external amygdala) [Schlingloff et al 2025]. This ACh modulation gives another opportunity to control seek to an odor target. An initial odor detection on the order of 500ms might only trigger sustained seek if ACh is activated by a food-seek drive from H.l and not suppressed by Sv, Sa, or P.bst. Working memory or sustained attention for the odor would require food motivation and an absence of habituation.

Simulation

The main seek path is almost entirely disconnected from the Pv timeout circuitry discussed in the essay. The main seek path is a short, fast path from Ob to V.pt (posterior tuberculum) to MLR (midbrain locomotor region) to R5.rs (mid-hindbrain reticulospinal turning area), represented by Ob to MidSeek to HindMove.

A flowchart illustrating the pathway from the olfactory bulb (Ob) to a central decision-making node ('MidSeek') that connects to the midbrain locomotor region (V.pt-MLR) and subsequently to the hindbrain region (R5.rs) for movement control.
Simulation model for the direct seek path. Ob (olfactory bulb), MLR (midbrain locomotor region), R5.rs (mid-hindbrain reticulospinal motor), V.pt (posterior tuberculum).

An earlier simulation model used S.d as a timeout for an OT orientation circuit, but the S.d lacks the direct avoidance action that Sv has with Hb.lm. However, the Pv and Hb.lm circuit is almost entirely disconnected from the V.pt-MLR, which means that the Pv modulation is quite convoluted.

A diagram illustrating a neural circuit model showing connections between various brain regions, including the olfactory bulb (Ob), midbrain (MidSeek), ventral pallidum (Pv), and areas involved in seeking and avoiding behaviors.
Convoluted avoidance path from PvSeek through HbAvoid to suppress the MidSeek action. A.bl (basolateral amygdala), Ob (olfactory bulb), Pv (ventral pallidum), R1.a (anterior hindbrain motor region), R5.rs (mid-hindbrain motor region), S.ot (olfactory tubercle).

In the avoid circuit, HbAvoid is the key avoidance node, which PvSeek uses for avoidance. An avoidance action needs to stop ongoing action, and to enable a reversal of the current seek direction. In the simulation, MidSeek can reverse its direction if it received an avoid signal. However, I don’t know if any midbrain circuit can reverse direction with an external modulating signal. The most plausible path is from V.mr as the main target of Hb.lm.

If this seek to avoid reversal circuit does exist, it might exist in OT, which does handle both seek and avoid, is used for general left vs right decisions, and receives V.rn input. But for the sake of this essay, I’m avoiding the complexity of revisiting OT and instead assuming that MidSeek can reverse direction on its own.

An alternative is more of a switchboard configuration, where avoidance disables the seek path and enables an odor avoidance path. In animals like the lamprey and fish, Ob directly drives Hb.m for odor chemotaxis, although that path does not exist for mammals, because hippocampus output drives their Hb.m. Using that switchboard model, Pv would use V.rn as the controller to switch between the V.pt seek circuit and the Hb.m odor avoidance chemotaxis. V.rn is essentially part of the Hb.m and R1.a motor circuit, and can project to essentially the entire brain the serotonin and non-serotonin projections.

Hysteresis

The simulation raised the problem of hysteresis again. This time partially because of its simplified PKA and enkephalin model. In this case, the simulation uses a single threshold for deciding to avoid, using PKA and enkephalin rising above a threshold. Unfortunately, when avoidance occurs, the simulation immediately decays the PKA, which drops it below the threshold, curtailing the avoidance and allowing the animal to reenter the failed odor plume. Because the simulation is a program, this problem could be easily fixed by adding a second threshold to disable avoidance, but how could Pv accomplish this hysteresis?

One solution could have Pv blocking any new decision to seek an odor. The S.d2 fast selection phase could be inhibited by low levels of enkephalin. When a new odor triggers S.d2, it would release some level of enkephalin because of the remaining PKA, which might be enough to block a new decision. An alternative solution could use the ACh to A.bl attention circuit. If the lower enkephalin level was still high enough to block ACh attention, it would block a new seek action. This A.bl solution would work especially well if A.bl habituates to an odor if it has no ACh.

References

Agetsuma M., Aizawa H., Aoki T., Nakayama R., M. Takahoko, M. Goto, T. Sassa, R. Amo, T. Shiraki, K. Kawakami, et al. The habenula is crucial for experience-dependent modification of fear responses in zebrafish Nat. Neurosci., 13 (2010), pp. 1354-1356

Baker PM, Mathis V, Lecourtier L, Simmons SC, Nugent FS, Hill S, Mizumori SJY. Lateral Habenula Beyond Avoidance: Roles in Stress, Memory, and Decision-Making With Implications for Psychiatric Disorders. Front Syst Neurosci. 2022 Mar 3;16:826475. 

Bariselli S, Fobbs WC, Creed MC, Kravitz AV. A competitive model for striatal action selection. Brain Res. 2019 Jun 15;1713:70-79. 

Bonnavion, P., Varin, C., Fakhfouri, G., Martinez Olondo, P., De Groote, A., Cornil, A., Lorenzo Lopez, R., Pozuelo Fernandez, E., Isingrini, E., Rainer, Q. and Xu, K., 2024. Striatal projection neurons coexpressing dopamine D1 and D2 receptors modulate the motor function of D1-and D2-SPNs. Nature neuroscience, 27(9), pp.1783-1793.

Conde-Berriozabal, S., Sitja-Roqueta, L., García-García, E., García-Gilabert, L., Sancho-Balsells, A., Fernandez-García, S., Rodriguez-Urgellés, E., Giralt, A., Castañé, A., Rodríguez, M.J. and Alberch, J., 2025. Differential impact of optogenetic stimulation of direct and indirect pathways from dorsolateral and dorsomedial striatum on motor symptoms in Huntington’s disease mice. Experimental Neurology, 383, p.114991.

Cui G, Jun SB, Jin X, Pham MD, Vogel SS, Lovinger DM, Costa RM. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature. 2013 Feb 14;494(7436):238-42. 

Dai, K.Z., Choi, I.B., Levitt, R., Blegen, M.B., Kaplan, A.R., Matsui, A., Shin, J.H., Bocarsly, M.E., Simpson, E.H., Kellendonk, C. and Alvarez, V.A., 2022. Dopamine D2 receptors bidirectionally regulate striatal enkephalin expression: Implications for cocaine reward. Cell reports, 40(13).

Faget, L., Oriol, L., Lee, W.C., Zell, V., Sargent, C., Flores, A., Hollon, N.G., Ramanathan, D. and Hnasko, T.S., 2024. Ventral pallidum GABA and glutamate neurons drive approach and avoidance through distinct modulation of VTA cell types. Nature Communications, 15(1), p.4233.

Giardino WJ, Eban-Rothschild A, Christoffel DJ, Li SB, Malenka RC, de Lecea L. Parallel circuits from the bed nuclei of stria terminalis to the lateral hypothalamus drive opposing emotional states. Nat Neurosci. 2018 Aug;21(8):1084-1095. 

Heinsbroek JA, Bobadilla AC, Dereschewitz E, Assali A, Chalhoub RM, Cowan CW, Kalivas PW. Opposing Regulation of Cocaine Seeking by Glutamate and GABA Neurons in the Ventral Pallidum. Cell Rep. 2020 Feb 11;30(6):2018-2027.e3.

Hernández-Jaramillo, A., Illescas-Huerta, E. and Sotres-Bayon, F., 2024. Ventral pallidum and amygdala cooperate to restrain reward approach under threat. Journal of Neuroscience, 44(23).

Hodge, A. and Yttri, E., 2025. Striatal modulation supports context-specific reinforcement and not action selection. Cell Reports, 44(8).

Hook, V., Toneff, T., Baylon, S. and Sei, C., 2008. Differential activation of enkephalin, galanin, somatostatin, NPY, and VIP neuropeptide production by stimulators of protein kinases A and C in neuroendocrine chromaffin cells. Neuropeptides, 42(5-6), pp.503-511.

Hu, H., Cui, Y. and Yang, Y., 2020. Circuits and functions of the lateral habenula in health and in disease. Nature Reviews Neuroscience, 21(5), pp.277-295.

Jennings JH, Ung RL, Resendez SL, Stamatakis AM, Taylor JG, Huang J, Veleta K, Kantak PA, Aita M, Shilling-Scrivo K, Ramakrishnan C, Deisseroth K, Otte S, Stuber GD. Visualizing hypothalamic network dynamics for appetitive and consummatory behaviors. Cell. 2015 Jan 29;160(3):516-27.

Ji, C., Zhang, Y., Lin, Z., Zhao, Z., Jiao, Z., Zheng, Z., Shi, X., Wang, X., Li, Z., Yu, S. and Qu, Y., 2024. Activation of hypothalamic-pontine-spinal pathway promotes locomotor initiation and functional recovery after spinal cord injury in mice. bioRxiv, pp.2024-11.

Kim, R., Ananth, M.R., Desai, N.S., Role, L.W. and Talmage, D.A., 2024. Distinct subpopulations of ventral pallidal cholinergic projection neurons encode valence of olfactory stimuli. Cell reports, 43(4).

Konradi, C., Macı́as, W., Dudman, J.T. and Carlson, R.R., 2003. Striatal proenkephalin gene induction: coordinated regulation by cyclic AMP and calcium pathways. Molecular brain research, 115(2), pp.157-161.

Lecca S, Meye FJ, Trusel M, Tchenio A, Harris J, Schwarz MK, Burdakov D, Georges F, Mameli M. Aversive stimuli drive hypothalamus-to-habenula excitation to promote escape behavior. Elife. 2017 Sep 5;6:e30697.

Lindsey JW, Markowitz J, Gillis WF, Datta SR, Litwin-Kumar A. Dynamics of striatal action selection and reinforcement learning. Elife. 2025 May 8;13:RP101747.

Luo, Y.J., Ge, J., Chen, Z.K., Liu, Z.L., Lazarus, M., Qu, W.M., Huang, Z.L. and Li, Y.D., 2023. Ventral pallidal glutamatergic neurons regulate wakefulness and emotion through separated projections. Iscience, 26(8).

Neuhofer, D. and Kalivas, P., 2023. Differential modulation of GABAergic and glutamatergic neurons in the ventral pallidum by GABA and neuropeptides. Eneuro, 10(7).

Ottenheimer, D.J., Simon, R.C., Burke, C.T., Bowen, A.J., Ferguson, S.M. and Stuber, G.D., 2024. Single-cell sequencing of rodent ventral pallidum reveals diverse neuronal subtypes with non-canonical interregional continuity. BioRxiv, pp.2024-03.

Palmer D, Cayton CA, Scott A, Lin I, Newell B, Paulson A, Weberg M, Richard JM. Ventral pallidum neurons projecting to the ventral tegmental area reinforce but do not invigorate reward-seeking behavior. Cell Rep. 2024 Jan 23;43(1):113669. 

Patterson CM, Wong JM, Leinninger GM, Allison MB, Mabrouk OS, Kasper CL, Gonzalez IE, Mackenzie A, Jones JC, Kennedy RT, Myers MG Jr. Ventral tegmental area neurotensin signaling links the lateral hypothalamus to locomotor activity and striatal dopamine efflux in male mice. Endocrinology. 2015 May;156(5):1692-700. 

Phua SC, Tan YL, Kok AMY, Senol E, Chiam CJH, Lee CY, Peng Y, Lim ATJ, Mohammad H, Lim JX, Fu Y. A distinct parabrachial-to-lateral hypothalamus circuit for motivational suppression of feeding by nociception. Sci Adv. 2021 May 7;7(19):eabe4323. 

Root DH, Melendez RI, Zaborszky L, Napier TC. The ventral pallidum: Subregion-specific functional anatomy and roles in motivated behaviors. Prog Neurobiol. 2015 Jul;130:29-70. 

Rysztak, L.G. and Jutkiewicz, E.M., 2022. The role of enkephalinergic systems in substance use disorders. Frontiers in Systems Neuroscience, 16, p.932546.

Schlingloff, D., Szabó, Í., Gulyás, É., Király, B., Kispál, R., Stephenson-Jones, M. and Hangya, B., 2025. Most ventral pallidal cholinergic neurons are cortically projecting bursting basal forebrain cholinergic neurons. bioRxiv, pp.2025-02.

Siemian JN, Arenivar MA, Sarsfield S, Aponte Y. Hypothalamic control of interoceptive hunger. Curr Biol. 2021 Sep 13;31(17):3797-3809.e5.

Soares-Cunha C, de Vasconcelos NAP, Coimbra B, Domingues AV, Silva JM, Loureiro-Campos E, Gaspar R, Sotiropoulos I, Sousa N, Rodrigues AJ. Nucleus accumbens medium spiny neurons subtypes signal both reward and aversion. Mol Psychiatry. 2020 Dec;25(12):3241-3255. 

Stamatakis AM, Van Swieten M, Basiri ML, Blair GA, Kantak P, Stuber GD. Lateral Hypothalamic Area Glutamatergic Neurons and Their Projections to the Lateral Habenula Regulate Feeding and Reward. J Neurosci. 2016 Jan 13;36(2):302-11. 

Steiner, H. and Gerfen, C.R., 1998. Role of dynorphin and enkephalin in the regulation of striatal output pathways and behavior. Experimental brain research, 123(1), pp.60-76.

Stephenson-Jones M, Bravo-Rivera C, Ahrens S, Furlan A, Xiao X, Fernandes-Henriques C, Li B. Opposing Contributions of GABAergic and Glutamatergic Ventral Pallidal Neurons to Motivational Behaviors. Neuron. 2020 Mar 4;105(5):921-933.e5. 

Tan, B., Browne, C.J., Nöbauer, T., Vaziri, A., Friedman, J.M. and Nestler, E.J., 2024. Drugs of abuse hijack a mesolimbic pathway that processes homeostatic need. Science, 384(6693), p.eadk6742.

Tecuapetla F, Jin X, Lima SQ, Costa RM. Complementary Contributions of Striatal Projection Pathways to Action Initiation and Execution. Cell. 2016 Jul 28;166(3):703-715. 

Unal, C.T., Pare, D. and Zaborszky, L., 2015. Impact of basal forebrain cholinergic inputs on basolateral amygdala neurons. Journal of Neuroscience, 35(2), pp.853-863.

Vachez YM, Tooley JR, Abiraman K, Matikainen-Ankney B, Casey E, Earnest T, Ramos LM, Silberberg H, Godynyuk E, Uddin O, Marconi L, Le Pichon CE, Creed MC. Ventral arkypallidal neurons inhibit accumbal firing to promote reward consumption. Nat Neurosci. 2021 Mar;24(3):379-390.

Wang Q, Sun RY, Hu JX, Sun YH, Li CY, Huang H, Wang H, Li XM. Hypothalamic-hindbrain circuit for consumption-induced fear regulation. Nat Commun. 2024 Sep 4;15(1):7728.

Williams, M., 2024. Study of the network involving the subthalamic nucleus in various measures of motivation in rats (Doctoral dissertation, Aix-marseille université).

Yao Y, Gao G, Liu K, Shi X, Cheng M, Xiong Y, Song S. Projections from D2 Neurons in Different Subregions of Nucleus Accumbens Shell to Ventral Pallidum Play Distinct Roles in Reward and Aversion. Neurosci Bull. 2021 May;37(5):623-640. 

Essay 31: Striatum as Timeout

Let’s return to the task of essay 16 on give-up time in foraging, which covered food search with a timeout. At first the animal uses a general roaming search and if it smells a food odor, it switches to a targeted seek following the odor with chemotaxis. If the animal finds food in the odor plume, it eats the food, but if it doesn’t find food, it will eventually give up and avoid the local area before returning to the roaming search.

Search state machine. Roam is the starting state, switching to seek when it detects odor, and switching to avoid after a timeout.

For another attempt at the problem, let’s take the striatum (basal ganglia) as implementing the timeout portion of this task using the neurotransmitter adenosine as a timeout signal and incorporating the multiple action path discussion from essay 30 on RTPA. Adenosine is a byproduct of ATP breakdown and is a measure of cellular activity. With sufficiently high adenosine, the striatum switches from the active seek path to an avoidance path. These circuits are where caffeine works to suppress the adenosine timeout, allowing for longer concentration.

Mollusk navigation

As mentioned in essay 30, the mollusk sea slug has a food search circuit with a similar logic to what we need here. The animal seeks food odors when it’s hungry, but it avoids food odors when it’s not hungry [Gillette and Brown 2015].

Mollusk food search circuit, modulated by hunger.
Mollusk food search circuit, illustrating a hunger-modulated switchboard. When the animal is not hungry, the switchboard reverses the odor to motor links turning it away from food.

This essay uses the same idea but replaces the hunger modulation with a timeout. When the timeout occurs, the circuit switches from a food seek action path to a food avoid action path.

Odor action paths

Two odor-following actions paths exist in the lamprey, one using Hb.m (medial habenula) and one using V.pt (posterior tuberculum). The Hb.m path is a chemotaxis path following a temporal gradient. The V.pt path projects to MLR (midbrain locomotor region), but The lamprey Ob.m (medial olfactory bulb) projects to both Hb.m (medial habenula) and to V.pt (posterior tuberculum), which each project to different locomotor paths [Derjean et all 2010], Hb.m to R.ip (interpeduncular nucleus) and V.pt to MLR (midbrain locomotor region). The zebrafish also has Ob projections to Hb and V.pt [Imamura et al 2020], [Kermen et al 2013].

Dual odor-seeking action paths in the lamprey and zebrafish. Hb (habenula), Ob.m (medial olfactory bulb), V.pt (posterior tectum).

Further complicating the paths, the Hb.m itself contains both an odor seeking path and an odor avoiding path [Beretta et al 2012], [Chen et al 2019]. Similarly Hb.m has dual action paths for social winning and losing [Okamoto et al 2021]. So, this essay could use the dual paths in Ob.m instead of contrasting Ob.m with V.pt, but the larger contract should make the simulation easier to follow.

This essay’s simulation makes some important simplifications. The Hb to R.ip path is a temporal gradient path used for chemotaxis, phototaxis and thermotaxis. In a real-world marine environment, odor diffusion and water turbulence is much more complicated, producing more clumps and making a simple gradient ascent more difficult [Hengenius et al 2012]. Because this essay is only focused on the switchboard effect, this simplification should be fine.

Striatum action paths with adenosine timeout

The timeout circuit uses the striatum, which has two paths: one selecting the main action, and the second either stopping the action, or selecting an opposing action [Zhai et al 2023]. The two paths are distinguished by their responsiveness to dopamine with S.d1 (striatal projection with D1 G-s stimulating) or S.d2 (striatal projection with D2 G-i inhibiting) marking the active and alternate paths respectively. This model is a simplification of the mammalian striatum where the two paths interact in a more complicated fashion [Cui et al 2013].

Essay odor seek with timeout circuit. The seek path flows from Ob, through S.d1 to P.v to V.pt. The avoid path flows from Obj, though S.d2 to Pv. to Hb. Ad (adenosine), Hb (habenula), Ob (olfactory bulb), Pv (ventral pallidum), S.d1 (striatum D1 projection neuron), S.d2 (striatum D2 projection neuron), V.pt (posterior tuberculum)

As mentioned, the two actions paths are the seek path from Ob to V.pt and the avoid path from Ob to Hb. For the timeout and switchboard, the Ob has a secondary projection to the striatum. Although this circuit is meant as a proto-vertebrate simplification, Ob does project to S.ot (olfactory tubercle) and to the equivalent in zebrafish [Kermen et al 2013].

The timeout is managed by adenosine, which is a neurotransmitter derived from ATP and a measure of neural activity. The striatum has three sub-circuits for this kind of functionality, which I’ll cover in order of complexity.

S.d1 and adenosine inhibition

The first circuit only uses the direct S.d1 path and adenosine as a timeout mechanism. When the animal follows an odor, the Ob to S.d1 signal enables the seek action. As a timeout, ATP from neural activity degrades to adenosine and the buildup of adenosine is a decent measure of activity over time. The longer the animal seeks, the more adenosine builds up. Of the Ob projection axis contains an A1i (adenosine G-i inhibitory) receptor, the adenosine will inhibit the release of glutamate from Ob, which will eventually self-disable the seek action.

S.d1 action path inhibited by adenosine buildup as a timeout. A1i (adenosine G-i inhibitory receptor), Ad (adenosine), mGlu5q (metabotropic glutamate G-q receptor), Ob (olfactory bulb), S.d1 (D1-type striatal projection neuron)

In practice, the striatum uses astrocytes to manage the glutamate release. An astrocyte that envelops the synapse measures glutamate release with an mGlu5q (metabotropic glutamate with G-q/11 binding) receptor and accumulates internal calcium [Cavaccini et al 2020]. The astrocyte’s calcium triggers an adenosine release as a gliotransmitter, making the adenosine level a timeout measure of glutamate activity. The presynaptic A1i receptor then inhibits the Ob signal. The timeframe is on the order of 5 to 20 minutes with a recovery of about 60 minutes, although the precise timing is probably variable. Interestingly, the time-out is a log function instead of linear measure of activity [Ma et al 2022].

This circuit doesn’t depend on the postsynaptic S.d1 firing [Cavaccini et al 2020], which contrasts with the next LTD (long term depression) circuit which only inhibits the axon if the S.d1 projection neuron fires.

S.d1 presynaptic LTD using eCB

S.d1 self-activating LTD uses retrotransmission to inhibit its own input using eCB (endocannabiniods) as a neurotransmitter. Like the astrocyte in the previous circuit, S.d1 uses a mGlu5q receptor to trigger eCB release, but also require that S.d1 fire, as triggered by NMDA glutamate receptor. The axon receives the eCB retrotransmission with a CB1i (cannabinoid G-i inhibitory) receptor and trigger presynaptic LTD [Shen et al 2008], [Wu et al 2015]. Like the previous circuit, the timeframe seems to be on the order of 10 minutes, lasting for 30 to 60 minutes.

S.d1 LTD circuit. A coincidence of glutamate detection with mGlu5q and S.d1 activation with NMDA triggers eCB release, which activates CB1i leading to presynaptic LTD. CB1i (cannabinoid G-i inhibitory receptor), mGlu5q (glutamate G-q receptor), Ob (olfactory bulb), S.d1 (striatum D1-type projection neuron).

This circuit inhibits itself over time without using adenosine or astrocytes. In the full striatum circuit, high dopamine levels suppress this LTD suppression, meaning that dopamine inhibits the timeout [Shen et al 2008].

The next circuit adds the S.d2 path, which uses adenosine and self-activity to trigger postsynaptic LTD.

S.d2 postsynaptic LTP via A2a.s

Consider a third circuit that has the benefits of both previous circuits because it uses adenosine as a timer managed by astrocytes and is also specific to postsynaptic activity. In addition, it allows for a second action path, changing the circuit from a Go/NoGo system to a Go/Avoid action pair. This circuit uses LTP (long term potentiation) on the S.d2 striatum neurons.

Timeout circuit using postsynaptic LTD at the S.d2 neuron and adenosine as a timeout signal. As adenosine accumulates, it stimulates S.d2, which both disables S.d1 and drives the avoid path. A2a.s (adenosine G-s stimulatory receptor), Ad (adenosine), mGlu5q (glutamate G-q metabotropic receptor), Ob (olfactory bulb), S.d1 (striatum D1-type projection neuron), S.d2 (striatum D2-type projection neuron)

When the odor first arrives, Ob activates the S.d1 path, seeking toward the odor. S.d1 is activated instead of S.d2 because of dopamine. In this simple model, the Ob itself could provide the initial dopamine like c. elegans odor-detecting neurons or the tunicate’s coronal cells or the dual glutamate and dopamine neurons in Vta (ventral tegmental area).

As time goes on, adenosine from the astrocyte builds up, which activates the S.d2 A2s.a (adenosine G-s stimulatory receptor) until it overcomes dopamine suppression and increases the S.d2 activity with LTP [Shen et al 2008]. Once S.d2 activates, it suppresses S.d1 [Chen et al 2023] and drives the avoid path.

The combination of these circuits looks like it’s precisely what the essay needs.

Simulation

In the simulation, when the animal is hunting food and finds a food odor plume, it directly seeks toward the center and eats if it find food. In the screenshot below, the animal is eating.

Simulation showing the animal eating food after seeking the odor plume.

Satiation disables the food seek. This might sound obvious, but hunger gating of food seeking requires specific satiety circuits to any seek path that’s food specific, which means the involvement of H.l (lateral hypothalamus) and related areas like H.arc (arcuate hypothalamus) and H.pv (periventricular hypothalamus). And, of course, the simulation requires simulation code to only enable food odor seek when the animal is searching for food.

The next screenshot shows the central problem of the essay, when the animal seeks a food odor but there’s no food at the center.

Screenshot showing the animal stuck in the middle of the food odor plume before the timeout.

Without a timeout, the animal circles the center of the food odor plume endlessly. After a timeout, the animal actively leaves the plume and avoid that specific odor until the timeout decays.

Screenshot showing the animal escaping from the odor plume after the timeout.

This system is somewhat complex because of the need for hysteresis. A too-simple solution with a single threshold can oscillate, because as soon as the animal starts leaving the timeout decays, which then re-enables the food-seek, which then quickly times out, repeating. Instead, the system needs to make re-enabling of the food seek more difficult after a timeout.

But that adds a secondary issue because if food seek is a lower threshold, then the sustain of seek needs to raise the threshold while the seek occurs. So, the sustain of seek needs a lower threshold than starting seek. This hysteresis and seek sustain presumably needs to be handled by the actual striatum circuit.

Discussion

I think this essay shows that using the stratum for an action timeout for food seek is a plausible application. The circuit is relatively simple and is effective, improving search by avoiding failed areas.

However, the simulation does raise some issues, particularly hysteresis problem. If the striatum does provide a timeout along these lines, it must somehow solve the hysteresis problem. While the animal is seeking, the ongoing LTP/LTD inhibition should use a high threshold to stop seeking, but once avoidance starts, there needs to be a high threshold to return to seeking to avoid oscillations between the two action paths.

Because LTD/LTP is a relatively long chemical process (minutes) internal to the neurons, as opposed to an instant switch in the simulation, the delay itself might be sufficient to solve the oscillation problem. It’s also possible that some of the more complicated parts of the circuit, such as P.ge (globus pallidus) and its feedback to the striatum or H.stn (subthalamic nucleus) might affect the sustain of seek or breaking it and so control the hysteresis problem.

The simulation also reinforced the absolute requirement that action paths need to be modulated by internal state like hunger. For the seek paths, both Hb.m and V.pt are heavily modulated by H.l and other hypothalamic hunger and satiety signals.

As expected, the simulation also illustrated the need for context information separate from the target odor. While the food odor is timed out, the animal can’t search the other odor plume because this essay’s animal can’t distinguish between the odor plumes, and therefore avoids both odors. With a long timeout and many odor plumes, this delays the food search. A future enhancement is to add context to the timeout. If the animal can timeout a specific odor plume, it can search alternatives even if the food odor itself is identical.

References

Beretta CA, Dross N, Guiterrez-Triana JA, Ryu S, Carl M. Habenula circuit development: past, present, and future. Front Neurosci. 2012 Apr 23;6:51. 

Cavaccini A, Durkee C, Kofuji P, Tonini R, Araque A. Astrocyte Signaling Gates Long-Term Depression at Corticostriatal Synapses of the Direct Pathway. J Neurosci. 2020 Jul 22;40(30):5757-5768. 

Chen JF, Choi DS, Cunha RA. Striatopallidal adenosine A2A receptor modulation of goal-directed behavior: Homeostatic control with cognitive flexibility. Neuropharmacology. 2023 Mar 15;226:109421. 

Chen WY, Peng XL, Deng QS, Chen MJ, Du JL, Zhang BB. Role of Olfactorily Responsive Neurons in the Right Dorsal Habenula-Ventral Interpeduncular Nucleus Pathway in Food-Seeking Behaviors of Larval Zebrafish. Neuroscience. 2019 Apr 15;404:259-267. 

Cui G, Jun SB, Jin X, Pham MD, Vogel SS, Lovinger DM, Costa RM. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature. 2013 Feb 14;494(7436):238-42.

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21;8(12):e1000567. 

Gillette R, Brown JW. The Sea Slug, Pleurobranchaea californica: A Signpost Species in the Evolution of Complex Nervous Systems and Behavior. Integr Comp Biol. 2015 Dec;55(6):1058-69. 

Hengenius JB, Connor EG, Crimaldi JP, Urban NN, Ermentrout GB. Olfactory navigation in the real world: Simple local search strategies for turbulent environments. J Theor Biol. 2021 May 7;516:110607.

Imamura F, Ito A, LaFever BJ. Subpopulations of Projection Neurons in the Olfactory Bulb. Front Neural Circuits. 2020 Aug 28;14:561822. 

Kermen F, Franco LM, Wyatt C, Yaksi E. Neural circuits mediating olfactory-driven behavior in fish. Front Neural Circuits. 2013 Apr 11;7:62.

Ma L, Day-Cooney J, Benavides OJ, Muniak MA, Qin M, Ding JB, Mao T, Zhong H. Locomotion activates PKA through dopamine and adenosine in striatal neurons. Nature. 2022 Nov;611(7937):762-768.

Okamoto H, Cherng BW, Nakajo H, Chou MY, Kinoshita M. Habenula as the experience-dependent controlling switchboard of behavior and attention in social conflict and learning. Curr Opin Neurobiol. 2021 Jun;68:36-43. 

Shen W, Flajolet M, Greengard P, Surmeier DJ. Dichotomous dopaminergic control of striatal synaptic plasticity. Science. 2008 Aug 8;321(5890):848-51. 

Wu YW, Kim JI, Tawfik VL, Lalchandani RR, Scherrer G, Ding JB. Input- and cell-type-specific endocannabinoid-dependent LTD in the striatum. Cell Rep. 2015 Jan 6;10(1):75-87. 

Zhai S, Cui Q, Simmons DV, Surmeier DJ. Distributed dopaminergic signaling in the basal ganglia and its relationship to motor disability in Parkinson’s disease. Curr Opin Neurobiol. 2023 Dec;83:102798.

Essay 22 issues: subthalamic nucleus simulation

The essay 22 simulation explored a striatum model where the two decision paths competed: odor seeking vs random exploration, using dopamine to bias between exploration and seeking. This model resembled striatum theories like [Bariselli et al. 2020] that consider the stratum’s direct and indirect paths as competing between approach and avoidant actions.

Issues in essay 22 include both neuroscience divergence and simulation problems. Although the simulation is a loose functional model, that laxity isn’t infinite and it may have gone too far from the neuroscience.

Adenosine and perseveration

Seeking and foraging have a perseveration problem: the animal must eventually give up on a failed cue, or it will remain stuck forever. The give-up circuit in essay 22 uses the lateral habenula (Hb.l) to integrate search time until it reaches a threshold to give up. An alternative circuit in the stratum itself involves the indirect path (S.d2), the D2 dopamine receptor and adenosine, with a behaviorally relevant time scale.

When fast neurotransmitters are on the order of 10 milliseconds, creating a timeout on the order of a few minutes is a challenge. Two possible solutions in that timescale are long term potentiation (LTP) where “long” means about 20 minutes, and astrocyte calcium accumulation, which is also about 10 to 20 minutes.

Adenosine receptors (A2r) in the striatum indirect path (S.d2) measure broad neural activity from ATP byproducts that accumulate in the intercellular space. Over 10 minutes those A2r can produce internal calcium ion (Ca) in the astrocytes or via LTP to enhance the indirect path. Enhancing the indirect path (exploration), eventually causes a switch from the direct path (seeking) to exploration, essentially giving-up on the seeking.

Ventral striatum

Although the essay models the dorsal striatum (S.d), the ventral striatum (S.v aka nucleus accumbens) is more associated with exploration and food seeking. In particularly, the olfactory path for food seeking goes through S.v, while midbrain motor actions use S.d. In salamanders, the striatum only processes midbrain (“collo-“) thalamic inputs, while olfactory and direct senses (“lemno-“) go to the cortex [Butler 2008]. Assuming the salamander path is more primitive, the essay’s use of S.d in the model is a likely mistake.

But S.v raises a new issue because S.v doesn’t use the subthalamus (H.stn) [Humphries and Prescott 2009]. Although, that model only applies to the S.v shell (S.sh) not the S.v core (S.core).

Ventral striatum pathway. MLR midbrain locomotive region, P.v ventral pallidum, S.sh ventral striatum shell, Vta ventral tegmental area.

In the above diagram of a striatum shell circuit, an odor-seek path is possible through the ventral tegmental area (Vta) but there is no space for an alternate explore path.

Low dopamine and perseveration

[Rutledge et al. 2009] investigates dopamine in the context of Parkinson’s disease (PD), which exhibits perseveration as a symptom. In contrast to the essay, PD is a low dopamine condition, and adding dopamine resolves the perseveration. But that resolve is the opposite of essay 22’s dopamine model, where low dopamine resolved perseveration.

Now, it’s possible that give-up perseveration and Parkinson’s perseveration are two different symptoms, or it’s possible that the complete absence of dopamine differs from low tonic dopamine, but in either case, the essay 22 model is too simple to explain the striatum’s dopamine use.

Dopamine burst vs tonic

Dopamine in the striatum has two modes: burst and tonic. Essay 22 uses a tonic dopamine, not phasic. The striatum uses phasic dopamine to switch attention to orient to a new salient stimulus. The phasic dopamine circuit is more complicated than the tonic system because it requires coordination with acetylcholine (ACh) from the midbrain laterodorsal tegmentum (V.ldt) and pedunculopontine (V.ppt) nuclei.

A question for the essays is whether that phasic burst is primitive to the striatum, or a later addition, possibly adding an interrupt for orientation to an earlier non-interruptible striatum.

Explore semantics

The word “explore” is used differently by behavioral ecology and in reinforcement learning, despite both using foraging-like tasks. These essays have been using explore in the behavioral ecology meaning, which may cause confusion on the reinforcement learning sense. The different centers on a fixed strategy (policy) compared with changing strategies.

In behavioral ecology, foraging is literal foraging, animals browsing or hunting in a place and moving on (giving up) if the place doesn’t have food [Owen-Smith et al. 2010]. “Exploring” is moving on from an unproductive place, but the policy (strategy) remains constant because moving on is part of the strategy. The policy for when to stay and when to go [Headon et al. 1982] often follows the marginal value theorem [Charnov 1976], which specifies when the animal should move on.

In contract, reinforcement learning (RL) uses “explore” to mean changing the policy (strategy). For example, in a two-armed bandit situation (two slot machines), the RL policy is either using machine A or using machine B, or a fixed probabilistic ratio, not a timeout and give-up policy. In that context, exploring means changing the policy not merely switching machines.

[Kacelnick et al. 2011] points out that the two-choice economic model doesn’t match vertebrate animal behavior, because vertebrates use an accept-reject decision [Cisek and Hayden 2022]. So, while the two-armed bandit may be useful in economics, it’s not a natural decision model for vertebrates.

Avoidance (nicotinic receptors in M.ip)

The simulation uncovered a foraging problem, where the animal remained around an odor patch it had given up on, because the give-up strategy reverts to random search. Instead, the animal should leave the current place and only resume search when its far away.

Path of simulated animal after giving up on a food odor.

In the diagram above, the animal remains near the abandoned food odor. The tight circles are the earlier seek before giving up, and the random path afterwards is the continued search. A better strategy would leave the green odor plume and explore other areas of the space.

As a possible circuit, the habenula (Hb.m) projects to the interpeduncular nucleus (M.ip) uses both glutamate and ACh as neurotransmitters, where ACh amplifies neural output. For low signals without ACh, the animal approaches the object, but high signals with ACh switch approach to avoidance. This avoidance switching is managed by the nicotine receptor (each) which is studied for nicotine addiction [Lee et al. 2019].

An interesting future essay might explore using nicotinic aversion to improve foraging by leaving an abandoned odor plume.

References

Bariselli S, Fobbs WC, Creed MC, Kravitz AV. A competitive model for striatal action selection. Brain Res. 2019 Jun 15;1713:70-79.

Butler, Ann. (2008). Evolution of the thalamus: A morphological and functional review. Thalamus & Related Systems. 4. 35 – 58.

Charnov, Eric L. Optimal foraging, the marginal value theorem. Theoretical population biology 9.2 (1976): 129-136.

Cisek P, Hayden BY. Neuroscience needs evolution. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14;377(1844):20200518.

Headon T, Jones M, Simonon P, Strummer J (1982) Should I Stay or Should I Go. On Combat Rock. CBS Epic.

Humphries MD, Prescott TJ. The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Prog Neurobiol. 2010 Apr;90(4):385-417.

Kacelnik A, Vasconcelos M, Monteiro T, Aw J. 2011. Darwin’s ‘tug-of-war’ vs. starlings’ ‘horse-racing’: how adaptations for sequential encounters drive simultaneous choice. Behav. Ecol. Sociobiol. 65, 547-558.

Lee HW, Yang SH, Kim JY, Kim H. The Role of the Medial Habenula Cholinergic System in Addiction and Emotion-Associated Behaviors. Front Psychiatry. 2019 Feb 28

Owen-Smith N, Fryxell JM, Merrill EH. Foraging theory upscaled: the behavioural ecology of herbivore movement. Philos Trans R Soc Lond B Biol Sci. 2010 Jul 27;365(1550):2267-78. 

Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson’s patients in a dynamic foraging task. J Neurosci. 2009 Dec 2