Impairments in reinforcement learning do not explain enhanced habit formation in cocaine use disorder

Publication: Psychopharmacology

Lim TV, Cardinal RN, Savulich GJ, Moustafa AA, Robbins TW, Ersche KD

1 August 2019


Drug addiction has been suggested to develop through drug-induced changes in learning and memory processes. Whilst the initiation of drug use is typically goal-directed and hedonically motivated, over time, drug-taking may develop into a stimulus-driven habit, characterised by persistent use of the drug irrespective of the consequences.

Converging lines of evidence suggest that stimulant drugs facilitate the transition of goal-directed into habitual drug-taking, but their contribution to goal-directed learning is less clear.

Computational modelling may provide an elegant means for elucidating changes during instrumental learning that may explain enhanced habit formation.

The research team used formal reinforcement learning algorithms to deconstruct the process of appetitive instrumental learning and to explore potential associations between goal-directed and habitual actions in patients with cocaine use disorder (CUD).

They re-analysed appetitive instrumental learning data in 55 healthy control volunteers and 70 CUD patients by applying a reinforcement learning model within a hierarchical Bayesian framework. They used a regression model to determine the influence of learning parameters and variations in brain structure on subsequent habit formation.

The research showed that poor instrumental learning performance in CUD patients was largely determined by difficulties with learning from feedback, as reflected by a significantly reduced learning rate.

Subsequent formation of habitual response patterns was partly explained by group status and individual variation in reinforcement sensitivity. White matter integrity within goal-directed networks was only associated with performance parameters in controls but not in CUD patients.

The data indicate that impairments in reinforcement learning are insufficient to account for enhanced habitual responding in CUD.

View publication

Computational modelling reveals contrasting effects on reinforcement learning and cognitive flexibility in stimulant dependence and obsessive–compulsive disorder: remediating effects of dopaminergic D2/3 receptor agents

Publication: Psychopharmacology

Kanen JW, Ersche KD, Fineberg NA, Robbins TW, Cardinal RN

20 July 2019


Disorders of compulsivity such as stimulant use disorder (SUD) and obsessive-compulsive disorder (OCD) are characterised by deficits in behavioural flexibility, some of which have been captured using probabilistic reversal learning (PRL) paradigms.

This study used computational modelling to characterise the reinforcement learning processes underlying patterns of PRL behaviour observed in SUD and OCD and to show how the dopamine D2/3 receptor agonist pramipexole and the D2/3 antagonist amisulpride affected these responses.

The researchers applied a hierarchical Bayesian method to PRL data across three groups: individuals with SUD, OCD, and healthy controls. Participants completed three sessions where they received placebo, pramipexole, and amisulpride, in a double-blind placebo-controlled, randomised design.

The researchers compared seven models using a bridge sampling estimate of the marginal likelihood.

The results showed that stimulus-bound perseveration, a measure of the degree to which participants responded to the same stimulus as before irrespective of outcome, was significantly increased in SUD, but decreased in OCD, compared to controls (on placebo).

Individuals with SUD also exhibited reduced reward-driven learning, whilst both the SUD and OCD groups showed increased learning from punishment (nonreward).

Pramipexole and amisulpride had similar effects on the control and OCD groups; both increased punishment-driven learning. These D2/3-modulating drugs affected the SUD group differently, remediating reward-driven learning and reducing aspects of perseverative behaviour, amongst other effects.

The research showed how perseverative tendencies and reward- and punishment-driven learning differentially contribute to PRL in SUD and OCD.

D2/3 agents modulated these processes and remediated deficits in SUD in particular, which may inform therapeutic effects.

View publication

Computational psychopharmacology: a translational and pragmatic approach

Publication: Psychopharmacology

Robbins TW, Cardinal RN

4 April 2019


Psychopharmacology needs novel quantitative measures and theoretical approaches based on computational modelling that can be used to help translate behavioural findings from experimental animals to humans, including patients with neuropsychiatric disorders.

Here, researchers carried out a brief review which exemplifies this approach when applied to recent published studies of the effects of manipulating central dopaminergic and serotoninergic systems in rodents and marmoset monkeys, and possible comparisons with healthy human volunteers receiving systemic agents or patients with depression and schizophrenia.

Behavioural effects of central depletions of dopamine or serotonin in monkeys in probabilistic learning paradigms are characterised further by computational modelling methods and related to rodent and human data.

Several examples are provided of the power of computational modelling to derive new measures and reappraise conventional explanations of regional neurotransmitter depletion and other drug effects, whilst enhancing construct validation in patient groups. Specifically, effects are shown on such parameters as ‘stimulus stickiness’ and ‘side stickiness’, which occur over and above effects on standard parameters of reinforcement learning, reminiscent of some early innovations in data analysis in psychopharmacology.

Computational modelling provides a useful methodology for further detailed analysis of behavioural mechanisms that are affected by pharmacological manipulations across species and will aid the translation of experimental findings to understand the therapeutic effects of medications in neuropsychiatric disorders, as well as facilitating future drug discovery.

View publication

Dopamine D2-like receptor stimulation selectively blocks learning from losses in visual and spatial reversal learning in the rat: behavioural and computational evidence

Publication: Psychopharmacology

Alsiö J, Phillips BU, Sala Bayo J, Nilsson SRO, Calafat-Pla TC, Rizwand A, Plumbridge J, López-Cruz L, Dalley JW, Cardinal RN, Mar AC, Robbins TW

19 June 2019


Dopamine D2-like receptors (D2R) are important drug targets in schizophrenia and Parkinson’s disease, but D2R ligands also cause cognitive inflexibility such as poor reversal learning. The specific role of D2R in reversal learning remains unclear.

Here researchers tested the hypotheses that D2R agonism impairs reversal learning by blocking negative feedback and that antagonism of D1-like receptors (D1R) impairs learning from positive feedback.

Male Lister Hooded rats were trained on a novel visual reversal learning task. Performance on “probe trials”, during which the correct or incorrect stimulus was presented with a third, probabilistically rewarded (50% of trials) and therefore intermediate stimulus, revealed individual learning curves for the processes of positive and negative feedback.

The effects of D2R and D1R agonists and antagonists were evaluated. A separate cohort was tested on a spatial probabilistic reversal learning (PRL) task after D2R agonism.

Computational reinforcement learning modelling was applied to choice data from the PRL task to evaluate the contribution of latent factors.

The team found that D2R agonism with quinpirole dose-dependently impaired both visual reversal and PRL. Analysis of the probe trials on the visual task revealed a complete blockade of learning from negative feedback at the 0.25 mg/kg dose, while learning from positive feedback was intact. Estimated parameters from the model that best described the PRL choice data revealed a steep and selective decrease in learning rate from losses. D1R antagonism had a transient effect on the positive probe trials. They concluded that D2R stimulation impairs reversal learning by blocking the impact of negative feedback.

View publication

Traffic exposures, air pollution and outcomes in pulmonary arterial hypertension: A United Kingdom cohort study analysis

Publication: European Respiratory Journal

Sofianopoulou E, Kaptoge S, Graf S, Hadinnapola C, Treacy CM, Church C, et al.

30 May 201

View publication

Body mass index and all cause mortality in HUNT and UK Biobank studies: linear and non-linear mendelian randomisation analyses

Publication: BMJ

Sun YQ, Burgess S, Staley JR, Wood AM, Bell S, Kaptoge SK, et al.

26 March 2019

View publication

Mendelian Randomization Study of ACLY and Cardiovascular Disease.

Publication: The New England Journal of Medicine

Ference BA, Ray KK, Catapano AL, Ference TB, Burgess S, Neff DR, et al.

14 March 2019

View publication

Association of genetically predicted testosterone with thromboembolism, heart failure, and myocardial infarction: mendelian randomisation study in UK Biobank.

Publication: BMJ

Luo S, Au Yeung SL, Zhao JV, Burgess S, Schooling CM.

6 March 2019

View publication

Assessing the causal association of glycine with risk of cardio-metabolic diseases

Publication: Nature Communications

Wittemans LBL, Lotta LA, Oliver-Williams C, Stewart ID, Surendran P, Karthikeyan S, et al.

5 March 2019

View publication

New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries.

Publication: Nature Genetics

Shrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Melbourne CA, et al.

25 February 2019

View publication

© Copyright - NIHR Cambridge Biomedical Research Centre 2026