THE NEED FOR ROBUST APPRAISAL OF RESEARCH IN ARTS AND HEALTH: STATISTICALLY SOPHISTICATED BUT SUBSTANTIVELY SUPERFICIAL (GUEST BLOG)

Academic paperBlogguest blogParticipatory art and ...Policy

25 Jul

Written By

A GUEST BLOG BY DR STEPHEN CLIFT

In Stephen's penultimate guest blog, for now, he provocatively argues that, sometimes at least, 'research in arts and health can produce findings that are banal, trivial or spurious'. His final guest blog in this series will be published tomorrow.

Stephen Clift (BA, PhD, PFRSPH) is Professor Emeritus, Canterbury Christ Church University, and former Director of the Sidney De Haan Research Centre for Arts and Health. He is a Professorial Fellow of the Royal Society for Public Health (RSPH) and is also Visiting Professor in the International Centre for Community Music, York St John University. Stephen has worked in the field of health promotion and public health for over thirty years, and has made contributions to research, practice and training on HIV/AIDS prevention, sex education, international travel and health and the health promoting school in Europe. His interests relate to arts and heath and particularly the potential value of group singing for health and wellbeing. Stephen is one of the founding editors of the journal Arts & Health: An international journal for research, policy and practice. He was the founding Chair of the RSPH Special Interest Group for Arts, Health and Wellbeing, and a founding trustee of Arts Enterprise with a Social Purpose (AESOP). He is also co-editor with Professor Paul Camic of the Oxford Public Health Textbook on Creative Arts, Health and Wellbeing published in November 2015. Currently, he is working on developing a series of provocations in arts and health research. a special collection of critical papers on arts and health with Frontiers in Psychology, and a special issue of the International Journal for Community Music on the impacts of the COVID-19 pandemic.

THE NEED FOR ROBUST APPRAISAL OF RESEARCH IN ARTS AND HEALTH: STATISTICALLY SOPHISTICATED BUT SUBSTANTIVELY SUPERFICIAL

Introduction

The paper by Burns and Van De Meer paper, discussed in the previous post, may well be a ‘strawman’ as the limitations and weaknesses are easy to identify – but the serious question raised is whether the criticisms made of this paper can be made more widely in relation to the growing literature on arts and health?

It may well be heretical to turn attention now to the work of Daisy Fancourt and her colleagues in this regard, but I have similar concerns in relation to a series of papers which address the role of ‘artistic creative activities’ in ‘regulating emotions’. The charge that research in arts and health can produce findings that are banal, trivial or spurious deserves to be tested through critical scrutiny of ‘cutting-edge scientific research’ in the field. The work of Fancourt also raises further questions regarding the ways in which ‘artistic creative activities’ and ‘emotion regulation’ are conceptualised and ‘measured’.

The work of Fancourt and colleagues on artistic creative activities and emotion regulation

Fancourt, Garnett, Spiro, West and Müllensiefen (2019) published a paper in the prestigious PLoS One journal, describing the development and validation of what they called the ‘Emotion Regulation Strategies for Artistic Creative Activities Scale (ERS-ACA)’. Subsequent papers explore the use of this scale in comparing people with and without symptoms of ‘depression’ (Fancourt and Ali, 2019), with participants in live vs virtual singing experiences (Fancourt and Steptoe, 2019), and to look at the role of demographic and other factors (Fancourt, Garnett and Müllensiefen, 2020). These papers are not discussed here but deserve the same critical scrutiny as the core paper.

Data for the Fancourt et al. paper were collected as part of The Great British Creativity Test ‘a large Citizen Science study’ undertaken in collaboration with the BBC. Just under 48,000 people took part in an online survey in which they identified ‘artistic creative activities’ they were involved in and then responded to a range of statements designed to assess the extent to which engagement in a chosen activity served to ‘regulate’ their emotions. I will outline the Fancourt, et al. (2019) paper briefly and then make some critical comments.

Summary of Fancourt, Garnett, Spiro, West and Müllensiefen (2019) The paper begins with a wide-ranging and sophisticated review of previous literature on ‘emotional regulation strategies’ and their relationship to ‘creative activities’ before arriving at the paper’s principal aim: ‘…the development and validation of a new model and scale to measure the Emotion Regulation Strategies for Artistic Creative Activities (ERS-ACA).’ (p.7)The items used for scale development were ‘created’ by three of the authors:

‘Three researchers (DF, CG and NS) independently proposed 4-5 inventory items for each of the ERSs identified in previous studies (including acceptance, concentration, discharge, distraction, perceived sense of self, problem solving, reappraisal, reflection, rumination and suppression). Each item was worded so that it could be endorsed to varying degrees on a rating scale.’ (p.7)

This process produced 170 ‘questions’ which were progressively reduced firstly through discussion (to 45 items), and then empirically through survey one (to 31 items) and survey two (to just 18 items). While the authors indicate that both positive and negative statements were generated, only two negative statements are included in the 31 items that made it through to the second survey, and both are dropped following analysis. Thus all 18 items in the final questionnaire are positively worded. The final form of the scale was then assessed in terms of validity and reliability through survey three.From an ‘emotions’ point of view, the final set of items refer to ‘anxiety’ or ‘worry’ (four items) or ‘unwanted feelings’ (two items), ‘negative things’ (one item), ‘things that are bothering me,’ (one item), ‘things that are on my mind’ (one item) or ‘what is going on in my life’ (one item). While earlier lists of items included reference to ‘stress,’ ‘anger’ and ‘sadness or misery’ these were pruned away as the development of the scale progressed. The final scale therefore is essentially concerned to the extent to which participants experienced artistic creative activities as reducing anxiety or worry or other unspecified negative feelings.

In studies one and two, respondents were asked to rate their disagreement-agreement with the emotion regulation statements ‘when engaging in [name the artistic creative activity].’ The quotation here comes from Supplementary file S1 associated with the paper, but it is not clear how the ‘artistic creative activity’ to be rated was specified, and as the original questionnaire is no longer available to view online, this cannot be checked. Uncertainty is further compounded by the fact that in a further paper, on ‘depression’ and emotional regulation strategies, Fancourt and Ali (2019) state that they ‘asked participants to focus on the creative activity they felt was most effective at regulating their emotions.’ (p.2)

Factor analysis of the ratings of emotion items (for all participants, and for ratings across all 17 creative activities) revealed a ‘hierarchical’ factor structure with one major factor and three sub-factors.

From study two, Table 2 (p.11) reports on the 17 different ‘favourite creative’ activities as a percentage of participants endorsing them (N=47,924). Singing, for example, was the most frequently endorsed favourite activity (12.4%), followed by painting, drawing, printmaking or sculpture (12.2%) and gardening (12.0%), down to the least frequently endorsed activities – making films or videos (0.7% - N=335), and learning or practicing magic tricks or circus skills (0.2% - N = 96). Table 3 reports the final factor analysis and shows that the ‘general factor’ is strongly and consistently defined by all 18 items (coefficients range from 0.48 to 071). The three sub-factors are defined by distinct sets of seven, six and five items respectively.

Study three reports evidence on the validity and reliability of the total and sub-scales.The main problems with this study

The introductory review of theory and research provides no discussion of the character and varieties of human emotion. It is surprising to find no reference to Darwin, nor to post-Darwinian research, pioneered by Ekman. There continues to be debate over the notion of ‘basic emotions’ – but Ekman in his Atlas of Emotions (atlasofemotions.org ) focuses on ‘five universal emotions’ - fear, anger, disgust, sadness and enjoyment (happiness) – four of which are negative and only of which is positive. The emphasis on ‘worry’ and ‘anxiety’ in the Fancourt et al. paper, is thus limited in terms of emotional range, but this focus points to personal issues that generate a combination fear, anger and sadness. Also, when a person is worried or anxious, they are worried or anxious about someone or something – but this study gives no attention to the context, content and extent of people’s worries.
There are also ‘emotions’ or ‘feelings’ other than anxiety or worry that might be have been considered, emotions which can involve considerable energy and distress: jealousy and envy, or shame and guilt, or love and hate (all of which are object-directed) – but none of these feature in the discussion.
The category of ‘artistic creative activities’ employed in the study is very broad and includes not only obvious examples of arts activities, such as singing, painting and dancing – but extends to include reading novels, cooking and baking, and gardening, and even ‘learning and performing magic tricks’. The breadth of coverage is surely questionable and is particularly problematic as the analysis of ‘emotion regulation strategies’ is conducted on the complete set of ratings gathered across all of the specific activities identified.
The items for the scale were not derived empirically from interviews or open-ended surveys of people engaged in creative activities in which they were asked about their ‘strategic use’ of such activities in managing their ‘emotions’. Rather, they were ‘invented’ by three of the authors.
The authors say that in the process of creating items for the scale, both positive and negative statements were formulated. However, the details provided of items in the paper show that very few negative items were used in the first survey and that these did not survive pruning following analysis. The final scale therefore consists of 18 positive statements which respondents are required to rate on a five-point scale from ‘strongly disagree’ to ‘strongly agree’. As such, the scale can be regarded as a set of ‘leading’ statements, which are likely to generate answers based on a general positive bias towards the activity the respondent has in mind, moderated only by their inclination to give ‘agree’ or ‘strongly agree’ responses.
A further issue to consider is the extent to which participants in the study were seeking to ‘regulate their emotions’ in the way implied by the title of the questionnaire and the content of the questionnaire items. When people engage in gardening, or when they attend a group to sing, or when they cook – are they actively following ‘a strategy’ to address worries they might have? Surely, they are first and foremost pursing these activities because they value and enjoy them.
Furthermore, it is not at all clear to what extent people participating in the survey were beset with ‘emotional’ issues they might seek to address. Consider, for example, the item – ‘When engaging in [name the creative artistic activity] I can shake off any anxieties in my life.’ It might be argued that everyone has anxieties in their lives at some point – and so the questionnaire is requiring respondents to answer ‘in general’ from their experience of the activity (whatever it is they have in mind) over an undefined period, on the impact it has on their unspecified anxieties. It is relevant to ask how widespread anxiety is in the general population? What proportion of participants in this survey may have had recent experience of anxieties against which to judge the effects of their activity? In the UK, the National Office of Statistics regularly gathers information on anxiety experienced ‘yesterday’ as part of the National Indicators of Wellbeing surveys. Data gathered over the last eight years, shows that approximately 40% of the population reported little or no anxiety the day before, and a further 23% reported very low levels of anxiety. On the other hand, approximately 20% of respondents reported relatively high levels of anxiety.
Apart from negative feelings and emotions, however, perhaps the most surprising feature of the final scale is that that none of the statements relate to positive feelings of happiness, enjoyment, pleasure or feelings of calm and relaxation – all of which are likely to be generated when people engage in artistic creative activities – and all of which are central candidates as mechanisms through which creative engagement may serve to counter negative feelings (see Clift et al. (2009) for a discussion).
The surveys were conducted online, and consequently participants were a convenience sample with obvious sources of bias and non-representativeness: primarily white, well-educated, partnered, financially secure and of course motivated to engage with an online survey. Indeed, from the demographic details reported, the study counts as WEIRD research (as respondents are primarily Western, Educated, and from an Industrialized, Rich, and Democratic country) (Hendricks et al., 2019). None of the evidence on ‘validity’ and ‘reliability’ presented in the study addresses, or indeed removes or compensates for, the biases involved.

Looking now at the outcomes of the study, there are four features of the findings, which, in my opinion are banal (and even trivial and spurious):

All the items load on a single general factor. The reasons for this are simple – all items can be read as expressing a ‘positive evaluation’ of the activity participants had in mind, and many of the items are closely synonymous. This is particularly clear for items 1 ‘…I can block out any unwanted thoughts and feelings’ and 7 ‘…it redirects my attention, so I forget unwanted thoughts and feelings’. It is hardy surprising that these two items are strongly correlated. Of course, not all items are as closely synonymous as these – and as a result, after the general factor is accounted for, three ‘residual’ factors do emerge, which appear to be reasonably interpreted, and reflect themes that have often emerged in the arts and health literature. But these ‘sub-factors’ are described at one point as ‘correlated’ and at another as ‘orthogonal’ – and no information is given on the variance accounted for by each factor once the general factor is accounted for.
Supplementary table S3 reports the ‘norms’ for the full scale and gives means for the 10^th to 90^th centiles in the distribution. The means given are clearly an average of ratings for the 18 items making up the scale, with a scale score of ‘1’ meaning that all items were strongly disagreed with, and ‘5’ indicating at all items were strongly agreed with. A score of ‘3’ would mean that respondents gave ratings clustered around a ‘neither agree nor disagree’ response. What is clear from table S3 is that the lowest 10% of scores are below a rating of ‘3r’ but for all other centiles, the mean scores are between the neutral to the positive side of the rating scale. Above the 50^th percentile, the mean rating is ‘4’ (rounded up to the nearest whole value) and changes are very slight from centile band to centile band. Not surprisingly, the movement across the range shows increasing levels of agreement – but essentially the bulk of the variability is on the positive side of the scale.
Supplementary table S2 reports the means on the total scale and subscales for each ‘favourite activity.’ Irrespective of the activity the mean scores on the first factor are between 3 and 4, which indicates that respondents are ‘agreeing’ that the activity helps them in the terms expressed by the scale items – i.e. they help to ‘reduce negative feelings’ including worry and anxiety. No information is given on the proportions of participants who disagreed with the items in the scale i.e. who said that the activity they had in mind did not help them reduce anxiety or worry.
Comparisons between different activities are listed in table S2 and show little or no differentiation. There appears, for example, to be no difference on the general factor between ‘singing’ and ‘gardening’ on the effects reported. More generally, there is a sense that irrespective of the activity, people gave positive feedback on the ‘beneficial’ effects they experienced from participation. No analysis is reported on whether any of the differences between activities are ‘significant’ – and I suspect that even if statistically ‘significant’ differences emerged, they would be trivial, and a function of the large samples involved. In addition, there appears to be no differentiation with respect to the specific ‘strategies.’ Might one think, for example, that some activities would be more effective than others in relation to ‘avoidance’ but in general, respondents appear to report that they engage in similar levels of avoidance, approach and self-development simultaneously, for all activities.

Summing upIn terms of conceptual, theoretical, methodological and analytical sophistication, the work of Fancourt et al. (2019) appears to stand head and shoulders above the study by Burns and Van Der Meer (2019) discussed in the previous blog. Nevertheless, there is not much to choose between them in terms of their substance. They essentially say the same thing. If people voluntarily engage in activities they value and enjoy, they will say, if asked, that they benefit from them.This paper is a masquerade: statistically sophisticated, but substantively superficial.Challenges

To the authors of the study reviewed: Do you accept the critique made of your work? If not, what defence would you give in response to the criticisms made?
To the Ethics Committee that approved this study: In the course to considering the proposal for this study, were any of the criticisms made above anticipated by reviewers? Given the criticisms made above, would you still give the study ethical approval?
To the editors of the journal PLoS ONE: Were any of the criticisms made above raised by reviewers and put to the authors? Given the criticisms made above, what view do you have now on the decision to publish this research?
To researchers in arts and health: Do you share the concerns raised over the way in which ‘artistic creative activities’ and ‘emotional regulation’ are conceptualised? Do you agree with the judgements made that the findings from this study are of little scientific credibility and practical value? Can you identify other studies which make little in the way of a contribution to knowledge and practice in the field of arts and health?

SourcesClift, S., Hancox, G., Morrison, I., Bärbel, H., Kreutz, G. and Stewart, D. (2009) What do singers say about the effects of choral singing on physical health? - Findings from a survey of choristers in Australia, England and Germany, Proceedings of ESCOM 2009 : 7th Triennial Conference of European Society for the Cognitive Sciences of Music, https://jyx.jyu.fi/handle/123456789/20854Fancourt. F. and Ali, H. (2019) Differential use of emotion regulation strategies when engaging in artistic creative activities amongst those with and without depression, Scientific Reports, 9, 1-9. https://pubmed.ncbi.nlm.nih.gov/31289298/

Fancourt, F. Garnett, C., Spiro, N., West, R. and Müllensiefen, D (2019) How do artistic creative activities regulate our emotions? Validation of the Emotion Regulation Strategies for Artistic Creative Activities Scale (ERS-ACA), PLoS ONE, 14, 2, 1-22. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0211362

Fancourt, D. and Steptoe, A. (2019) Present in body or just in mind: Differences in social presence and emotion regulation in live vs. virtual singing experiences, Frontiers in Psychology, 10, 778. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.00778/full

Hendriks, T., Warren, M.A., Schotanus-Dijkstra, M. Hassankhan, A., Graafsma, T. et al. (2019) How WEIRD are positive psychology interventions? A bibliometric analysis of randomized controlled trials on the science of well-being, The Journal of Positive Psychology, 14, 489-501. https://www.tandfonline.com/doi/full/10.1080/17439760.2018.1484941?needAccess=true

arts and healthguest blogHEALTH AND WELLBEINGSTEPHEN CLIFT

THE NEED FOR ROBUST APPRAISAL OF RESEARCH IN ARTS AND HEALTH: STATISTICALLY SOPHISTICATED BUT SUBSTANTIVELY SUPERFICIAL (GUEST BLOG)

THE NEED FOR ROBUST APPRAISAL OF RESEARCH IN ARTS AND HEALTH: A CASE OF ‘THE EMPEROR’S NEW CLOTHES’ (GUEST BLOG)

THE NEED FOR ROBUST APPRAISAL OF RESEARCH IN ARTS AND HEALTH: QUESTIONS OF VALUE, RELEVANCE AND USEFULNESS (GUEST BLOG)