The quality framework

A small stone bridge over a calm water body in a park, with trees on both sides, reflecting in the water on a clear, sunny day.

In order to compare the strengths and weaknesses of different questionnaires and measurement scales a Quality Framework is used. Scales should be validated in different populations and by different investigators. The name given to a scale can be unintentionally misleading and end up with it being misused. RESULT puts forward the best scales based on these criteria…

User friendly
Good psychometrics
Meaningful to stakeholders
Good in routine practice

Some pitfalls to look out for…

Using scales for the wrong reason

Many scales in common usage were never intended as routine outcome measures. The Severity of Alcohol Dependence Questionnaire, SADQ, is an example. The scale was one of the first dependence measures and it was originally designed to assess a problem drinker’s suitability for controlled drinking: when seen in this light it is clear why the timeframe is set to be the ‘most recent period of heavy drinking’ and why the items emphasise the physical elements of dependence, withdrawal symptoms which could easily be identified as triggers for drinking. An excellent scale for its intended purpose but not so good as a general measure of dependence.

The Severity of Opiate Dependence Questionnaire, SODQ, was based on the same structure as the SADQ. At the time, controlled drinking was controversial and there was no thought of controlled heroin use, which the SODQ developers seemed to acknowledge as an after thought by adding the five Severity of Dependence, SDS, questions to the SODQ as a separate ‘appendix’ in order to include ‘the psychological elements of dependence’. So, by these measures SADQ/SODQ + SDS = Dependence.

Using screening tools as outcome meausres

A different example is the use of screening tools as outcome measures. The Alcohol Use Disorder Test, AUDIT, and Drug Use Disorder Test, DUDIT, are both excellent screening tools (as they were designed); the difficulty is that screening requires sensitivity to change to be concentrated at lower, general population levels of consumption/problem and may then be insensitive to changes in heavy consumption/severe problem levels expected in problem drinking or drug taking populations. The lesson is: be sure to understand what a scale was designed to do and how it has developed or been superseded by new, improved scales.

Using scales that are not what they seem

Not all scales are entirely what they claim to be. The Treatment Outcome Profile, TOP, is an example. On the face of it a composite outcome tool for addiction. The TOP was developed at the behest of the National Treatment Agency in the UK, which was set up to circumvent the Department of Health and is now disbanded. It was imposed on agencies and disliked by most practitioners for this and a variety of other reasons. In a competitive commissioning world there are concerns that the TOP is easy to ‘game’ and susceptible to perfunctory completion. The secret is that it had been structured to mirror the agenda of the government and provide sound bites for politicians - it served this purpose. Other composite measures such as the widely used Addiction Severity Index, ASI, were always available.

A slightly different example also stems from the competitive nature of addiction commissioning leading to exaggerated claims, usually to do with good outcomes, by services, but an example relating to scales can be found from King’s College London claiming that the Substance Use Recovery Evaluator, SURE, had ‘unprecedented input from people in recovery’ making it sound unique and rather special. Not actually true. The Addiction Recovery Questionnaire, ARQ, used a similar methodology, has more psychometric data, and was published a year earlier. The lesson is: beware measures that are politically driven or self-promoting.

The Quality Framework gives a score to the scale requirements…

Psychometrics

“Does the scale measure what it claims to?” max score 8

Content Validity is the extent to which the domain in question is comprehensively sampled. It requires clear item selection by more than one expert and a target population to develop scale items. Scores for content validity

Unclear description = 0
Clear description by developers = 1
Clear description involving experts = 2

Compare to benchmarks

Criterion Validity is the extent to which a scale relates to a gold standard. The gold standard must be a genuine and comparable measure with a correlation >=0.7

Convergent Validity is the extent to which a scale relates to another scale that measures something where there is a theoretical liklihood of an association. Scores should correlate 0.3-0.6

No or inadequate comparisons = 0
Single criterion or convergent comparison = 1
Multiple comparisons = 2

Internal consistency is the extent to which items in a (sub)scale are intercorrelated, thus measuring the same construct

No or inadequate analysis = 0
Adequate factor analysis or Cronbach's alpha 0.7 = 1
Factor structure confirmed in different populations = 2

Discriminant validity is the ability of a questionnaire to detect clinically important changes over time or to determine if a condition exists or not

No or inadequate analysis = 0
Area under ROC curve >=0.7 or shows clinically significant change or adequate statistic = 1
Multiple discriminations = 2

Usability

“Can the scale easily be applied to routine practice?” max score 6

Number of Items and Completion Time are related indicators of user acceptability. The total assessment package should be completed in a reasonable period of time – say 10-20 minutes. It works well to have a scale fit on an A4 sheet. An assessment package is likely to be made up of 3-4 scales

21+ items = 0
13-20 items = 1
1-12 items = 2

Clinically Significant Change is the gold standard of psychological treatment outcome. The calculation requires a value for reliable change (the measurement error) and a value for a well functioning population completing the scale

Neither value published = 0
One value published = 1
Both values published = 2

It is important to have the value for reliable change not just as a way of demonstrating that the scale measures change but how much the score needs to change

Ease of Use means scales are better if they can be i) administered and ii) scored with minimal training and without the need for complex scoring, and iii) easy to understand, but users should always have some tutoring as to the correct interpretation of measures

Training and knowledge to interpret are both needed = 0
Minimal training and meaning is plain = 1
Minimal training and user guide available = 2

NOTE: it is often desirable to use a scale that can be applied to most situations - this may be a factor determining suitability

Adoptability

“Is it practical for an agency to use the scale?” max score 8

Independent Evaluation is important but many scales rely on only one publication by the original creators of the scale. Independent evaluation strengthens validation

Authors’ publications only = 0
One independent publication = 1
Several independent publications = 2

Cross Cultural Evaluation strengthens validation and may be crucial to generalising a scale’s use depending on the target population

None found = 0
One or limited culturally diverse groups = 1
Several culturally diverse groups = 2

Language Check is evidence of testing for plain English (or other language) or service user feedback on the wording of the scale items and instructions

Not found = 0
Evidence of some checks = 1
Formal and User checks = 2

It is better if scales are in the Public Domain or have a Creative Commons licence so that the scientific community and clinicians can use them freely = 1

It is better if there are no copyright constraints or fees for using scales = 1

The outcome measures selected for RESULT all have high Quality Framework scores and notably have independent evaluation and published values for calculating clinically significant change.

The selected measures for associated mental health problems also have good Quality Framework scores but not the values for calculating clinically significant change.

Quality framework - scientific articles

Hermann RC and Palmer RH (2002) Common Ground: A Framework for Selecting Core Quality Measures for Mental Health and Substance Abuse Care. Psychiatric Services 53: 281-287

Scientific Advisory Committee of the Medical Outcomes Trust (2002) Assessing health status and quality-of-life instruments: Attributes and review criteria. Quality of Life Research 11: 193-205

Terwee CB, Bota SDM, de Boera MR, van der Windta DAWM, Knola DL, Dekkera J, Boutera LM, de Vet LMH (2007) Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology 60: 34-42