TECHNICAL INFORMATION ::: the Quality Framework
objective criteria to assess and compare the quality of scales evaluated for usefulness in routine practice



This is about the practicalities and the generalisability of the measure (6 criteria max score = 9)

Number of Items and Completion Time are related indicators of user acceptability. The total assessment package should be completed in a reasonable period of time – say 10-20 minutes. An assessment package is likely to be made up of 3-4 scales. If no completion time is stated then <20 items is presumed to take <4min

  • >5min: long = 0

  • <4min: brief = 1

Independent Evaluation - many scales rely on only one publication by the original creators of the scale. Independent evaluation strengthens validation.

  • Authors only poor = 0

  • One independent publication fair = 1

  • Several independent publications good = 2

Cross Cultural Evaluation strengthens validation and may be crucial to generalising a scale’s use depending on the target population.

  • None found n/k = 0

  • One or limited culturally diverse groups fair = 1

  • Several culturally diverse groups good = 2

Language Check is evidence of testing for plain english (or other language) or service user feedback on the wording of the scale items and instructions.

  • Not found n/k = 0

  • Evidence of some check fair = 1

  • Formal and User checks good = 2

Copyright and Permissions - it is better if scales are in the public domain or have a creative commons licence so that the scientific community and clinicians can use them freely. 

  • Yes - there are copyright constraints on using the scale copyright = 0

  • No - free to use provided no changes made public domain or creative commons = 1

Cost - it is better if scales are free of any charges for their use.

  • No - a charge is made on use of the scale no = 0

  • Yes - the scale is free to use yes = 1



This is about how easily and effectively staff can use the data collected in routine practice (5 criteria max score = 7)

Universal means that the scale can be applied to any, or at the least the main types of, substance misuse and the scale is socioeconomically neutral. Usually scales meeting this criterion are the most desirable. Substance specific scales may be useful for particular assessments. 

  • Only applies to a specific substance: no = 0

  • Applies to all or multiple substances: yes = 1

Clinically Significant Change is the gold standard of psychological treatment outcome. The calculation requires a value for reliable change (the measurement error) and a value for a well functioning population completing the scale.

  • Neither value no = 0

  • Reliable change part = 1

  • Both values yes = 2

Measures domain limits means do floor and ceiling effects limit the range of a scale. Note that content validity is about whether the domain itself is fully represented.

  • >15% of respondents score max or min score no = 0

  • <15% of respondents score max or min score yes = 1

Ease of use - staff should always have some tutoring as to the correct interpretation of measures. Scales for routine use are better if they can be i) administered and ii) scored with minimal training and without the need for complex scoring.

  • Training and complex scoring both needed low = 0

  • Either training or complex scoring needed medium = 1

  • No training and no complex scoring high = 2

Interpretability is the ability to assign qualitative meaning to the quantitative scores. 

  • Difficult to interpret hard = 0

  • Scores define meaningful categories easy = 1



This is about the all important validity of the data collected (7 criteria max score = 14)

Content Validity is the extent to which the domain in question is comprehensively sampled. It requires a clear item selection by more than one expert to develop scale items and a target population (Content validity). The scale should comprehensively represent the construct in question (Face Validity).

Scores for content validity

  • Unclear description n/k = 0

  • Clear description by developers fair = 1

  • Clear description involving experts good = 2

Scores for face validity

  • Mixed items not exclusive to the construct n/k = 0

  • Partial representation of the construct fair = 1

  • Comprehensive representation of the construct good = 2

Construct Validity is the extent to which a scale or subscale measures a single construct which has been derived from theory. Evidence is inferred from different sources: is the scale, or subscale, measuring a single construct and do all the items contribute to the score (Internal Consistency); are scores correlated with another scale predicted to be related (Convergent Validity); and are unrelated concepts actually unrelated (Discriminant Validity)  

Scores for Internal consistency

  • No analysis found n/k = 0

  • Inadequate construct validity found poor = -1

  • Adequate factor analysis or Cronbach's alpha 0.70-0.95 fair = 1

  • Factor structure confirmed in different populations good = 2

Scores for convergent validity

  • No analysis found n/k = 0

  • Inadequate convergent validity poor = -1

  • Single correlation r <0.70 and >=0.30 fair = 1

  • Multiple correlations r <0.70 and >=0.30 good = 2

Scores for discriminant validity

  • No analysis n/k = 0

  • Inadequate discriminant validity poor = -1

  • Area under ROC curve >0.7 or adequate statistic fair = 1

  • Multiple discriminations good = 2

Criterion Validity is the extent to which a scale relates to a gold standard (Concurrent Validity). The gold standard must be a genuine and comparable measure which may be a challenge to find. Predictive Validity is the extent to which scores predict future events that are related to the construct.  

Scores for concurrent validity

  • No analysis found n/k = 0

  • Inadequate concurrent validity poor = -1

  • Single correlation r >=0.70 fair = 1

  • Consistent or multiple correlations good = 2

Scores for predictive validity

  • No analysis n/k = 0

  • Inadequate predictive validity poor = -1

  • Single predictor r >=0.40 fair = 1

  • Multiple correlations good = 2


Scientific articles on the Quality Framework

Hermann RC, Palmer RH. Common ground: a framework for selecting core quality measures for mental health and substance abuse care. Psychiatric services (Washington, D.C.). 2002;53(3):281-7   DOI: 10.1176/

Terwee CB, Bot SDM, de Boer MR, van der Windt D, Knol DL, Dekker J, Bouter LM, de Vet HCW (2007) Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology 60:34-42    DOI:  10.1016/j.jclinepi.2006.03.012

Haynes SN, Richard DCS and Kubany ES (1995) Content Validity in Psychological Assessment: A Functional Approach to Concepts and Methods. Psychological Assessment 3: 238-247  PMID not found

Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, Stein RE (2002) Assessing health status and quality-of-life instruments: Attributes and review criteria. Quality of Life Research 11: 193–205    PMID: 12074258

E. Ware J. Standards for validating health measures: Definition and content. Journal of Chronic Diseases. 1987;40(6):473-80   PMID: 3298292