TECHNICAL INFORMATION ::: statistical tests
symbols, abbreviations and a brief explanation of statistical
terms and tests used to evaluate scales


Internal Consistency & reliability

❝ does the scale (or subscale) measure a single construct ❞

A scale should measure whatever it is that that it claims to measure and nothing else. The most commonly used statistic to test this is Cronbach's alpha which correlates all the items in a scale with each other - possible values are 0-1. For scales of 10 or more items alpha should be >0.7. Reliability is commonly tested by the correlation between a set of measures completed at time-1 with a set completed by the same respondents at time-2 which should be 7-14 days later. This is known as test-retest reliability which should be >0.8. This video from Brunel University, London ▶︎


Construct validity

❝ what is the structure of the scale ❞

Scale development typically starts out with a pool of items that are thought to reflect whatever is being measured - a single construct, for example dependence, or a number of subscales, for example in quality of life, may be hypothesised. A principal components analysis or similar factor analysis is the most common way of demonstrating the scale structure. The analysis determines which items correlate with each other and whether a number of items can be combined to make a scale or subscale. This video is from Oxford Academic Press ▶︎


Change potential

❝ does the scale score change as the condition changes ❞

Any scale that is to be used as an outcome measure must be able to detect the changes that may occur. Mean scores at two different points, before and after treatment for example, are commonly used to demonstrate the change potential of a scale. Differences between means are used for other validation purposes. Mean scores can be misleading - outlying values and distributions that are not 'normal' can produce false results. This video is from Practical Applications of Statistics in the Social Sciences ▶︎


Discriminant validity

❝ can the scale identify a disorder ❞

Scales that are designed as screening tools or intended to identify people who have a particular condition, depression or PTSD for example, need to be able to pick up 'cases' and miss as few as possible, the sensitivity of the scale, while not incorrectly identifying people who do not have the condition, the specificity. This video is from Oxford Academic Press ▶︎


Concurrent and convergent validity

❝ does the scale match up to a gold standard or similar ❞

These are two very common tests of validity which measure the strength of the relationship between two scales. Correlations have values -1 to +1. Concurrent is testing to see if the scale is close to a gold standard and so correlations should be high, say >0.7; convergent is testing to see if the scale is in line with another related measure and so correlations should be mid-range, say 0.4-0.6. This video from Brunel University, London ▶︎