Proficiency Testing in Labs: Best Practices
Proficiency Testing in Labs: Best Practices
Internal quality control Analytical methods should be validated as fit for pur-
pose before use by a laboratory. Laboratories should
Internal quality control (IQC) is one of a number of ensure that, as a minimum, the methods they use are
concerted measures that analytical chemists can take to fully documented, laboratory staff trained in their use
ensure that the data produced in the laboratory are un- and that they have implemented a satisfactory IQC sys-
der statistical control, i.e. of known quality and uncer- tem.
50
Proficiency testing is the use of results generated in in- Despite the primary self-help objectives described
terlaboratory test comparisons for the purpose of a above, an acceptable performance in a proficiency test-
continuing assessment of the technical competence of ing scheme (where available) is increasingly expected
the participating laboratories [1]. With the advent of as a condition for accreditation. Indeed, in the latest re-
“mutual recognition” on both a European and world vision of ISO Guide 25 it is a requirement that labora-
wide basis, it is now essential that laboratories partici- tories participate in appropriate proficiency testing
pate in proficiency testing schemes that will provide an schemes whenever these are available [2]. Fortunately
interpretation and assessment of results which is trans- both the accreditation requirements and the “self-help
parent to the participating laboratory and its “custom- intentions” can be fulfilled by the same means at one
er”. and the same time.
Participation in proficiency testing schemes provides
laboratories with an objective means of assessing and
documenting the reliability of the data they are produc- History of proficiency testing: International Harmonised
ing. Although there are several types of proficiency Protocol
testing schemes, they all share a common feature: test
results obtained by one laboratory are compared to an Proficiency testing emerged from the early generalised
external standard, frequently the results obtained by interlaboratory testing that was used in different de-
one or more other laboratories in the scheme. Labora- grees to demonstrate proficiency (or rather lack of it),
tories wishing to demonstrate their proficiency should to characterise analytical methods and to certify refer-
seek and participate in proficiency testing schemes rele- ence materials. These functions have now been sepa-
vant to their area of work. However, proficiency testing rated to a large degree, although it is still recognised
is only a snapshot of performance at infrequent inter- that proficiency testing, in addition to its primary func-
vals – it will not be an effective check on general per- tion, can sometimes be used to provide information on
formance or an inducement to achieve fitness for pur- the relative performance of different analytical meth-
pose, unless it is used in the context of a comprehensive ods for the same analyte, or to provide materials suffi-
quality system in the laboratory. ciently well characterised for IQC purposes [3].
The principles of proficiency testing are now well es- The systematic deployment of proficiency testing
tablished and understood. Nevertheless, there are some was pioneered in the United States in the 1940s and in
aspects of practice that need further amplification and the 1960s in the United Kingdom by the clinical bio-
comment. This paper aims to highlight some of these. chemists, who clearly need reliable results within insti-
tutional units and comparability between institutions.
However, the use of proficiency testing is now repre-
sented in most sectors of analysis where public safety is
involved (e.g. in the clinical chemistry, food analysis, in-
Elements of proficiency testing dustrial hygiene and environmental analysis sectors)
and increasingly used in the industrial sector. Each of
In analytical chemistry proficiency testing almost in- these sectors has developed its own approach to the or-
variably takes the form of a simultaneous distribution ganisation and interpretation of proficiency testing
of effectively identical samples of a characterised mate- schemes, with any commonality of approach being ad-
rial to the participants for unsupervised blind analysis ventitious rather than by collaboration.
by a deadline. The primary purpose of proficiency test- To reduce differences in approach to the design and
ing is to allow participating laboratories to become interpretation of proficiency testing schemes the three
aware of unsuspected errors in their work and to take international organisations ISO, IUPAC and AOAC
remedial action. This it achieves by allowing a partici- INTERNATIONAL have collaborated to bring togeth-
pant to make three comparisons of its performance: er the essential features of proficiency testing in the
with an externally determined standard of accuracy; form of The International Harmonised Protocol for the
with that of peer laboratories; with its own past per- Proficiency Testing of (Chemical) Analytical Laborato-
formance. ries [4, 5]. This protocol has now gained international
In addition to these general aims, a proficiency test- acceptance, most notably in the food sector. For the
ing scheme should specifically address fitness for pur- food sector it is now accepted that proficiency testing
pose, the degree to which the quality of the data pro- schemes must conform to the International Harmon-
duced by a participant laboratory can fulfil its intended ised Protocol, and that has been endorsed as official
purpose. This is a critical issue in the design of profi- policy by the Codex Alimentarius Commission, AOAC
ciency testing schemes that will be discussed below. INTERNATIONAL and the European Union.
51
Studies on the effectiveness of proficiency testing lar proficiency scheme for a particular determination
have not been carried out in a systematic manner in will be proficient for all similar determinations. In a
most sectors of analytical chemistry, although recently number of instances it has been shown that a laborato-
a major study of proficiency testing under the auspices ry proficient in one type of analysis may not be profi-
of the Valid Analytical Measurement (VAM) pro- cient in a closely related one. Two examples of where
gramme has been undertaken by the Laboratory of the ability of laboratories to determinate similar analytes is
Government Chemist in the United Kingdom. Howev- very variable are described here.
er, the results have yet to be published (personal com-
munication).
This paper comments on some critical aspects of Example 1: Total poly- and (cis) mono-unsaturated
proficiency testing, identified as a result of experience and saturated fatty acids in oils and fats
in applying the International Harmonised Protocol to
operational proficiency testing schemes. Results from proficiency testing exercises that include
such tests indicate that the determinations are of varia-
ble quality. In particular, the determination of poly-un-
Economics of proficiency testing schemes: requirement saturated and saturated fatty acids is generally satisfac-
for laboratories to undertake a range of tory but the determination of mono-unsaturated fatty
determinations offered within a proficiency testing acids is unduly variable with a bi-modal distribution of
scheme results sometimes being obtained. Bi-modality might be
expected on the grounds that some participant labora-
Proficiency testing is in principle adaptable to most tories were able to separate cis- from trans- mono-unsa-
kinds of analysis and laboratories and to groups of la- turated fatty acids. However, examination of the meth-
boratories of all sizes. However, it is most effectively ods of analysis used by participants did not substantiate
and economically applied to large groups of laborato- this – some laboratories reported results as if they were
ries conducting large numbers of routine analyses. Set- separating cis- and trans- fatty acids even though the
ting up and running a scheme has a number of over- analytical systems employed were incapable of such a
head costs which are best distributed over a large num- separation. This is clearly demonstrated in Reports
ber of participant laboratories. Moreover, if only a from the UK Ministry of Agriculture, Fisheries and
small range of activities is to be subject to test, then Food’s Food Analysis Performance Assessment Sche-
proficiency testing can address all of them. If in a labo- me [6].
ratory there is an extremely wide range of analyses that
it may be called upon to carry out (e.g. a food control
laboratory), it will not be possible to provide a profi- Example 2: Trace nutritional elements (zinc, iron,
ciency test for each of them individually. In such a case calcium etc.)
it is necessary to apply proficiency testing to a propor-
tion of the analyses that can be regarded as representa- Laboratories have been asked to analyse proficiency
tive. It has been suggested that for laboratories under- test material which contains a number of trace elements
taking many different analyses a “generic” approach of nutritional significance, e.g. for zinc, calcium and
should be taken wherever possible. Thus, for general iron etc. It has been observed that the number of labo-
food analysis laboratories, they should participate in, ratories which achieve “satisfactory” results for each
and achieve a satisfactory performance from, series analyte determined in the same test material differs
dealing with the testing of GC, HPLC, trace element markedly, thus suggesting that the assumption that the
and proximate analysis procedures, rather than for ev- satisfactory determination of one such analyte is indica-
ery analyte that they may determine (always assuming tive that a satisfactory determination would be ob-
that an appropriate proficiency testing scheme is availa- served for all similar analytes is not valid. This conclu-
ble). However, the basic participation should be sup- sion is generally assumed even if the elements are de-
plemented by participation in specific areas where reg- termined in a “difficult” matrix, such as in a foodstuff,
ulations are in force and where the analytical tech- where many of the problems may be assigned to a ma-
niques applied are judged to be sufficiently specialised trix effect rather than the end-point determination.
to require an independent demonstration of compe- Other limitations are apparent in proficiency testing.
tence. In the food sector examples of such analytes are For example, unless the laboratory uses typical analyti-
aflatoxins (and other mycotoxins), pesticides and over- cal conditions to deal with the proficiency testing mate-
all and specific migration from packaging to food prod- rials (and this is essentially out of the control of the or-
ucts. ganiser in most schemes) the result will not enable par-
However, it is necessary to treat with caution the in- ticipants to take remedial action in case of inaccuracy.
ference that a laboratory that is successful in a particu- This gives rise to a potential conflict between the reme-
52
dial and the accreditation roles of proficiency testing. It centred ( in the absence of overall bias among the par-
is unfortunate that successful participation in proficien- ticipants). However, the dispersion of q will vary
cy testing schemes has become a “qualification” (or at among analytes often by quite large amounts and so
least poor performance a “disqualification”) factor in needs further interpretation. Thus a “stranger” to the
accreditation. Nevertheless, it is recognised by most scheme would not be able to judge whether a score rep-
proficiency testing scheme organisers that their primary resented fitness for purpose — the scheme is not trans-
objective is to provide help and advise — not to “qual- parent.
ify” or “accredit” participants.
Finally, it must be remembered that extrapolation
from success in proficiency tests to proficiency in every- z-scores
day analytical work is an assumption — in most circum-
stances it would be prohibitively expensive and practi- The z-score results by scaling the error to a target value
cally difficult for a proficiency testing organiser to test for standard deviation, sp, i.e. zp(x-X̂)/sp. If the parti-
the proposition experimentally by using undisclosed cipating laboratories as a whole are producing data that
testing. However, most customers would anticipate that are fit for purpose and are close to normally distributed
performance in a proficiency testing exercise would be (as is often the case) the z-score can be interpreted
the “best” that is achievable by a laboratory, and that roughly as a standard normal deviate, i.e. it is zero-cen-
repeated poor performance in a proficiency testing tred with a standard deviation of unity. Only a relative-
scheme is not acceptable. ly few scores (F0.1%) would fall outside bounds of B
3 in “well-behaved” systems. Such bounds (normally
B3 or B2) are used as decision limits for the instiga-
Scoring tion of remedial action by individual laboratories. The
B 3 boundary has already been prescribed in the UK
Converting the participant’s analytical results into Aflatoxins in Nuts, Nut Products, Dried Figs and Dried
scores is nearly always an essential aid to the interpre- Fig Products Regulations [7]. If participants as a whole
tation of the result. Those scores must be transparent to were performing worse than the fitness for purpose
both the laboratory and its “customer”; that customer specification, then a much larger proportion of the re-
may be either a customer in the conventional sense or sults would give z-scores outside the action limits. Be-
an accreditation agency. cause the error is scaled to the parameter sp it is imme-
Raw analytical results are expressed in a number of diately interpretable by both participating laboratory
different units, cover a large range of concentrations and its customers.
and stem from analyses that may need to be very accu-
rate or may require only “order-of-magnitude” accura-
cy. An effective scoring system can reduce this diversity Combining scores
to a single scale on which all results are largely compar-
able and which any analytical chemist or his client can Many scheme organisers and participants like to sum-
interpret immediately. Such a scoring system (the z- marise scores from different rounds or from various
score) has been recommended in the International analytes within a single round of a test as some kind of
Harmonised Protocol. A number of other scoring sys- an average; various possibilities are suggested in the In-
tems have evolved in the various proficiency testing ternational Harmonised Protocol. Such combinations
schemes which are presently operating; many of these could be used within a laboratory or by a scheme or-
systems incorporate arbitrary scaling, the main function ganiser for review purposes. Although it is a valid pro-
of which is to avoid negative scores and fractions. How- cedure to combine scores for the same analyte within
ever, all of these scores can be derived from two basic or between rounds, it has to be remembered that com-
types of score, the z-score and the q-score [4, 5]. bination scores can mask a proportion of moderate de-
The first action in converting a result into a score is viations from acceptability. Combining scores from dif-
to consider the error, the difference x-X̂ between the ferent analytes is more difficult to justify. Such a combi-
result x and the assigned value X̂ (X̂ being the best nation could for instance hide the fact that the results
available estimate of the true value). This error can for a particular analyte were always unsatisfactory. Use
then be scaled by two different procedures: of such scores outside the analytical community might
therefore give rise to misleading interpretations. Thus,
it must be emphasised that the individual score is most
q-scores informative; it is the score that should be used for any
internal or external “assessment” purposes and that
The q-score results by scaling the error to the assigned combination scores may, in some situations, disguise
value, i.e. qp(x-X̂)/X̂. Values of q will be nearly zero- unsatisfactory individual scores.
53
Fig. 1
of an external standard of performance is therefore es- monstrated the validity of the proficiency testing is
sential. questionable.
As the concentration of the analyte is unknown to The International Harmonised Protocol recom-
the participants at the time of analysis, it may be neces- mends a method for establishing sufficiently homo-
sary to express the criterion as a function of concentra- geneity. (More strictly speaking, the test merely fails to
tion rather than a single value applicable over all con- detect significant lack of inhomogeneity.) After the
centrations. It is also important that the value of sp bulk material has been homogenised it is divided into
used for an analysis should remain constant over ex- the test material for distribution. Ten or more of the
tended periods of time, so that z-scores of both individ- test materials are selected at random and analysed in
ual participants and groups of participants remain com- duplicate under randomised repeatability conditions by
parable over time. a method of good precision and appropriate trueness.
As stressed above, the foregoing excludes the possi- The results are treated by analysis of variance and the
bility of using the actual robust standard deviation of a material is deemed to be sufficiently homogeneous, if
round of the test as the denominator in the calculation no significant variation between the analyses is found,
of z-scores. It also excludes the use of criteria that or if the between-sample standard deviation is less than
merely describes the current state of the art. Such prac- 0.3 sp.
tice would undoubtedly serve to identify outlying re- There is a potential problem with the test for homo-
sults but would not address fitness for purpose. It could geneity — it may be expensive to execute because it
easily seem to justify results that were in fact not fit for requires at least 20 replicate analyses. In the instance of
purpose. Moreover, it would not allow comparability of a very difficult analysis dependent on costly instrumen-
scores over a period of time. tation and extraction procedures, e.g. the determina-
The question of how to quantify fitness for purpose tion of dioxins, the cost of the homogeneity test may be
remains incompletely answered. A general approach a major proportion of the total cost of the proficiency
has been suggested based on the minimisation of cost test. Moreover, if the material is found to be unsatisfac-
functions [9], but has yet to be applied to practical situ- tory, the whole procedure of preparation and testing
ations. Specific approaches based on professional jud- has to be repeated. Some organisers are so confident of
gements are used in various sectors. In the food indus- their materials that they do not conduct a homogeneity
try the Horwitz function [10] is often taken as a fitness test. However, experience in some sectors has shown
for purpose (acceptability) criterion whereas in others, that materials found to be satisfactory in some batches
e.g. in clinical biochemistry, criteria based on probabili- are decidedly heterogeneous in other batches after the
ties of false positives and negatives have evolved [11]. same preparative procedures. Another complication of
In some areas fitness for purpose may be determined such testing is that a single material may prove to be
by statutory requirements, particularly where method acceptable for one analyte and heterogeneous for an-
performance characteristics are prescribed, as by the other. A possible strategy that could be used with care
European Union [12] and the Codex Alimentarius is to store the random samples selected before distribu-
Commission for veterinary drug residues methods. tion, but to analyse them only if the homogeneity of the
material is called into question after the results have
been examined. However, no remedial action would
Homogeneity of the distributed material then make the round of the proficiency testing usable if
heterogeneity were detected, so the whole round would
As most chemical analysis is destructive, it is essentially have to be repeated to provide the proficiency informa-
impracticable to circulate among the participants a sin- tion for the participants. In general, it seems that ho-
gle specimen as a proficiency testing material. The al- mogeneity tests are a necessary expense, unless the dis-
ternative is to distribute simultaneously to all partici- tributed material is a true solution that has been ade-
pants samples of a characterised bulk material. For this quately mixed before subdivision.
to be a successful strategy the bulk material must be
essentially homogeneous before the subdivision into
samples takes place. This is simple in the instance Proficiency testing and other quality assurance
where the material is a true solution. In many instances, measures
however, the distributed material is a complex multi-
phase substance that cannot be truly homogeneous While proficiency testing provides information for a
down to molecular levels. In such a case it is essential participant about the presence of unsuspected errors, it
that the samples are at least so similar that no percepti- is completely ineffectual unless the proficiency testing
ble differences between the participants’ results can be is an integral part of the formal quality system of the
attributed to the proficiency testing material. This con- laboratory. For example, proficiency testing is not a
dition is called “sufficient homogeneity”. If it is not de- substitute for IQC, which should be conducted in every
56
run of analysis for the detection of failures of the analy- scheme would be expected at first to move round by
tical system in the short term to produce data that are round towards the value of sp and then stabilise close
fit for purpose. It seems likely that the main way in to that value. Ideally then, in a mature scheme the pro-
which proficiency testing benefits the participant labo- portion of participants falling outside defined z-scores
ratory is in compelling it to install an effective IQC sys- should be roughly predictable from the normal distri-
tem. This actually enhances the scope of the proficiency bution. Usually in practice in a new scheme the disper-
testing scheme. The IQC system installed should cover sion will be considerably greater than sp in the first
all analyses conducted in the laboratory, and not just round but improve rapidly and consistently over the
those covered by the proficiency testing scheme. In one subsequent few rounds. Then the rate of improvement
scheme it was shown that laboratories with better-de- decreases towards zero.
signed IQC systems showed considerably better per- If the incentives to perform well are not stringent,
formance in proficiency testing [13]. A crucial feature the performance of the group of laboratories may sta-
of a successful IQC scheme was found to be a control bilise at a level that does not meet the fitness for pur-
material that was traceable outside the actual laborato- pose requirement. Examples of this may be found in
ry. some schemes where the proportion of participants ob-
An important role of proficiency testing is the trig- taining satisfactory results in, say, the determination of
gering of remedial action within a laboratory when un- pesticides has increased over time but has now stabil-
satisfactory results are obtained. Where possible the ised at F70% rather than the 95% which is the ulti-
specific reason for the bad result should be determined mate objective. However, where there are external con-
by reference to documentation. If consistently bad re- straints and considerations (e.g. accreditation) the pro-
sults are obtained, then the method used (or the execu- portion of outliers rapidly declines round by round
tion of the method protocol) must be flawed or perhaps (discounting the effects of late newcomers to the sche-
applied outside the scope of its validation. Such interac- me) as the results from the scheme markedly penalise
tion encourages the use of properly validated methods such participants. In addition, in view of the importance
and the maintenance of full records of analysis. of proficiency testing schemes to the accreditation proc-
ess, the need for proficiency testing schemes to them-
selves become either accredited or certified needs to be
Does proficiency testing work? addressed in the future.
References
1. ISO Guide 43 (1993), 2nd Edition, 6. Report 0805 of the MAFF Food 10. Horwitz W (1982) Anal Chem
Geneva Analysis Performance Assessment 54 : 67A–76A
2. ISO Guide 25 (1993), 2nd Edition, Scheme, FAPAS Secretariat, CSL 11. IFCC approved recommendations on
Geneva Food Laboratory, Norwich, UK quality control in clinical chemistry,
3. Thompson M, Wood R (1995), Pure 7. Statutory Instrument 1992 No. 3326, part 4. Internal quality control”
Appl Chem 67 : 649–666 HMSO, London (1980) J Clin Chem Clin Biochem
4. Thompson M, Wood R (1993) Pure 8. Statistics Sub-Committee of the AMC 18 : 534–541
Appl Chem 65 : 2123–2144 (1995) Analyst 120 : 2303–2308 12. Offical Journal of the European Uni-
5. Thompson M, Wood R (1993) J 9. Thompson M, Fearn T, Analyst, (in on, No. L118 of 14.5.93, p64
AOAC International 76 : 926–940 press) 13. Thompson M, Lowthian P J (1993)
Analyst 118 : 1495–1500