Application of Accuracy and Precision Evaluations Based on the Application of Accuracy and Precision Evaluations Based on the Current United States and Indonesian Pharmacopoeias: A Critical Current United States and Indonesian Pharmacopoeias: A Critical Review Review

New methods for the evaluation of accuracy and precision are mentioned in the latest edition of the United States Pharmacopoeia (USP), whereas other validation parameters, that is, selectivity, linearity, range, and robustness, remained relatively unchanged. In obtaining reliable data from any chemical/pharmaceutical analysis, the analytical procedure must be validated or verified in accordance with the latest edition of the pharmacopoeia. Some review articles on the general validation methods have been published by the author. This present review will focus on the implementation and discussion of the accuracy and precision evaluation based on the current USP and Indonesian pharmacopoeia. Some examples of the calculation of several accuracy and precision method of determinations are also discussed.


Introduction
Based on the general chapter of United States Pharmacopoeia (USP)-National Formulary (US44-NF39) <1225> [1], the accuracy of the analytical procedure can be defined as the closeness of test results obtained by that procedure to the true value. The accuracy of an analytical procedure should be established across its range, whereas precision is the degree of agreement among individual test results when the procedure is applied repeatedly to multiple samplings of a homogeneous sample. The definition of accuracy and precision in the general chapter USP-NF39 <1225> is almost similar to their definition in British Pharmacopeia (BP) 2022, Supplementary III F [2], International Council for Harmonization (ICH) Q2R1 [3], and Indonesian Pharmacopoeia (FI) VI <1381> [4]. The ICH of Technical Requirements for Pharmaceuticals for Human Use Q2R1 uses the term trueness instead of accuracy. According to International Organization for Standardization (ISO) 5725-2:2019 [5], the term accuracy indicates the combination of the term trueness and precision. General chapter USP44-NF39 <1225> and FI VI <1381> [1,4] describe general validation methods of analytical procedures and their acceptance criteria; the method performance is determined using the parameters accuracy, precision, specificity, detection limit, quantification limit, linearity, and range. General chapter USP44-NF39 <1210> [6] describes the utilization of statistical approaches in procedure validation as described in chapter <1225> [1]. This chapter focuses on establishing analytical performance characteristics of accuracy, precision, and detection limit. Accuracy can only be evaluated if a true or accepted reference value is available. Accuracy of an analytical procedure expresses the closeness of agreement between τ (true or nominal value) and Y (measured value); the closeness is expressed as the average of (Y − τ). General chapter USP44-NF39 <1010> [7] provides a basic statistical approach for decision-making and the methods for comparison of two analytical procedures; comparing two analytical procedures (e.g., new-and validated-method) is necessary to determine whether the difference of the accuracy and precision are less than the amount described in the analytical target profile (ATP). General Notices and Requirements Section 6.30 of the USP [8] describes the need to produce comparable results from the proposed/alternative to the compendial method. This comparative test described in general chapter USP 44-NF 39 <1010> can be also applied to the method validation for transferring analytical procedures in different laboratories [7]. The performance characteristic of accuracy and precision, which should be evaluated and specified in the method validation, must meet the acceptance criteria described by the ATP of the proposed/new method [9]. Review articles on the validation of analytical methods and their applications in pharmaceutical analysis (including herbal drugs) have been published by the author in 2005 [10], 2012 [11], 2018 [12,13], and 2022 [14,15]. In the last 4 years, new methods for the evaluation of accuracy and precision are mentioned in the latest edition of the US, whereas other validation parameters, that is, selectivity, linearity, range, and robustness, remained relatively unchanged. The analytical method must be validated/verified based on the latest edition of the pharmacopoeia or official guidelines to obtain reliable data from any chemical analysis carried out by QC or research laboratories; if a new edition of the pharmacopoeia has been released, then the old version will automatically no longer be applied. This review will describe and focus on implementing accuracy and precision evaluation in accordance with the current USP44-NF39 [1,6,7,9] and FI VI [4]. It also aims to gain a comprehensive understanding of the new methods for determining accuracy and precision based on those current Pharmacopoeias. Related official guidelines, publications, and some similarities from previous editions of the USP-NF will be described in this review because of their important role in current evaluation methods. Some examples of calculations and determinations of accuracy and precision are also described. Given their comprehensive understanding of accuracy and precision evaluation methods, the pharmacists in QC laboratories and/or researchers can select the best method for a particular application in their work.

Assessment of Accuracy and Precision
Separated Evaluation of Accuracy and Precision: According to chapters USP44-NF39 <1225> [1] and FI VI <`1381> [4]. Accuracy can be determined by three categories: (1) Drug substance: accuracy is determined by applying an analytical procedure to an analyte of known purity (e.g., a certified reference standard) or by comparing with a well-characterized procedure, where the accuracy has been stated or defined. (2) Drug in product: accuracy can be determined by applying an analytical procedure to a synthetic mixture of the drug product components to which a certain amount of analyte is added within the range of the procedure. If a sample with all components of the drug product can be obtained, that is, active pharmaceutical ingredient (API) and excipients, then this can be done by adding a known amount of analyte to the drug product (spiking) or by comparing with a well-characterized procedure, where the accuracy has been stated or defined. (3) Impurities: accuracy should be assessed on a sample (of drug substance or drug product) that is spiked with a known amount of impurity.
Accuracy should be assessed using a minimum of nine determinations over a minimum of three concentration levels, covering the specified range (i.e., three concentrations and replicates of each concentration). The accuracy can be evaluated as follows: (1) Determine the percent of recovery (%R) across the range of the assay.
(3) Evaluate the linearity of the relationship between the estimated and actual concentrations. The statistically preferred criterion indicates that the confidence interval (CI) for the slope is contained within an interval of about 1.0 or alternatively, that is, the slope is close to 1.0. %R can be calculated using Equation 1 or 2 (standard addition method).
where Yc is the actual or true concentration, Yf is the measured/estimated concentration, Yu is the original concentration before standard addition, and Ca is the added concentration of the analyte. The acceptance criteria of certain drug substance or product are described by their USP-NF and FI monographs, or it can be referred to Tables 1 and 2 (if the monograph is not yet available). The acceptance criteria depend on analyte concentrations in samples or the instrument used, which should be stated in the ATP of the proposed method. Researchers must decide whether the acceptance criteria to be specified in the ATP are based on the concentration of API in the sample and/or the instrument used. Food and Drug Administration Office of Regulatory Affairs Laboratory Manual II [16] describes that the general acceptance criteria of accuracy for human drug analytical methods is at least 80%-120% (for assay) and 70%-130% (for content uniformity) of the expected content.
USP <1225> [1] and FI <1381> [4] do not describe the method for the evaluation of the linearity relationship of the recovery curve between Yf and Yc (Equation 3). Funk et al. [28] described the equations used to determine the CI of the slope (b) and intercept (a) (Equations 4 and 5) of the recovery curve. The CI value must include the value of 1 (slope) and 0 (intercept); if it does not include the respected values, then one may assume a constant and/or proportional systematic error of the proposed method.
i (1, 2 …. n) indicates different concentration levels, and t is the student-t-factor based on the degree of freedom of n − 2 (p = 0.05). Recently, ICH Q14 described that the slope of the recovery curve between Yf and Yc should be within 0.8 to 1.25 (for p = 0.05) [29].  DS: drug substance, DP: drug product, Imp: impurities, Na: not available. *The acceptance criteria can be referred to the detector of the chromatographic system.
In general, the precision of an analytical procedure is expressed as the standard deviation (SD), relative standard deviation (RSD), or coefficient of variation (CV) of a series of measurements [1,4]. Three levels of precision determination include repeatability, intermediate precision (ruggedness), and reproducibility [30]. Repeatability refers to the use of an analytical procedure within a laboratory over a short period of time using the same analyst with the same equipment. Intermediate precision (also known as ruggedness) expresses withinlaboratory variation, as on different days, or with different analysts or equipment within the same laboratory. Reproducibility refers to the use of an analytical procedure in different laboratories, as in a collaborative study [1,4].
Precision is determined by testing a sufficiently homogenous sample aliquot and expressed as SD or RSD. Sample analysis must be performed through a complete analytical procedure from sample preparation to final test result. Repeatability must be assessed using a minimum of nine determinations covering the specified range of the procedure (i.e., three concentration levels and three replicates of each concentration) or using a minimum of six determinations at 100% of the test concentration [1,4]. For analysis using a single run, SD can be calculated using Equation 6.
where Yi is the individual value, and ̅ is the sample mean. article [11]. Based on Ref. [30], SR can also indicate reproducibility.
where Sr is the repeatability, and SB is between condition variance; SB, Sr, and SD of the mean (SDm) can be determined using ANOVA (Equations 8 and 9), which were described in the previous edition of the USP41-NF36, 2018 [31]. Detailed discussions can be referred to author's previous publication [11], but the equations were not mentioned in the current edition of the USP [7].
The previous edition of the USP [31] described the variance of the mean (Vm) for a test involving different combinations of runs and number of replicates per run using (Equation 10): where m is the number of runs, and n is the number of replications for each of run. Equation 10 is no longer described in the current edition of the USP-NF [7]. For in-house validation of a new analytical procedure, the author recommends determining SR instead of Sr for precision evaluation. The precision acceptance criteria of certain drugs and preparations have been described by the monographs of the pharmacopoeia or Tables 1 and 2. FDA, ORA Laboratory Manual II [16] describes the general acceptance criteria for the precision of human drug analytical methods, which are <3% (for drug products) and <2% (for API).
General chapter of USP44-NF39 <1210> [6]. Accuracy and precision can be evaluated using CI of bias (CIB) and CI of SD (CISD).
where τ is the true or nominal value, bias (B) is ( ̅ − τ), n is the number of reportable value, t1−α,n−1 is the percentile of central t-distribution with area 1 − α to the left and (n − 1) degrees of freedom, : −1 2 is a percentile of a central chi-squared distribution with area α to the left, and (n − 1) degrees of freedom.
The acceptance criteria of CIB and CID can be referred to Tables 1 and 2. If λ is the maximum bias acceptance limit, then CIB must be between −ƛ and +ƛ, and CISD should be less than the acceptance values. If the nominal content of the API is 0.1% (Table 1), then CIB should be within −10% to +8%, and CISD must be <3% (repeatability) or <6% (reproducibility). If the researcher using NIR (Table  2) limit of %ƛ ranging from −2% to +2% (DS), from −5% to +5% (DP), or from −30% to +50% (impurities). The researcher must determine whether to use acceptance criteria based on nominal concentration or the instrument used must be described in ATP.

Combined evaluation of accuracy and precision.
Using separated accuracy and precision as discussed in section separated evaluation of accuracy and precision, some data of the individual results of Yi or %R are not included in the required acceptance criteria (Tables 1 and  2). Several methods used for evaluating accuracy and precision simultaneously have been proposed to ensure that all Yi or %R can meet the requirement of the acceptance criteria; discussion of this method has been described by a previous review [12].
Mean ± TI = ̅ ± K. SD (14) K = √ Japan Pharmacopeia 17th edition [32] used ̅ ± CI to assess accuracy; CI must be calculated using intermediate precision or reproducibility. ̅ ± CI should be included in the range of the acceptance criteria (Tables  1 and 2). The application of CI (p = 0.05) in evaluating recovery was also recommended by the new version of ICH Q2 (R2) [33].

Application of Accuracy and Precision Evaluations 231
In evaluating the percent of the data ( ̅ ± CI/PI/TI) included in the required specification range (P%)inside can be calculated using Excel [34]: where SLupper, lower is the upper or lower specification limit of the ATP. These discussions show that every results of the pharmaceuticals analysis at QC laboratory ( ̅ ± CI/PI/TI) should be included in the acceptance criteria (Tables 1 or 2), or P (%)inside should be close to 100%. The author recommends applying this combination methods rather than the separate methods, which are discussed in section separated evaluation of accuracy and precision.
Selection of SD and the methods of evaluations. As previously discussed, SD can be calculated as Sr, SR, and SDm. For analysis, which is performed using several series of measurements (runs), the values of SR would be > Sr and >SDm as shown by a previous work [35]. SD can be applied for a single run of analysis. Calculations of accuracy and precision using Equations 11-15 require SD value; the results of calculation of the mean ± PI/TI (Equations 13 and 14) yield broadest expected ranges, if SR is used as SD; this shows that the application of SR as SD is the best choice. Our work [35] showed that the calculation of % ̅̅̅̅̅ and precision using separate evaluations (Equations 1, 2, 4, 5, 11, and 12) and the combination evaluation (Equation 13) met the acceptance criteria of the ATP, but calculation using Equation 14 showed that some data were out of specification (OOS). Therefore, the combination evaluation using TI (Equation 14) is recommended for the evaluation of accuracy and precision because it could yield a broader calculated expected range. If Equation 14 is used, and the specification range of the ATP is achieved, then using other equations will also meet the ATP. Detailed discussion regarding this matter has been described in previous publication by the author [12].

Assessment of the Accuracy and Precision Through Comparison
Principle of evaluation. Based on USP44-NF39 <1225> [1], accuracy and precision of the proposed/new (N) method can be validated by comparing with the old (O) validated method. The detailed comparison methods have been described by the current USP [7], which will be discussed below.
Comparing the accuracy of two procedures, the absolute value of the true difference in means (µ ) can be calculated as follows.
|µ | must be less than the required value (d). For precision, the SD ratio of the new procedure to the old procedure must be less than a certain required value (k). must be < k If an old procedure has µO of 100 unit, then upper and lower specification limits are 104 and 96, respectively, CVO is 0. 16 where ∅ represents the cumulative probability function of the normal standard distribution. Table 3 describes the expected OOS of the new procedure for various d and k values. If d = 1 and k = 2 is selected, then the OOS of the new procedure will be 0.14%; if d = 2 and k = 1, then OOS will be 1.27%. OOS of an analytical procedure can be calculated using Equation 19 [36]: where ∅ represents the normal standard cumulative distribution function (Z value), and the process capability (CP) can be estimated using Equation 20 [12].
OOS can also be estimated by 100% minus P (%) inside (Section assessment of the accuracy and precision through comparison).
Based on BP 2022 [2], the accuracy evaluation of two procedures could be evaluated using cross-correlation coefficient r (Pearson Product-Moment Correlation). If N is the proposed procedure, then O is the validated procedure. r can be calculated using Equation 21 [37]: Based on ICH M10 [38], the assessment of the accuracy by comparison can be evaluated using the concordance correlation coefficient (CCC) as described by Equation 22 [39].  [40].
Evaluating the accuracy and precision of the proposed procedure by comparing with a validated procedure using a significance test with a certain level of p is not recommended [41,42].
Comparison using homogenous test materials [7]. The number of replications of the new procedure (nN) and the old procedure (nO) can be calculated using Equation 19.
where Z1-α and Z1-β are the standard normal percentiles with area 1−α and 1−β, respectively, to the left. The Type I error rate is α, and the Type II error rate is β. Table 4 presents the power (1 − β) for sample size combination with d = 1, k = 2, α = 0.05, and SDN = SDO = 0.4 (for µ of 100 unit). Samples size should be 15 to obtain around 0.80 power of the new procedure. The accuracy of the two procedures is tested by calculating CI of µD, which should be fulfilled − < < + , and CISD must be <k.   [7]. For this paired design using nonhomogenous samples (different lots and manufactures), the number of replications can be estimated as follows:

Comparison of two procedures using nonhomogenous test materials (variation across test samples)
where : −1 2 is the percentile from the chi-squared distribution with area α to the left and degrees of freedom n − 1.
If SDO is not available, then the CI of the SD ratio can be calculated as follows: Summary of the comparison method. Accuracy evaluation by using the comparison method has been mentioned by the USP44-NF39 chapter <1225>, BP 2022, and Indonesian pharmacopoeia chapter <1381> [1,2,4]. The USP 44-NF 39 chapter <1010> [7] described detailed methods for the evaluation of accuracy and precision using the comparison method (Sections internal quality control). BP 2022 [2] used cross-correlation as the method of assessment. However, Indonesian Pharmacopeia [4] did not describe the evaluation method of the accuracy using the comparison method. In comparing two bioanalytical methods, ICH M10 [38] used CCC (Section principle of evaluation). Several applications of CCC in bioanalytical methods have been reported; thus, the application of CCC is recommended instead of using cross-correlations. Further work must be conducted to determine whether CCC can be applied to replace the complex method described by USP 44-NF39 <1010> [7]. The comparison of two procedures (N and O) should not be evaluated using a significance test with a certain p-value; p-value cannot be trusted whether it is small or large [41,42].

Internal Quality Control
Once the proposed method has been validated using separate or combined evaluations as discussed in sections assessment of accuracy and precision-internal quality control, the validated method should always be monitored during its application [9,43,44]. A certain number of QC samples should be analyzed during routine analysis. QC samples are typical samples which over a given period are sufficiently stable and homogeneous enough to provide the same results [43]. For each batch of run, 5% QC samples should be analyzed [43,44]. For every 20 samples, one QC sample should be analyzed.
The results of QC sample analysis can be divided into the acceptance zone (between lower and upper limit), guard band, and rejection zone [9]; or limits are set as mean ± 2 SD, mean ± 3SD (action limit) [43]; or mean ± 1 SD (zone C), mean ± 2 SD (zone B), and mean ± 3 SD (zone A) [44]. Based on the general chapter of USP <1210> and <1010> [6,7], it can be assumed that the acceptance zone is mean ± CI/PI, whereas the guard band is mean ± TI. In proving whether the method used is still valid during application, the result of the analysis of all QC samples should be included in the acceptance zone or zones C and B. If the results are in the guard band, action limit, rejection zone, or zone A, then the analytical procedure should be investigated to correct the problem or to be revalidated as necessary. This method can no longer be used for routine applications. The author recommends that the internal quality control methods should be added in the new edition of Indonesian Pharmacopeia [4] and Indonesian's CPOB [45]. Table 5 shows the analysis result data of an API in a DP (10 level of concentrations) using a UV spectrophotometer; this work aims to investigate systemic and constant proportional errors during extraction. As shown in Table 1, the acceptance criteria for accuracy is 85%-110%, and repeatability is 4% (concentrations of 100 ppm).  These data indicate that the researchers should optimize the extraction methods. The proportional and systemic error of the method cannot be observed using %R, RSD of precision, and slope. The calculation of CIa, CIb, and other validation parameters can be performed using our self-developed VMA solutions, which can be downloaded for free using the described link [46].

Determination of Accuracy According to Ref. [6].
Drug substances. Table 6 shows the analysis data of a DS in three levels of API concentration in triplicate using NIR. As shown in Table 2, the acceptance criteria of the bias of the accuracy is −2% to +1%, and the RSD of repeatability and intermediate precision is 1%.    Table XI for TI of the URL of Appendix A Statistical Table [47], whereas ̅ ± can be estimated using the Interactive Statistic Page link [48]. The calculations show that % ̅̅̅̅̅ ± PI/TI are included in the specification range (−2% to 1%).
Drug products. Table 7 shows the result of analytical method validation of DP in three runs with different concentration levels, and each run is carried out in six replications. The acceptance criteria of 100 ppm were as follows: accuracy: 85%-110%, bias: −15% to +10%, RSD repeatability: 4% (Table 1) This evaluation shows that the recommendation of ICH Q14 [29] regarding the acceptance criteria of slope (0.8-1.25) should be further optimized (P = 0.05 or 0.01; see 2.1.).
Using ANOVA (Equations 7-9), 2 = 1.6277; 2 = 0.42048; 2 = 2.0481. Evaluation of accuracy and precision using the comparison method [7]. The result of the new and validated procedures for homogenous test materials (Section internal quality control) was analyzed using the data presented in Table 8

Conclusions and Recommendations
Conclusions and Recommendations: (1) The abovementioned examples show that evaluation using the limits of accuracy and precision ( Table 1) described by current USP and FI cannot be used to draw valid conclusions. Although ̅ and precision meet the requirements, CIa and CIb could not meet the requirements (5.1 and 5.2.2). The author recommends revising the method of assessment in chapters <1225> of the USP-NF [1] and <1381> of the FI VI [4], that is, "Assessment of accuracy can be accomplished in a variety of ways, including evaluating the recovery of the analyte (percent recovery) across the range of the assay, or evaluating the linearity of the relationship between estimated and actual concentrations". The term "or" should be replaced with "and". (2) As discussed in section selection of SD and the methods of evaluations, the application of a combined evaluation using Equation 14 is recommended in evaluating the accuracy and precision of a proposed/new method. (3) The recovery data must be reported as mean ± CI/PI/TI instead of mean ± SD/RSD (Sections combined evaluation of accuracy and precision and selection of SD and the methods of evaluations). (4) Further works are needed to determine whether the comparison method using concordance correlations can replace the methods proposed in the USP chapter <1010> [7]. In evaluating the accuracy and precision using the comparison method, a significance test with a certain p-value should not be applied (Section summary of the comparison method). (5) In obtaining reliable data at a QC laboratory, the application of internal quality control (Section internal quality control) is strongly recommended. The internal quality control must be included in the new edition of the Indonesian Pharmacopeia.