Applying bootstrap quantile regression for the construction of a low birth weight model

Background: Most investigators use ordinary least squares (OLS) methods to model low birth weight. When the data are non-normal or contain outliers, OLS become ineffective. However, the quantile method of forecasting low birth weight has not been fully evaluated, although it has good potential for overcoming problems associated with linear regression. Methods: The present study reports our comparison between the OLS and quantile regression methods for modeling low birth weight when the data are right skewed and outliers are presented. Additionally, we evaluated the performance of the associated algorithm in recovering the true parameter using the bootstrap method. Results: Our study found that a mother’s education level, the number of maternal parities, and the last birth interval significantly impacted low birth weight at any selected low quantile. Based on the bootstrap simulation study, the proposed model was considered to be acceptable since both methods generated nearly identical estimates of the parameter model. An accuracy test proved that the quantile method was an unbiased estimator. Conclusions: The present study found that low birth weight is significantly affected by the mother’s educational level, the number of maternal parities, and the last birth interval.


Introduction
Birth weight in humans is described as the weight of an infant obtained within the first 60 minutes after birth. Birth weight is determined by two major processes: length of gestation and intrauterine growth rate. 1 Low birth weight is defined by the World Health Organization as a birth weight < 2500 grams. Low birth weight can be caused by either a short gestation period, retarded intrauterine growth, or a combination of the two. 1,2 It is considered to delay child development and carries a greater risk of early childhood mortality. Moreover, infants with low birth weight also have a significantly greater risk of infection, decreased chances of survival, higher susceptibility to childhood illnesses, and difficulties associated with psychosocial development, behavior, and learning during childhood. 3 Over the last few decades, many studies have investigated the causes of low birth weight. Recently, low birth weight and its determinants have come under intense global scrutiny.
Conventional regression methods, such as the ordinary least squares (OLS) method, are typically used to model the factors affecting birth weight. OLS is based on a central tendency, which may not appropriately represent the reality in cases where the dependent variable ranges between the lower and upper values; hence, the relationship may not be homogenous across different percentiles of the dependent variables. Thus, using OLS to estimate the mean may not accurately reflect or represent heterogeneity in the estimated relationship. However, studies have shown that the resulting estimates of various effects on the conditional mean of birth weight do not necessarily indicate the size and nature of these effects on the lower tail of the birth weight distribution. 4,5 A more complete picture of the covariate effects can be seen by estimating a family of conditional quantile functions. Estimates of conditional quantiles can be used overcome any problem associated with the classical method (OLS), such as outlier data or heteroscedasticity cases, as long as the error distribution of the data has a continuous, symmetric, and unimodal density. 6,7 The quantile regression method is used to estimate the relationship at any point of conditional distribution of the dependent variable, which generates various estimated coefficients at certain quantiles of the dependent variable. 8−10 The objective of this study is to identify the determinants of low birth weight using quantile regression. We report a quantile regression model for

Methods
This study utilized primary data collected by questionnaire distributions from March through July 2016. Our sample was limited to mothers who just delivered a singleton live birth and were living in West Sumatera, Indonesia. In total, 92 respondents with complete information were included in the analysis. The response variable was the child's birth weight recorded in kilograms. Eleven indicator variables were used in this study, including continuous and categorical types, i.e., mother's education, mother's job, residence, number of pregnancy problems, mother's age, number of parities, number of prenatal care visits, mother's weight gain during pregnancy, mother's hemoglobin (Hb) level, last birth interval, and sex of the baby. 1 Mother's education was divided into three levels: low, middle, and high level, with the low level considered a reference category for interpreting coefficients. Mother's job was classified into three categories: government employee, housewife, and other, and residence was categorized as urban or rural. The number of pregnancy problems was categorized into three types: > 1 problem (reference category), 1 problem, and no problem. Meanwhile, mother's age, the number of parities, the number of prenatal care visits, mother's weight gain during pregnancy, mother's Hb level, and last birth interval were represented by continuous variables. This research has been conducted in full accordance with the World Medical Association Declaration of Helsinki. Figure 1 (a) presents a histogram of the dependent variable of 92 birth weights. Distribution of data is skewed to the right. Figure 1 (b) demonstrates a normal Q-Q plot of the data, indicating a violation of the normality assumption in the birth weight data. Summary statistics were calculated for all of the selected independent variables. Table 1 presents the descriptive statistics for all of the continuous independent variables, and Table 2 shows the percentage of each category of qualitative variables. In the present study, the quantile regression approach was used to model low birth weight based on the following ideas: Considering a linear model, 14 where y i is the ith observation, x i is the ith independent variable, and e i is an independent error variable with probability density f i . For identifiability, we assume that, for a quantile level of interest   (0,1), the conditional th quantile of e i given x i is zero. The conditional quantile regression is as follows: where Q Y (τ| ) represents the th conditional quantile of the response Y given x and parameter β(τ) is an unknown functional vector. A point estimate β(τ) of the parameter β(τ) is obtained by minimizing the objective function:  where ρ τ (. ) denotes the following loss function: and . is the usual indicator function. Such loss function is then an asymmetric absolute loss function, i.e., a weighted sum of absolute deviations, where a (1 − ) weight is assigned to the negative deviations and a weight is used for the positive deviations. 6 We evaluated goodness of fit for these quantile regressions using R 2 values. The R 2 index formulation for quantile regression differs from OLS regression since it is based on the minimization of an absolute weighted sum (not an unweighted sum of squares as in OLS). The R 2 formulation for quantile regression is represented by what is typically called a pseudo-R 2 , which is formulated as follows: where is the residual absolute sum of weighted differences between the observed dependent variable and the estimated quantile of conditional distribution in the more complex model, and is the total absolute sum of weighted differences between the observed dependent variable and the estimated quantile of conditional distribution in the simplest model. 4 We evaluated the performance of the quantile method and its associated algorithm in recovering the true parameter using a simulation study, which was performed by applying the bootstrap method. 8 The bootstrap resampling method is a fully nonparametric procedure that is suitable for use in a wide range of models and easy to implement. In this method, a new data set is generated by sampling with replacement from the original data set, and the hypothesis model is fitted to the new data set. 8,15,16 The estimation of standard errors for parameters was obtained by fitting the hypothesis model to the new data set.
Previous study 7 presented the following procedure to perform bootstrap sampling as follows: (1) Generate a random sample of size n from the original data denoted by X * = (X 1 * , X 2 * , X n * ) ; (2) For this one bootstrap sampling, apply the quantile function estimator to each element of X * to obtain U * = (U 1 * , U 2 * , … , U n * ) where U 1 * = Q(X 1 * ); (3) Calculate the statistic of interest S i (U * ); (4) Repeat steps 1-3 for B times in order to obtain the empirical bootstrap distribution for S(U), for B = 1,…,reB; (5) Calculate bootstrap parameter average value using and bootstrap variance as follows: where j = 1, rep; q = 1, rek ; (6) Next, construct a confidence interval for each conditional quantile parameter for the generic jth parameter and the qth quantile using the following formula: β ̅ j (τ q )  z α/2 SD (β j (τ q )), where SD (β j (τ q )) is a standard deviation of β j (τ q ) or the square root of bootstrap variance, V q,j .

Results
In this study, the model hypothesis was presented in the birth weight equation as follows: Birth weight i = β 1 Age i + β 2 Education (Middle) i + β 3 Education (High) i + β 4 Parity i + β 5 Last birth interval i + β 6 Weight gain i + β 7 Problems (One problem) i + β 8 Problems (No problem) i + β 9 Hb i + β 10 Rural i + β 11 Female i +e i ; Next, the model hypothesis was fitted to the birth weight data. After fitting, four indicator variables were found to indicate a statistically significant effect on the response. The variable "problems" were excluded from the model because they were not statistically significant in any of the constructed equations.     Next, we measured goodness of fit of the proposed models. Several studies have reported the use of the Pseudo-R 2 to indicate goodness of fit for each selected quantile. 6,17 Table  4 shows the corresponding Pseudo-R 2 values for each selected quantile using birth weight data as the response variable. The results shown in Table 4 indicate that the 0.40 th quantile is the best among all five nested models, as indicated by the highest Pseudo-R 2 value.
Although the Pseudo-R 2 values for all five lower quantiles were within an acceptable range (> 79%), this study also investigated the performance of quantile regression and its associated algorithm in recovering the true parameter. A simulation study was subsequently performed using a bootstrap approach. All 50 model fits were used to measure standard errors to calculate the 95% confidence interval of all parameters in this simulation study. The result of the bootstrap estimation method and the 95% bootstrap percentile intervals are shown in Table  5, which reveals that the quantile regression and bootstrap models yield almost identical parameter estimates. Additionally, all parameter estimates from the quantile regression method were within the 95% bootstrap percentile intervals, indicating that the parameters estimated for all selected quantiles in the proposed model were acceptable. Thus, we can conclude that the power of this study's quantile regression method yields the best fit for the proposed model. 18 We also examined the accuracy of the quantile estimation method to determine that it is unbiased. Table 6 presents the bias estimation results between the quantile and bootstrap estimation methods for each low quantile. Bias was calculated as the difference between quantile estimation and bootstrap estimation. The quantile estimation method was unbiased if the standard deviation of bias was less than the standard deviation of bootstrap distribution.

Discussion
This present study reports on a low birth weight statistical model constructed using a quantile regression approach. Although many studies have reported on models to determine low birth weight, few studies have used the quantile approach, particularly considering the mother's education level, the number of parities, last birth interval, mother's weight gain, Hb level, and the number of pregnancy problems.
These results reveal that the mother's education level, the number of parities, and the last birth interval significantly affected low birth weight. Furthermore, a validity test used the bootstrap resampling method, with results indicating acceptability of the proposed model since it yielded identical parameter estimates. All parameter estimates of quantile regression were within 95% bootstrap percentile intervals. Next, the accuracy of the quantile regression method was tested and determined to be unbiased. This study revealed that the standard deviation of bias (i.e., the difference between quantile estimation and bootstrap estimation) was less than the standard deviation of bootstrap, which means that the parameters estimated for all selected quantiles in this study are statistically acceptable.
More research using additional data samples (> 250) is necessary to achieve a better model since quantile regression itself requires a large data sample. 6 In addition, the Bayesian approach to quantile regression also could be implemented to overcome the need for a larger data sample since more data indicate more time and more money. The Bayesian method has the ability to estimate model parameters even using small data. 19

Conclusions
The quantile regression approach is based on its ability to enhance the understanding of the low birth weight model, where data with outliers are available. Quantile regression has the ability to overcome this problem since it can assess the association between independent variables and outcome in each conditional quantile, hence it is applicable for all data with low moderate, and high outlier values. The present study demonstrated that low, birth weight is significantly affected by mother's education level, the number of parities, and the last birth interval. This proposed model could be accepted based on validity tests using the bootstrap resampling method. All significant parameter estimates from the quantile regression were within 95% bootstrap percentile intervals.