### Efficiency measurement

#### Stochastic frontier analysis

Traditional production functions assume that producers obtain the greatest output with the fewest inputs; however, not all producers successfully overcome optimization issues [72]. To address this challenge, Aigner and Schmidt (1977) and Meeusen and Broeck (1977) [73, 74] developed the SFA model, which has had a major impact on productive econometric modeling and assessing the technical efficiency of firms. Later, scholars such as Battese [75, 76] refined the SFA model, mainly in terms of the treatment of inefficiencies and the application of panel data. Moreover, owing to the extensive research period in this paper, it is difficult to guarantee complete stability of individual efficiency. Furthermore, considering that the inefficient part of the HSE may change over time, we conducted our study based on the time-varying random-effects frontier model proposed by Battese et al. (1992) [77], which is given below.

$${Y}_{it}=f\left({x}_{it};\beta\,\right)\mathrm{exp}\left({v}_{it}-{u}_{it}\right),\,where\,i=\mathrm{1,2},\dots\,,n$$

(1)

In Eq. (1), \({Y}_{it}\) is the output; \(f\left({x}_{it};\beta \right)\) is the production function; \({x}_{i}\) is the input factor vector, which is the vector of its coefficients to be estimated; \({v}_{it}\) is used to represent the effect of statistical error and various stochastic factors on the frontier output; and \({u}_{it}\ge 0\) is a time-varying technical inefficiency term that measures the relative production efficiency level and is independent of \({v}_{it}\).

Further, we quantitatively describe the effect of the timing factor on the inefficiency term \({u}_{it}\).

$$\begin{array}{c}{u}_{it}=\beta \left(t\right)\times {u}_{i}, i=\mathrm{1,2},3,\dots ,n\\ \beta \left(t\right)=exp\left\{-\eta \times \left(t-T\right)\right\}\end{array}$$

(2)

In Eq. (2), \(T\) is the total periods used in the study, and \(\eta\) is a time-varying coefficient to be estimated, reflecting the magnitude of the rate of change in technical efficiency. If \(\eta <0\), \(\beta \left(t\right)\) increases with \(t\), and technical efficiency decreases; if \(\eta >0\), \(\beta \left(t\right)\) decreases with \(t\) and technical efficiency improves; and if \(\eta =0\), \(\beta \left(t\right)\) does not change over time, and the technical efficiency does not change either.

Finally, owing to technical inefficiency and random noise, it is challenging for producers to achieve the frontier level of the production function. A producer’s technical efficiency (TE) is expressed by the ratio of the expectation of the producer’s output in the sample to the expectation of the stochastic frontier, as shown in Eq. (3).

$${TE}_{it}=\frac{E\left[f\left({x}_{it},\beta\,\right)exp\left({v}_{it}-{u}_{it}\right)\right]}{E\left(f\left({x}_{it},\beta \right)exp\left({v}_{it}-{u}_{it}\right)|{u}_{it}=0\right)}=exp\left\{-{\mu\,}_{it}\right\},\,where\,i=\mathrm{1,2},3,\dots\,,n$$

(3)

#### Production function setting for the SFA model

Following the study of Coelli et al. [78], we specify two forms of production function in the SFA model (i.e., Cobb-Douglass or Translog functional forms). However, researchers have different views on the usage of these two functions [79, 80].

First, the logarithmic expression of the Cobb–Douglas functional form is as follows:

$$Ln{Y}_{it}={\beta }_{0}+\sum_{j=1}^{N}{\beta\,}_{i}Ln{x}_{j,it}+{v}_{it}-{\mu }_{it}$$

(4)

In Eq. (4), \({Y}_{it}\) represents the output of healthcare services for \(i\) DMUs in period \(t\); \(j\) represents the number of independent variables; \({x}_{j,it}\) indicates the corresponding input level of the ith decision unit at time \(t\); \({\beta }_{0}\) is the constant term intercept; \({\beta }_{i}\) is the parameter to be estimated; and \({v}_{it}\) and \({u}_{it}\) are the same as in Eq. (1).

Second, the Translog function provides a second-order approximation that is a flexible functional form. Thus, the Translog function is also known as the Cobb–Douglas extended form, and the expression is as shown below.

$$Ln{Y}_{it}={\beta\,}_{0}+\sum_{j=1}^{N}{\beta }_{i}Ln{x}_{j,it}+\frac{1}{2}{\sum }_{j=1}^{N}{\sum }_{h=1}^{N}{\beta }_{jh}Ln{x}_{j,it}Ln{x}_{h,it}+{v}_{it}-{\mu }_{it}$$

(5)

Finally, we adjusted the form of Cobb–Douglas according to the current research purpose; the Cobb–Douglas production function of China’s healthcare services can be expressed as below.

$$Ln{HSO}_{it}={\beta }_{0}+{\beta }_{1}Ln{L}_{it}+{\beta }_{2}Ln{K}_{it}+{v}_{it}-{\mu }_{it}, where i=\mathrm{1,2.3},\dots n$$

(6)

In Eq. (6), \({HSO}_{it}\) expresses the healthcare service output of the ith decision unit at time \(t\); \({L}_{it}\) denotes the corresponding labor input of the ith decision unit at time \(t\). \({K}_{it}\) denotes the corresponding capital investment of the ith decision unit at time \(t\). \({\beta }_{0}\) is the constant term intercept; \({\beta }_{1} \mathrm{and }{\beta }_{2}\) are the parameters to be estimated; and \({v}_{it}\) and \({u}_{it}\) are the same as in Eq. (1).

In addition, if we extend the logarithmic of the Cobb–Douglas functional form, we obtain the logarithmic of the Translog functional form for China’s healthcare services:

$$\begin{array}{c}Ln{HSO}_{it}={\beta }_{0}+{\beta }_{1}Ln{L}_{it}+{\beta }_{2}Ln{K}_{it}+\frac{1}{2}{\beta }_{3}{\left(Ln{L}_{it}\right)}^{2}+\frac{1}{2}{\beta }_{4}{\left(Ln{K}_{it}\right)}^{2}\\ +{\beta }_{5}Ln{K}_{it}Ln{L}_{it}+{v}_{it}-{\mu }_{it}\end{array}$$

(7)

In Eq. (7), \({\beta }_{0}\) is the constant term intercept; \({\beta }_{1},{\beta }_{2},{\beta }_{3},{\beta }_{4}\), and \({\beta }_{5}\) are the parameters to be estimated; and the other variables have the same meaning as in Eq. (6).

Notably, the Cobb–Douglas function has the advantage of having a simple model form with few parameters for easy estimation. However, it has the disadvantage of a fixed and constant factor elasticity of substitution. Although the Translog function overcomes this shortcoming, the Translog function is not necessarily superior to the Cobb–Douglas function. Thus, we followed the study of Kumbhakar et al. [81] to select the production function suitable for this study. The specific test steps are as follows:

$$LR=-2\left[Ln\left({H}_{0}\right)-Ln\left({H}_{1}\right)\right]$$

(8)

In Eq. (8), the original assumption is \({H}_{0}: {\beta }_{3}={\beta }_{4}={\beta }_{5}=0\) (i.e., the production function is assumed to be of the Cobb–Douglas form). Alternative assumption is \({H}_{1}:{\beta }_{3},{\beta }_{4},{\beta }_{5}\ne 0\) (i.e., the production function is assumed to be of the Translog form). \(Ln\left({H}_{0}\right)\) represents the value of the log-likelihood function for the original hypothesis \({H}_{0}\). \(Ln\left({H}_{1}\right)\) represents the value of the log-likelihood function of the alternative hypothesis \({H}_{1}\). Furthermore, the statistic obeys a mixed \({\lambda }^{2}\) distribution, provided the original hypothesis holds.

#### Parametric tests of the production function for the SFA model

To find suitable production function forms for the SFA model, first, we examined which functional form is most suitable for this study using Eq. (8). Second, we examined the distribution of technical inefficiency in the production function by utilizing the null hypothesis \({H}_{0}:\mu =0\). This null hypothesis assumes that healthcare providers are all operating on the technical efficiency frontier and that there are no asymmetric and random technical efficiencies in the inefficiency effects (i.e., technical inefficiencies have a half-normal distribution). If \(\mu \ne 0\), then the technical inefficiency has a truncated normal distribution. Finally, we verified the presence of inefficiency by calculating the value of \(\gamma\). The specific formula is as follows:

$$\gamma =\frac{{\delta }_{\mu }^{2}}{{\delta }_{\mu }^{2}+{\delta }_{v}^{2}}$$

(9)

In Eq. (9), \(\gamma\) represents the magnitude of the variance of the inefficiency term; \({\delta }_{\mu }^{2}\) and \({\delta }_{v}^{2}\) represent the variances of the inefficiency and random shock terms, respectively. If \(\gamma\) is close to 0, the statistical noise is perfectly correlated with the production variance. If \(\gamma\) is close to 1, a significant portion of the error term comes from technical inefficiencies.

#### Malmquist index

The SFA model cannot capture the dynamic sources of variation in efficiency since it can only evaluate the static efficiency of each decision unit. Therefore, to examine the total factor productivity change in healthcare services in China, we used the Malmquist index (MI) model.

Total factor productivity (TFP) is a measure of the efficiency with which multiple inputs (such as labor and capital) are combined to produce output in a production process. It captures the portion of output growth that cannot be explained by changes in the quantities of inputs used [82]. Caves constructed the Malmquist productivity index in 1982 [71]. It is assumed that in each period \(t=\mathrm{1,2},3,\dots ,T\), the production unit transforms the input \({e}^{t}\) into the output \({p}^{t}\) according to the production technology \({S}^{t}\). Then, the Malmquist index can be expressed by the distance function [83], i.e., \({D}_{O}^{t}=({p}^{t},{e}^{t})\). Based on this, the Malmquist index in period \(t\) is defined below.

$${M}_{O}^{t}=\frac{{D}_{O}^{t}=({p}^{t+1},{e}^{t+1})}{{D}_{O}^{t}=({p}^{t},{e}^{t})}$$

(10)

Similarly, the Malmquist based on period \(\text{t+1}\) can be expressed as below.

$${M}_{O}^{t+1}=\frac{{D}_{O}^{t+1}=({p}^{t+1},{e}^{t+1})}{{D}_{O}^{t+1}=({p}^{t},{e}^{t})}$$

(11)

According to Grosskopf et al. [84] and Färe et al. [85], the Malmquist index used in this study is the geometric mean of the two indices mentioned above, which avoids arbitrariness in the choice of the base period. The formula is shown as follows:

$${Malmquist\,Index=M}_{O}^{t}\left({p}^{t+1},{e}^{t+1},{p}^{t},{e}^{t}\right)\,{ = \left[\left(\frac{{D}_{O}^{t}=\left({p}^{t+1},{e}^{t+1}\right)}{{D}_{O}^{t}=\left({p}^{t},{e}^{t}\right)}\right)\left(\frac{{D}_{O}^{t+1}=\left({p}^{t+1},{e}^{t+1}\right)}{{D}_{O}^{t+1}=\left({p}^{t},{e}^{t}\right)}\right)\right]}^\frac{1}{2}$$

(12)

In Eq. (12), if \(\text{MI > 1}\), it denotes a relative growth in TFP. If \(\text{MI < 1}\), it denotes a relative decrease in TFP. If \(\text{MI = 1}\), it indicates no change in TFP.

Accordingly, the Malmquist index can be divided into the two components below.

$$\begin{array}{cc}Malmquist Index&\,\begin{array}{cc}=&\,{M}_{o}^{t} \left({p}^{t+1},{e}^{t+1},{p}^{t},{e}^{t}\right)\end{array}\\ &\,\begin{array}{cc}=&\,\frac{{D}_{O}^{t+1}\left({p}^{t+1},{e}^{t+1}\right)}{{D}_{O}^{t+1}\left({p}^{t},{e}^{t}\right)}{\left[\left(\frac{{D}_{O}^{t}\left({p}^{t+1},{e}^{t+1}\right)}{{D}_{O}^{t+1}\left({p}^{t+1},{e}^{t+1}\right)}\right)\left(\frac{{D}_{O}^{t}\left({p}^{t},{e}^{t}\right)}{{D}_{O}^{t+1}\left({p}^{t},{e}^{t}\right)}\right)\right]}^\frac{1}{2}\end{array}\\\,&\,\begin{array}{cc}=&\,techch\times\,effch=techch\times\,\left(pech\times sech\right)\end{array}\end{array}$$

(13)

In Eq. (13), the MI decomposes the TFP change (tfpch) into two components: technical change (techch) and technical efficiency change (effch) [86]. effch denotes the level of improvement resulting from the technological innovation that occurred between period \({\text{t}}\) and period \(\text{t+1}\) [87]. techch is used to evaluate the impact of changes on production fronts as they are sent. Further, effch comprises two components: pure efficiency (pech) and scale efficiency (sech). Pech estimates the managerial efficiency of DMUs, while sech estimates the appropriateness of the scale of DMUs.

### Time-evolution analysis

#### Kernel density estimation

To reflect the information on the distribution pattern and extension of the HSE in China, we used kernel density for estimation in this paper. Kernel density estimation is a nonparametric approach for kernel estimation. In this method, discrete variables are connected by smooth curves, while the distribution of random variables is represented by sequential of density curves [88, 89]. The basic principle is to assume that the random variables \({X}_{1},{X}_{2},\dots ,{X}_{N}\) are independently and identically distributed. Their density functions are denoted by \(f\left(x\right)\), and the kernel density is estimated as shown below.

$$f\left(x\right)=\frac{1}{nh}\sum_{i=1}^{n}K\left(\frac{{X}_{i}-x}{h}\right)$$

(14)

$$K\left(x\right)=\frac{1}{\sqrt{2\pi }}exp\left(-\frac{{x}^{2}}{2}\right)$$

(15)

In Eq. (14), \(n\) is the number of Chinese provinces studied in this work; \({X}_{i}\) is the observed value, i.e., the HSE of each province, and \(x\) is the average of the observed values; \(K\left(\bullet \right)\) is a Gaussian kernel function, which can be represented by Eq. (15); and \(h\) is the bandwidth, which determines how accurately the kernel density curve is estimated. In this paper, by referring to Sliverman [90] approach, we set the bandwidth as below.

$$h=1.06\delta {N}^{-\frac{1}{5}}$$

(16)

In Eq. (16), \(\delta\) is estimated by \(min\left\{s,Q/1.34\right\}\), \(s\) is the sample standard deviation, and \(Q\) is the interquartile spacing.

#### Markov chain

While kernel density estimation can capture the distribution pattern of HSE over time as a whole, it does not deeply reflect the intrinsic dynamic trends and characteristics of the distribution. To address this challenge, a possible approach is to explore the internal dynamic trends of HSE in each region by estimating the transition probability matrix associated with Markov chains [91].

Markov chains are models of stochastic processes proposed by the Russian mathematician Markov. Markov chains treat random variables as discrete states and can be used to examine the mobility of the internal distribution between levels by calculating the probability of shifting the variables up or down a level after a time change through maximum likelihood estimation [92]. Thus, we adopted a Markov chain model to construct the state transfer probability matrix to meticulously grasp the relative state changes between high and low Chinese HSE at a specified time, as well as the possibility of changes. The specific steps are as follows:

A Markov chain behaves as a stochastic process \(\left\{X\left(t\right),t\in T\right\}\), where the set \(T\) of indices in this stochastic process corresponds to the individual periods. Then, for all periods \(t\) and all possible states \(i\), \(j\), they satisfy the condition of Eq. (17).

Equation (17) indicates that the state of the random variable \(X\) at period \(t-1\) determines the probability of it being in state \(j\) during period \(t\).

$$\begin{array}{c}P=\left\{X\left(t\right)=\left.j\right|X\left(t-1\right)=i,X\left(t-2\right)={i}_{t-2},\dots ,X\left(0\right)={i}_{0}\right\}\\ =X\left(t\right)=\left.j\right| X\left(t-1\right)=i\end{array}$$

(17)

Let \(X\left(t\right)=j\); in other words, \(j\) represents the state in period \(t\), while \({P}_{ij}\) denotes the probability of all the transitional probabilities. Here, we classify the HSE of provinces into four types based on the quartiles (0.25, 0.5, and 0.75 division points), with the types indicated as \(K=1, \mathrm{2,3},4\). The traditional Markov dynamic transformation probability matrix is illustrated in Table 1.

In Table 1, \({P}_{ij}\) means the state transfer probability that the HSE in period \(t\) is of state \(i\) and shifts to state \(j\) in period \(t+1\). It can be expressed as \({P}_{ij}\)=\({n}_{ij}/{n}_{i}\), where \({n}_{ij}\) indicates the summation of the number of districts in state \(i\) in period \(t\), and the count of districts transfers to state \(j\) in period \(t+1\); \({n}_{i}\) means the summation of the number of districts with state \(j\) in all years. Then, we can construct a \(4\times 4\) transfer probability matrix by categorizing the HSE into four kinds. Thus, we can analyze the dynamic transition trend and evolutionary pattern of the HSE in China and identify whether the dynamic transition trend is upward, downward, or unchanged.

Note that the traditional Markov chains still do not consider the effect of geographic and spatial factors on HSE transfer in China. Thus, to further analyze the spatial transfer trends of the HSE in 31 provinces in China, in addition to adding spatial lags to the traditional Markov transfer chain, we also constructed a spatial Markov transfer matrix using the distance between neighboring provinces as the weight of spatial lags to explore the relationship between the dynamic transition probabilities of HSE in the province and neighboring provinces. The specific step is to include the corresponding spatial lag term in the traditional Markov dynamic transformation probability matrix \((K\times K)\) and expand it to the spatial Markov dynamic transformation probability matrix \((K\times K\dots \times K)\) [93]. As shown in Table 2, we extended the traditional Markov dynamic transformation probability matrix \((4\times 4)\) of this study to a spatial Markov dynamic transformation probability matrix \((4\times 4\times 4\times 4)\), which presents the probabilities of provinces transitioning from state \(i\) to state \(j\) over the study period, considering the spatial lag terms \(\mathrm{I},\mathrm{II},\mathrm{III},\) and \(\mathrm{IV}\).

### Spatial-evolution analysis

#### Exploratory spatial data analysis (ESDA)

ESDA is a spatial econometric methodology for studying the associations of a phenomenon in geospatial space with the values of attributes in neighboring elements [94]. As shown in Table 3, ESDA primarily consists of the global spatial autocorrelation analysis and local spatial autocorrelation analysis, which are analyses of possible interdependencies and correlations between certain observations in a specific range [95]. Thus, this study applied the ESDA model to comprehensively reveal the overall spatial autocorrelation and internal agglomeration features of the HSE in China.

#### Global autocorrelation

The global Moran’s I index was used to check for correlations in the HSE in each province before doing the spatial correlation study. Moran’s I index is defined as follows:

$$\mathrm M\mathrm o\mathrm r\mathrm a\mathrm n’\mathrm s\,\mathrm I=\frac{\sum_{i=1}^n\sum_{j=1}^nW_{ij}\left(y_i-\overline y\right)\left(y_j-\overline y\right)}{\sum_{i=1}^n\left(y_i-\overline y\right)^2}\times\frac n{\sum_{i=1}^n\sum_{j=1}^nW_{ij}}$$

(18)

In Eq. (18), \(n\) is the subject of the study (i.e., the overall number of provinces); \({y}_{i}\) and \({y}_{j}\) represent the HSE of the \(i\) province and \(j\) province, respectively; \(\overline{y }\) is the mean value of \({y}_{i}\) and \({y}_{j}\), and \({W}_{ij}\) is the neighboring space weight matrix. Moran’s I index takes values in the range of \(\left[-\mathrm{1,1}\right]\). If the variable is positively correlated in space (Moran’s I > 0), provinces with higher (or lower) levels of HSE are more spatially clustered; if the variable is negatively correlated in space (Moran’s I < 0), there are spatial differences in the HSE between each province and its neighboring provinces. If the variable is spatially constant (Moran’s I = 0), then each province’s HSE randomly distributes in each province in space [96].

#### Local autocorrelation

Local spatial autocorrelation analysis was used to analyze the spatial correlation between each spatial object and its neighboring units in a specific region to reflect the local characteristic differences in the distribution of spatial objects. It can overcome the global spatial autocorrelation analysis, which only considers a single value to describe the degree of spatial correlation between the items under study and disregards any possible instability defects in the local area [97]. Therefore, the local Moran’s I reveals the spatial clustering state of efficiency between a province and its neighboring provinces. Among them, the H–H type represents the high-value region of the HSE surrounded by high-value adjacent units; the H–L type represents the high-value region surrounded by low-value adjacent units; the L-L type represents the low-value region surrounded by low-value adjacent units; and the L–H type represents the low-value region surrounded by high-value adjacent units.

### Indicator selection

#### Health service input indicators

Most investments in healthcare services are measured in terms of human, financial, and material resources [98]. However, Grossman (1972) brought up that the financial investment in health services frequently covers the investment in hiring different types of technicians and the investment in purchasing medical equipment, which can cause double counting issues with the current input indicators. As a result, the input indicators only calculate human and material capital input. In related studies, the number of health personnel, number of healthcare technicians, number of registered nurses, and number of licensed (assistant) doctors have usually been selected as inputs of human capital [99,100,101], while the number of beds, number of health facilities, and more than 10,000 yuan of equipment have been used as inputs for material resources [102, 103]. Therefore, this study selected the number of health technicians per 1,000 population and the number of beds in medical and health institutions per 1,000 population as human capital input and material capital input, respectively.

#### Health service output indicators

In terms of output indicators, owing to the complexity and particularity of the medical and health industry, its output is usually measured by disease cures and improvement of people’s health, but these are difficult to quantify. Associated studies have used the number of emergency and outpatient visits, bed utilization, admissions (discharges), and surgical inpatient visits as quantitative indicators of the supply of healthcare services to reflect the output of healthcare resources [104,105,106]. As a result, we chose the number of outpatient and emergency department visits, hospital discharges, and surgical procedures performed in the hospital as output indicators. Since the standard SFA model is limited to a single output [107, 108], it is necessary to aggregate the number of outpatients and emergency department visits, hospital discharges, and surgical procedures performed in the hospital into a single variable. Thus, referring to Xu et al. [109], principal component analysis was used to logarithmically transform and weight the three indicators to create the output index, as each of the output indicators has its own bias, thus maximizing the amount of information that could be found in the output indicators.

Furthermore, to control the effect of heteroskedasticity and ensure the smoothness of the data, all variables in this study were logarithmically transformed. The specific indicators and descriptive statistics of the input and output variables are presented in Table 4. The table shows that, from 2010 to 2020, the number of health technicians per 1,000 population and the number of beds in medical and health institutions per 1,000 population in China trended upward each year. The output index after transformation with principal component analysis and the number of outpatient and emergency department visits, hospital discharges, and surgical procedures performed in the hospital increased every year from 2010 to 2019, then dropped slowly from 2019 to 2020.

#### Data source

The study’s data cover 31 provinces in China from 2010 to 2020, excluding Taiwan, Hong Kong, and Macao. We divided the provinces into regions based on geographical differences: eastern, central, and western (shown in Fig. 1). The eastern region includes Beijing, Tianjin, Hebei, Liaoning, Shanghai, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong, and Hainan. The central part includes Shanxi, Jilin, Heilongjiang, Anhui, Jiangxi, Henan, Hubei, and Hunan. The western area includes Inner Mongolia, Chongqing, Guangxi, Sichuan, Guizhou, Yunnan, Tibet, Shaanxi, Gansu, Qinghai, Ningxia, and Xinjiang. Moreover, the data are from the “China Statistical Yearbook” and “China Health Statistical Yearbook”.

## Add Comment