Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Analysis Machine Learning and Applications Episode 2 Part 7 docx
Nội dung xem thử
Mô tả chi tiết
Root Cause Analysis for Quality Management 409
root
P(Y 1)
P(Y 1 ∪ Y 2) P(Y 1 ∪ Y 3) P(Y 1 ∪ Y n)
P(Y 2) P(Y n−1)
P(Y n−1 ∪ Y n)
P(Y n)
Fig. 1. Organization of the used multitree data structure
to find a node (sub-process) with a higher support in the branch below. This reduces
the time to find the optimal solution significantly, as a good portion of the tree to
traverse, can be omitted.
Algorithm 1 Branch & Bound algorithm for process optimization
1: procedure TRAVERSETREE(Y¯)
2: Y := {sub-nodes of Y¯}
3: for all y ∈ Y do
4: if N(X|y) > nmax and Q(X|y) ≥ qmin then
5: nmax = N(X|y)
6: end if
7: if N(X|y) > nmax and Q(X|y) < qmin then
8: TraverseTree(y)
9: end if
10: end for
11: end procedure
In many real world applications, the influence domain is mixed, consisting of
discrete data and numerical variables. To enable a joint evaluation of both influence
types, the numerical data is transformed into nominal data by mapping the continuous data onto pre-set quantiles. In most our applications, we chose 10%, 20%, 80%
and 90% quantile, as they performed the best.
Verification
The optimum of the problem (3) can only be defined in statistical terms, as in practice
the sample sets are small and the quality measures are only point estimators. Therefore, confidence intervals have to be used in order to get a more valid statement of
the real value of the considered PCI. In the special case, where the underlying data
follows a normal distribution, it is straight forward to construct a confidence interval. As the distribution of Cp
Cˆp (Cˆp denotes the estimator of Cp) is known, a (1−D)%
confidence interval for Cp is given by
410 Christian Manuel Strobel and Tomas Hrycej
C(X) =
⎡
⎣Cˆp
8
F2
n−1; D
2
n−1 ,Cˆp
8
F2
n−1;1− D
2
n−1
⎤
⎦ (6)
For the other parametric basic indices, in general there exits no analytical solution
as they all have a non-centralized F2 distribution. Different numerical approximation
can be found in literature for Cpm,Cpk and Cpmk (see Balamurali and Kalyanasundaram (2002) and Bissel (1989)).
If there is no possibility to make an assumption about the distribution of the
data, computer based, statistical methods as the Bootstrap method are used to calculate a confidence intervals. In Balamurali and Kalyanasundaram (2002), the authors
present three different methods for calculating confidence intervals and a simulation
study. As result, the method called BCa-Method outperformed the other two methods, and therefore is used in our applications for assigning confidence intervals for
the non-parametric basic PCIs, as described in (3). For the Empirical Capability Index Eci a simulation study showed that the Bootstrap-Standard-Method, as defined in
Balamurali and Kalyanasundaram (2002), performed the best. A (1-D)% confidence
interval for the Eci can be obtained by
C(X) =
Eˆci −)−1(1−D)VB,Eˆci +)−1(1−D)VB
(7)
where Eˆci denotes an estimator for Eci, VB the Bootstrap standard deviation and )−1
the inverse standard normal.
As the results of the introduced algorithm are based on sample sets, it is important to verify the soundness of the founded solutions. Therefore, the sample set
to analyze is to be randomly divided into two disjoint sets: training and test set. A
set of possible optimal sub-process is generated, by applying the describe algorithm
and the referenced Bootstrap-methods to calculate confidence intervals. In a second
step, the root cause analysis algorithm is applied to the test set. The final output is a
verified sub-process.
3 Computational results
A proof on concept was performed using data of a foundry plant and engine manufacturing in the premium automotive industry. The 32 analyzed sample sets comprised measurement results describing geometric characteristics like the position of
drill holes or surface texture of the produced products and the corresponding influence sets. The data sets consist of 4 to 14 different values, specifying for example a
particular machine number or a workers name. An additional data set, recording the
results of a cylinder twist measurement having 76 influence variables, was used to
evaluated the algorithm for numerical parameter sets. Each of the analyzed data sets
has at least 500 and at most 1000 measurement results.
The evaluation was performed for the non-parametric Cp and the empirical capability index Eci using the describe Branch and Bound principle. Additionally a