Data Analysis Machine Learning and Applications Episode 2 Part 7 docx

Root Cause Analysis for Quality Management 409

root

P(Y 1)

P(Y 1 ∪ Y 2) P(Y 1 ∪ Y 3) P(Y 1 ∪ Y n)

P(Y 2) P(Y n−1)

P(Y n−1 ∪ Y n)

P(Y n)

Fig. 1. Organization of the used multitree data structure

to find a node (sub-process) with a higher support in the branch below. This reduces

the time to find the optimal solution significantly, as a good portion of the tree to

traverse, can be omitted.

Algorithm 1 Branch & Bound algorithm for process optimization

1: procedure TRAVERSETREE(Y¯)

2: Y := {sub-nodes of Y¯}

3: for all y ∈ Y do

4: if N(X|y) > nmax and Q(X|y) ≥ qmin then

5: nmax = N(X|y)

6: end if

7: if N(X|y) > nmax and Q(X|y) < qmin then

8: TraverseTree(y)

9: end if

10: end for

11: end procedure

In many real world applications, the influence domain is mixed, consisting of

discrete data and numerical variables. To enable a joint evaluation of both influence

types, the numerical data is transformed into nominal data by mapping the continuous data onto pre-set quantiles. In most our applications, we chose 10%, 20%, 80%

and 90% quantile, as they performed the best.

Verification

The optimum of the problem (3) can only be defined in statistical terms, as in practice

the sample sets are small and the quality measures are only point estimators. Therefore, confidence intervals have to be used in order to get a more valid statement of

the real value of the considered PCI. In the special case, where the underlying data

follows a normal distribution, it is straight forward to construct a confidence interval. As the distribution of Cp

Cˆp (Cˆp denotes the estimator of Cp) is known, a (1−D)%

confidence interval for Cp is given by

410 Christian Manuel Strobel and Tomas Hrycej

C(X) =

⎡

⎣Cˆp

n−1; D

n−1 ,Cˆp

n−1;1− D

n−1

⎤

⎦ (6)

For the other parametric basic indices, in general there exits no analytical solution

as they all have a non-centralized F2 distribution. Different numerical approximation

can be found in literature for Cpm,Cpk and Cpmk (see Balamurali and Kalyanasundaram (2002) and Bissel (1989)).

If there is no possibility to make an assumption about the distribution of the

data, computer based, statistical methods as the Bootstrap method are used to calculate a confidence intervals. In Balamurali and Kalyanasundaram (2002), the authors

present three different methods for calculating confidence intervals and a simulation

study. As result, the method called BCa-Method outperformed the other two methods, and therefore is used in our applications for assigning confidence intervals for

the non-parametric basic PCIs, as described in (3). For the Empirical Capability Index Eci a simulation study showed that the Bootstrap-Standard-Method, as defined in

Balamurali and Kalyanasundaram (2002), performed the best. A (1-D)% confidence

interval for the Eci can be obtained by

C(X) =

Eˆci −)−1(1−D)VB,Eˆci +)−1(1−D)VB

(7)

where Eˆci denotes an estimator for Eci, VB the Bootstrap standard deviation and )−1

the inverse standard normal.

As the results of the introduced algorithm are based on sample sets, it is important to verify the soundness of the founded solutions. Therefore, the sample set

to analyze is to be randomly divided into two disjoint sets: training and test set. A

set of possible optimal sub-process is generated, by applying the describe algorithm

and the referenced Bootstrap-methods to calculate confidence intervals. In a second

step, the root cause analysis algorithm is applied to the test set. The final output is a

verified sub-process.

3 Computational results

A proof on concept was performed using data of a foundry plant and engine manufacturing in the premium automotive industry. The 32 analyzed sample sets comprised measurement results describing geometric characteristics like the position of

drill holes or surface texture of the produced products and the corresponding influence sets. The data sets consist of 4 to 14 different values, specifying for example a

particular machine number or a workers name. An additional data set, recording the

results of a cylinder twist measurement having 76 influence variables, was used to

evaluated the algorithm for numerical parameter sets. Each of the analyzed data sets

has at least 500 and at most 1000 measurement results.

The evaluation was performed for the non-parametric Cp and the empirical capability index Eci using the describe Branch and Bound principle. Additionally a

Thư viện tri thức trực tuyến

Data Analysis Machine Learning and Applications Episode 2 Part 7 docx

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Data Analysis Machine Learning and Applications Episode 3 Part 9 docx

Data Analysis Machine Learning and Applications Episode 1 Part 4 pptx

Data Analysis Machine Learning and Applications Episode 2 Part 10 docx

Data Analysis Machine Learning and Applications Episode 1 Part 6 docx

Data Analysis Machine Learning and Applications Episode 2 Part 4 doc

Data Analysis Machine Learning and Applications Episode 1 Part 2 potx