Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Analysis Machine Learning and Applications Episode 2 Part 7 docx
MIỄN PHÍ
Số trang
25
Kích thước
765.7 KB
Định dạng
PDF
Lượt xem
1595

Data Analysis Machine Learning and Applications Episode 2 Part 7 docx

Nội dung xem thử

Mô tả chi tiết

Root Cause Analysis for Quality Management 409

root

P(Y 1)

P(Y 1 ∪ Y 2) P(Y 1 ∪ Y 3) P(Y 1 ∪ Y n)

P(Y 2) P(Y n−1)

P(Y n−1 ∪ Y n)

P(Y n)

Fig. 1. Organization of the used multitree data structure

to find a node (sub-process) with a higher support in the branch below. This reduces

the time to find the optimal solution significantly, as a good portion of the tree to

traverse, can be omitted.

Algorithm 1 Branch & Bound algorithm for process optimization

1: procedure TRAVERSETREE(Y¯)

2: Y := {sub-nodes of Y¯}

3: for all y ∈ Y do

4: if N(X|y) > nmax and Q(X|y) ≥ qmin then

5: nmax = N(X|y)

6: end if

7: if N(X|y) > nmax and Q(X|y) < qmin then

8: TraverseTree(y)

9: end if

10: end for

11: end procedure

In many real world applications, the influence domain is mixed, consisting of

discrete data and numerical variables. To enable a joint evaluation of both influence

types, the numerical data is transformed into nominal data by mapping the continu￾ous data onto pre-set quantiles. In most our applications, we chose 10%, 20%, 80%

and 90% quantile, as they performed the best.

Verification

The optimum of the problem (3) can only be defined in statistical terms, as in practice

the sample sets are small and the quality measures are only point estimators. There￾fore, confidence intervals have to be used in order to get a more valid statement of

the real value of the considered PCI. In the special case, where the underlying data

follows a normal distribution, it is straight forward to construct a confidence inter￾val. As the distribution of Cp

Cˆp (Cˆp denotes the estimator of Cp) is known, a (1−D)%

confidence interval for Cp is given by

410 Christian Manuel Strobel and Tomas Hrycej

C(X) =

⎣Cˆp

8

F2

n−1; D

2

n−1 ,Cˆp

8

F2

n−1;1− D

2

n−1

⎦ (6)

For the other parametric basic indices, in general there exits no analytical solution

as they all have a non-centralized F2 distribution. Different numerical approximation

can be found in literature for Cpm,Cpk and Cpmk (see Balamurali and Kalyanasun￾daram (2002) and Bissel (1989)).

If there is no possibility to make an assumption about the distribution of the

data, computer based, statistical methods as the Bootstrap method are used to calcu￾late a confidence intervals. In Balamurali and Kalyanasundaram (2002), the authors

present three different methods for calculating confidence intervals and a simulation

study. As result, the method called BCa-Method outperformed the other two meth￾ods, and therefore is used in our applications for assigning confidence intervals for

the non-parametric basic PCIs, as described in (3). For the Empirical Capability In￾dex Eci a simulation study showed that the Bootstrap-Standard-Method, as defined in

Balamurali and Kalyanasundaram (2002), performed the best. A (1-D)% confidence

interval for the Eci can be obtained by

C(X) = 

Eˆci −)−1(1−D)VB,Eˆci +)−1(1−D)VB

 (7)

where Eˆci denotes an estimator for Eci, VB the Bootstrap standard deviation and )−1

the inverse standard normal.

As the results of the introduced algorithm are based on sample sets, it is im￾portant to verify the soundness of the founded solutions. Therefore, the sample set

to analyze is to be randomly divided into two disjoint sets: training and test set. A

set of possible optimal sub-process is generated, by applying the describe algorithm

and the referenced Bootstrap-methods to calculate confidence intervals. In a second

step, the root cause analysis algorithm is applied to the test set. The final output is a

verified sub-process.

3 Computational results

A proof on concept was performed using data of a foundry plant and engine man￾ufacturing in the premium automotive industry. The 32 analyzed sample sets com￾prised measurement results describing geometric characteristics like the position of

drill holes or surface texture of the produced products and the corresponding influ￾ence sets. The data sets consist of 4 to 14 different values, specifying for example a

particular machine number or a workers name. An additional data set, recording the

results of a cylinder twist measurement having 76 influence variables, was used to

evaluated the algorithm for numerical parameter sets. Each of the analyzed data sets

has at least 500 and at most 1000 measurement results.

The evaluation was performed for the non-parametric Cp and the empirical ca￾pability index Eci using the describe Branch and Bound principle. Additionally a

Tải ngay đi em, còn do dự, trời tối mất!