Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Analysis Machine Learning and Applications Episode 1 Part 4 pptx
MIỄN PHÍ
Số trang
25
Kích thước
365.7 KB
Định dạng
PDF
Lượt xem
1838

Data Analysis Machine Learning and Applications Episode 1 Part 4 pptx

Nội dung xem thử

Mô tả chi tiết

22 Eugeniusz Gatnar

KW = 1

NM2

N

i=1

L(xi)(M −L(xi)). (11)

Also Dietterich (2000) proposed the measure to assess the level of agreement

between classifiers. It is the kappa statistics:

N = 1−

1

M

N

i=1 L(xi)(M −L(xi))

N(M −1)p¯(1− p¯) . (12)

Hansen and Salamon (1990) introduced the measure of difficulty T. It is simply

the variance of the random variable Z = L(x)/M:

T = Var(Z). (13)

Two measures of diversity have been proposed by Partridge and Krzanowski

(1997) for evaluation of the software diversity. The first one is the generalized di￾versity measure:

GD = 1− p(2)

p(1)

, (14)

where p(k) is the probability that k randomly chosen classifiers will fail on the ob￾servation x. The second measure is named coincident failure diversity:

CFD =

 0 where p0 = 1

1

1−p0

M

m=1

M−m M−1 pm where p0 < 1 , (15)

where pm is the probability that exactly m out of M classifiers will fail on an obser￾vation x.

4 Combination rules

Once we have produced the set of individual classifiers of desired level of diversity,

we combine their predictions to amplify their correct decisions and cancel out the

wrong ones. The combination function F in (1) depends on the type of the classifier

outputs.

There are three different forms of classifier output. The classifier can produce a

single class label (abstract level), rank the class labels according to their posterior

probabilities (rank level), or produce a vector of posterior probabilities for classes

(measurement level).

Majority voting is the most popular combination rule for class labels1:

Cˆ∗(x) = argmaxj



M

m=1

I(Cˆ

m(x) = lj)



. (16)

1 In the R statistical environment we obtain class labels using the command

predict(...,type="class").

Fusion of Multiple Statistical Classifiers 23

It can be proved that it is optimal if the number of classifiers is odd, they have the

same accuracy, and the classifier’s outputs are independent. If we have evidence that

certain models are more accurate than others, weighing the individual predictions

may improve the overall performance of the ensemble.

Behavior Knowledge Space developed by Huang and Suen (1995) uses a look-up

table that keeps track of how often each class combination is produced by the clas￾sifiers during training. Then, during testing, the winner class is the most frequently

observed class in the BKS table for the combination of class labels produced by the

set of classifiers.

Wernecke (1992) proposed a method similar to BKS, that uses the look-up table

with 95% confidence intervals of the class frequencies. If the intervals overlap, the

least wrong classifier gives the class label.

Naive Bayes combination introduced by Domingos and Pazzani (1997) also

needs training to estimate the prior and posterior probabilities:

sj(x) = P(lj)



M

m=1

P(Cˆ

m(x)|lj). (17)

Finally, the class with the highest value of sj(x) is chosen as the ensemble prediction.

On the measurement level, each classifier produces a vector of posterior probabil￾ities2 Cˆ

m(x)=[cm1(x), cm2(x),...,cmJ (x)]. And combining predictions of all models,

we have a matrix called decision profile for an instance x:

DP(x) =

c11(x) c12(x) ... c1J (x)

... ... ... ...

cM1(x) cM2(x) ... cMJ (x)

⎦ (18)

Based on the decision profile we calculate the support for each class (sj(x)), and

the final prediction of the ensemble is the class with the highest support:

Cˆ∗(x) = argmaxj



sj(x)



. (19)

The most commonly used is the average (mean) rule:

sj(x) = 1

M

M

m=1

cm j(x). (20)

There are also other algebraic rules that calculate median, maximum, minimum and

product of posterior probabilities for the j-th class. For example, the product rule is:

sj(x) = 1

M



M

m=1

cm j(x). (21)

Kuncheva et al. (2001) proposed a combination method based on Decision Tem￾plates, that are averaged decision profiles for each class (DTj). Given an instance x,

2 We use the command predict(...,type="prob").

Tải ngay đi em, còn do dự, trời tối mất!