Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

Trang chủ

Đăng nhập

Đăng ký

Mới

Đăng ký tài khoản mới

AI Tư vấn

Mới

Trợ lý thông minh tìm tài liệu

Liên hệ fanpage

Hỗ trợ tìm tài liệu

Lưu trang

Liên hệ fanpage

Data Analysis Machine Learning and Applications Episode 1 Part 4 pptx

MIỄN PHÍ

Số trang

Kích thước

365.7 KB

Định dạng

PDF

Lượt xem

777

Tài liệu đang bị lỗi

File tài liệu này hiện đang bị hỏng, chúng tôi đang cố gắng khắc phục.

Data Analysis Machine Learning and Applications Episode 1 Part 4 pptx

Nội dung xem thử

Mô tả chi tiết

22 Eugeniusz Gatnar

KW = 1

NM2

i=1

L(xi)(M −L(xi)). (11)

Also Dietterich (2000) proposed the measure to assess the level of agreement

between classifiers. It is the kappa statistics:

N = 1−

i=1 L(xi)(M −L(xi))

N(M −1)p¯(1− p¯) . (12)

Hansen and Salamon (1990) introduced the measure of difficulty T. It is simply

the variance of the random variable Z = L(x)/M:

T = Var(Z). (13)

Two measures of diversity have been proposed by Partridge and Krzanowski

(1997) for evaluation of the software diversity. The first one is the generalized diversity measure:

GD = 1− p(2)

p(1)

, (14)

where p(k) is the probability that k randomly chosen classifiers will fail on the observation x. The second measure is named coincident failure diversity:

CFD =

0 where p0 = 1

1−p0

m=1

M−m M−1 pm where p0 < 1 , (15)

where pm is the probability that exactly m out of M classifiers will fail on an observation x.

4 Combination rules

Once we have produced the set of individual classifiers of desired level of diversity,

we combine their predictions to amplify their correct decisions and cancel out the

wrong ones. The combination function F in (1) depends on the type of the classifier

outputs.

There are three different forms of classifier output. The classifier can produce a

single class label (abstract level), rank the class labels according to their posterior

probabilities (rank level), or produce a vector of posterior probabilities for classes

(measurement level).

Majority voting is the most popular combination rule for class labels1:

Cˆ∗(x) = argmaxj

m=1

I(Cˆ

m(x) = lj)

. (16)

1 In the R statistical environment we obtain class labels using the command

predict(...,type="class").

Fusion of Multiple Statistical Classifiers 23

It can be proved that it is optimal if the number of classifiers is odd, they have the

same accuracy, and the classifier’s outputs are independent. If we have evidence that

certain models are more accurate than others, weighing the individual predictions

may improve the overall performance of the ensemble.

Behavior Knowledge Space developed by Huang and Suen (1995) uses a look-up

table that keeps track of how often each class combination is produced by the classifiers during training. Then, during testing, the winner class is the most frequently

observed class in the BKS table for the combination of class labels produced by the

set of classifiers.

Wernecke (1992) proposed a method similar to BKS, that uses the look-up table

with 95% confidence intervals of the class frequencies. If the intervals overlap, the

least wrong classifier gives the class label.

Naive Bayes combination introduced by Domingos and Pazzani (1997) also

needs training to estimate the prior and posterior probabilities:

sj(x) = P(lj)

m=1

P(Cˆ

m(x)|lj). (17)

Finally, the class with the highest value of sj(x) is chosen as the ensemble prediction.

On the measurement level, each classifier produces a vector of posterior probabilities2 Cˆ

m(x)=[cm1(x), cm2(x),...,cmJ (x)]. And combining predictions of all models,

we have a matrix called decision profile for an instance x:

DP(x) =

⎡

⎣

c11(x) c12(x) ... c1J (x)

... ... ... ...

cM1(x) cM2(x) ... cMJ (x)

⎤

⎦ (18)

Based on the decision profile we calculate the support for each class (sj(x)), and

the final prediction of the ensemble is the class with the highest support:

Cˆ∗(x) = argmaxj

sj(x)

. (19)

The most commonly used is the average (mean) rule:

sj(x) = 1

m=1

cm j(x). (20)

There are also other algebraic rules that calculate median, maximum, minimum and

product of posterior probabilities for the j-th class. For example, the product rule is:

sj(x) = 1

m=1

cm j(x). (21)

Kuncheva et al. (2001) proposed a combination method based on Decision Templates, that are averaged decision profiles for each class (DTj). Given an instance x,

2 We use the command predict(...,type="prob").

Tài liệu tương tự (6)

Xem tất cả

MIỄN PHÍ

19425 lượt xem

Tải ngay đi em, còn do dự, trời tối mất!

Thư viện tri thức trực tuyến

Tài liệu đang bị lỗi

Data Analysis Machine Learning and Applications Episode 1 Part 4 pptx

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Data Analysis Machine Learning and Applications Episode 3 Part 9 docx

Data Analysis Machine Learning and Applications Episode 2 Part 10 docx

Data Analysis Machine Learning and Applications Episode 1 Part 6 docx

Data Analysis Machine Learning and Applications Episode 2 Part 4 doc

Data Analysis Machine Learning and Applications Episode 1 Part 2 potx

Data Analysis Machine Learning and Applications Episode 2 Part 7 docx