Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

Trang chủ

Đăng nhập

Đăng ký

Mới

Đăng ký tài khoản mới

AI Tư vấn

Mới

Trợ lý thông minh tìm tài liệu

Liên hệ fanpage

Hỗ trợ tìm tài liệu

Lưu trang

Liên hệ fanpage

Tài liệu High-Performance Parallel Database Processing and Grid Databases- P11 doc

MIỄN PHÍ

Số trang

Kích thước

343.1 KB

Định dạng

PDF

Lượt xem

1217

Tài liệu High-Performance Parallel Database Processing and Grid Databases- P11 doc

Nội dung xem thử

Mô tả chi tiết

480 Chapter 17 Parallel Clustering and Classification

Rec# Weather Temperature Time Day Jog (Target Class)

1 Fine Mild Sunset Weekend Yes

2 Fine Hot Sunset Weekday Yes

3 Shower Mild Midday Weekday No

4 Thunderstorm Cool Dawn Weekend No

5 Shower Hot Sunset Weekday Yes

6 Fine Hot Midday Weekday No

7 Fine Cool Dawn Weekend No

8 Thunderstorm Cool Midday Weekday No

9 Fine Cool Midday Weekday Yes

10 Fine Mild Midday Weekday Yes

11 Shower Hot Dawn Weekend No

12 Shower Mild Dawn Weekday No

13 Fine Cool Dawn Weekday No

14 Thunderstorm Mild Sunset Weekend No

15 Thunderstorm Hot Midday Weekday No

Figure 17.11 Training data set

thunderstorm, whereas the possible values for temperature are hot, mild, and cool.

Continuous values are real numbers (e.g., heights of a person in centimetres).

Figure 17.11 shows the training data set for the decision tree shown previously.

This training data set consists of only 15 records. For simplicity, only categorical

attributes are used in this example. Examining the first record and matching it with

the decision tree in Figure 17.10, the target is a Yes for fine weather and mild

temperature, disregarding the other two attributes. This is because all records in

this training data set follow this rule (see records 1 and 10). Other records, such as

records 9 and 13 use all the four attributes.

17.3.2 Decision Tree Classification: Processes

Decision Tree Algorithm

There are many different algorithms to construct a decision tree, such as ID3, C4.5,

Sprint, etc. Constructing a decision tree is generally a recursive process. At the

start, all training records are at the root node. Then it partitions the training records

recursively by choosing one attribute at a time. The process is repeated for the

partitioned data set. The recursion stops when a stopping condition is reached,

which is when all of the training records in the partition have the same target class

label.

Figure 17.12 shows an algorithm for constructing a decision tree. The decision tree construction algorithm uses a divide-and-conquer method. It constructs

the tree using a depth-first fashion. Branching can be binary (only 2 branches) or

multiway (½2 branches).

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

17.3 Parallel Classification 481

Algorithm: Decision Tree Construction

Input: training dataset D

Output: decision tree T

Procedure DTConstruct(D):

1. TDØ

2. Determine best splitting attribute

3. TDcreate root node and label with splitting attribute

4. TDadd arc to root node for each split predicate with

label

5. For each arc do

6. DDdataset created by applying splitting predicate

to D

7. If stopping point reached for this path Then

8. T’ D create leaf node and label with appropriate

class

9. Else

10. T’ D DTConstruct(D)

11. TDadd T’ to arc

Figure 17.12 Decision tree algorithm

Note that in the algorithm shown in Figure 17.12, the key element is the splitting

attribute selection (line 2). The splitting attribute is the attribute chosen to split the

training data set into a number of partitions. The splitting attribute step is also often

known as feature selection, because the algorithm needs to select a feature (or an

attribute) of the training data set to create a node. As mentioned earlier, choosing

a different attribute as a splitting attribute will cause the result decision to be different. The difference in the decision tree produced by an algorithm lies in how

to position the features or input attributes. Hence, choosing a splitting attribute,

which will result in an optimum decision tree, is desirable. The way by which a

splitting node is determined will be described in greater detail in the following.

Splitting Attributes or Feature Selection

When constructing a decision tree, it is necessary to have a means of determining

the importance of the attributes for the classification. Hence, calculation is needed

to find the best splitting attribute at a node. All possible splitting attributes are

evaluated with a feature selection criterion to find the best attribute. Although the

feature selection criterion still does not guarantee the best decision tree, nevertheless, it also relies on the completeness of the training data set and whether or not

the training data set provides enough information.

The main aim of feature selection or choosing the right splitting attribute at

some point in a decision tree is to create a tree that is as simple as possible and

gives the correct classification. Consequently, poor selection of an attribute can

result in a poor decision tree.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

482 Chapter 17 Parallel Clustering and Classification

At each node, available attributes are evaluated on the basis of separating the

classes of the training records. For example, looking at the training records in

Figure 17.11, we note that if Time D Dawn, then the answer is always No (see

records 4, 7, 11–13). It means that if Time is chosen as the first splitting attribute,

at the next stage, we do not need to process these 5 records (records 4, 7, 11–13).

We need to process only those records with Time D Sunset or Midday (10 records

altogether), making the gain for choosing attribute Time as a splitting attribute

quite high and hence, desirable.

Let us look at another possible attribute, namely, Weather. Also notice that

when the Weather D Thunderstorm, the target class is always No (see records 4, 8,

14–15). If attribute Weather is chosen as a splitting attribute in the beginning, in

the next stage, these four records (records 4, 8, 14–15) will not be processed—we

need to process only the other 11 records. So, the gain in choosing attribute

Weather as a splitting attribute is not that bad, but not as good as the attribute

Time, because a higher number of records are pruned out.

Therefore, the main goal for choosing the best splitting attribute is to choose the

attribute that will prune out as many records as possible at the early stage, so that

fewer records need to be processed in the subsequent stages. We can also say that

the best splitting attribute is the one that will result in the smallest tree.

There are various kinds of feature selection criteria for determining the best

splitting attributes. The basic feature selection criterion is called gain criterion,

which was designed for the one of the original decision tree algorithm (i.e.,

ID3/C4.5). Heuristically, the best splitting attribute will produce the “purest”

nodes. A popular impurity criterion is information gain. Information gain increases

with the average purity of the subsets that an attribute produces. Therefore, the

strategy is to choose an attribute that results in greatest information gain.

The gain criterion basically consists of four important calculations.

Ž Given a probability distribution, the information required to predict an event

is the distribution’s entropy. Entropy for the given probability of the target

classes, p1; p2;:::; pn where Pn

iD1

pi D 1, can be calculated as follows:

entropy.p1; p2;:::; pn/ D Xn

iD1

.pi log.1=pi// (17.2)

Let us use the training data set in Figure 17.11. There are two target

classes: Yes and No. With 15 records in the training data set, 5 records have

target class Yes and the other 10 records have target class No. The probability

of falling into a Yes is 5/15, whereas the No probability is 10/15. Entropy for

the given probability of the two target classes is then calculated as follows:

entropy(Yes, No) D 5=15 ð log.15=5/ C 10=15 ð log.15=10/

D 0:2764 (17.3)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

17.3 Parallel Classification 483

At the next iteration, when the training data set is partitioned to a smaller

subset, we need to calculate the entropy based on the number of training

records in the partition, not the total number of records in the original training

data set.

Ž For each of the possible attributes to be chosen as a splitting attribute, we need

to calculate the entropy value for each of the possible values of that particular

attribute. Equation 17.2 can be used, but the number of records is not the total

number of training records but rather the number of records possessing the

attribute value of the entropy of a particular attribute:

For example, for Weather D Fine, there are 4 records with target class Yes

and 3 records with No. Hence the entropy for Weather D Fine is:

entropy.Weather D Fine/ D 4=7 ð log.7=4/ C 3=7 ð log.7=3/

D 0:2966 (17.4)

For example, for Weather D Shower, there is only 1 record with target

class Yes and 3 records with No. Hence the entropy for Weather D Shower

is:

entropy.Weather D Shower/ D 1=4 ð log.4=1/ C 3=4 ð log.4=3/

D 0:2442 (17.5)

Note that the entropy calculation for both examples above uses a different total number of records. In Weather D Fine the number of records is 7,

whereas in Weather D Shower the number of records is only 4. This number

of records is important, because it affects the probability of having a target

class. For example, for target class Yes in Fine weather the probability is

4/7, whereas the same target class Yes in Shower weather the probability is

only 1/4.

For each of the attribute values, we need to calculate the entropy. In other

words, for attribute Weather, because there are three attribute values (e.g.,

Fine, Shower, and Thunderstorm), each of these three values must have an

entropy value. For attribute Temperature, for instance, we need an entropy

calculated for values Hot, Mild, and Cool.

Ž The entropy values for each attribute must be summed with a weighted sum.

The aim is that each attribute must have one entropy value. Because each

attribute value has an individual entropy value (e.g., attribute Weight has

three entropy values, one for each weather), and the entropy of each attribute

value is based on a different probability distribution, when we combine all

the entropy values from the same attributes, their individual weight must be

considered.

To calculate the weighted sum, each entropy value must be multiplied with

the probability of each value of the total number of training records in the

partition. For example, the weighted entropy value for Fine weather is 7/15 ð

0:2966.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

484 Chapter 17 Parallel Clustering and Classification

There are 7 records out of 15 records with Fine weather, and the entropy

for Fine weather is 0.2966 as calculated earlier (see equation 17.4).

Using the same method, the weighted sum for Shower weather is 4/15 ð

0:2442, as there are only 4 records out of the 15 records in the training dataset

with Shower weather, and the original entropy for Shower as calculated in

equation 17.5 is 0.2442.

After each individual entropy value has been weighted, we can sum them

for each individual attribute. For example, the weighted sum for attribute

Weather is:

Weighted sum entropy .W eather/ D Weighted entropy .Fine/

C Weighted entropy .Shower/

C Weighted entropy .T hunderstorm/

D 7=15 ð 0:2966 C 4=15 ð 0:2442 C 4=15 ð 0

D 0:2035 (17.6)

Ž Finally, the gain for an attribute can be calculated by subtracting the weighted

sum of the attribute entropy from the overall entropy. For example, the gain

for attribute Weather is:

gain(Weather) D entropy.training datasetD/ entropy.attributeWeather/

D 0:2764 0:2035

D 0:0729 (17.7)

The first part of equation 17.7 was previously calculated from equation

17.3, whereas the second part of the equation is from equation 17.6

After all attributes have their gain values, the attribute that has the highest gain

value is chosen as the splitting attribute.

After an attribute has been chosen as a splitting attribute, the training data set is

partitioned into a number of partitions according to the number of distinct values

in the splitting attribute. Once the training data set has been partitioned, for each

partition, the same process as above is repeated, until all records at the same partition fall into the same target class, and then the process for the partition terminates

(refer to Fig. 17.12 for the algorithm).

A Walk-Through Example

Using the sample training data set in Figure 17.11, the following gives a complete

walk-through of the process to create a decision tree.

Step 1: Calculate entropy for the training data set in Figure 17.11. The result is

previously calculated as 0.2764 (see equation 17.3).

Step 2: Process attribute Weather

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.