Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Analysis Machine Learning and Applications Episode 1 Part 2 potx
Nội dung xem thử
Mô tả chi tiết
54 Kamila Migdađ Najman and Krzysztof Najman
itself6. Since the learning algorithm of the SOM network is not deterministic, in
subsequent iterations it is possible to obtain a network with very weak discriminating
properties. In such a situation the value of the Silhouette index in subsequent stages
of variable reduction may not be monotone, what would make the interpretation
of obtained results substantially more difficult. At the end it is worth to note that
for large databases the repetitive construction of the SOM networks may be time
consuming and may require a large computing capacity of the computer equipment
used.
In the opinion of the authors the presented method proved its utility in numerous
empirical studies and may be successfully applied in practice.
References
DEBOECK G., KOHONEN T. (1998), Visual explorations in finance with Self-Organizing
Maps, Springer-Verlag, London.
GNANADESIKAN R., KETTENRING J.R., TSAO S.L. (1995), Weighting and selection of
variables for cluster analysis, Journal of Classification, vol. 12, p. 113-136.
GORDON A.D. (1999), Classification , Chapman and Hall / CRC, London, p.3
KOHONEN T. (1997), Self-Organizing Maps, Springer Series in Information Sciences,
Springer-Verlag, Berlin Heidelberg.
MILLIGAN G.W., COOPER M.C. (1985), An examination of procedures for determining the
number of clusters in data set. Psychometrika, 50(2), p. 159-179.
MILLIGAN G.W. (1994), Issues in Applied Classification: Selection of Variables to Cluster,
Classification Society of North America News Letter, November Issue 37.
MILLIGAN G.W. (1996), Clustering validation: Results and implications for applied analyses. In Phipps Arabie, Lawrence Hubert & G. DeSoete (Eds.), Clustering and classification, River Edge, NJ: World Scientific, p. 341-375.
MIGDAĐ NAJMAN K., NAJMAN K. (2003), Zastosowanie sieci neuronowej typu SOM w
badaniu przestrzennego zróznicowania powiatów ˙ , Wiadomosci Statystyczne, 4/2003, p. ´
72-85.
ROUSSEEUW P.J. (1987), Silhouettes: a graphical aid to the interpretation and validation of
cluster analysis. J. Comput. Appl. Math. 20, p. 53-65.
VESANTO J. (1997), Data Mining Techniques Based on the Self Organizing Map, Thesis for
the degree of Master of Science in Engineering, Helsinki University of Technology.
6 The quality of the SOM network is assessed on the basis of the following coefficients:
topographic, distortion and quantisation.
Calibrating Margin–based Classifier Scores into
Polychotomous Probabilities
Martin Gebel1 and Claus Weihs2
1 Graduiertenkolleg Statistische Modellbildung,
Lehrstuhl für Computergestützte Statistik,
Universität Dortmund, D-44221 Dortmund, Germany
2 Lehrstuhl für Computergestützte Statistik,
Universität Dortmund, D-44221 Dortmund, Germany
Abstract. Margin–based classifiers like the SVM and ANN have two drawbacks. They are
only directly applicable for two–class problems and they only output scores which do not
reflect the assessment uncertainty. K–class assessment probabilities are usually generated by
using a reduction to binary tasks, univariate calibration and further application of the pairwise
coupling algorithm. This paper presents an alternative to coupling with usage of the Dirichlet
distribution.
1 Introduction
Although many classification problems cover more than two classes, the margin–
based classifiers such as the Support Vector Machine (SVM) and Artificial Neural
Networks (ANN), are only directly applicable to binary classification tasks. Thus,
tasks with number of classes K greater than 2 require a reduction to several binary
problems and a following combination of the produced binary assessment values to
just one assessment value per class.
Before this combination it is beneficial to generate comparable outcomes by calibrating them to probabilities which reflect the assessment uncertainty in the binary
decisions, see Section 2. Analyzes for calibration of dichotomous classifier scores
show that the calibrators using Mapping with Logistic Regression or the Assignment Value idea are performing best and most robust, see Gebel and Weihs (2007).
Up to date, pairwise coupling by Hastie and Tibshirani (1998) is the standard approach for the subsequent combination of binary assessment values, see Section 3.
Section 4 presents a new multi–class calibration method for margin–based classifiers
which combines the binary outcomes to assessment probabilities for the K classes.
This method based on the Dirichlet distribution will be compared in Section 5 to the
coupling algorithm.