Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Analysis Machine Learning and Applications Episode 1 Part 6 docx
Nội dung xem thử
Mô tả chi tiết
152 Kurt Hornik and Walter Böhm
Table 2. Formation of a third class in the Euclidean consensus partitions for the Gordon-Vichi
macroeconomic ensemble as a function of the weight ratio w between 3- and 2-class partitions
in the ensemble.
1.5 India
2.0 India, Sudan
3.0 India, Sudan
4.5 India, Sudan, Bolivia, Indonesia
10.0 India, Sudan, Bolivia, Indonesia
12.5 India, Sudan, Bolivia, Indonesia, Egypt
f India, Sudan, Bolivia, Indonesia, Egypt
these, 85 female undergraduates at Rutgers University were asked to sort 15 English
terms into classes “on the basis of some aspect of meaning”. There are at least three
“axes” for classification: gender, generation, and direct versus indirect lineage. The
Euclidean consensus partitions with Q = 3 classes put grandparents and grandchildren in one class and all indirect kins into another one. For Q = 4, {brother, sister}
are separated from {father, mother, daughter, son}. Table 3 shows the memberships
for a soft Euclidean consensus partition for Q = 5 based on 1000 replications of the
AO algorithm.
Table 3. Memberships for the 5-class soft Euclidean consensus partition for the RosenbergKim kinship terms data.
grandfather 0.000 0.024 0.012 0.965 0.000
grandmother 0.005 0.134 0.016 0.840 0.005
granddaughter 0.113 0.242 0.054 0.466 0.125
grandson 0.134 0.111 0.052 0.581 0.122
brother 0.612 0.282 0.024 0.082 0.000
sister 0.579 0.391 0.026 0.002 0.002
father 0.099 0.546 0.122 0.158 0.075
mother 0.089 0.654 0.136 0.054 0.066
daughter 0.000 1.000 0.000 0.000 0.000
son 0.031 0.842 0.007 0.113 0.007
nephew 0.012 0.047 0.424 0.071 0.447
niece 0.000 0.129 0.435 0.000 0.435
cousin 0.080 0.056 0.656 0.033 0.174
aunt 0.000 0.071 0.929 0.000 0.000
uncle 0.000 0.000 0.882 0.071 0.047
Figure 1 indicates the classes and margins for the 5-class solutions. We see that
the memberships of ‘niece’ are tied between columns 3 and 5, and that the margin
of ‘nephew’ is only very small (0.02), suggesting the 4-class solution as the optimal
Euclidean consensus representation of the ensemble.
Hard and Soft Euclidean Consensus Partitions 153
uncle
aunt
cousin
niece
nephew
son
daughter
mother
father
sister
brother
grandson
granddaughter
grandmother
grandfather
0.0 0.2 0.4 0.6 0.8 1.0
4
4
4
4
1
1
2
2
2
2
5
3/5
3
3
3
Fig. 1. Classes (incicated by plot symbol and class id) and margins (differences between the
largest and second largest membership values) for the 5-class soft Euclidean consensus partition for the Rosenberg-Kim kinship terms data.
Quite interestingly, none of these consensus partitions split according to gender,
even though there are such partitions in the data. To take the natural heterogeneity in the data into account, one could try to partition them (perform clusterwise
aggregation, Gaul and Schader (1988)), resulting in meta-partitions (Gordon and
Vichi (1998)) of the underlying objects. Function cl_pclust in package clue provides an AO heuristic for soft prototype-based partitioning of classifications, allowing in particular to obtain soft or hard meta-partitions with soft or hard Euclidean
consensus partitions as prototypes.
References
BARTHÉLEMY, J.P. and MONJARDET, B. (1981): The median procedure in cluster analysis
and social choice theory. Mathematical Social Sciences, 1, 235–267.
BARTHÉLEMY, J.P. and MONJARDET, B. (1988): The median procedure in data analysis:
new results and open problems. In: H. H. Bock, editor, Classification and related methods
of data analysis. North-Holland, Amsterdam, 309–316.
BOORMAN, S. A. and ARABIE, P. (1972): Structural measures and the method of sorting.
In R. N. Shepard, A. K. Romney and S. B. Nerlove, editors, Multidimensional Scaling:
Theory and Applications in the Behavioral Sciences, 1: Theory. Seminar Press, New
York, 225–249.
CHARON, I., DENOEUD, L., GUENOCHE, A. and HUDRY, O. (2006): Maximum transfer
distance between partitions. Journal of Classification, 23(1), 103–121.
DAY, W. H. E. (1981): The complexity of computing metric distances between partitions.
Mathematical Social Sciences, 1, 269–287.
DIMITRIADOU, E., WEINGESSEL, A. and HORNIK, K. (2002): A combination scheme for
fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence,
16(7), 901–912.
GAUL, W. and SCHADER, M. (1988): Clusterwise aggregation of relations. Applied Stochastic Models and Data Analysis, 4, 273–282.