Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Mining and Big Data
Nội dung xem thử
Mô tả chi tiết
Ying Tan · Yuhui Shi (Eds.)
123
LNCS 9714
First International Conference, DMBD 2016
Bali, Indonesia, June 25–30, 2016
Proceedings
Data Mining
and Big Data
Lecture Notes in Computer Science 9714
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zürich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Ying Tan • Yuhui Shi (Eds.)
Data Mining
and Big Data
First International Conference, DMBD 2016
Bali, Indonesia, June 25–30, 2016
Proceedings
123
Editors
Ying Tan
Peking University
Beijing
China
Yuhui Shi
Xi’an Jiaotong-Liverpool University
Suzhou
China
ISSN 0302-9743 ISSN 1611-3349 (electronic)
Lecture Notes in Computer Science
ISBN 978-3-319-40972-6 ISBN 978-3-319-40973-3 (eBook)
DOI 10.1007/978-3-319-40973-3
Library of Congress Control Number: 2016942014
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
© Springer International Publishing Switzerland 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG Switzerland
Preface
This volume constitutes the proceedings of the International Conference on Data
Mining and Big Data (DMBD 2016), which was held in conjunction with the 7th
International Conference on Swarm Intelligence (ICSI 2016), during June 25–30, 2016,
at Padma Resort in Legian, Bali, Indonesia.
The theme of DMBD 2016 was “Serving Life with Data Science.” Data mining refers
to the activity of going through big data sets to look for relevant or pertinent information.
This type of activity is a good example of the axiom “looking for a needle in a haystack.”
The idea is that businesses collect massive sets of data that may be homogeneous or
automatically collected. Decision-makers need access to smaller, more specific pieces of
data from these large sets. They use data mining to uncover the pieces of information
that will inform leadership and help chart the course for a business. Big data contains a
huge amount of data and information and is worth researching in depth. Big data, also
known as massive data or mass data, refers to the amount of data involved that are too
great to be interpreted by a human. However, the methods to process big data are
ineffective. Currently, the suitable technologies include data mining, A/B testing,
crowdsourcing, data fusion and integration, genetic algorithms, machine learning, natural language processing, signal processing, simulation, time series analysis, and
visualization. But real or near-real-time information delivery is one of the defining
characteristics of big data analytics. It is important to find new methods to enhance the
effectiveness of big data. With the advent of big data analysis and intelligent computing
techniques we are facing new challenges to make the information transparent and
understandable efficiently. DMBD 2016 provided an excellent opportunity and an
academic forum for academia and practitioners to present and discuss the latest scientific
results, methods, and innovative ideas and advantages in theories, technologies, and
applications in data mining, big data, and intelligent computing. The technical program
covered all aspects of data mining, big data, and swarm intelligence as well as intelligent
computing methods applied to all fields of computer science, signal/information processing, machine learning, data mining and knowledge discovery, robotics, big data,
scheduling, game theory, parallel realization, etc.
DMBD 2016 took place at Padma Resort in Legian, Bali, Indonesia. Bali is a
famous Indonesian island with the provincial capital at Denpasar. Lying between Java
to the west and Lombok to the east, this island is renowned for its volcanic lakes,
spectacular rice terraces, stunning tropical beaches, ancient temples, and palaces, as
well as dance and elaborate religious festivals. Bali is also the largest tourist destination
in the country and is renowned for his highly developed arts, including traditional and
modern dance, sculpture, painting, leather, metalworking, and music. Since the late
20th century, the province has had a big rise in tourism. Bali received the Best Island
Award from Travel and Leisure in 2010. The island of Bali won because of its
attractive surroundings (both mountain and coastal areas), diverse tourist attractions,
excellent international and local restaurants, and the friendliness of the local people.
According to BBC Travel released in 2011, Bali is one of the world’s best islands!
DMBD 2016 received 115 submissions from about 278 authors in 36 countries and
regions (Algeria, Australia, Bangladesh, Brazil, Chile, China, Colombia, Egypt,
France, Germany, Greece, India, Indonesia, Iraq, Ireland, Japan, Kazakhstan, Republic
of Korea, Luxembourg, Malaysia, Norway, Poland, Portugal, Romania, Russian Federation, Singapore, Slovakia, South Africa, Spain, Sweden, Chinese Taiwan, Tunisia,
Turkey, UK, USA, Vietnam) across six continents (Asia, Europe, North America,
South America, Africa, and Oceania). Each submission was reviewed by at least two
reviewers, and on average 2.8 reviewers. Based on rigorous reviews by the Program
Committee members and reviewers, 57 high-quality papers were selected for publication in this proceedings volume with an acceptance rate of 49.57 %. The papers are
organized in 10 cohesive sections covering all major topics of the research and
development of data mining and big data and one Workshop on Computational Aspects
of Pattern Recognition and Computer Vision.
As organizers of DMBD 2016, we would like to express sincere thanks to Peking
University and Xian Jiaotong-Liverpool University for their sponsorship, and to Beijing Xinghui Hi-Tech Co. for its co-sponsorship as well as to the IEEE Computational
Intelligence Society, World Federation on Soft Computing, and International Neural
Network Society, IEEE Beijing section for their technical co-sponsorship. We would
also like to thank the members of the Advisory Committee for their guidance, the
members of the international Program Committee and additional reviewers for
reviewing the papers, and the members of the Publications Committee for checking the
accepted papers in a short period of time. We are especially grateful to the proceedings
publisher Springer for publishing the proceedings in the prestigious series of Lecture
Notes in Computer Science. Moreover, we wish to express our heartfelt appreciation to
the plenary speakers, session chairs, and student helpers. In addition, there are still
many more colleagues, associates, friends, and supporters who helped us in immeasurable ways; we express our sincere gratitude to them all. Last but not the least, we
would like to thank all the speakers, authors, and participants for their great contributions that made DMBD 2016 successful and all the hard work worthwhile.
May 2016 Ying Tan
Yuhui Shi
VI Preface
Organization
General Chairs
Ying Tan Peking University, China
Russ Eberhart IUPUI, USA
General Program Committee Chair
Yuhui Shi Xi’an Jiaotong-Liverpool University, China
Technical Committee Co-chairs
Haibo He University of Rhode Island Kingston, USA
Martin Middendorf University of Leipzig, Germany
Xiaodong Li RMIT University, Australia
Hideyuki Takagi Kyushu University, Japan
Ponnuthurai Nagaratnam
Suganthan
Nanyang Technological University, Singapore
Kay Chen Tan National University of Singapore, Singapore
Special Sessions Co-chairs
Shi Cheng Nottingham University Ningbo, China
Yuan Yuan Chinese Academy of Sciences, China
Publications Co-chairs
Radu-Emil Precup Politehnica University of Timisoara, Romania
Swagatham Das Indian Statistical Institute, India
Plenary Session Co-chairs
Nikola Kasabov Auckland University of Technology, New Zealand
Rachid Chelouah EISTI, France
Tutorial Chair
Milan Tuba University of Belgrade, Serbia
Publicity Co-chairs
Yew-Soon Ong Nanyang Technological University, Singapore
Pramod Kumar Singh Indian Institute of Information Technology
and Management, India
Eugene Semenkin Siberian Aerospace University, Russia
Somnuk Phon-Amnuaisuk Institut Teknologi Brunei, Brunei
Finance and Registration Co-chairs
Andreas Janecek University of Vienna, Austria
Chao Deng Peking University, China
Suicheng Gu Google Corporation, USA
DMBD 2016 Program Committee
Mohd Helmy Abd Wahab Universiti Tun Hussein Onn, Malaysia
Miltiadis Alamaniotis Purdue University, USA
Rafael Alcala University of Granada, Spain
Tomasz Andrysiak UTP Bydgoszcz, Poland
Duong Tuan Anh HoChiMinh City University of Technology, Vietnam
Carmelo J.A. Bastos Filho University of Pernambuco, Brazil
Vladimir Bukhtoyarov Siberian State Aerospace University, Russia
David Camacho Universidad Autonoma de Madrid, Spain
Jinde Cao Southeast University, China
Carlos Costa University of Minho, Portugal
Jose Alfredo Ferreira Costa Universidade Federal do Rio Grande do Norte, Brazil
Bogusław Cyganek AGH University of Science and Technology, Poland
Kusum Deep Indian Institute of Technology Roorkee, India
Mingcong Deng Tokyo University of Agriculture and Technology,
Japan
Pragya Dwivedi JNU New Delhi, India
Jianwu Fang Xi’an Institute of Optics and Precision Mechanics
of CAS, China
Fangyu Gai National University of Defense Technology, China
Teresa Guarda Isla - Superior Institute of Languages and
Administration of Leiria, Portugal
Cem Iyigun Middle East Technical University, Turkey
Dariusz Jankowski Wrocław University of Technology, Poland
Mingyan Jiang Shandong University, China
Imed Kacem LCOMS - Université de Lorraine, France
Kalinka Kaloyanova University of Sofia - FMI, Bulgaria
Jong Myon Kim School of Electrical Engineering, South Korea
Pawel Ksieniewicz Wroclaw University of Technology, Poland
Germano Lambert-Torres PS Solutions, Brazil
Bin Li University of Science and Technology of China, China
VIII Organization
Andrei Lihu Politehnica University of Timisoara, Romania
Shu-Chiang Lin National Taiwan University of Science
and Technology, Taiwan
Bin Liu Nanjing University of Post and Telecommunications,
China
Wenlian Lu Fudan University, China
Wenjian Luo University of Science and Technology of China, China
Wojciech Macyna Wroclaw University of Technology, Poland
Michalis Mavrovouniotis De Montfort University, UK
Mohamed Arezki Mellal M’Hamed Bougara University, Algeria
Sanaz Mostaghim Institute IWS, Germany
Maria Muntean 1 Decembrie 1918 University of Alba Iulia, Romania
Sheak Rashed Haider Noori Daffodil International University, Bangladesh
Benoit Otjacques Luxembourg Institute of Science and Technology,
Luxembourg
Piotr Porwik University of Silesia, Poland
Wei Qin Shanghai Jiao Tong University, China
Vignesh Raja CDAC, India
Mohamed Salah Gouider Institut Supérieur de Gestion de Tunis, Tunisia
Volkmar Schau Friedrich Schiller University of Jena, Germany
Ivan Silva University of São Paulo, Brazil
Pramod Kumar Singh ABV-IIITM Gwalior, India
Hung-Min Sun National Tsing Hua University, Taiwan
Ying Tan Peking University, China
Christos Tjortjis International Hellenic University, Greece
Paulo Trigo ISEL, Portugal
Milan Tuba University of Belgrade, Serbia
Agnieszka Turek Warsaw University of Technology, Poland
Gai-Ge Wang Jiangsu Normal University, China
Guoyin Wang Chongqing University of Posts and
Telecommunications, China
Lei Wang Tongji University, China
Qi Wang Northwestern Polytechnical University, China
Xiaoying Wang Changshu Institute of Technology, China
Yong Wang Zhongnan University, China
Ka-Chun Wong City University of Hong Kong, SAR China
Michal Wozniak Wroclaw University of Technology, Poland
Bo Xing University of Johannesburg, South Africa
Bing Xue Victoria University of Wellington, New Zealand
Yingjie Yang De Montfort University, UK
Kiwon Yeom NASA Ames Research Center, USA
Jie Zhang Newcastle University, UK
Qieshi Zhang Waseda University, Japan
Yujun Zheng Zhejiang University of Technology, China
Organization IX
Cui Zhihua Complex System and Computational Intelligence
Laboratory, China
Huiyu Zhou Queen’s University Belfast, UK
Additional Reviewers
Andrysiak, Tomasz
Burduk, Robert
Hu, Jianqiang
Jackowski, Konrad
Jiang, Zhiyu
Koziarski, Michał
Li, Rui
Loruenser, Thomas
Shi, Xinli
Wan, Ying
Wang, Yi
Wozniak, Michal
Yakhchi, Shahpar
Yan, Shankai
Zawoad, Shams
Zhao, Yang
Zhong, Jie
X Organization
Contents
Challenges in Data Mining and Big Data
Evolutionary Computation and Big Data: Key Challenges
and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Shi Cheng, Bin Liu, Yuhui Shi, Yaochu Jin, and Bin Li
Prospects and Challenges in Online Data Mining: Experiences
of Three-Year Labour Market Monitoring Project . . . . . . . . . . . . . . . . . . . . 15
Maxim Bakaev and Tatiana Avdeenko
Data Mining Algorithms
Enhance AdaBoost Algorithm by Integrating LDA Topic Model. . . . . . . . . . 27
Fangyu Gai, Zhiqiang Li, Xinwen Jiang, and Hongchen Guo
An Improved Algorithm for MicroRNA Profiling from Next Generation
Sequencing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Salim A., Amjesh R., and Vinod Chandra S.S.
Utilising the Cross Industry Standard Process for Data Mining to Reduce
Uncertainty in the Measurement and Verification of Energy Savings . . . . . . . 48
Colm V. Gallagher, Ken Bruton, and Dominic T.J. O’Sullivan
Implementing Majority Voting Rule to Classify Corporate Value Based
on Environmental Efforts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Ratna Hidayati, Katsutoshi Kanamori, Ling Feng, and Hayato Ohwada
Model Proposal of Knowledge Management for Technology
Based Companies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Jorge Leonardo Puentes Morantes, Nancy Yurani Ortiz Guevara,
and José Ignacio Rodriguez Molano
Frequent Itemset Mining
Oracle and Vertica for Frequent Itemset Mining . . . . . . . . . . . . . . . . . . . . . 77
Hristo Kyurkchiev and Kalinka Kaloyanova
Reconstructing Positive Surveys from Negative Surveys
with Background Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Dongdong Zhao, Wenjian Luo, and Lihua Yue
Spatial Data Mining
Application of the Spatial Data Mining Methodology and Gamification
for the Optimisation of Solving the Transport Issues
of the “Varsovian Mordor”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Robert Olszewski and Agnieszka Turek
A Geo-Social Data Model for Moving Objects . . . . . . . . . . . . . . . . . . . . . . 115
Hengcai Zhang, Feng Lu, and Jie Chen
Optimization on Arrangement of Precaution Areas Serving for Ships’
Routeing in the Taiwan Strait Based on Massive AIS Data . . . . . . . . . . . . . 123
Jinhai Chen, Feng Lu, Mingxiao Li, Pengfei Huang, Xiliang Liu,
and Qiang Mei
Prediction
Bulk Price Forecasting Using Spark over NSE Data Set. . . . . . . . . . . . . . . . 137
Vijay Krishna Menon, Nithin Chekravarthi Vasireddy, Sai Aswin Jami,
Viswa Teja Naveen Pedamallu, Varsha Sureshkumar, and K.P. Soman
Prediction and Survival Analysis of Patients After Liver Transplantation
Using RBF Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
C.G. Raji and S.S. Vinod Chandra
Link Prediction by Utilizing Correlations Between Link Types
and Path Types in Heterogeneous Information Networks . . . . . . . . . . . . . . . 156
Hyun Ji Jeong, Kim Taeyeon, and Myoung Ho Kim
Advanced Predictive Methods of Artificial Intelligence in Intelligent
Transport Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Viliam Lendel, Lucia Pancikova, and Lukas Falat
Range Prediction Models for E-Vehicles in Urban Freight Logistics
Based on Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Johannes Kretzschmar, Kai Gebhardt, Christoph Theiß,
and Volkmar Schau
Feature Selection
Partitioning Based N-Gram Feature Selection for Malware Classification . . . . 187
Weiwei Hu and Ying Tan
A Supervised Biclustering Optimization Model for Feature Selection
in Biomedical Dataset Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Saziye Deniz Oguz Arikan and Cem Iyigun
XII Contents
Term Space Partition Based Ensemble Feature Construction
for Spam Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Guyue Mi, Yang Gao, and Ying Tan
Information Extraction
Term Extraction from German Computer Science Textbooks . . . . . . . . . . . . 219
Kevin Möhlmann and Jörn Syrbe
An FW-DTSS Based Approach for News Page Information Extraction . . . . . 227
Leiming Ma and Zhengyou Xia
A Linear Regression Approach to Multi-criteria Recommender System . . . . . 235
Tanisha Jhalani, Vibhor Kant, and Pragya Dwivedi
Classification
Classification of Power Quality Disturbances Using Forest Algorithm . . . . . . 247
Fábbio Borges, Ivan Silva, Ricardo Fernandes, and Lucas Moraes
A Sequential k-Nearest Neighbor Classification Approach for Data-Driven
Fault Diagnosis Using Distance- and Density-Based Affinity Measures . . . . . 253
Myeongsu Kang, Gopala Krishnan Ramaswami, Melinda Hodkiewicz,
Edward Cripps, Jong-Myon Kim, and Michael Pecht
A Hybrid Model Combining SOMs with SVRs for Patent Quality Analysis
and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Pei-Chann Chang, Jheng-Long Wu, Cheng-Chin Tsao,
and Chin-Yuan Fan
Mining Best Strategy for Multi-view Classification . . . . . . . . . . . . . . . . . . . 270
Jing Peng and Alex J. Aved
Anomaly Pattern and Diagnosis
Detecting Variable Length Anomaly Patterns in Time Series Data. . . . . . . . . 279
Ngo Duy Khanh Vy and Duong Tuan Anh
Bigger Data Is Better for Molecular Diagnosis Tests Based
on Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Alexandru G. Floares, George A. Calin, and Florin B. Manolache
Waiting Time Screening in Diagnostic Medical
Imaging – A Case-Based View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Marisa Esteves, Henrique Vicente, Sabino Gomes, António Abelha,
M. Filipe Santos, José Machado, João Neves, and José Neves
Contents XIII
Data Visualization Analysis
Real-Time Data Analytics: An Algorithmic Perspective . . . . . . . . . . . . . . . . 311
Sarwar Jahan Morshed, Juwel Rana, and Marcelo Milrad
High-Dimensional Data Visualization Based on User Knowledge . . . . . . . . . 321
Qiaolian Liu, Jianfei Zhao, Naiwang Guo, Ding Xiao, and Chuan Shi
A Data Mining and Visual Analytics Perspective on Sustainability-Oriented
Infrastructure Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
Dimitri N. Mavris, Michael Balchanos, WoongJe Sung,
and Olivia J. Pinon
Visual Interactive Approach for Mining Twitter’s Networks . . . . . . . . . . . . . 342
Youcef Abdelsadek, Kamel Chelghoum, Francine Herrmann,
Imed Kacem, and Benoît Otjacques
Privacy Policy
Key Indicators for Data Sharing - In Relation with Digital Services. . . . . . . . 353
Sheak Rashed Haider Noori, Md. Kamrul Hossain, and Juwel Rana
Efficient Probabilistic Methods for Proof of Possession in Clouds . . . . . . . . . 364
Lukasz Krzywiecki, Krzysztof Majcher, and Wojciech Macyna
Cloud-Based Storage Model with Strong User Privacy Assurance . . . . . . . . . 373
Amir Rezapour, Wei Wu, and Hung-Min Sun
Social Media
The Role of Social Media in Innovation and Creativity: The Case
of Chinese Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Jiwat Ram, Siqi Liu, and Andy Koronois
Malay Word Stemmer to Stem Standard and Slang Word Patterns
on Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Mohamad Nizam Kassim, Mohd Aizaini Maarof, Anazida Zainal,
and Amirudin Abdul Wahab
Two-Phase Computing Model for Chinese Microblog Sentimental Analysis . . . 401
Jianyong Duan, Chao Wang, Mei Zhang, and Hui Liu
Local Community Detection Based on Bridges Ideas . . . . . . . . . . . . . . . . . . 409
Xia Zhang, Zhengyou Xia, and Jiandong Wang
Environment for Data Transfer Measurement . . . . . . . . . . . . . . . . . . . . . . . 416
Sergey Khoruzhnikov, Vladimir Grudinin, Oleg Sadov, Andrey Shevel,
Stefanos Georgiou, and Arsen Kairkanov
XIV Contents