Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Mining and Big Data
PREMIUM
Số trang
564
Kích thước
44.7 MB
Định dạng
PDF
Lượt xem
1168

Data Mining and Big Data

Nội dung xem thử

Mô tả chi tiết

Ying Tan · Yuhui Shi (Eds.)

123

LNCS 9714

First International Conference, DMBD 2016

Bali, Indonesia, June 25–30, 2016

Proceedings

Data Mining

and Big Data

Lecture Notes in Computer Science 9714

Commenced Publication in 1973

Founding and Former Series Editors:

Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board

David Hutchison

Lancaster University, Lancaster, UK

Takeo Kanade

Carnegie Mellon University, Pittsburgh, PA, USA

Josef Kittler

University of Surrey, Guildford, UK

Jon M. Kleinberg

Cornell University, Ithaca, NY, USA

Friedemann Mattern

ETH Zurich, Zürich, Switzerland

John C. Mitchell

Stanford University, Stanford, CA, USA

Moni Naor

Weizmann Institute of Science, Rehovot, Israel

C. Pandu Rangan

Indian Institute of Technology, Madras, India

Bernhard Steffen

TU Dortmund University, Dortmund, Germany

Demetri Terzopoulos

University of California, Los Angeles, CA, USA

Doug Tygar

University of California, Berkeley, CA, USA

Gerhard Weikum

Max Planck Institute for Informatics, Saarbrücken, Germany

More information about this series at http://www.springer.com/series/7409

Ying Tan • Yuhui Shi (Eds.)

Data Mining

and Big Data

First International Conference, DMBD 2016

Bali, Indonesia, June 25–30, 2016

Proceedings

123

Editors

Ying Tan

Peking University

Beijing

China

Yuhui Shi

Xi’an Jiaotong-Liverpool University

Suzhou

China

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Computer Science

ISBN 978-3-319-40972-6 ISBN 978-3-319-40973-3 (eBook)

DOI 10.1007/978-3-319-40973-3

Library of Congress Control Number: 2016942014

LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI

© Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, express or implied, with respect to the material contained herein or for any errors or

omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG Switzerland

Preface

This volume constitutes the proceedings of the International Conference on Data

Mining and Big Data (DMBD 2016), which was held in conjunction with the 7th

International Conference on Swarm Intelligence (ICSI 2016), during June 25–30, 2016,

at Padma Resort in Legian, Bali, Indonesia.

The theme of DMBD 2016 was “Serving Life with Data Science.” Data mining refers

to the activity of going through big data sets to look for relevant or pertinent information.

This type of activity is a good example of the axiom “looking for a needle in a haystack.”

The idea is that businesses collect massive sets of data that may be homogeneous or

automatically collected. Decision-makers need access to smaller, more specific pieces of

data from these large sets. They use data mining to uncover the pieces of information

that will inform leadership and help chart the course for a business. Big data contains a

huge amount of data and information and is worth researching in depth. Big data, also

known as massive data or mass data, refers to the amount of data involved that are too

great to be interpreted by a human. However, the methods to process big data are

ineffective. Currently, the suitable technologies include data mining, A/B testing,

crowdsourcing, data fusion and integration, genetic algorithms, machine learning, nat￾ural language processing, signal processing, simulation, time series analysis, and

visualization. But real or near-real-time information delivery is one of the defining

characteristics of big data analytics. It is important to find new methods to enhance the

effectiveness of big data. With the advent of big data analysis and intelligent computing

techniques we are facing new challenges to make the information transparent and

understandable efficiently. DMBD 2016 provided an excellent opportunity and an

academic forum for academia and practitioners to present and discuss the latest scientific

results, methods, and innovative ideas and advantages in theories, technologies, and

applications in data mining, big data, and intelligent computing. The technical program

covered all aspects of data mining, big data, and swarm intelligence as well as intelligent

computing methods applied to all fields of computer science, signal/information pro￾cessing, machine learning, data mining and knowledge discovery, robotics, big data,

scheduling, game theory, parallel realization, etc.

DMBD 2016 took place at Padma Resort in Legian, Bali, Indonesia. Bali is a

famous Indonesian island with the provincial capital at Denpasar. Lying between Java

to the west and Lombok to the east, this island is renowned for its volcanic lakes,

spectacular rice terraces, stunning tropical beaches, ancient temples, and palaces, as

well as dance and elaborate religious festivals. Bali is also the largest tourist destination

in the country and is renowned for his highly developed arts, including traditional and

modern dance, sculpture, painting, leather, metalworking, and music. Since the late

20th century, the province has had a big rise in tourism. Bali received the Best Island

Award from Travel and Leisure in 2010. The island of Bali won because of its

attractive surroundings (both mountain and coastal areas), diverse tourist attractions,

excellent international and local restaurants, and the friendliness of the local people.

According to BBC Travel released in 2011, Bali is one of the world’s best islands!

DMBD 2016 received 115 submissions from about 278 authors in 36 countries and

regions (Algeria, Australia, Bangladesh, Brazil, Chile, China, Colombia, Egypt,

France, Germany, Greece, India, Indonesia, Iraq, Ireland, Japan, Kazakhstan, Republic

of Korea, Luxembourg, Malaysia, Norway, Poland, Portugal, Romania, Russian Fed￾eration, Singapore, Slovakia, South Africa, Spain, Sweden, Chinese Taiwan, Tunisia,

Turkey, UK, USA, Vietnam) across six continents (Asia, Europe, North America,

South America, Africa, and Oceania). Each submission was reviewed by at least two

reviewers, and on average 2.8 reviewers. Based on rigorous reviews by the Program

Committee members and reviewers, 57 high-quality papers were selected for publi￾cation in this proceedings volume with an acceptance rate of 49.57 %. The papers are

organized in 10 cohesive sections covering all major topics of the research and

development of data mining and big data and one Workshop on Computational Aspects

of Pattern Recognition and Computer Vision.

As organizers of DMBD 2016, we would like to express sincere thanks to Peking

University and Xian Jiaotong-Liverpool University for their sponsorship, and to Bei￾jing Xinghui Hi-Tech Co. for its co-sponsorship as well as to the IEEE Computational

Intelligence Society, World Federation on Soft Computing, and International Neural

Network Society, IEEE Beijing section for their technical co-sponsorship. We would

also like to thank the members of the Advisory Committee for their guidance, the

members of the international Program Committee and additional reviewers for

reviewing the papers, and the members of the Publications Committee for checking the

accepted papers in a short period of time. We are especially grateful to the proceedings

publisher Springer for publishing the proceedings in the prestigious series of Lecture

Notes in Computer Science. Moreover, we wish to express our heartfelt appreciation to

the plenary speakers, session chairs, and student helpers. In addition, there are still

many more colleagues, associates, friends, and supporters who helped us in immea￾surable ways; we express our sincere gratitude to them all. Last but not the least, we

would like to thank all the speakers, authors, and participants for their great contri￾butions that made DMBD 2016 successful and all the hard work worthwhile.

May 2016 Ying Tan

Yuhui Shi

VI Preface

Organization

General Chairs

Ying Tan Peking University, China

Russ Eberhart IUPUI, USA

General Program Committee Chair

Yuhui Shi Xi’an Jiaotong-Liverpool University, China

Technical Committee Co-chairs

Haibo He University of Rhode Island Kingston, USA

Martin Middendorf University of Leipzig, Germany

Xiaodong Li RMIT University, Australia

Hideyuki Takagi Kyushu University, Japan

Ponnuthurai Nagaratnam

Suganthan

Nanyang Technological University, Singapore

Kay Chen Tan National University of Singapore, Singapore

Special Sessions Co-chairs

Shi Cheng Nottingham University Ningbo, China

Yuan Yuan Chinese Academy of Sciences, China

Publications Co-chairs

Radu-Emil Precup Politehnica University of Timisoara, Romania

Swagatham Das Indian Statistical Institute, India

Plenary Session Co-chairs

Nikola Kasabov Auckland University of Technology, New Zealand

Rachid Chelouah EISTI, France

Tutorial Chair

Milan Tuba University of Belgrade, Serbia

Publicity Co-chairs

Yew-Soon Ong Nanyang Technological University, Singapore

Pramod Kumar Singh Indian Institute of Information Technology

and Management, India

Eugene Semenkin Siberian Aerospace University, Russia

Somnuk Phon-Amnuaisuk Institut Teknologi Brunei, Brunei

Finance and Registration Co-chairs

Andreas Janecek University of Vienna, Austria

Chao Deng Peking University, China

Suicheng Gu Google Corporation, USA

DMBD 2016 Program Committee

Mohd Helmy Abd Wahab Universiti Tun Hussein Onn, Malaysia

Miltiadis Alamaniotis Purdue University, USA

Rafael Alcala University of Granada, Spain

Tomasz Andrysiak UTP Bydgoszcz, Poland

Duong Tuan Anh HoChiMinh City University of Technology, Vietnam

Carmelo J.A. Bastos Filho University of Pernambuco, Brazil

Vladimir Bukhtoyarov Siberian State Aerospace University, Russia

David Camacho Universidad Autonoma de Madrid, Spain

Jinde Cao Southeast University, China

Carlos Costa University of Minho, Portugal

Jose Alfredo Ferreira Costa Universidade Federal do Rio Grande do Norte, Brazil

Bogusław Cyganek AGH University of Science and Technology, Poland

Kusum Deep Indian Institute of Technology Roorkee, India

Mingcong Deng Tokyo University of Agriculture and Technology,

Japan

Pragya Dwivedi JNU New Delhi, India

Jianwu Fang Xi’an Institute of Optics and Precision Mechanics

of CAS, China

Fangyu Gai National University of Defense Technology, China

Teresa Guarda Isla - Superior Institute of Languages and

Administration of Leiria, Portugal

Cem Iyigun Middle East Technical University, Turkey

Dariusz Jankowski Wrocław University of Technology, Poland

Mingyan Jiang Shandong University, China

Imed Kacem LCOMS - Université de Lorraine, France

Kalinka Kaloyanova University of Sofia - FMI, Bulgaria

Jong Myon Kim School of Electrical Engineering, South Korea

Pawel Ksieniewicz Wroclaw University of Technology, Poland

Germano Lambert-Torres PS Solutions, Brazil

Bin Li University of Science and Technology of China, China

VIII Organization

Andrei Lihu Politehnica University of Timisoara, Romania

Shu-Chiang Lin National Taiwan University of Science

and Technology, Taiwan

Bin Liu Nanjing University of Post and Telecommunications,

China

Wenlian Lu Fudan University, China

Wenjian Luo University of Science and Technology of China, China

Wojciech Macyna Wroclaw University of Technology, Poland

Michalis Mavrovouniotis De Montfort University, UK

Mohamed Arezki Mellal M’Hamed Bougara University, Algeria

Sanaz Mostaghim Institute IWS, Germany

Maria Muntean 1 Decembrie 1918 University of Alba Iulia, Romania

Sheak Rashed Haider Noori Daffodil International University, Bangladesh

Benoit Otjacques Luxembourg Institute of Science and Technology,

Luxembourg

Piotr Porwik University of Silesia, Poland

Wei Qin Shanghai Jiao Tong University, China

Vignesh Raja CDAC, India

Mohamed Salah Gouider Institut Supérieur de Gestion de Tunis, Tunisia

Volkmar Schau Friedrich Schiller University of Jena, Germany

Ivan Silva University of São Paulo, Brazil

Pramod Kumar Singh ABV-IIITM Gwalior, India

Hung-Min Sun National Tsing Hua University, Taiwan

Ying Tan Peking University, China

Christos Tjortjis International Hellenic University, Greece

Paulo Trigo ISEL, Portugal

Milan Tuba University of Belgrade, Serbia

Agnieszka Turek Warsaw University of Technology, Poland

Gai-Ge Wang Jiangsu Normal University, China

Guoyin Wang Chongqing University of Posts and

Telecommunications, China

Lei Wang Tongji University, China

Qi Wang Northwestern Polytechnical University, China

Xiaoying Wang Changshu Institute of Technology, China

Yong Wang Zhongnan University, China

Ka-Chun Wong City University of Hong Kong, SAR China

Michal Wozniak Wroclaw University of Technology, Poland

Bo Xing University of Johannesburg, South Africa

Bing Xue Victoria University of Wellington, New Zealand

Yingjie Yang De Montfort University, UK

Kiwon Yeom NASA Ames Research Center, USA

Jie Zhang Newcastle University, UK

Qieshi Zhang Waseda University, Japan

Yujun Zheng Zhejiang University of Technology, China

Organization IX

Cui Zhihua Complex System and Computational Intelligence

Laboratory, China

Huiyu Zhou Queen’s University Belfast, UK

Additional Reviewers

Andrysiak, Tomasz

Burduk, Robert

Hu, Jianqiang

Jackowski, Konrad

Jiang, Zhiyu

Koziarski, Michał

Li, Rui

Loruenser, Thomas

Shi, Xinli

Wan, Ying

Wang, Yi

Wozniak, Michal

Yakhchi, Shahpar

Yan, Shankai

Zawoad, Shams

Zhao, Yang

Zhong, Jie

X Organization

Contents

Challenges in Data Mining and Big Data

Evolutionary Computation and Big Data: Key Challenges

and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Shi Cheng, Bin Liu, Yuhui Shi, Yaochu Jin, and Bin Li

Prospects and Challenges in Online Data Mining: Experiences

of Three-Year Labour Market Monitoring Project . . . . . . . . . . . . . . . . . . . . 15

Maxim Bakaev and Tatiana Avdeenko

Data Mining Algorithms

Enhance AdaBoost Algorithm by Integrating LDA Topic Model. . . . . . . . . . 27

Fangyu Gai, Zhiqiang Li, Xinwen Jiang, and Hongchen Guo

An Improved Algorithm for MicroRNA Profiling from Next Generation

Sequencing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Salim A., Amjesh R., and Vinod Chandra S.S.

Utilising the Cross Industry Standard Process for Data Mining to Reduce

Uncertainty in the Measurement and Verification of Energy Savings . . . . . . . 48

Colm V. Gallagher, Ken Bruton, and Dominic T.J. O’Sullivan

Implementing Majority Voting Rule to Classify Corporate Value Based

on Environmental Efforts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Ratna Hidayati, Katsutoshi Kanamori, Ling Feng, and Hayato Ohwada

Model Proposal of Knowledge Management for Technology

Based Companies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Jorge Leonardo Puentes Morantes, Nancy Yurani Ortiz Guevara,

and José Ignacio Rodriguez Molano

Frequent Itemset Mining

Oracle and Vertica for Frequent Itemset Mining . . . . . . . . . . . . . . . . . . . . . 77

Hristo Kyurkchiev and Kalinka Kaloyanova

Reconstructing Positive Surveys from Negative Surveys

with Background Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Dongdong Zhao, Wenjian Luo, and Lihua Yue

Spatial Data Mining

Application of the Spatial Data Mining Methodology and Gamification

for the Optimisation of Solving the Transport Issues

of the “Varsovian Mordor”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Robert Olszewski and Agnieszka Turek

A Geo-Social Data Model for Moving Objects . . . . . . . . . . . . . . . . . . . . . . 115

Hengcai Zhang, Feng Lu, and Jie Chen

Optimization on Arrangement of Precaution Areas Serving for Ships’

Routeing in the Taiwan Strait Based on Massive AIS Data . . . . . . . . . . . . . 123

Jinhai Chen, Feng Lu, Mingxiao Li, Pengfei Huang, Xiliang Liu,

and Qiang Mei

Prediction

Bulk Price Forecasting Using Spark over NSE Data Set. . . . . . . . . . . . . . . . 137

Vijay Krishna Menon, Nithin Chekravarthi Vasireddy, Sai Aswin Jami,

Viswa Teja Naveen Pedamallu, Varsha Sureshkumar, and K.P. Soman

Prediction and Survival Analysis of Patients After Liver Transplantation

Using RBF Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

C.G. Raji and S.S. Vinod Chandra

Link Prediction by Utilizing Correlations Between Link Types

and Path Types in Heterogeneous Information Networks . . . . . . . . . . . . . . . 156

Hyun Ji Jeong, Kim Taeyeon, and Myoung Ho Kim

Advanced Predictive Methods of Artificial Intelligence in Intelligent

Transport Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Viliam Lendel, Lucia Pancikova, and Lukas Falat

Range Prediction Models for E-Vehicles in Urban Freight Logistics

Based on Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Johannes Kretzschmar, Kai Gebhardt, Christoph Theiß,

and Volkmar Schau

Feature Selection

Partitioning Based N-Gram Feature Selection for Malware Classification . . . . 187

Weiwei Hu and Ying Tan

A Supervised Biclustering Optimization Model for Feature Selection

in Biomedical Dataset Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

Saziye Deniz Oguz Arikan and Cem Iyigun

XII Contents

Term Space Partition Based Ensemble Feature Construction

for Spam Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Guyue Mi, Yang Gao, and Ying Tan

Information Extraction

Term Extraction from German Computer Science Textbooks . . . . . . . . . . . . 219

Kevin Möhlmann and Jörn Syrbe

An FW-DTSS Based Approach for News Page Information Extraction . . . . . 227

Leiming Ma and Zhengyou Xia

A Linear Regression Approach to Multi-criteria Recommender System . . . . . 235

Tanisha Jhalani, Vibhor Kant, and Pragya Dwivedi

Classification

Classification of Power Quality Disturbances Using Forest Algorithm . . . . . . 247

Fábbio Borges, Ivan Silva, Ricardo Fernandes, and Lucas Moraes

A Sequential k-Nearest Neighbor Classification Approach for Data-Driven

Fault Diagnosis Using Distance- and Density-Based Affinity Measures . . . . . 253

Myeongsu Kang, Gopala Krishnan Ramaswami, Melinda Hodkiewicz,

Edward Cripps, Jong-Myon Kim, and Michael Pecht

A Hybrid Model Combining SOMs with SVRs for Patent Quality Analysis

and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

Pei-Chann Chang, Jheng-Long Wu, Cheng-Chin Tsao,

and Chin-Yuan Fan

Mining Best Strategy for Multi-view Classification . . . . . . . . . . . . . . . . . . . 270

Jing Peng and Alex J. Aved

Anomaly Pattern and Diagnosis

Detecting Variable Length Anomaly Patterns in Time Series Data. . . . . . . . . 279

Ngo Duy Khanh Vy and Duong Tuan Anh

Bigger Data Is Better for Molecular Diagnosis Tests Based

on Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

Alexandru G. Floares, George A. Calin, and Florin B. Manolache

Waiting Time Screening in Diagnostic Medical

Imaging – A Case-Based View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

Marisa Esteves, Henrique Vicente, Sabino Gomes, António Abelha,

M. Filipe Santos, José Machado, João Neves, and José Neves

Contents XIII

Data Visualization Analysis

Real-Time Data Analytics: An Algorithmic Perspective . . . . . . . . . . . . . . . . 311

Sarwar Jahan Morshed, Juwel Rana, and Marcelo Milrad

High-Dimensional Data Visualization Based on User Knowledge . . . . . . . . . 321

Qiaolian Liu, Jianfei Zhao, Naiwang Guo, Ding Xiao, and Chuan Shi

A Data Mining and Visual Analytics Perspective on Sustainability-Oriented

Infrastructure Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

Dimitri N. Mavris, Michael Balchanos, WoongJe Sung,

and Olivia J. Pinon

Visual Interactive Approach for Mining Twitter’s Networks . . . . . . . . . . . . . 342

Youcef Abdelsadek, Kamel Chelghoum, Francine Herrmann,

Imed Kacem, and Benoît Otjacques

Privacy Policy

Key Indicators for Data Sharing - In Relation with Digital Services. . . . . . . . 353

Sheak Rashed Haider Noori, Md. Kamrul Hossain, and Juwel Rana

Efficient Probabilistic Methods for Proof of Possession in Clouds . . . . . . . . . 364

Lukasz Krzywiecki, Krzysztof Majcher, and Wojciech Macyna

Cloud-Based Storage Model with Strong User Privacy Assurance . . . . . . . . . 373

Amir Rezapour, Wei Wu, and Hung-Min Sun

Social Media

The Role of Social Media in Innovation and Creativity: The Case

of Chinese Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

Jiwat Ram, Siqi Liu, and Andy Koronois

Malay Word Stemmer to Stem Standard and Slang Word Patterns

on Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391

Mohamad Nizam Kassim, Mohd Aizaini Maarof, Anazida Zainal,

and Amirudin Abdul Wahab

Two-Phase Computing Model for Chinese Microblog Sentimental Analysis . . . 401

Jianyong Duan, Chao Wang, Mei Zhang, and Hui Liu

Local Community Detection Based on Bridges Ideas . . . . . . . . . . . . . . . . . . 409

Xia Zhang, Zhengyou Xia, and Jiandong Wang

Environment for Data Transfer Measurement . . . . . . . . . . . . . . . . . . . . . . . 416

Sergey Khoruzhnikov, Vladimir Grudinin, Oleg Sadov, Andrey Shevel,

Stefanos Georgiou, and Arsen Kairkanov

XIV Contents

Tải ngay đi em, còn do dự, trời tối mất!