Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Analyzing Time Interval Data
Nội dung xem thử
Mô tả chi tiết
Analyzing
Time Interval
Data
Philipp Meisen
Introducing an Information System
for Time Interval Data Analysis
Analyzing Time Interval Data
Philipp Meisen
Analyzing
Time Interval Data
Introducing an Information System
for Time Interval Data Analysis
Philipp Meisen
Aachen, Germany
ISBN 978-3-658-15727-2 ISBN 978-3-658-15728-9 (eBook)
DOI 10.1007/978-3-658-15728-9
Library of Congress Control Number: 2016952631
Springer Vieweg
© Springer Fachmedien Wiesbaden GmbH 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained
herein or for any errors or omissions that may have been made.
Printed on acid-free paper
This Springer Vieweg imprint is published by Springer Nature
The registered company is Springer Fachmedien Wiesbaden GmbH
The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany
D82 (Diss. RWTH Aachen University, 2015)
Acknowledgments
For Edison and Isaac
First of all, I want to thank all the people that helped me making this
work possible. Especially, I want to mention Sabina Jeschke for her supervision and advice, my managing director, friend, and brother Tobias Meisen
for sharing his knowledge and experience and pushing me whenever
needed, my co-worker and friend Christian Kohlschein for listening, having
endless discussions and reviewing my work, Angelika Reimer for creating
the illustrations, and Diane Wittman for helping me formatting the book.
I also want to give some special thanks and dedications to the people,
which follow me my whole life like my own shadow. My elder brother Holger,
who helped me whenever I was in doubt, my already mentioned twinbrother Tobias for all the “Schokostreuselbrötchen” and discussions, my
parents for making all this possible by having, loving, and supporting me,
and also my dearest friends Tummel, Hoomer, Christian, Diane, and Marco
for every talk, time-out, and drink we had. Thank you all, for being there for
me whenever needed.
Last but not least, I want to express my deepest gratitude to my wife
Deborah for her support whenever it was needed. Without her this work
would never have been possible.
Philipp
Abstract
Time interval data is data which associates information with a specific time
range (i.e., a time window) defined by a start- and an end time point. Thus,
time intervals are a generalization of time points, i.e., each time point is a
time interval having the same start- and end time point. Nowadays, huge
sets of time interval data is collected in various situations, e.g., personnel
deployment, equipment usage, process control, or process management.
Common systems are not capable to analyze these amounts of time interval data. Questions like “How many resources were utilized on Mondays in
an annual average” or “Which days overlap with the planning and which
are diametrically” cannot be answered utilizing modern systems or need
extensive data integration processes.
In this thesis, a model to analyze time interval data (TIDAMODEL) is introduced. Based on this model, a query language (TIDAQL) is defined,
which can be utilized to answer complex questions as presented in the
previous chapter. Furthermore, a similarity measure based on different
types of distance measures (TIDADISTANCE) is presented. This similarity
measure enables users to search for similar situations within a time interval
database. The different solutions are combined to design and realize the
central result of the thesis, i.e., an information system to analyze time interval data (TIDAIS). The introduced system utilizes different, bitmap based
indexes, which enable the system to handle huge amounts of data.
The results of the evaluation show that the presented implementation
fulfills the requirements formulated by different stakeholders. In addition, it
outperforms state-of-the-art solutions (e.g., solutions based on the Oracle
database management system, icCube, or TimeDB).
Zusammenfassung
Zeitintervalldaten sind Daten welche innerhalb eines Zeitfensters, d.h. zwischen einem Start- und Endzeitpunkt, erfasst werden und eine Verallgemeinerung von Zeitpunktdaten darstellen. Heutzutage werden immer häufiger große Mengen von Zeitintervalldaten in Bereichen wie z.B. der Personaldisposition, Gerätenutzung, Prozesssteuerung oder Planung erfasst.
Die Auswertung von diesen Daten stellt gängige Analysesysteme vor
große Herausforderungen. Fragestellungen wie „Wie viele Ressourcen
wurden im Jahresdurchschnitt montags über den Tag verteilt in der Fertigung benötigt?“ oder „Welche Tage sind bzgl. der Planung am genausten
und welche verlaufen diametral“ können meistens mit modernen Systemen
gar nicht modelliert oder nur durch Verwendung von langwierigen Integrationsprozessen beantwortet werden.
In dieser Arbeit wird zunächst eine auf diskreten Zeitachsen basierende
Modellierung (TIDAMODEL) vorgestellt. Basierend auf dieser Modellierung
wird im Weiteren eine Anfragesprache (TIDAQL) definiert, welche die Beantwortung komplexer Fragestellungen, wie weiter oben angedeutet, ermöglicht. Neben der Beantwortung von Fragen ist die Suche nach ähnlichen Gegebenheiten eine wichtige Eigenschaft von Informationssystemen.
Um diese Ähnlichkeitssuche zu ermöglichen, wird in der Arbeit ein Ähnlichkeitsmaß (TIDADISTANCE) präsentiert. Diese einzelnen vorgestellten
Teilergebnisse werden genutzt, um das zentrale Ergebnis der Arbeit, ein
Informationssystem zur Analyse von Zeitintervalldaten (TIDAIS), zu entwerfen und zu realisieren. Das vorgestellte System basiert dabei auf Bitmaps,
welche die Auswertung von großen Datenmengen von Zeitintervalldaten
ermöglicht. Die Evaluierungsergebnisse zeigen, dass das vorgestellte System andere Lösungen (z.B. Lösungen die auf icCube, TimeDB oder moderne Datenbankmanagementsysteme wie Oracle basieren) bzgl. der Auswertungsperformanz übertrifft.
Table of Contents
Acknowledgments V
Abstract VII
Zusammenfassung IX
Table of Contents XI
List of Abbreviations XV
List of Figures XIX
List of Tables XXV
List of Listings XXVII
List of Definitions XXXI
1 Introduction and Motivation 1
2 Time Interval Data Analysis 7
2.1 Time 7
2.1.1 Time Intervals 7
2.1.2 Time Interval Data Aggregation 10
2.1.3 Temporal Models 14
2.1.4 Temporal Operators 20
2.1.5 Temporal Concepts 22
2.1.6 Special Characteristics of Time 23
2.2 Features of Time Interval Data Analysis Information System 29
2.2.1 Analytical Capabilities 30
2.2.2 Time Interval Data Analysis Process 35
2.2.3 User Interface, Visualization, and User Interactions 42
2.3 Summary 43
3 State of the Art 45
3.1 Analytical Information Systems 45
3.2 Analyzing Time Interval Data: Different Approaches 46
3.2.1 On-Line Analytical Processing 47
3.2.2 Temporal Pattern Mining & Association Rule Mining 52
3.2.3 Visual Analytics 54
XII Table of Contents
3.3 Performance Improvements 56
3.3.1 Indexing Time Interval Data 56
3.3.2 Aggregating Time Interval Data 60
3.3.3 Caching Time Interval Data 61
3.4 Analytical Query Languages for Temporal Data 62
3.5 Similarity of Time Interval Data 67
3.6 Summary 70
4 TIDAMODEL: Modeling Time Interval Data 73
4.1 Time Axis 73
4.2 Descriptors 76
4.3 Time Interval Database 80
4.4 Dimensional Modeling 82
4.5 Summary 87
5 TIDAQL: Querying for Time Interval Data 91
5.1 Data Control Language 92
5.2 Data Definition Language 95
5.3 Data Manipulation Language 96
5.3.1 Insert, Delete, & Update Statements 97
5.3.2 Get & Alive Statements 99
5.3.3 Select Statements 100
5.4 Summary 108
6 TIDADISTANCE: Similarity of Time Interval Data 111
6.1 Temporal Order Distance 113
6.2 Temporal Relational Distance 115
6.3 Temporal Measure Distance 117
6.4 Temporal Similarity Measure 118
7 TIDAIS: An Information System for Time Interval Data 121
7.1 System’s Architecture, Components, and Implementation 121
7.1.1 Data Repository 125
7.1.2 Cache & Storage 127
7.2 Configuration 129
Table of Contents XIII
7.2.1 Model Configuration 130
7.2.2 System Configuration 145
7.3 Data Structures & Algorithms 149
7.3.1 Model Handling 150
7.3.2 Indexes 156
7.3.3 Caching & Storage 165
7.3.4 Aggregation Techniques 167
7.3.5 Distance Calculation 171
7.4 User Interfaces 176
7.5 Summary 178
8 Results & Evaluation 181
8.1 Requirements & Features 181
8.2 Performance 187
8.2.1 High Performance Collections 188
8.2.2 Load Performance 189
8.2.3 Selection Performance 190
8.2.4 Distance Performance 196
8.2.5 Proprietary Solutions vs. TIDAIS 197
8.3 Summary 201
9 Summary and Outlook 203
Appendix 205
Pipelined Table Functions (PL/SQL Oracle) 205
A Complete Sample Model-Configuration-File 206
A Complete Sample Configuration-File 211
Detailed Overview of the Runtime Performance 215
3-NN of the Temporal Relational Similarity 217
Bibliography 219
List of Abbreviations
AD Active Directory
AIS Analytical Information System
AJAX Asynchronous JavaScript and XML
ANSI American National Standards Institute
ANTLR Another Tool for Language Recognition
API Application Programming Interface
ARTEMIS Assessing coRrespondence of Temporal Events Measure for
Interval Sequences
BI Business Intelligence
CET Central European Time (time zone)
CPU Central Processing Unit
CSS Cascading Style Sheets
CSV Comma Separated Value
DBMS Database Management System
DCL Data Control Language
DDL Data Definition Language
DML Data Manipulation Language
DSS Decision Support System
DST Daylight Saving Time
DTW Dynamic Time Warping
DW Data Warehouse
JDBC Java Database Connectivity
JMS Java Message Service
JSON Java Simple Object Notation
GB Giga Byte
GIS Geographic Information System
GPU Graphics Processor Unit
GTA General Temporal Aggregation
GUI Graphical User Interface
HCC Hybrid Columnar Compression
XVI List of Abbreviations
HOLAP Hybrid OLAP
HTML HyperText Markup Language
HTTP Hypertext Transport Protocol
IBSM Interval Based Sequence Matching
ISO International Organization for Standardization
ITA Instant Temporal Aggregation
k-NN k-nearest neighbors
LDAP Lightweight Directory Access Protocol
LRU Least Recently Used (cache algorithms)
MB Mega Byte
MDX Multidimensional Expressions
MOLAP Multidimensional OLAP
MRU Most Recently Used (cache algorithms)
MWTA Moving-Window Temporal Aggregation
NoSQL Not Only SQL
OLAM On-Line Analytical Mining
OLAP On-Line Analytical Processing
PDT Pacific Daylight Time (time zone)
PL/SQL Procedural Language/Structured Query Language
POJO Plain Old Java Object
ROLAP Relational OLAP
RR Random Replacement (cache algorithms)
RQ Research Question
SQL Structured Query Language
STA Span Temporal Aggregation
SVG Scalable Vector Graphics
TAT Two-step Aggregation Technique
TIDA Time Interval Data Analysis
UI User Interface
UTC Coordinated Universal Time (time zone)
XML Extensible Markup Language
List of Abbreviations XVII
XSD XML Schema Definition
XSLT Extensible Stylesheet Language Transformation
List of Figures
Figure 2.1 Apple falling from tree, example of a time interval and associated information observed, measured or calculated
during the process of an apple falling from a tree. 8
Figure 2.2 Machine performance, example of a time interval and associated information observed, measured, or calculated
during the execution of a task by a machine. 9
Figure 2.3 Example of ITA and MWTA (temporal aggregation forms
creating constant intervals). 12
Figure 2.4 Example of STA and TAT (temporal aggregation forms
creating constant intervals). 13
Figure 2.5 Overview of the different aspects of a temporal model. 15
Figure 2.6 The fall property using a discrete (left) and continuous
(right) temporal model. Within the discrete chart, the
diamonds mark the value of the property and the
triangles illustrate the indivisible delta between the
previous and the current time point. 16
Figure 2.7 The item property using a discrete (left) and continuous
(right) temporal model. Within the discrete chart, the
diamonds mark the value of the item property and the
triangles illustrate the indivisible delta between the
previous and the current time point. 17
Figure 2.8 Example of a mapping between data of a circular
temporal model to a linear temporal model. 19
Figure 2.9 Selection of a time window from an unbounded
temporal model to be presented and analyzable in
a bounded temporal model. 20
Figure 2.10 Overview of Allen’s (1983) temporal operators. 20
XX List of Figures
Figure 2.11 Illustration of the ambiguousness of Allen’s (1983)
temporal operators. 21
Figure 2.12 Examples of commonly used temporal concepts. 22
Figure 2.13 Example of the impact of different time zones within the
scope of temporal analytics. 24
Figure 2.14 Illustration exemplifying the error of calculating
statistical values, e.g., the amount of intervals per hour. 25
Figure 2.15 Overview of selected features defined in the category
descriptive analytics in the context of time interval data
analysis (cf. Table 2.1). 33
Figure 2.16 The data science process following Schutt, O'Neil
(2014). 36
Figure 2.17 The result of the workshops regarding the time interval
data analysis process. 38
Figure 3.1 Examples of the different types of hierarchies
(non-strict, non-covering, and non-onto). 48
Figure 3.2 Two examples of the summarizability problem. 49
Figure 3.3 Illustration of a scenario covered I-OLAP as presented
by Koncilia et al. (2014). 51
Figure 3.4 Examples of the visualization techniques Cluster
Viewer (van Wijk, van Selow 1999) and GROOVE
(Lammarsch et al. 2009). 55
Figure 3.5 Example of a bitmap-index containing three bitmaps,
one for each possible value (i.e., red, green, and
yellow) of the color-property. 58