Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Advanced analytics and learning on temporal data
Nội dung xem thử
Mô tả chi tiết
Vincent Lemaire · Simon Malinowski ·
Anthony Bagnall · Alexis Bondu ·
Thomas Guyet · Romain Tavenard (Eds.)
123
LNAI 11986
4th ECML PKDD Workshop, AALTD 2019
Würzburg, Germany, September 20, 2019
Revised Selected Papers
Advanced Analytics
and Learning
on Temporal Data
Lecture Notes in Artificial Intelligence 11986
Subseries of Lecture Notes in Computer Science
Series Editors
Randy Goebel
University of Alberta, Edmonton, Canada
Yuzuru Tanaka
Hokkaido University, Sapporo, Japan
Wolfgang Wahlster
DFKI and Saarland University, Saarbrücken, Germany
Founding Editor
Jörg Siekmann
DFKI and Saarland University, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/1244
Vincent Lemaire • Simon Malinowski •
Anthony Bagnall • Alexis Bondu •
Thomas Guyet • Romain Tavenard (Eds.)
Advanced Analytics
and Learning
on Temporal Data
4th ECML PKDD Workshop, AALTD 2019
Würzburg, Germany, September 20, 2019
Revised Selected Papers
123
Editors
Vincent Lemaire
Orange Labs
Lannion, France
Simon Malinowski
Inria
University of Rennes
Rennes, France
Anthony Bagnall
University of East Anglia
Norwich, UK
Alexis Bondu
Orange Labs
Châtillon, France
Thomas Guyet
Irisa
Agrocampus Ouest
Rennes, France
Romain Tavenard
University of Rennes 2
Rennes, France
ISSN 0302-9743 ISSN 1611-3349 (electronic)
Lecture Notes in Artificial Intelligence
ISBN 978-3-030-39097-6 ISBN 978-3-030-39098-3 (eBook)
https://doi.org/10.1007/978-3-030-39098-3
LNCS Sublibrary: SL7 – Artificial Intelligence
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Workshop Description
The European Conference on Machine Learning and Principles and Practice of
Knowledge Discovery in Databases (ECML PKDD) is the premier European machine
learning and data mining conference and builds upon over 17 years of successful events
and conferences held across Europe. This year, ECML PKDD 2019, was held in
Würzburg, Germany, during September 16–20. It was complemented by a workshop
program, where each workshop was dedicated to specialized topics, to cross-cutting
issues, and to upcoming trends. This stand-alone LNAI volume includes the selected
papers of the Advanced Analytics and Learning on Temporal Data (AALTD 2019)
Workshop.
Motivation - Temporal data are frequently encountered in a wide range of domains
such as bio-informatics, medicine, finance, and engineering, among many others. They
are naturally present in applications motion and vision analysis, or more emerging ones
such as energy efficient building, smart cities, dynamic social media, or sensor networks. Contrary to static data, temporal data are of complex nature, they are generally
noisy, of high dimensionality, they may be non stationary (i.e. first order statistics vary
with time) and irregular (involving several time granularities), and they may have
several invariant domain-dependent factors such as time delay, translation, scale, or
tendency effects. These temporal peculiarities limit the majority of standard statistical
models and machine learning approaches, that mainly assume i.i.d data,
homoscedasticity, normality of residuals, etc. To tackle such challenging temporal data,
one appeals for new advanced approaches at the bridge of statistics, time series analysis, signal processing, and machine learning. Defining new approaches that transcend
boundaries between several domains to extract valuable information from temporal
data is undeniably a hot topic in the for near future, that has been the subject of active
research this last decade.
Workshop Topics - The aim of this fourth edition of the workshop, AALTD 20191
,
held in conjunction with ECML PKDD 2019, was to bring together researchers and
experts in machine learning, data mining, pattern analysis, and statistics to share their
challenging issues and advances in temporal data analysis. Analysis and learning from
temporal data covers a wide scope of tasks including learning metrics, learning
representations, unsupervised feature extraction, clustering, and classification.
The proposed workshop received papers that cover one or several of the following
topics:
– Temporal Data Clustering
– Classification of Univariate and Multivariate Time Series
– Early Classification of Temporal Data
1 https://project.inria.fr/aaltd19/.
– Deep Learning and Learning Representations for Temporal Data
– Modeling Temporal Dependencies
– Advanced Forecasting and Prediction Models
– Space-Temporal Statistical Analysis
– Functional Data Analysis Methods
– Temporal Data Streams
– Interpretable Time-Series Analysis Methods
– Dimensionality Reduction, Sparsity, Algorithmic Complexity, and Big Data
Challenges
– Bio-Informatics, Medical, Energy Consumption, and Temporal Data
Outcomes - AALTD 2019 was structured as a full-day workshop. We encouraged
submissions of regular papers that were up to 16 pages of unpublished work. All
submitted papers were peer reviewed (double-blind) by two or three reviewers from the
Program Committee, and selected on the basis of these reviews. AALTD 2019 received
31 submissions, among which 16 papers were accepted for inclusion in the proceedings. The papers with higher review ratings were selected for an oral presentation, and
the others were given the opportunity to present a poster through a spotlight session and
a discussion session. The workshop started with an invited talk “Time Series Classification at Scale”
2 given by Francois Petitjean from the Monash University, Australia.
We thank all organizers and reviewers for the time and effort invested. We would
also like to express our gratitude to the members of the Program Committee. We also
thank the ECML, the Organizing Committee (particularly Peggy and Kurt, the workshop and tutorial chairs), and the local staff who helped us. Sincere thanks are due to
Springer for their help in publishing the proceedings. Lastly, we thank all participants
and invited speaker of the ECML PKDD 2019 workshops for their contributions that
made the workshop really interesting.
November 2019 Vincent Lemaire
Simon Malinowski
Anthony Bagnall
Alexis Bondu
Thomas Guyet
Romain Tavenard
2 https://www.francois-petitjean.com/Research/Petitjean-AALTD2019.pdf.
vi Preface
Organization
Program Committee Chairs
Anthony Bagnall University of East Anglia, UK
Alexis Bondu Orange Labs, France
Thomas Guyet IRISA, France
Vincent Lemaire Orange Labs, France
Simon Malinowski Université de Rennes, Inria, CNRS, IRISA, France
Romain Tavenard Université de Rennes 2, COSTEL, France
Program Committee
Amaia Abanda Basque Center for Applied Mathematics (BCAM),
Spain
Mustafa Baydoğan Boğaziçi University, Turkey
Albert Bifet LTCI, Télécom ParisTech, France
Andreas Brandmaier Max Planck Institute for Human Development,
Germany
Clément Christophe ERIC, Université Lyon 2, France
Hoang Anh Dau University of California, Riverside, USA
Germain Forestier Université de Haute-Alsace, France
Dominique Gay Université de La Réunion, France
David Guijo-Rubio Universidad de Córdoba, Spain
Paul Honeine Université de Rouen, France
Hassan Ismail Fawaz Universite de Haute-Alsace, France
Isak Karlsson Stockholm University, Sweden
Nikos Katzouris NCSR Demokritos, Greece
James Large University of East Anglia, UK
Jason Lines University of East Anglia, UK
Usue Mori University of the Basque Country, Spain
Pierre Nodet Orange Labs, France
Charlotte Pelletier Université de Bretagne-Sud, IRISA, France
Francois Petitjean Monash University, Australia
Patrick Schäfer Humboldt Universität zu Berlin, Germany
Pavel Senin Los Alamos National Laboratory, USA
Chang Wei Monash University, Australia
Julien Velcin ERIC, Université Lyon 2, France
Contents
Oral Presentation
Robust Functional Regression for Outlier Detection. . . . . . . . . . . . . . . . . . . 3
Harjit Hullait, David S. Leslie, Nicos G. Pavlidis, and Steve King
Transform Learning Based Function Approximation for Regression
and Forecasting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Kriti Kumar, Angshul Majumdar, M. Girish Chandra,
and A. Anil Kumar
Proactive Fiber Break Detection Based on Quaternion Time Series
and Automatic Variable Selection from Relational Data . . . . . . . . . . . . . . . . 26
Vincent Lemaire, Fabien Boitier, Jelena Pesic, Alexis Bondu,
Stéphane Ragot, and Fabrice Clérot
A Fully Automated Periodicity Detection in Time Series . . . . . . . . . . . . . . . 43
Tom Puech, Matthieu Boussard, Anthony D’Amato,
and Gaëtan Millerand
Conditional Forecasting of Water Level Time Series with RNNs. . . . . . . . . . 55
Bart J. van der Lugt and Ad J. Feelders
Challenges and Limitations in Clustering Blood Donor
Hemoglobin Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Marieke Vinkenoog, Mart Janssen, and Matthijs van Leeuwen
Localized Random Shapelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Mael Guillemé, Simon Malinowski, Romain Tavenard,
and Xavier Renard
Poster Presentation
Feature-Based Gait Pattern Classification for a Robotic Walking Frame . . . . . 101
Christopher M. A. Bonenberger, Benjamin Kathan, and Wolfgang Ertel
How to Detect Novelty in Textual Data Streams?
A Comparative Study of Existing Methods. . . . . . . . . . . . . . . . . . . . . . . . . 110
Clément Christophe, Julien Velcin, Jairo Cugliari, Philippe Suignard,
and Manel Boumghar
Seq2VAR: Multivariate Time Series Representation with Relational Neural
Networks and Linear Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . 126
Edouard Pineau, Sébastien Razakarivony, and Thomas Bonald
Modelling Patient Sequences for Rare Disease Detection
with Semi-supervised Generative Adversarial Nets . . . . . . . . . . . . . . . . . . . 141
Kezi Yu, Yunlong Wang, and Yong Cai
Extended Kalman Filter for Large Scale Vessels Trajectory Tracking
in Distributed Stream Processing Systems . . . . . . . . . . . . . . . . . . . . . . . . . 151
Katarzyna Juraszek, Nidhi Saini, Marcela Charfuelan, Holmer Hemsen,
and Volker Markl
Unsupervised Anomaly Detection in Multivariate Spatio-Temporal
Datasets Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Yildiz Karadayi
Learning Stochastic Dynamical Systems via Bridge Sampling. . . . . . . . . . . . 183
Harish S. Bhat and Shagun Rawat
Quantifying Quality of Actions Using Wearable Sensor . . . . . . . . . . . . . . . . 199
Mohammad Al-Naser, Takehiro Niikura, Sheraz Ahmed, Hiroki Ohashi,
Takuto Sato, Mitsuhiro Okada, Katsuyuki Nakamura,
and Andreas Dengel
An Initial Study on Adapting DTW at Individual Query
for Electrocardiogram Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Daniel Shen and Min Chi
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
x Contents
Oral Presentation
Robust Functional Regression for Outlier
Detection
Harjit Hullait1(B)
, David S. Leslie1(B)
, Nicos G. Pavlidis1(B)
,
and Steve King2(B)
1 Lancaster University, Lancaster, UK
{h.hullait,d.leslie,n.pavlidis}@lancaster.ac.uk 2 Rolls Royce PLC, Derby, UK
Abstract. In this paper we propose an outlier detection algorithm for
temperature sensor data from jet engine tests. Effective identification
of outliers would enable engine problems to be examined and resolved
efficiently. Outlier detection in this data is challenging because a human
controller determines the speed of the engine during each manoeuvre.
This introduces variability which can mask abnormal behaviour in the
engine response. We therefore suggest modelling the dependency between
speed and temperature in the process of identifying abnormalities. The
engine temperature has a delayed response with respect to the engine
speed, which we will model using robust functional regression. We then
apply functional depth with respect to the residuals to rank the samples
and identify the outliers. The effectiveness of the outlier detection algorithm is shown in a simulation study. The algorithm is also applied to real
engine data, and identifies samples that warrant further investigation.
Keywords: Robust functional data analysis · Robust model
selection · Outlier detection
Before a jet engine is delivered it must complete a Pass-Off test. In a Pass-Off
test a controller performs manoeuvres, which can be defined as various engine
accelerations and decelerations starting and ending at a set idle speed. The purpose of this test is to ensure the engine complies with set standards. During the
test, data is captured by sensors measuring engine speed, pressure, temperature
and vibration in different parts of the engine. This high-frequency measurement
data offers the ability to automate the detection of engine problems. By building statistical models for the Pass-Off test data we can aid the engineers in
identifying engine issues efficiently.
One of the key manoeuvres in a Pass-Off test is the Vibration Survey (VS).
In this manoeuvre the engine is accelerated slowly to a certain speed then slowly
decelerated. We have 199 VS datasets, which include the turbine pressure ratio
(TPR) that measures the engine speed, and the turbine gas temperature (TGT)
which is a key temperature feature. In Fig. 1 we have plots of the TPR and TGT
c Springer Nature Switzerland AG 2020
V. Lemaire et al. (Eds.): AALTD 2019, LNAI 11986, pp. 3–13, 2020.
https://doi.org/10.1007/978-3-030-39098-3_1
4 H. Hullait et al.
for the 30 VS manoeuvres. We have transformed the time index to the interval
[0, 1] and the range of sensor measurements to [0, 100].
Automated detection of abnormal engine behaviour has been studied before
[9,14]. Both approaches require a training set of “normal” samples to build a
normality model. They then apply novelty detection using an appropriate distance measure and threshold. We will instead use Functional Data Analysis
(FDA) methods to identify VS manoeuvres that display unusual temperature
behaviour in response to the variable (human-controlled) TPR time series. We
will robustly build a normality model without requiring a set of “normal” samples. FDA techniques have been used effectively to model sensor data [13], as
they combine information across samples and exploit the underlying behavioural
structure. However this is to the best of our knowledge the first time these techniques are being used for modelling jet engine data.
We will use robust Functional Linear Regression (FLR) to build a model of
“normal” behaviour. We shall then use the residuals from this model to identify
outlying behaviour. The residuals are time series therefore using metrics such
as the mean-square error means we lose a lot of information. Instead we will
apply functional depth [6], which is capable of identifying various types of outlier
behaviour.
There are a number of functional outlier detection methods, including the
threshold approach [8], the Functional Boxplot [22] and the Outliergram [2],
which use functional depth [15] to rank the curves. Alternative approaches use
Directional Outlyingness measures, such as MS-plots [7] and Functional Outlier
Maps [19]. There are also approaches for multivariate functional data [10]. These
methods do not model the dependency between the functional response and
functional input, and may therefore miss important outliers. Robust FLR can
model this dependency structure, which can improve the detection of outliers.
This paper is organised as follows. In Sect. 1 we summarise the FDA methods,
which will be used in the outlier detection algorithm. In Sect. 2, we will develop
robust FDA techniques to obtain a robust regression model. In Sect. 3, we show
how the robust regression model can be used to identify outliers. In Sect. 4
we give simulation results comparing the robust model with a classical model.
Fig. 1. Plots of 30 TPR and TGT time series.
Robust Functional Regression for Outlier Detection 5
Finally in Sect. 5 we apply the robust model on the engine data and highlight
the outliers identified.
1 Classical Functional Data Analysis
In this section we give a brief summary of the FDA tools that we will later apply
in our model. In the following sections we will use the vector space L2(I) which
is the Hilbert space of square integrable functions on the compact interval I with
the inner product f,g =
I f(t)g(t)dt for functions f,g ∈ L2(I).
We will define X(t), Y (t) to be univariate stochastic processes defined on
I, with mean functions μX(t) and μY (t), and covariance functions CX(s, t) =
cov{X(s), X(t)} and CY (s, t) = cov{Y (s), Y (t)} for all s, t ∈ I. We shall define
x(t)=[x1(t), ..., xn(t)] and y(t)=[y1(t), ..., yn(t)] be n samples from X(t) and
Y (t) respectively.
In practice we observe xi(t) and yi(t) at discrete time points. We shall
assume for simplicity of exposition that observations are made at equally spaced
time points t1, ..., tN . We will outline Functional Linear Regression and Functional Principal Component Analysis with respect to the underlying functions.
In Sect. 1.3 we need to use the discretely observed data to define a suitable model
selection criterion.
1.1 Functional Linear Regression
In this section we will introduce the FLR model [16], which we will use to model
the relationship between TGT and TPR for the VS manoeuvre. In FLR we
model the relationship between predictor xi(t) and response yi(t) as:
yi(t) = α(t) +
I
xi(s)β(s, t)ds + i(t), (1)
where α(t) is the intercept function, β(s, t) is the regression function and i(t)
is the error process. For a fixed t, we can think of β(s, t) as the relative weight
placed on xi(s) to predict yi(t). For simplicity we will assume the mean functions
μX(t) = 0 and μY (t) = 0 which thereby means α(t) = 0. This is a reasonable
assumption as in practice we can calculate the mean functions μX(t) and μY (t)
efficiently for dense data and then pre-process the data by subtracting μX(t)
and μY (t) from the observed curves.
FLR in the function-on-function case is a well studied model. There are
typically two approaches taken: basis methods [5,23] and grid based methods
[11,20]. The basis approach will be used as it is computationally efficient.
We will represent xi(t) and yi(t) in terms of M pre-chosen basis functions
φX
j (t), φY
j (t) respectively:
xi(t) =
M
j=1
zijφX
j (t) and yi(t) =
M
j=1
wijφY
j (t).
6 H. Hullait et al.
For notational simplicity we have assumed that xi(t) and yi(t) can be represented
by the same number of functions M, however this assumption can be easily
relaxed.
We define φX(t)=[φX
1 (t), ..., φX
M (t)], φY (s)=[φY
1 (s), ..., φY
M (s)], zi =
[zi1, ..., ziM ] and wi = [wi1, ..., wiM ]. We will then model the regression surface
using a double basis expansion [17]:
β(s, t) =
M
l=1
M
m=1
bmlφX
m(s)φY
l (t) = φX(s)
T BφY (t), (2)
for an M × M regression matrix B. We can then write:
yi(t) = ziBφY (t) + i(t). (3)
Letting i(t) = qiφY (t) [5] we can reduce Eq. (3) to:
wi = ziB + qi. (4)
This simplification enables us to estimate B using standard multivariate regression methods.
1.2 Functional Principal Component Analysis
In this section we describe Functional Principal Component Analysis (FPCA),
which we will use to build data-driven basis functions φX(t) and φY (t) for
xi(t) and yi(t), respectively. These basis functions give effective, low-dimensional
representations and will be used in the Functional Linear Regression model
described in Sect. 1.1.
Functional Principal Component Analysis (FPCA) is a method of finding
dominant modes of variance for functional data. These dominant modes of variance are called the Functional Principal Components (FPCs). FPCA is also used
as a dimensionality reduction tool, as a set of observed curves can be effectively
approximated by a linear combination of a small set of FPCs. These FPCs form
an orthonormal basis over L2(I) [21].
The FPCs, φX
k (t) for k = 1, 2, ..., are the eigenfunctions of the covariance
function CX(s, t) with eigenvalues λX
k . Note that the eigenfunctions are ordered
by the respective eigenvalues. The Karhunen-Lo´eve theorem shows that xi(t)
can be decomposed as xi(t) = ∞
k=1 zikφX
k (t) where the principal component
score zik =
I xi(t)φX
k (t)dt.
We can define the M-truncation as:
xˆi(t) =
M
k=1
zikφX
k (t), (5)
which gives the minimal residual error:
1
n
n
i=1
||xi − xˆi||2 = 1
n
n
i=1
I
[xi(t) − xˆi(t)]2dt, (6)