Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

Trang chủ

Đăng nhập

Đăng ký

Mới

Đăng ký tài khoản mới

AI Tư vấn

Mới

Trợ lý thông minh tìm tài liệu

Liên hệ fanpage

Hỗ trợ tìm tài liệu

Lưu trang

Liên hệ fanpage

Advanced analytics and learning on temporal data

PREMIUM

Số trang

236

Kích thước

21.1 MB

Định dạng

PDF

Lượt xem

793

Tài liệu đang bị lỗi

File tài liệu này hiện đang bị hỏng, chúng tôi đang cố gắng khắc phục.

Advanced analytics and learning on temporal data

Nội dung xem thử

Mô tả chi tiết

Vincent Lemaire · Simon Malinowski ·

Anthony Bagnall · Alexis Bondu ·

Thomas Guyet · Romain Tavenard (Eds.)

123

LNAI 11986

4th ECML PKDD Workshop, AALTD 2019

Würzburg, Germany, September 20, 2019

Revised Selected Papers

Advanced Analytics

and Learning

on Temporal Data

Lecture Notes in Artificial Intelligence 11986

Subseries of Lecture Notes in Computer Science

Series Editors

Randy Goebel

University of Alberta, Edmonton, Canada

Yuzuru Tanaka

Hokkaido University, Sapporo, Japan

Wolfgang Wahlster

DFKI and Saarland University, Saarbrücken, Germany

Founding Editor

Jörg Siekmann

DFKI and Saarland University, Saarbrücken, Germany

More information about this series at http://www.springer.com/series/1244

Vincent Lemaire • Simon Malinowski •

Anthony Bagnall • Alexis Bondu •

Thomas Guyet • Romain Tavenard (Eds.)

Advanced Analytics

and Learning

on Temporal Data

4th ECML PKDD Workshop, AALTD 2019

Würzburg, Germany, September 20, 2019

Revised Selected Papers

123

Editors

Vincent Lemaire

Orange Labs

Lannion, France

Simon Malinowski

Inria

University of Rennes

Rennes, France

Anthony Bagnall

University of East Anglia

Norwich, UK

Alexis Bondu

Orange Labs

Châtillon, France

Thomas Guyet

Irisa

Agrocampus Ouest

Rennes, France

Romain Tavenard

University of Rennes 2

Rennes, France

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Artificial Intelligence

ISBN 978-3-030-39097-6 ISBN 978-3-030-39098-3 (eBook)

https://doi.org/10.1007/978-3-030-39098-3

LNCS Sublibrary: SL7 – Artificial Intelligence

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are

believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors

give a warranty, expressed or implied, with respect to the material contained herein or for any errors or

omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in

published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Workshop Description

The European Conference on Machine Learning and Principles and Practice of

Knowledge Discovery in Databases (ECML PKDD) is the premier European machine

learning and data mining conference and builds upon over 17 years of successful events

and conferences held across Europe. This year, ECML PKDD 2019, was held in

Würzburg, Germany, during September 16–20. It was complemented by a workshop

program, where each workshop was dedicated to specialized topics, to cross-cutting

issues, and to upcoming trends. This stand-alone LNAI volume includes the selected

papers of the Advanced Analytics and Learning on Temporal Data (AALTD 2019)

Workshop.

Motivation - Temporal data are frequently encountered in a wide range of domains

such as bio-informatics, medicine, finance, and engineering, among many others. They

are naturally present in applications motion and vision analysis, or more emerging ones

such as energy efficient building, smart cities, dynamic social media, or sensor networks. Contrary to static data, temporal data are of complex nature, they are generally

noisy, of high dimensionality, they may be non stationary (i.e. first order statistics vary

with time) and irregular (involving several time granularities), and they may have

several invariant domain-dependent factors such as time delay, translation, scale, or

tendency effects. These temporal peculiarities limit the majority of standard statistical

models and machine learning approaches, that mainly assume i.i.d data,

homoscedasticity, normality of residuals, etc. To tackle such challenging temporal data,

one appeals for new advanced approaches at the bridge of statistics, time series analysis, signal processing, and machine learning. Defining new approaches that transcend

boundaries between several domains to extract valuable information from temporal

data is undeniably a hot topic in the for near future, that has been the subject of active

research this last decade.

Workshop Topics - The aim of this fourth edition of the workshop, AALTD 20191

held in conjunction with ECML PKDD 2019, was to bring together researchers and

experts in machine learning, data mining, pattern analysis, and statistics to share their

challenging issues and advances in temporal data analysis. Analysis and learning from

temporal data covers a wide scope of tasks including learning metrics, learning

representations, unsupervised feature extraction, clustering, and classification.

The proposed workshop received papers that cover one or several of the following

topics:

– Temporal Data Clustering

– Classification of Univariate and Multivariate Time Series

– Early Classification of Temporal Data

1 https://project.inria.fr/aaltd19/.

– Deep Learning and Learning Representations for Temporal Data

– Modeling Temporal Dependencies

– Advanced Forecasting and Prediction Models

– Space-Temporal Statistical Analysis

– Functional Data Analysis Methods

– Temporal Data Streams

– Interpretable Time-Series Analysis Methods

– Dimensionality Reduction, Sparsity, Algorithmic Complexity, and Big Data

Challenges

– Bio-Informatics, Medical, Energy Consumption, and Temporal Data

Outcomes - AALTD 2019 was structured as a full-day workshop. We encouraged

submissions of regular papers that were up to 16 pages of unpublished work. All

submitted papers were peer reviewed (double-blind) by two or three reviewers from the

Program Committee, and selected on the basis of these reviews. AALTD 2019 received

31 submissions, among which 16 papers were accepted for inclusion in the proceedings. The papers with higher review ratings were selected for an oral presentation, and

the others were given the opportunity to present a poster through a spotlight session and

a discussion session. The workshop started with an invited talk “Time Series Classification at Scale”

2 given by Francois Petitjean from the Monash University, Australia.

We thank all organizers and reviewers for the time and effort invested. We would

also like to express our gratitude to the members of the Program Committee. We also

thank the ECML, the Organizing Committee (particularly Peggy and Kurt, the workshop and tutorial chairs), and the local staff who helped us. Sincere thanks are due to

Springer for their help in publishing the proceedings. Lastly, we thank all participants

and invited speaker of the ECML PKDD 2019 workshops for their contributions that

made the workshop really interesting.

November 2019 Vincent Lemaire

Simon Malinowski

Anthony Bagnall

Alexis Bondu

Thomas Guyet

Romain Tavenard

2 https://www.francois-petitjean.com/Research/Petitjean-AALTD2019.pdf.

vi Preface

Organization

Program Committee Chairs

Anthony Bagnall University of East Anglia, UK

Alexis Bondu Orange Labs, France

Thomas Guyet IRISA, France

Vincent Lemaire Orange Labs, France

Simon Malinowski Université de Rennes, Inria, CNRS, IRISA, France

Romain Tavenard Université de Rennes 2, COSTEL, France

Program Committee

Amaia Abanda Basque Center for Applied Mathematics (BCAM),

Spain

Mustafa Baydoğan Boğaziçi University, Turkey

Albert Bifet LTCI, Télécom ParisTech, France

Andreas Brandmaier Max Planck Institute for Human Development,

Germany

Clément Christophe ERIC, Université Lyon 2, France

Hoang Anh Dau University of California, Riverside, USA

Germain Forestier Université de Haute-Alsace, France

Dominique Gay Université de La Réunion, France

David Guijo-Rubio Universidad de Córdoba, Spain

Paul Honeine Université de Rouen, France

Hassan Ismail Fawaz Universite de Haute-Alsace, France

Isak Karlsson Stockholm University, Sweden

Nikos Katzouris NCSR Demokritos, Greece

James Large University of East Anglia, UK

Jason Lines University of East Anglia, UK

Usue Mori University of the Basque Country, Spain

Pierre Nodet Orange Labs, France

Charlotte Pelletier Université de Bretagne-Sud, IRISA, France

Francois Petitjean Monash University, Australia

Patrick Schäfer Humboldt Universität zu Berlin, Germany

Pavel Senin Los Alamos National Laboratory, USA

Chang Wei Monash University, Australia

Julien Velcin ERIC, Université Lyon 2, France

Contents

Oral Presentation

Robust Functional Regression for Outlier Detection. . . . . . . . . . . . . . . . . . . 3

Harjit Hullait, David S. Leslie, Nicos G. Pavlidis, and Steve King

Transform Learning Based Function Approximation for Regression

and Forecasting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Kriti Kumar, Angshul Majumdar, M. Girish Chandra,

and A. Anil Kumar

Proactive Fiber Break Detection Based on Quaternion Time Series

and Automatic Variable Selection from Relational Data . . . . . . . . . . . . . . . . 26

Vincent Lemaire, Fabien Boitier, Jelena Pesic, Alexis Bondu,

Stéphane Ragot, and Fabrice Clérot

A Fully Automated Periodicity Detection in Time Series . . . . . . . . . . . . . . . 43

Tom Puech, Matthieu Boussard, Anthony D’Amato,

and Gaëtan Millerand

Conditional Forecasting of Water Level Time Series with RNNs. . . . . . . . . . 55

Bart J. van der Lugt and Ad J. Feelders

Challenges and Limitations in Clustering Blood Donor

Hemoglobin Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Marieke Vinkenoog, Mart Janssen, and Matthijs van Leeuwen

Localized Random Shapelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Mael Guillemé, Simon Malinowski, Romain Tavenard,

and Xavier Renard

Poster Presentation

Feature-Based Gait Pattern Classification for a Robotic Walking Frame . . . . . 101

Christopher M. A. Bonenberger, Benjamin Kathan, and Wolfgang Ertel

How to Detect Novelty in Textual Data Streams?

A Comparative Study of Existing Methods. . . . . . . . . . . . . . . . . . . . . . . . . 110

Clément Christophe, Julien Velcin, Jairo Cugliari, Philippe Suignard,

and Manel Boumghar

Seq2VAR: Multivariate Time Series Representation with Relational Neural

Networks and Linear Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . 126

Edouard Pineau, Sébastien Razakarivony, and Thomas Bonald

Modelling Patient Sequences for Rare Disease Detection

with Semi-supervised Generative Adversarial Nets . . . . . . . . . . . . . . . . . . . 141

Kezi Yu, Yunlong Wang, and Yong Cai

Extended Kalman Filter for Large Scale Vessels Trajectory Tracking

in Distributed Stream Processing Systems . . . . . . . . . . . . . . . . . . . . . . . . . 151

Katarzyna Juraszek, Nidhi Saini, Marcela Charfuelan, Holmer Hemsen,

and Volker Markl

Unsupervised Anomaly Detection in Multivariate Spatio-Temporal

Datasets Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Yildiz Karadayi

Learning Stochastic Dynamical Systems via Bridge Sampling. . . . . . . . . . . . 183

Harish S. Bhat and Shagun Rawat

Quantifying Quality of Actions Using Wearable Sensor . . . . . . . . . . . . . . . . 199

Mohammad Al-Naser, Takehiro Niikura, Sheraz Ahmed, Hiroki Ohashi,

Takuto Sato, Mitsuhiro Okada, Katsuyuki Nakamura,

and Andreas Dengel

An Initial Study on Adapting DTW at Individual Query

for Electrocardiogram Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Daniel Shen and Min Chi

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

x Contents

Oral Presentation

Robust Functional Regression for Outlier

Detection

Harjit Hullait1(B)

, David S. Leslie1(B)

, Nicos G. Pavlidis1(B)

and Steve King2(B)

1 Lancaster University, Lancaster, UK

{h.hullait,d.leslie,n.pavlidis}@lancaster.ac.uk 2 Rolls Royce PLC, Derby, UK

[email protected]

Abstract. In this paper we propose an outlier detection algorithm for

temperature sensor data from jet engine tests. Effective identification

of outliers would enable engine problems to be examined and resolved

efficiently. Outlier detection in this data is challenging because a human

controller determines the speed of the engine during each manoeuvre.

This introduces variability which can mask abnormal behaviour in the

engine response. We therefore suggest modelling the dependency between

speed and temperature in the process of identifying abnormalities. The

engine temperature has a delayed response with respect to the engine

speed, which we will model using robust functional regression. We then

apply functional depth with respect to the residuals to rank the samples

and identify the outliers. The effectiveness of the outlier detection algorithm is shown in a simulation study. The algorithm is also applied to real

engine data, and identifies samples that warrant further investigation.

Keywords: Robust functional data analysis · Robust model

selection · Outlier detection

Before a jet engine is delivered it must complete a Pass-Off test. In a Pass-Off

test a controller performs manoeuvres, which can be defined as various engine

accelerations and decelerations starting and ending at a set idle speed. The purpose of this test is to ensure the engine complies with set standards. During the

test, data is captured by sensors measuring engine speed, pressure, temperature

and vibration in different parts of the engine. This high-frequency measurement

data offers the ability to automate the detection of engine problems. By building statistical models for the Pass-Off test data we can aid the engineers in

identifying engine issues efficiently.

One of the key manoeuvres in a Pass-Off test is the Vibration Survey (VS).

In this manoeuvre the engine is accelerated slowly to a certain speed then slowly

decelerated. We have 199 VS datasets, which include the turbine pressure ratio

(TPR) that measures the engine speed, and the turbine gas temperature (TGT)

which is a key temperature feature. In Fig. 1 we have plots of the TPR and TGT

c Springer Nature Switzerland AG 2020

V. Lemaire et al. (Eds.): AALTD 2019, LNAI 11986, pp. 3–13, 2020.

https://doi.org/10.1007/978-3-030-39098-3_1

4 H. Hullait et al.

for the 30 VS manoeuvres. We have transformed the time index to the interval

[0, 1] and the range of sensor measurements to [0, 100].

Automated detection of abnormal engine behaviour has been studied before

[9,14]. Both approaches require a training set of “normal” samples to build a

normality model. They then apply novelty detection using an appropriate distance measure and threshold. We will instead use Functional Data Analysis

(FDA) methods to identify VS manoeuvres that display unusual temperature

behaviour in response to the variable (human-controlled) TPR time series. We

will robustly build a normality model without requiring a set of “normal” samples. FDA techniques have been used effectively to model sensor data [13], as

they combine information across samples and exploit the underlying behavioural

structure. However this is to the best of our knowledge the first time these techniques are being used for modelling jet engine data.

We will use robust Functional Linear Regression (FLR) to build a model of

“normal” behaviour. We shall then use the residuals from this model to identify

outlying behaviour. The residuals are time series therefore using metrics such

as the mean-square error means we lose a lot of information. Instead we will

apply functional depth [6], which is capable of identifying various types of outlier

behaviour.

There are a number of functional outlier detection methods, including the

threshold approach [8], the Functional Boxplot [22] and the Outliergram [2],

which use functional depth [15] to rank the curves. Alternative approaches use

Directional Outlyingness measures, such as MS-plots [7] and Functional Outlier

Maps [19]. There are also approaches for multivariate functional data [10]. These

methods do not model the dependency between the functional response and

functional input, and may therefore miss important outliers. Robust FLR can

model this dependency structure, which can improve the detection of outliers.

This paper is organised as follows. In Sect. 1 we summarise the FDA methods,

which will be used in the outlier detection algorithm. In Sect. 2, we will develop

robust FDA techniques to obtain a robust regression model. In Sect. 3, we show

how the robust regression model can be used to identify outliers. In Sect. 4

we give simulation results comparing the robust model with a classical model.

Fig. 1. Plots of 30 TPR and TGT time series.

Robust Functional Regression for Outlier Detection 5

Finally in Sect. 5 we apply the robust model on the engine data and highlight

the outliers identified.

1 Classical Functional Data Analysis

In this section we give a brief summary of the FDA tools that we will later apply

in our model. In the following sections we will use the vector space L2(I) which

is the Hilbert space of square integrable functions on the compact interval I with

the inner product f,g =

I f(t)g(t)dt for functions f,g ∈ L2(I).

We will define X(t), Y (t) to be univariate stochastic processes defined on

I, with mean functions μX(t) and μY (t), and covariance functions CX(s, t) =

cov{X(s), X(t)} and CY (s, t) = cov{Y (s), Y (t)} for all s, t ∈ I. We shall define

x(t)=[x1(t), ..., xn(t)] and y(t)=[y1(t), ..., yn(t)] be n samples from X(t) and

Y (t) respectively.

In practice we observe xi(t) and yi(t) at discrete time points. We shall

assume for simplicity of exposition that observations are made at equally spaced

time points t1, ..., tN . We will outline Functional Linear Regression and Functional Principal Component Analysis with respect to the underlying functions.

In Sect. 1.3 we need to use the discretely observed data to define a suitable model

selection criterion.

1.1 Functional Linear Regression

In this section we will introduce the FLR model [16], which we will use to model

the relationship between TGT and TPR for the VS manoeuvre. In FLR we

model the relationship between predictor xi(t) and response yi(t) as:

yi(t) = α(t) +

xi(s)β(s, t)ds + i(t), (1)

where α(t) is the intercept function, β(s, t) is the regression function and i(t)

is the error process. For a fixed t, we can think of β(s, t) as the relative weight

placed on xi(s) to predict yi(t). For simplicity we will assume the mean functions

μX(t) = 0 and μY (t) = 0 which thereby means α(t) = 0. This is a reasonable

assumption as in practice we can calculate the mean functions μX(t) and μY (t)

efficiently for dense data and then pre-process the data by subtracting μX(t)

and μY (t) from the observed curves.

FLR in the function-on-function case is a well studied model. There are

typically two approaches taken: basis methods [5,23] and grid based methods

[11,20]. The basis approach will be used as it is computationally efficient.

We will represent xi(t) and yi(t) in terms of M pre-chosen basis functions

φX

j (t), φY

j (t) respectively:

xi(t) =

j=1

zijφX

j (t) and yi(t) =

j=1

wijφY

j (t).

6 H. Hullait et al.

For notational simplicity we have assumed that xi(t) and yi(t) can be represented

by the same number of functions M, however this assumption can be easily

relaxed.

We define φX(t)=[φX

1 (t), ..., φX

M (t)], φY (s)=[φY

1 (s), ..., φY

M (s)], zi =

[zi1, ..., ziM ] and wi = [wi1, ..., wiM ]. We will then model the regression surface

using a double basis expansion [17]:

β(s, t) =

l=1

m=1

bmlφX

m(s)φY

l (t) = φX(s)

T BφY (t), (2)

for an M × M regression matrix B. We can then write:

yi(t) = ziBφY (t) + i(t). (3)

Letting i(t) = qiφY (t) [5] we can reduce Eq. (3) to:

wi = ziB + qi. (4)

This simplification enables us to estimate B using standard multivariate regression methods.

1.2 Functional Principal Component Analysis

In this section we describe Functional Principal Component Analysis (FPCA),

which we will use to build data-driven basis functions φX(t) and φY (t) for

xi(t) and yi(t), respectively. These basis functions give effective, low-dimensional

representations and will be used in the Functional Linear Regression model

described in Sect. 1.1.

Functional Principal Component Analysis (FPCA) is a method of finding

dominant modes of variance for functional data. These dominant modes of variance are called the Functional Principal Components (FPCs). FPCA is also used

as a dimensionality reduction tool, as a set of observed curves can be effectively

approximated by a linear combination of a small set of FPCs. These FPCs form

an orthonormal basis over L2(I) [21].

The FPCs, φX

k (t) for k = 1, 2, ..., are the eigenfunctions of the covariance

function CX(s, t) with eigenvalues λX

k . Note that the eigenfunctions are ordered

by the respective eigenvalues. The Karhunen-Lo´eve theorem shows that xi(t)

can be decomposed as xi(t) = ∞

k=1 zikφX

k (t) where the principal component

score zik =