Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Semantic Digital Libraries
PREMIUM
Số trang
247
Kích thước
6.1 MB
Định dạng
PDF
Lượt xem
761

Semantic Digital Libraries

Nội dung xem thử

Mô tả chi tiết

Semantic Digital Libraries

Sebastian Ryszard Kruk • Bill McDaniel

Editors

Semantic Digital Libraries

123

Sebastian Ryszard Kruk

National University of Ireland

Digital Enterprise Research Institute

Lower Dangan

Galway

Ireland

[email protected]

Bill McDaniel

National University of Ireland

Digital Enterprise Research Institute

Lower Dangan

Galway

Ireland

[email protected]

ISBN 978-3-540-85433-3 e-ISBN 978-3-540-85434-0

Library of Congress Control Number: 2008934757

ACM Computing Classification (1998): H.3, H.5, I.2

c Springer-Verlag Berlin Heidelberg 2009

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is

concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,

reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication

or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,

1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Viola￾tions are liable to prosecution under the German Copyright Law.

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,

even in the absence of a specific statement, that such names are exempt from the relevant protective laws

and regulations and therefore free for general use.

Cover design: KünkelLopka, Heidelberg

Printed on acid-free paper

987654321

springer.com

I dedicate this book to my gorgeous wife,

Ewelina,

who stayed with me

through ups and downs

ever since

semantic digital libraries

came in to our lifes and brought us

to the far edge of Europe

Sebastian Ryszard Kruk

This book is dedicated to

my lovely wife,

Linda,

who has seen me through

thick and thin, good and bad

and who will, apparently,

follow me anywhere in the world

Bill McDaniel

Preface

The Archchancellor turned the pages carefully.

They were well illustrated.

The Librarian knew his wizards.

The Science of Discworld, Terry Pratchett

Sebastian Kruk:

In 2002, the Polish government initiated the creation of a national digital

library that would deliver cultural heritage to each household through the

Internet. At that time, I was focusing my research on information retrieval,

supported by the upcoming technologies of the Semantic Web. It was Prof.

Henryk Krawczyk from Gdansk University of Technology who helped me to

align my research with the growing demand for digital library management

systems. As a result, my Master’s Thesis presented a prototype of a semantic

digital library called Elvis-DL.

Two years later, I was invited by Prof. Stefan Decker to continue my

research on semantic digital libraries in the newly set up Digital Enterprise

Research Institute in Galway, Ireland; soon after which Elvis-DL became

JeromeDL and opened a new chapter of my research on semantic digital

libraries. Two years ago, after the first Tutorial on Semantic Digital Libraries

that we gave at JCDL 2006, I met Bill and we started implementing the seman￾tic digital libraries vision, particularly JeromeDL, in the context of eLearning.

The project got more momentum when my team was recently re-joined by

Mariusz Cygan, who became the main architect and developer of JeromeDL.

Along the way, I collaborated with many people who influenced and helped

me to put the domain of semantic digital libraries into shape. To mention

only a few who’s help I appreciate the most: Prof. Daniel Schwabe, Bernhard

Haslhofer, Predrag Knezevic, Sandy Payette, Dean Kraftt, and Traugott

Koch.

X Preface

Bill McDaniel:

For me my adventures in electronic libraries began in the 1970s working

in the UK with, and extending the capabilities of, IBM mainframe libraries

that are used for source code storage and DataPoint minicomputer full text

indexing methods. After returning to US in the early 1980s, I built electronic

libraries and full text index access methods using VSAM under a variety of

mainframe operating systems. These libraries were used in commercial soft￾ware application products where it was required to store documents that are

coded in IBM GML and in high speed printer data streams for intelligent

document assembly systems. My second generation design included links to

other documents and metadata tags that are attached to the documents for

search purposes.

Throughout the 1990s, I worked with unstructured data in the form of

documents coded for composition, which ultimately encoded in Adobe Acro￾bat PDF. This work emphasized the value of tagged databases, interdocument

linking, and the possibilities of hyperlinking text elements. But the new com￾puting systems and PCs had no good library support for managing such

document collections, for indexing the text, or for searching related and

interlinked documents intelligently.

In 1999, I became aware of the Semantic web and did some experiments

of my own and turned to building initial semantically powered products. But

the web did not have the structure for the easy tagging of documents. HTML

was too limiting and XML was just getting started. Semantics required more

time for maturation.

By the early 2000s, when I joined Adobe, the chicken and egg problem of

metadata acquisition was obvious I spent much of my time there working on

building an automatic metadata entity extraction system while researching

the possibilities of semantically powered systems.

Joining DERI in 2006, I met Sebastian and learned about JeromeDL, the

semantic digital library project here. In my capacity as eLearning project exec￾utive, I have been able to see the semantic digital libraries emerging and the

underlying challenges being met. At the same time, the potential of libraries

which could interlink documents, books, users, and other entities on a semantic

level, interoperating with a multitude of other systems, has become apparent.

This book is a fascinating collection of papers about these possibilities, the

work already completed, and the things to come.

For me, it is a chance to see the long term vision of documents and their

intricate potential become more fully realized. I hope every reader takes away

as much insight from these chapters as I have gained by helping to gather and

edit them.

Preface XI

Dear reader, the book you hold in hand now is a result of the work of many

people who saw the potential of applying semantic web and social networking

technologies to digital libraries domain. It is an outcome of ongoing research

in that newly established domain of discourse and of an intensive development

of open source projects like FEDORA, BRICKS, Greenstone, SIMILE, and

JeromeDL.

We hope you will enjoy reading as much as we enjoyed researching,

developing, and finally delivering this book.

Galway, Ireland, Sebastian Ryszard Kruk

July 2008 Bill McDaniel

Contents

Introduction

Bill McDaniel and Sebastian Ryszard Kruk . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Part I Introduction to Digital Libraries and Semantic Web

Digital Libraries and Knowledge Organization

Dagobert Soergel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Semantic Web and Ontologies

Marcin Synak, Maciej Dabrowski, and Sebastian Ryszard Kruk . . . . . . . . . 41

Social Semantic Information Spaces

John G. Breslin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Part II A Vision of Semantic Digital Libraries

Goals of Semantic Digital Libraries

Sebastian Ryszard Kruk and Bill McDaniel . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Architecture of Semantic Digital Libraries

Sebastian Ryszard Kruk, Adam Westerki, and Ewelina Kruk . . . . . . . . . . . 77

Long-time Preservation

Markus Reis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Part III Ontologies for Semantic Digital Libraries

Bibliographic Ontology

Maciej Dabrowski, Macin Synak, and Sebastian Ryszard Kruk . . . . . . . . . 103

XIV Contents

Community-aware Ontologies

Slawomir Grzonkowski, Sebastian Ryszard Kruk, Adam Gzella,

Jakub Demczuk, and Bill McDaniel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Part IV Prototypes of Semantic Digital Libraries

JeromeDL – The Social Semantic Digital Library

Sebastian Ryszard Kruk, Mariusz Cygan, Adam Gzella,

Tomasz Woroniecki, and Maciej Dabrowski . . . . . . . . . . . . . . . . . . . . . . . . . . 139

The BRICKS Digital Library Infrastructure

Bernhard Haslhofer and Predrag Kneˇzevi´c . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Semantics in Greenstone

Annika Hinze, George Buchanan, David Bainbridge, and Ian Witten . . . 163

Part V Building the Future – Semantic Digital Libraries in Use

Hyperbooks

Gilles Falquet, Luka Nerima, and Jean-Claude Ziswiler . . . . . . . . . . . . . . . 179

Semantic Digital Libraries for Archiving

Bill McDaniel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Evaluation of Semantic and Social Technologies for Digital

Libraries

Sebastian Ryszard Kruk, Ewelina Kruk, and Katarzyna Stankiewicz . . . . 203

Conclusions: The Future of Semantic Digital Libraries

Sebastian Ryszard Kruk and Bill McDaniel . . . . . . . . . . . . . . . . . . . . . . . . . . 215

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

List of Contributors

David Bainbridge

Department of Computer Science

University of Waikato

New Zealand

[email protected]

John G. Breslin

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

George Buchanan

Department of Computer Science

University of Swansea

United Kingdom

[email protected]

Mariusz Cygan

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Maciej Dabrowski

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Jakub Demczuk

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Gilles Falquet

Centre Universitaire d’Informatique

University of Geneva

1227 Carouge, Switzerland

[email protected]

Slawomir Grzonkowski

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Adam Gzella

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Bernhard Haslhofer

Department of Distributed

and Multimedia Systems

University of Vienna

Vienna, Austria

[email protected]

XVI List of Contributors

Annika Hinze

Department of Computer Science

University of Waikato

New Zealand

[email protected]

Predrag Kneˇzevi´c

Freelancer Consultant

Darmstadt

Germany

[email protected]

Ewelina Kruk

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Sebastian Ryszard Kruk

Digital Enterprise Research Institute,

National University of Ireland,

Galway, Ireland,

[email protected]

Bill McDaniel

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Luka Nerima

Centre Universitaire d’Informatique

University of Geneva

1227 Carouge, Switzerland

[email protected]

Markus Reis

Austrian Research Centers

Research Studio Digital Memory

Engineering

Vienna, Austria

[email protected]

Dagobert Soergel

College of Information Studies

University of Maryland

College Park, MD, USA

[email protected]

Katarzyna Stankiewicz

Gdansk University of Technology

Narutowicza 11/12

Gdansk, Poland

[email protected]

Marcin Synak

Gdansk University of Technology

Narutowicza 11/12

Gdansk, Poland

[email protected]

Adam Westerski

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Ian Witten

Department of Computer Science

University of Waikato

New Zealand

[email protected]

Tomasz Woroniecki

Digital Enterprise Research Institute

National University of Ireland

Galway, Ireland

[email protected]

Jean-Claude Ziswiler

Centre Universitaire d’Informatique

University of Geneva

1227 Carouge, Switzerland

[email protected]

Introduction

Bill McDaniel and Sebastian Ryszard Kruk

1 Some History

As we look forward to the emergence of semantic digital libraries, it is good

to consider their origins and sources in traditional digital libraries. A short

examination of their definitions and applications will prove fruitful. It will

provide a base for our later examination of the implications of adding semantic

power to the digital library concept.

Digital libraries have been around for quite some time. Wikipedia1 refer￾ences Greenstein and Thorin to define a digital library: “A digital library is a

library in which collections are stored in digital formats (as opposed to print,

microform, or other media) and accessible by computers” [79].

Originally, they were designed to contain objects for operating systems to

use; linked object code libraries, source code libraries, compiled object code

for reuse by multiple programs. They emerged from a consideration of the

needs of the OS to find and load components but a recognition that existing

file systems were too limited to support these activities in a timely fashion.

The essential structure that emerged remains even today; a directory of

library members which provides names and other metadata for the objects

contained followed, in the same dataset or file, by the actual binary for each

object referenced by a directory member.

The details are different for each type of library, of course, but the essential

nature of a digital repository for named objects remains. Typically, digital

libraries are artifacts constructed on top of more fundamental data storage

structures supported directly by the OS. Very few OSs support a library as a

low level data structure.2 Instead, they are based on relative record accesses,

indexed sequential access methods, or relational databases.

1 http://en.wikipedia.org/wiki/Digital_library 2 IBM DOS and VSE did not support libraries as fundamental. Neither do Win￾dows, OS X, or flavors of Unix. IBMs AS400 uses a relational database as its core

file system which might be construed as a type of digital library, but typically

digital libraries are still defined on top of that database. The best example is

2 B. McDaniel and S.R. Kruk

One of us created a digital library access method in the early 1980s based

around the IBM Virtual Storage Access Method and its relative record dataset

architecture. It was called VLAM, the Virtual Library Access Method.

This digital library (DL) was intended to provide several services which

traditional access methods did not: long names, BLOB storage, document

storage, rapid directory search, inter-object links, and a great deal of meta￾data. Later, an indexing method was invented to allow full text indexing of

the objects for sophisticated search and retrieval. This was a hashed bitmap

index known as the Holographic Index Access Method (HIAM).

2 Todays Libraries

Digital libraries (DLs) have grown since that time in both numbers and

uses with specialized applications and implementing very sophisticated data

structures. More and more of them are created on top of personal computer

operating systems such as Unix, Linux, or Windows. Many are implemented

on top of relational database systems such as Oracle, DB/2, or SQL Server.

Most recently, the emergence of open source operating systems and database

engines with enterprise level features and reliability has created a significant

movement toward digital libraries implemented on top of Linux and MySQL.

The attraction of relational databases for library creators is the built-in

capabilities of easily creating, storing, and searching large quantities of meta￾data for library objects. Directories can be distributed across orthogonalized

tables and the associated metadata can be isolated so as to improve search

performance. However, the cost is one of licensing and installing an RDBMS

(Relational Database Management System) to achieve this.

The availability of relatively high powered database engines such as

MySQL makes this possible, but still involves the installation of the MySQL

engine on the server. If the library system is localized on each users machine,

this replication is an obstacle. Certainly, if the system is server based as more

and more are, then the use of an RDBMS for a digital library is easier.

Consequently, many digital libraries are still implemented using only OS￾provided storage services. This, of course, implies the creation of new code

bases that implement storage algorithms, indexing, garbage collection, update

processing and the myriad other storage components necessary for such a

systems implementation.

Because of this many digital libraries are implemented directly on the OS

file system, with separate files used for each object, a special set of index files,

and directory files which contain metadata for each object. This approach has

several drawbacks. A multi-file system is more vulnerable to corruption and

the IBM MVS (now z/OS) support of Partitioned Datasets (PDS) which is a

fundamental operating system access method. However, it is severely limited in

flexibility and not easy to use.

Tải ngay đi em, còn do dự, trời tối mất!