Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Semantic Digital Libraries
Nội dung xem thử
Mô tả chi tiết
Semantic Digital Libraries
Sebastian Ryszard Kruk • Bill McDaniel
Editors
Semantic Digital Libraries
123
Sebastian Ryszard Kruk
National University of Ireland
Digital Enterprise Research Institute
Lower Dangan
Galway
Ireland
Bill McDaniel
National University of Ireland
Digital Enterprise Research Institute
Lower Dangan
Galway
Ireland
ISBN 978-3-540-85433-3 e-ISBN 978-3-540-85434-0
Library of Congress Control Number: 2008934757
ACM Computing Classification (1998): H.3, H.5, I.2
c Springer-Verlag Berlin Heidelberg 2009
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Cover design: KünkelLopka, Heidelberg
Printed on acid-free paper
987654321
springer.com
I dedicate this book to my gorgeous wife,
Ewelina,
who stayed with me
through ups and downs
ever since
semantic digital libraries
came in to our lifes and brought us
to the far edge of Europe
Sebastian Ryszard Kruk
This book is dedicated to
my lovely wife,
Linda,
who has seen me through
thick and thin, good and bad
and who will, apparently,
follow me anywhere in the world
Bill McDaniel
Preface
The Archchancellor turned the pages carefully.
They were well illustrated.
The Librarian knew his wizards.
The Science of Discworld, Terry Pratchett
Sebastian Kruk:
In 2002, the Polish government initiated the creation of a national digital
library that would deliver cultural heritage to each household through the
Internet. At that time, I was focusing my research on information retrieval,
supported by the upcoming technologies of the Semantic Web. It was Prof.
Henryk Krawczyk from Gdansk University of Technology who helped me to
align my research with the growing demand for digital library management
systems. As a result, my Master’s Thesis presented a prototype of a semantic
digital library called Elvis-DL.
Two years later, I was invited by Prof. Stefan Decker to continue my
research on semantic digital libraries in the newly set up Digital Enterprise
Research Institute in Galway, Ireland; soon after which Elvis-DL became
JeromeDL and opened a new chapter of my research on semantic digital
libraries. Two years ago, after the first Tutorial on Semantic Digital Libraries
that we gave at JCDL 2006, I met Bill and we started implementing the semantic digital libraries vision, particularly JeromeDL, in the context of eLearning.
The project got more momentum when my team was recently re-joined by
Mariusz Cygan, who became the main architect and developer of JeromeDL.
Along the way, I collaborated with many people who influenced and helped
me to put the domain of semantic digital libraries into shape. To mention
only a few who’s help I appreciate the most: Prof. Daniel Schwabe, Bernhard
Haslhofer, Predrag Knezevic, Sandy Payette, Dean Kraftt, and Traugott
Koch.
X Preface
Bill McDaniel:
For me my adventures in electronic libraries began in the 1970s working
in the UK with, and extending the capabilities of, IBM mainframe libraries
that are used for source code storage and DataPoint minicomputer full text
indexing methods. After returning to US in the early 1980s, I built electronic
libraries and full text index access methods using VSAM under a variety of
mainframe operating systems. These libraries were used in commercial software application products where it was required to store documents that are
coded in IBM GML and in high speed printer data streams for intelligent
document assembly systems. My second generation design included links to
other documents and metadata tags that are attached to the documents for
search purposes.
Throughout the 1990s, I worked with unstructured data in the form of
documents coded for composition, which ultimately encoded in Adobe Acrobat PDF. This work emphasized the value of tagged databases, interdocument
linking, and the possibilities of hyperlinking text elements. But the new computing systems and PCs had no good library support for managing such
document collections, for indexing the text, or for searching related and
interlinked documents intelligently.
In 1999, I became aware of the Semantic web and did some experiments
of my own and turned to building initial semantically powered products. But
the web did not have the structure for the easy tagging of documents. HTML
was too limiting and XML was just getting started. Semantics required more
time for maturation.
By the early 2000s, when I joined Adobe, the chicken and egg problem of
metadata acquisition was obvious I spent much of my time there working on
building an automatic metadata entity extraction system while researching
the possibilities of semantically powered systems.
Joining DERI in 2006, I met Sebastian and learned about JeromeDL, the
semantic digital library project here. In my capacity as eLearning project executive, I have been able to see the semantic digital libraries emerging and the
underlying challenges being met. At the same time, the potential of libraries
which could interlink documents, books, users, and other entities on a semantic
level, interoperating with a multitude of other systems, has become apparent.
This book is a fascinating collection of papers about these possibilities, the
work already completed, and the things to come.
For me, it is a chance to see the long term vision of documents and their
intricate potential become more fully realized. I hope every reader takes away
as much insight from these chapters as I have gained by helping to gather and
edit them.
Preface XI
Dear reader, the book you hold in hand now is a result of the work of many
people who saw the potential of applying semantic web and social networking
technologies to digital libraries domain. It is an outcome of ongoing research
in that newly established domain of discourse and of an intensive development
of open source projects like FEDORA, BRICKS, Greenstone, SIMILE, and
JeromeDL.
We hope you will enjoy reading as much as we enjoyed researching,
developing, and finally delivering this book.
Galway, Ireland, Sebastian Ryszard Kruk
July 2008 Bill McDaniel
Contents
Introduction
Bill McDaniel and Sebastian Ryszard Kruk . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Part I Introduction to Digital Libraries and Semantic Web
Digital Libraries and Knowledge Organization
Dagobert Soergel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Semantic Web and Ontologies
Marcin Synak, Maciej Dabrowski, and Sebastian Ryszard Kruk . . . . . . . . . 41
Social Semantic Information Spaces
John G. Breslin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Part II A Vision of Semantic Digital Libraries
Goals of Semantic Digital Libraries
Sebastian Ryszard Kruk and Bill McDaniel . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Architecture of Semantic Digital Libraries
Sebastian Ryszard Kruk, Adam Westerki, and Ewelina Kruk . . . . . . . . . . . 77
Long-time Preservation
Markus Reis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Part III Ontologies for Semantic Digital Libraries
Bibliographic Ontology
Maciej Dabrowski, Macin Synak, and Sebastian Ryszard Kruk . . . . . . . . . 103
XIV Contents
Community-aware Ontologies
Slawomir Grzonkowski, Sebastian Ryszard Kruk, Adam Gzella,
Jakub Demczuk, and Bill McDaniel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Part IV Prototypes of Semantic Digital Libraries
JeromeDL – The Social Semantic Digital Library
Sebastian Ryszard Kruk, Mariusz Cygan, Adam Gzella,
Tomasz Woroniecki, and Maciej Dabrowski . . . . . . . . . . . . . . . . . . . . . . . . . . 139
The BRICKS Digital Library Infrastructure
Bernhard Haslhofer and Predrag Kneˇzevi´c . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Semantics in Greenstone
Annika Hinze, George Buchanan, David Bainbridge, and Ian Witten . . . 163
Part V Building the Future – Semantic Digital Libraries in Use
Hyperbooks
Gilles Falquet, Luka Nerima, and Jean-Claude Ziswiler . . . . . . . . . . . . . . . 179
Semantic Digital Libraries for Archiving
Bill McDaniel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Evaluation of Semantic and Social Technologies for Digital
Libraries
Sebastian Ryszard Kruk, Ewelina Kruk, and Katarzyna Stankiewicz . . . . 203
Conclusions: The Future of Semantic Digital Libraries
Sebastian Ryszard Kruk and Bill McDaniel . . . . . . . . . . . . . . . . . . . . . . . . . . 215
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
List of Contributors
David Bainbridge
Department of Computer Science
University of Waikato
New Zealand
John G. Breslin
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
George Buchanan
Department of Computer Science
University of Swansea
United Kingdom
Mariusz Cygan
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Maciej Dabrowski
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Jakub Demczuk
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Gilles Falquet
Centre Universitaire d’Informatique
University of Geneva
1227 Carouge, Switzerland
Slawomir Grzonkowski
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Adam Gzella
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Bernhard Haslhofer
Department of Distributed
and Multimedia Systems
University of Vienna
Vienna, Austria
XVI List of Contributors
Annika Hinze
Department of Computer Science
University of Waikato
New Zealand
Predrag Kneˇzevi´c
Freelancer Consultant
Darmstadt
Germany
Ewelina Kruk
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Sebastian Ryszard Kruk
Digital Enterprise Research Institute,
National University of Ireland,
Galway, Ireland,
Bill McDaniel
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Luka Nerima
Centre Universitaire d’Informatique
University of Geneva
1227 Carouge, Switzerland
Markus Reis
Austrian Research Centers
Research Studio Digital Memory
Engineering
Vienna, Austria
Dagobert Soergel
College of Information Studies
University of Maryland
College Park, MD, USA
Katarzyna Stankiewicz
Gdansk University of Technology
Narutowicza 11/12
Gdansk, Poland
Marcin Synak
Gdansk University of Technology
Narutowicza 11/12
Gdansk, Poland
Adam Westerski
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Ian Witten
Department of Computer Science
University of Waikato
New Zealand
Tomasz Woroniecki
Digital Enterprise Research Institute
National University of Ireland
Galway, Ireland
Jean-Claude Ziswiler
Centre Universitaire d’Informatique
University of Geneva
1227 Carouge, Switzerland
Introduction
Bill McDaniel and Sebastian Ryszard Kruk
1 Some History
As we look forward to the emergence of semantic digital libraries, it is good
to consider their origins and sources in traditional digital libraries. A short
examination of their definitions and applications will prove fruitful. It will
provide a base for our later examination of the implications of adding semantic
power to the digital library concept.
Digital libraries have been around for quite some time. Wikipedia1 references Greenstein and Thorin to define a digital library: “A digital library is a
library in which collections are stored in digital formats (as opposed to print,
microform, or other media) and accessible by computers” [79].
Originally, they were designed to contain objects for operating systems to
use; linked object code libraries, source code libraries, compiled object code
for reuse by multiple programs. They emerged from a consideration of the
needs of the OS to find and load components but a recognition that existing
file systems were too limited to support these activities in a timely fashion.
The essential structure that emerged remains even today; a directory of
library members which provides names and other metadata for the objects
contained followed, in the same dataset or file, by the actual binary for each
object referenced by a directory member.
The details are different for each type of library, of course, but the essential
nature of a digital repository for named objects remains. Typically, digital
libraries are artifacts constructed on top of more fundamental data storage
structures supported directly by the OS. Very few OSs support a library as a
low level data structure.2 Instead, they are based on relative record accesses,
indexed sequential access methods, or relational databases.
1 http://en.wikipedia.org/wiki/Digital_library 2 IBM DOS and VSE did not support libraries as fundamental. Neither do Windows, OS X, or flavors of Unix. IBMs AS400 uses a relational database as its core
file system which might be construed as a type of digital library, but typically
digital libraries are still defined on top of that database. The best example is
2 B. McDaniel and S.R. Kruk
One of us created a digital library access method in the early 1980s based
around the IBM Virtual Storage Access Method and its relative record dataset
architecture. It was called VLAM, the Virtual Library Access Method.
This digital library (DL) was intended to provide several services which
traditional access methods did not: long names, BLOB storage, document
storage, rapid directory search, inter-object links, and a great deal of metadata. Later, an indexing method was invented to allow full text indexing of
the objects for sophisticated search and retrieval. This was a hashed bitmap
index known as the Holographic Index Access Method (HIAM).
2 Todays Libraries
Digital libraries (DLs) have grown since that time in both numbers and
uses with specialized applications and implementing very sophisticated data
structures. More and more of them are created on top of personal computer
operating systems such as Unix, Linux, or Windows. Many are implemented
on top of relational database systems such as Oracle, DB/2, or SQL Server.
Most recently, the emergence of open source operating systems and database
engines with enterprise level features and reliability has created a significant
movement toward digital libraries implemented on top of Linux and MySQL.
The attraction of relational databases for library creators is the built-in
capabilities of easily creating, storing, and searching large quantities of metadata for library objects. Directories can be distributed across orthogonalized
tables and the associated metadata can be isolated so as to improve search
performance. However, the cost is one of licensing and installing an RDBMS
(Relational Database Management System) to achieve this.
The availability of relatively high powered database engines such as
MySQL makes this possible, but still involves the installation of the MySQL
engine on the server. If the library system is localized on each users machine,
this replication is an obstacle. Certainly, if the system is server based as more
and more are, then the use of an RDBMS for a digital library is easier.
Consequently, many digital libraries are still implemented using only OSprovided storage services. This, of course, implies the creation of new code
bases that implement storage algorithms, indexing, garbage collection, update
processing and the myriad other storage components necessary for such a
systems implementation.
Because of this many digital libraries are implemented directly on the OS
file system, with separate files used for each object, a special set of index files,
and directory files which contain metadata for each object. This approach has
several drawbacks. A multi-file system is more vulnerable to corruption and
the IBM MVS (now z/OS) support of Partitioned Datasets (PDS) which is a
fundamental operating system access method. However, it is severely limited in
flexibility and not easy to use.