Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Handbook of Research on Geoinformatics - Hassan A. Karimi Part 5 potx
Nội dung xem thử
Mô tả chi tiết
172
Geospatial Image Metadata Catalog Services
1. INTRODUCT ION
As earth observation continues worldwide, large
volumes of remotely sensed data on the Earth’s
climate and environment have been collected and
archived. In order to maintain the data archives
efficiently and to facilitate discovery by users of
desired data in the holdings, each data provider
normally maintains a digital metadata catalog.
Some online catalogs provide services to users
for searching the catalog and discovering the data
they need through a well-established Application
Programming Interface (API). Such services are
called Catalog Services. The information in the
catalog is the searchable metadata that describe
individual data entries in the archives. Currently
most Catalog Services are provided through Webbased interfaces.
This chapter analyses three open catalog
service systems. It reviews the metadata standards, catalog service conceptual schemas and
protocols, and the components of catalog service
specifications.
2. REV IEW of G eosp at ial Image
Cat alog Ser vices
2.1 Pilot Catalog Service Systems
The Federal Geographic Data Committee (FGDC)
Clearinghouse is a virtual collection of digital
spatial data distributed over many servers in the
United States and abroad. The primary intention
of the Clearinghouse is to provide discovery
services for digital data, allowing users to evaluate its quality through metadata. Most metadata
provide information on how to acquire the data;
in many cases, links to the data or an order form
are available online.
The NASA Earth Observing System ClearingHOuse (ECHO) is a clearinghouse of spatial
and temporal metadata that enables the science
community to exchange data and information.
ECHO technology can provide metadata discovery
services and serve as an order broker for clients
and data partners. All the NASA Distributed Active Archive Centers (DAACs), as data providers,
generate and ingest metadata information into
ECHO.
The Open Geospatial Consortium (OGC) has
promoted standardization and interoperability
among the geospatial communities. In catalogue
service aspect, OGC has defined the Catalog
Service implementation standard (OpenGIS,
2004) and published two recommendation papers
(OpenGIS, 2005a; OpenGIS 2005b). The George
Mason University (GMU) CSISS Catalog service
for Web (CSW) system is an OGC-compliant
catalog service, which demonstrates how the
earth science community can publish geospatial
resources by searching pre-registered spatial and
temporal metadata information. In particular, the
GMU CSISS CSW catalog service is based on
the OpenGIS implementation standard, and the
ebRIM application profile (OpenGIS, 2005). It
provides users with an open and standard means
to access more than 15 Terabytes global Landsat
datasets.
2.2 Conceptual System Architecture
Since these geospatial catalog services address
similar needs, it is not surprising that they have
almost the same conceptual system architecture,
as shown in Figure 1.
From the point of view of metadata circulation, a catalog service usually consists of three
components: metadata generation and ingestion,
a conceptual schema for catalog service, and a
query interface for catalog service.
Metadata generation and ingestion is always
based on applicable metadata standards, such
as the Dublin Core (DCMI, 2003), Geographic
information – Metadata (19115) from International Organization for Standard (ISO, 2003),
Content Standard for Digital Geospatial Metadata
(CSDGM) from Federal Geographic Data Com-
173
Geospatial Image Metadata Catalog Services
mittee (FGDC, 1998), or the ECS Earth Science
Information Model from National Aeronautics
and Space Administration (NASA, 2006).
Metadata structures, relationships and definitions, known as conceptual schemas, play a key
role in catalog services. They define what kind
of metadata information can be provided and
how the metadata are organized. The conceptual schemas are closely related to those of the
pre-ingested metadata information, but are not
necessarily identical. Catalog service conceptual
schemas are always oriented toward the field of
application and may be tailored to particular application profiles.
The query interface for a catalog service
defines the necessary operations, the syntax of
each operation, and the binding protocol. To
facilitate access and promote interoperability
among catalog services, the interface definition
may be kept open.
2.3 Metadata G eneration
In this section, the three open catalog services
identified in Section 2.2 are analyzed on the following two aspects regarding metadata generation.
2.3.1 Base Metadata Standard
The base metadata standard is the public geospatial
metadata standard on which the catalog service
is based and to which the catalog service is tailored, to meet a given agency’s requirements. In
addition to international and national geospatial
metadata standards, such as ISO 19115 and FGDC
CSDGM, several agencies may have de-facto
standards in their production environment, such
as NASA ECS.
The metadata used by the FGDC Clearinghouse follows FGDC CSDGM. Each affiliated
catalog service site must organize their metadata
information following the CSDGM standard
before they join the clearinghouse.
The ECHO Science Metadata Conceptual
Model has been developed based on the NASA
Earth Observation System Data and Information
Core System (EOSDIS) Science Data Model, with
modifications to suit project needs.
GMU CSISS CSW builds up its metadata conceptual model by combining the ebRIM information model and the ECS science data model.
2.3.2 Automatic Generation of
Metadata
As the volume of spatial datasets keeps growing,
generation of metadata becomes increasingly
time-consuming. An automatic mechanism for
generating metadata will facilitate the generation
and frequent update of metadata.
Metadata information needs to be organized
as TXT or SGML or HTML files before a node
Figure 1. Conceptual Architecture of Catalog Service
Catalog Service
Client Catalog Service
Metadata
Holdings
Data
Holdings
Query Interface
Conceptual Schema
User
174
Geospatial Image Metadata Catalog Services
joins the FGDC clearinghouse. Some metadata
generation tools are available in addition to the
commercial software packages. These tools are
advertised on the FGDC website. To help the user
set up a clearinghouse node easily, a software
package, ISite, is provided. With this software,
a qualified clearinghouse node server can be set
up in minutes.
All the ECHO metadata holdings are obtained
directly from the data providers. DAACs can
use some ECS tools to automatically generate
metadata information.
GMU CSISS is developing Java-based tools
to automatically extract metadata information
from each granule. The Hierarchical Data Format
(HDF), Hierarchical Data Format - Earth Observing System (HDF-EOS), GeoTIFF and NetCDF
data formats are currently supported.
2.4 Metadata Ingestion
2.4.1 Metadata Distribution
This function deals with the physical distribution of metadata information within the catalog
service.
The FGDC Clearinghouse is a decentralized
system of servers that contain field-level metadata descriptions of available digital spatial data
located on the Internet. The metadata information is physically managed within the affiliated
server node.
Even though in ECHO scenario, the metadata
information is periodically generated by those
distinct data centers, they are centrally managed
by the ECHO operation team. That is, in the
design time, metadata information in ECHO is
distributed; while in the run time it is managed
centrally.
The GMU CSISS CSW maintains more than
15 Terabytes of global Landsat images. All the
metadata information for these images has been
registered into a centralized metadata database.
2.4.2 Ingestion Type
This section examines how each catalog service
ingests metadata. It focuses on two aspects: remote
vs. local and automatic vs. manual.
In the FGDC Clearinghouse, all the metadata
information is manipulated only in the affiliated
server node. Remote ingestion is not supported
in server nodes. The ingestion has to been manually.
Due to a centralized metadata information, a
database approach is taken. Metadata ingestion in
ECHO involves two steps. Data centers need to upload their current metadata information remotely
to a dedicated File Transfer Protocol (FTP) server,
and the ECHO operation team is responsible for
ingesting these metadata information into the
ECHO operational system.
GMU CSISS CSW provides published interfaces. As long as the metadata information is well
organized, it can be remotely ingested into the
GMU CSISS CSW metadata database. All the
metadata information in that database is online
and ready for client’s query.
2.5 Conceptual Schema
We examine how the metadata conceptual schema
is defined in each catalog service.
In each FGDC Clearinghouse collection, all
the metadata information is organized according
to the FGDC CSDGM. The conceptual schema
of FGDC Clearinghouse collection is exactly the
same as that of the FGDC CSDGM.
In ECHO, all the metadata information collected in the NASA DAACs is based on the ECS
science data model, with some modifications
necessary to suit project needs.
GMU CSISS CSW defines its conceptual
schema based on the ECS science data model
combined with ISO 19115. Since GMU CSISS
CSW supports metadata queries and data retrieval
(through the OGC services), an ebRIM-based
profile has been selected to support defining the
175
Geospatial Image Metadata Catalog Services
association between a data granule instance and
applicable geospatial service instances.
2.6 T ransfer Protocol
A catalog service usually provides a standard,
API-based interface to support the client’s query.
This “design-by-contract” mechanism promote
third party members’ contribution to develop new
query interfaces, besides those web-based query
interfaces provided by the catalog server itself.
The backbone of the FGDC Clearinghouse is
Z39.50 (ISO, 1998). This protocol was initially
developed by the library community to discover
bibliographic records using a standard set of attributes. To guide how to implement FGDC metadata
elements within a Z39.50 service, the FGDC has
developed an application profile for geospatial
metadata called "GEO," which provides sets of
attributes, operators, and rules of implementation
that suit geospatial needs. In fact, the node server
is a Z39.50 server, which enables FGDC query
utilities to search its metadata holdings on the fly
through Z39.50 protocol and GEO profile.
ECHO exposes the Session Manager and a limited set of the ECHO services as Web Services defined via the Web Services Description Language
(WSDL). ECHO also provides two client packages,
Façade and EchoTalk, for client developers. The
syntax of the communication protocol between
client and ECHO is based on the Web Services
Interoperability (WS-I) Basic Profile. However,
the semantics of the communication protocol are
defined by ECHO itself. Specific query syntax,
in Extensible Markup Language (XML) format,
has been proposed and implemented.
GMU CSISS CSW’s communication protocol
is based on the OGC Catalog Service Implementation Specification, which specifies the interfaces
and several applicable bindings for catalog services. Operations, core information schema and
query language encodings are included. The
transportation-related communication protocol
follows this specification.
2.7 System Distribution
This section examines the physical distribution
of catalog service systems.
The FGDC Clearinghouse has 400 worldwide
registered nodes as of March 22, 2006. FGDC
maintains several Web-based search interfaces
to carry out distributed searches across multiple
clearinghouse nodes.
ECHO acts as an intermediary between data
partners and client partners. Data partners provide
information about their data holdings, and client
partners develop software to access this information through ECHO Query and Order Web Service
interface. End users who want to search ECHO's
metadata must use one of the ECHO clients.
Although ECHO has close connections with the
DAACs and ECHO Clients, ECHO itself is not
a distributed system. It does not need to build a
distributed search across multiple agencies and
nodes at run time.
GMU CSISS CSW is a standalone service.
Like ECHO, it is not a distributed system.
2.8 Review Summaries
Table 1 summarizes the results of the analysis.
3. CONC LUS ION and Discuss ion
We have reviewed three public catalog services
— FGDC Clearinghouse, NASA ECHO and GMU
CSISS CSW— considering the following aspects:
metadata generation, metadata ingestion, catalog
service conceptual schema, query protocols and
system distribution. This review shows how it
is becoming possible to query metadata holdings through public, standard Web-based query
interfaces.
The review results also show that the catalog
service providers still must define a catalog service
schema that meets their particular needs. These
application-oriented approaches can meet projects
176
Geospatial Image Metadata Catalog Services
requirements, but they will make it more difficult
to create future cross-federation multi catalog
services. We recommend that a standard, common
and discipline-oriented-metadata based schema
be used for future implementations of catalog
services in the same and/or related fields.
R eferences
DCMI. (2003). DCMI Metadata Terms. Retrieved
March 8, 2007, from http://dublincore.org/documents/dcmi-terms/ß
ECHO. (2005). Earth Observing System Clearinghouse. Retrieved March 8, 2007, from http://www.
echo.eos.nasa.gov/
FGDC. (1998). Content Standard for Digital
Geospatial Metadata (CSDGM). Retrieved March
8, 2007, from http://fgdc.er.usgs.gov/metadata/
contstan.html
FGDC. (2005). FGDC Geospatial Data Clearinghouse Activity. Retrieved March 8, 2007,
from http://www.fgdc.gov/clearinghouse/clearinghouse.html
ISO. (1998). ISO 23950: Information and
documentation - Information retrieval (Z39.50)
- Application service definition and protocol
specification.
ISO. (2003). ISO 19115: Geographic Information
- Metadata.
LAITS. (2005). LAITS OGC Catalog Service
for Web - Discovery Interface. Retrieved March
8, 2007, from http://geobrain.laits.gmu.edu/csw/
discovery/
NASA. (2006). EOSDIS Core System Data Model,
Retrieved March 8, 2007, from http://spg.gsfc.
nasa.gov/standards/heritage/eosdis-core-systemdata-model
OpenGIS. (2004). OpenGIS Catalogue Service
Implementation Specification. Retrieved March
8, 2007, from http://www.opengeospatial.org/
specs/?page=specs
OpenGIS. (2005a). OGC Recommendation Paper 04-17r1: OGC Catalogue Services- ebRIM
(ISO/TS 15000-3 profile of CSW. Retrieved
March 8, 2007, from http://www.opengeospatial.
org/specs/?page=recommendation
Tables 1. Review summaries
Evaluation Points FGDC Clearinghouse NASA ECHO GMU CSISS CSW
Metadata generation –
Base standard
FGDC CSDGM ECS Core ECS Core/ISO 19115
Metadata generation –
Generation automation
manually with tools manually with tools automatically
Metadata ingestion –
Metadata Distribution
distributed centralized centralized
Metadata ingestion –
Ingestion Type
N/A Remotely and
automatically
Locally and automatically
Conceptual Schema FGDC CSDGM Based on ECS Core Based on ISO 19115 and
ebRIM
Transfer Protocol Z39.50 and GEO profile Proprietary and based
on Web Service
OGC Catalog Service and
HTTP binding
System distribution Distributed Centralized Centralized