
eCommerce
Search our
entire site
Enter your search
terms below, or visit
our search
page
Search
case
studies only
Enter your search
terms below:
For
the table
of contents and
hyperlinks to
general topics
proceed to toc
 
|

Object Services
Architectures
Metadata Structures
for Internet Services
Project Summary
September
15, 1998
Executive Summary
Today, the World
Wide Web is a global information repository of documents
primarily represented by syntactically structured HTML
tags and MIME extensions. These weak data models do not
provide the foundation for command and control situation
modeling or enterprise computing, or for a new generation
of tools to operate on a more semantically structured,
knowledge-based web. Richer base data model(s) are needed
that converge the benefits of emerging web structuring
mechanisms and distributed object services architectures.
The technical
objective of this project was to improve the foundation
for Web and object model integration. Several candidate
technologies from different communities exist -- for
instance, HTML and MIME from the Internet community;
Webserver from the DARPA JTF/ATD command and control
community; ORBs, IDL, Tagged Data from OMG; Java and
ActiveX from the component community; Harvest SOIF and
Netscape RDM from the search engine community; ODMG ODL
and Tsimmis OEM from the database community; Dublin Core,
Warwick Framework, and related work from the Internet
metadata and digital libraries communities; two different
specifications for document object models from Netscape
and Microsoft; and RDF and XML from W3C.
Our approach has
been to identify the main contender approaches, identify
any deficiencies in each approach, identify a convergence
approach centered around the use of XML as a basic
representation, fill in (some of) the gaps, and transfer
our results and lessons learned directly and
incrementally to DoD command and control projects like
DARPA Advanced Information Technology Services (AITS)
Architecture and industry standards organizations,
primarily W3C and OMG.
The results of our
work have been the identification of a key technical
framework for integrating Web and object technologies, a
number of Technical Reports and external publications
describing this approach, a prototype illustrating one of
the possible Web object construction mechanisms, and the
injection of ideas from this work into the activities of
OMG and W3C.
Problem Statement
The basic data structure in the Web
is HTML. It is generally recognized that HTML is too
simple to adequately support the requirements of the
increasingly-complex applications being developed with
the Web as a base, such as:
- applications that require the
Web client to function as the front-end to
enterprise applications or mediate between
multiple heterogeneous databases,
- applications that require more
flexibility in distributing processing load
between Web servers and clients, and
- applications that require the
Web client to present different views of the same
data to different users or in which intelligent
Web agents need to tailor information discovery
to the needs of individual users.
Proprietary HTML extensions have
been developed to address some of these problems, but
none deals with all of them, and together they create
barriers to interoperability. The same is true of the
proprietary data formats used by particular applications.
Their use requires specialized helper applications,
plug-ins, or Java applets, creating interoperability
problems, and difficulty in reusing that data in
different applications for new purposes. While use of
some specialized formats is necessary in particular
applications (e.g., multimedia), in many cases these
formats are used to address HTML deficiencies for
generalized document and data processing.
There is much ongoing work within
both the Web and database communities on data structure
enhancements to address these issues. Work on similar
issues is ongoing within the Object Management Group
(OMG) as well. This work has contributed valuable ideas,
and the various proposals illustrate similar basic
concepts, generally, movement toward some form of simple
object model. However, these similarities are often
obscured by detailed representational differences, and
the work is fragmented and lacks a unifying framework. As
a result, individual proposals often lack key
capabilities that are in some cases contained in other
proposals. Moreover, in most cases these proposals are
not well-integrated with key areas of emerging industry
consensus on emerging Web data structuring technologies.
If the Internet is to develop to
support advanced application requirements, there is a
need for both richer individual data structuring
mechanisms, and a unifying overall framework which
supports heterogeneous representations and extensibility
and provides metalevel concepts for describing and
integrating them.
Objective
The technology
objective of the OSA/metadata project was to define a unifying framework for Web data
structuring representations that
- is extensible,
- supports integration of
multiple types
- supports the requirements of
metadata, annotations, and database applications,
- is based on emerging industry
technologies such as XML, RDF, Dynamic HTML, and DARPA I*3 database
work including OEM,
- provides a formal model for
database-like operations on these structures, and
- provides a query language
based on the formal model.
Our technology
transfer objectives were
- to transfer
results of our work to the DARPA AITS command and
control projects, and
- to converge
these efforts by working with W3C and OMG.
Approach
Our approach was to build on and
unify key proposals for richer Internet data structuring
and metadata mechanisms, including XML, RDF, Dynamic
HTML, OEM and related work, based on an analysis of
underlying common principles. These mechanisms will be
extended with specific metalevel concepts that reify
(make first-class) components of the data structure,
allowing them to be self-describing and to better
integrate code and data. Some of these proposals already
include limited steps in this direction. By building on
emerging efforts, the work will be grounded in technology
that already has considerable support from major Internet
technology providers such as Microsoft, Netscape, and
Sun. Use of XML, which is a subset of SGML, also links
this technology to the use of SGML within the government
(e.g., CALS).
We also identified a potential
formal basis for applying database operations, such as
query and view operators, to the resulting structures,
based on object logics such as F-logic. These logics
provide limited second-order capabilities for dealing
with the metalevel concepts, while using first-order
semantics, which provides for computational efficiency
and tractability.
In addition, in an effort to push
the integration of the technologies we have identified, we made our results
available to DARPA AITS architecture projects, published
technical reports and papers,
and made our results available to both W3C and OMG both
via published results and presentations, and via
participation in the technical activities of these
groups.
This work, which
integrates data, metadata, and object capabilities from
both database and key emerging Web technologies, will be
crucial in integrating object service, Web, and database
technologies in a deep and efficient manner to support
increasingly-demanding enterprise-scale applications.
Limitations of Related Work
The Internet and Web communities
have developed a number of "object models" or
data structuring principles to represent semistructured
data. The database community has developed proposals for
"lightweight object models," partly driven by
attempts to represent metadata for Web resources. All
this work has contributed valuable ideas and, taken as a
whole, exhibits important common underlying principles
based on the use of tagged data items or attribute/value
pairs. However, the individual proposals lack important
capabilities that are often contained in other proposals.
What is required is that this work be integrated and the
best ideas merged. The following paragraphs describe work
that is most directly relevant to this effort (other
work, such as Harvest's SOIF, is also relevant).
The World Wide Web Consortium (W3C)
Resource Description Framework (RDF) effort
<http://www.w3.org/Metadata/RDF/> extends the PICS
technology for labeling Internet content to support more
general metadata requirements. Related work includes
Netscape's Meta Content Framework (MCF)
<http://www.w3.org/TR/NOTE-MCF-XML/> and
Microsoft's XML-Data
<http://www.w3.org/TR/1998/NOTE-XML-data>. These
efforts define what are effectively metadata type
systems, based on collections of attribute/value pairs.
They provide a core of good ideas for supporting
metadata, such as explicit links from pages representing
resources to metadata describing them. However, there are
important differences among the various approaches, and
the approaches are not integrated with other parallel
work, such as the Document Object Model (see below).
The RDF and related work define
mappings to the Extensible Markup Language (XML) <http://www.w3.org/XML/>, a W3C
Recommendation (adopted specification). XML, which is a
subset of SGML, allows creation of customized markup
languages incorporating user-defined tags and a
standardized way of describing those languages (DTDs)
that can be understood by generalized clients. XML thus
provides direct support for using tagged data items
(attribute/value pairs) in Web resources, as opposed to
the current need to use ad hoc encodings of data items in
terms of HTML tags. XML DTDs are similar in some ways to
database schemas, and thus provide a natural target for
database information. The linking of resources with their
DTDs is similar to the association of a database record
with its schema type, and to the association of an object
with its type or class definition. The hypertext linking
capabilities of XML are greater than those of HTML,
including bidirectional and multiway links, and links to
spans of text. Work is also underway on tying XML to
Java. XML has considerable industry support (e.g., both
Netscape and Microsoft). However, XML provides only basic
tagged value support. Additional concepts must be added
to apply it to extended data and metadata structuring
requirements (as illustrated by RDF and related efforts).
W3C's Document
Object Model (DOM) effort
<http://www.w3.org/DOM/>, based on Dynamic HTML
facilities defined by Microsoft and Netscape, extends
HTML with an object model allowing scripts or programs to
change styles and attributes of page elements (or
objects) or even to replace existing elements (or
objects) with new ones. This provides a basic way to
integrate a page's data with code in the page and
provides an explicit metalevel and API. Current W3C
specifications provide a DOM for XML as well as for HTML.
However, as currently defined, these capabilities are not
sufficiently tailorable or general. For example, current
specifications lack support for integrating code not
co-located on the page (e.g., code that already exists on
the client) or for defining application-specific objects
based on data on the page, and the work is currently not
integrated with metadata work such as RDF.
Stanford's Tsimmis Object Exchange
Model (OEM) and related work by others (e.g., U. Penn.)
have also based metadata models on collections of
attribute/value pairs, together with extensions such as
reifying individual attributes by assigning identifiers
to them. This work provides a valuable core of ideas for
applying database concepts to this type of data. However,
the metadata capabilities of these structures are
somewhat limited. They do not explicitly consider
capturing type and schema information where it exists, or
linking that type information to the structures it
describes. The work is also not well integrated with
emerging Web technologies such as XML, DOM, and RDF that
are likely to change the basic nature of the Web's
representation. Finally, an assumption behind these
database approaches so far, which in part explains their
limited technical success, has been that the problem they
address is to query largely syntactically structured text
bases, the kinds supported by HTML. XML-based approaches
provide a higher level, more semantic representational
structure, which can start with the assumption that
information authors themselves have support to provide
more semantic structure information.
Finally, the OMG has identified a
number of requirements similar to those found in the
context of the Web. An example is a recent Tagged Data
RFP. These requirements involve the use of tagged data
items to support semantics-based information exchange
between applications, and also support for nesting and
the ability to locate objects via tags through layers of
nesting. Such high-level communication is considered
important in OMG's attempts to define Business Object
capabilities. OMG's Property Service provides similar
capabilities. These are of interest in showing the
recognized need for data organizations, similar to those
described above, within OMG's object-oriented distributed
architecture. However, these are not yet fully coordinated with emerging
Web or database representations.
Results
The results of our
work have been the identification of a key technical
framework for integrating Web and object technologies, a
number of Technical Reports and external publications
describing this approach, a prototype illustrating one of
the possible Web object construction mechanisms, and the
injection of ideas from this work into the activities of
OMG and W3C. Specifically:
We completed a
Technical Report Towards
a Web Object Model <http://www.objs.com/OSA/wom.htm>. This report :
- described key examples of
existing work from the Web, database, and OMG
communities (including those mentioned above)
that contribute both ideas and technology toward
providing the components of a Web object model
- identified some key underlying
principles behind this work
- identified a framework which
allows this work to be unified and extended to
support the requirements of advanced Web
applications for object technology
In particular, the report described
how a number of (in some respects) separate
"threads" of development in the Web community
could be combined to form the basis of a Web object model
to address requirements for enhanced Web capabilities.
This combination was based on the observation that the
fundamental components of any object model are:
- data structures that can
represent object state
- ways to associate behavior (object
methods) with the object state
- ways for the object methods to
access and operate on that state
Extending this idea to the Web
environment, the idea is that Web pages can be considered
as state, and objects can be constructed by enhancing
those pages with additional metadata that allows the
pages to be considered as objects in some object model.
In particular, Web pages can be enhanced with metadata
consisting of programs that act as object methods with
respect to the "state" represented by the Web
page. The report also identified key Web technologies to
support the object model components we identified. We
also presented this material at the OMG-DARPA Workshop on Compositional Software
Architectures, and to the
OMG's Internet and GIS Special Interest Groups.
This Technical Report has been
widely read on the Web. As a result, we were asked to
write the following invited papers:
- F. Manola, Towards A Richer Web Object Model, ACM SIGMOD Record 27(1), March
1998, 76-80,
<http://www.acm.org/sigmod/sigmod_record>
and
- F. Manola, Key Technologies for a Web Object Model (tentative title), to appear as the
lead article in a special issue of IEEE Internet
Computing on Web Object Models, Jan./Feb. 1999,
<http://www.objs.com/survey/wom-ieee.htm>.
We also completed a Technical
Report Some Web Object Model Construction
Technologies <http://www.objs.com/OSA/wom-II.htm>. This report provides further details about
a number of specific technologies that will be important
in building Web objects. In particular, it:
- adds more detail to the
overall approach to constructing objects in the
Web introduced in the earlier Technical Report,
and discusses general considerations for Web
object model design.
- describes a number of
technologies developed (or under development) in
the context of the Web that provide parts of the
mechanisms required to construct Web objects.
- discusses potential
applications for Web objects constructed
according to these techniques, and discusses how
to construct objects in several "real"
object models (e.g., OMG IDL, Java, and
JavaScript) using these mechanisms.
- presents general conclusions
to be derived from these technologies.
This latest report shows that the
approach described in our initial Technical Report is
definitely viable. Considerable work is in progress on
technologies that are relevant to Web object
construction, and there are numerous alternative
technologies becoming available to address the various
parts of the Web object construction problem we have
identified. However, further work is required to sort out
the various alternative approaches, and integrate the
most promising ones into one, or possibly more, workable
combinations.
In conjunction with the
OSA/Intermediary Architecture subproject, we also
developed a prototype of an extended XML parser which can
generate application-specific objects from XML documents,
in order to experiment with one form of Web object
construction mechanism. This prototype uses XML-defined
metadata added to XML documents to define associations
between object classes and the XML elements in the
document. A White Paper
<http://www.objs.com/OSA/XML-to-Java-Mapping.html>
describing this work was also completed.
We also helped form a Web/OMA
Integration Working Group of the OMG Internet SIG <http://www.objs.com/isig/home.htm>,
with the general goals of:
- identifying the relationships
(and overlaps) between specifications being
developed in the Web and OMG communities, and
reducing unnecessary incompatibilities
- examining applications that
use combinations of OMG and Web technologies,
determining technology shortfalls, and
recommending solution approaches
We also participated in the
activities of the OMG's Object and Reference Model
Subcommittee, which is working to identify and clarify
OMG's next-generation object model concepts, and in the
activities of several other OMG groups that are beginning
to look at Web technologies such as XML.
Our participation in W3C activities
has been only moderate (although OBJS is a W3C member),
but we have submitted input on coordinating the various
W3C metadata-related activities, and participated in
technical interchanges on W3C-related email lists.
Lessons Learned
There are numerous
"threads" of Web technology development,
including such things as scripting languages, stylesheets
and other presentation facilities, addressing mechanisms
(URLs, XLL), data representations (HTML, XML, MIME
types), and protocols. The more complex applications
currently being envisioned for the Web require that these
threads be combined in complex ways, often exposing both
similarities among technologies previously perceived as
separate, and unexpected technical gaps. The need to
consider new technology combinations mirrors the
need to consider new application combinations
which integrate aspects of document processing,
conventional Web processing, database capabilities, and
distributed object architectures. Both the requirements
of these new application combinations and the technology
combinations needed to address them need to be much
better understood than they are now. In particular,
applications of a merger between Web and object
technologies are still being clarified. While it is easy
to hypothesize about how such merged technologies might
be used, concrete matching of hard requirements to actual
capabilities is at a very early stage. A lot of this is
still "technology push".
There is a need to better
understand how standards for defining representations,
such as XML, and standards for defining interfaces,
such as CORBA, can be used together in providing enhanced
interoperability. Distributed object architectures such
as CORBA have tended to emphasize interface standards,
while the Internet has tended to emphasize representation
standards. However, the two approaches are clearly
complementary, examples being the role of IIOP in
providing CORBA interoperability, and the role of the DOM
(essentially a set of interfaces) in providing a means to
add behavior to Web pages. Moreover, the two forms of
standards will increasingly be used together as, for
example, CORBA-based systems increasingly deal with data
in domain-specific standard representations.
The concept of "objects"
in the context of the Web should not necessarily be
identical to that of "objects" in a programming
language or conventional distributed object system. The
Web generally supports a philosophy of
"loose-coupling" (e.g., of data and
processing), which makes it highly flexible. This
essential flexibility should be preserved in the Web's
further technical development, given the diversity and
heterogeneity both of the applications the Web must
support, and the data and processing resources the Web
makes available for possible integration. This means,
among other things, that technology integration must be
modular, and it must be possible to easily alter
connections between data and processing resources to
adapt to new requirements. The general approach we have
identified attempts to take these requirements into
consideration.
The Web's standards process is in
many respects still maturing. The W3C has made tremendous
progress, and done some outstanding technical work, but
the incorporation of this work into widely-available
commercial products is somewhat spotty. This is to some
extent the result of the fact that the demand for
standards compliance is still rather lacking as compared
with the demand for new features. The increasing use of
the Web for larger-scale and enterprise-critical
applications will create much of the required pressure
for standards compliance.
Next Steps
Additional work is needed within
W3C on integrating XML, DOM, and the other technologies
we have identified, along the lines identified in our
framework, to support a full integration of Web and
object capabilities. Corresponding work is required
within OMG. At the same time, additional work is needed
to better understand the applications made possible by
such an integration of Web and object capabilities.
Another obvious next step is the
development of database-like capabilities based on Web
technologies such as XML, RDF, and DOM. We had originally
intended to work on such capabilities (in particular,
query facilities) in this project. However, we did not
pursue this activity due to a decision to concentrate on
Web/object integration, as providing the basic foundation
for this and other work. The database community has
defined extended query facilities (e.g., Lorel, UnQL) to
support their semistructured data representations. The
database community has also developed query facilities,
together with formal underpinnings, for SGML structures
(e.g., OQL-doc). Developments of this type of technology
have begun to address Web requirements, e.g., the recent XML-QL submission to W3C <http://www.w3.org/TR/NOTE-xml-ql>,
but further work is required in this area. Query-like
capabilities also play important roles in both formatting
specifications (a limited query notation for identifying
parts of SGML structures called SDQL exists within the
ISO DSSSL standard for formatting SGML documents, and a
similar notation exists within XSL) and more advanced Web
addressing mechanisms (e.g., the XML linking
capabilities). The possible integration of these query
capabilities is worth investigating.
Impact
The Internet today supports a wide
variety of data structuring mechanisms, such as HTML,
MIME, and many
existing and proposed metadata formats (e.g., SOIF, PICS,
Warwick Framework). These representations were developed independently for various
specialized purposes. The limitations and lack of
integration of these mechanisms increasingly creates
problems in developing advanced Web applications and in
providing advanced services for these applications. These
problems are particularly evident in applications which
require the Web to support rich structures of data,
metadata (data about data), and behavior, e.g., where
multiple users, not just authors, contribute to a
"knowledge base" of hyperlinked information
including both new information, information which
comments on or amplifies existing information, and
processes (whether in the form of application programs,
workflows, agents, or other forms) which act on this
information. If we can succeed in extending the Web with
object capabilities, it should be possible to not only
deal with the problem of representing all this
information, but also to add OMG-like object services and
database-like functionality required for managing that
information.
This project has identified the
foundational basis for supporting more complex data
structures and services in the Internet without requiring
major departures from current emerging Web technology.
The work also provides guidance toward rationalizing
further developments within the Web and OMG communities
for better-integrating Web and object technologies.
A program
immediately benefiting from this project would be the
DARPA AITS Architecture project, especially its Webserver
component, since the approach we have identified is
expected to provide the benefits of the current
idiosyncratic Webserver architecture but in a form
compatible with emerging industry standards. More
broadly, our approach provides a sound direction for
combining Web and object technologies into a richer
knowledge-based representation, which should benefit both
a knowledge-based Web and enterprise computing.
|

Objects
Home
IA Infrastructure
Metadata
Objects
Object Models
Object Models II
Object Models III
Object Models IV
Reports
Survivability
Webtrader |