
eCommerce
Search our
entire site
Enter your search
terms below, or visit
our search
page
Search
case
studies only
Enter your search
terms below:
For
the table
of contents and
hyperlinks to
general topics
proceed to toc
 
|
Survivability in Object
Services Architectures
- A "survivable"
application can continue to function despite the
loss or degradation of some of its components,
will maintain its functionality and performance
for as long as possible, and will degrade
gracefully when this is no longer possible.
Survivability relies on redundancy to allow
normal operations to continue as long as
possible, the ability to reconfigure to correct
problems, and policies defining acceptable (but
less desirable) functionality or performance
should it prove impossible to maintain the
desired behavior.
We
are developing an architecture and Survivability
Service to make OSA-based distributed systems far
more survivable in the face of component failure
and degradation than is currently possible.
The architecture unifies a number of existing
robustness mechanisms and adds several new ones
to provide a variety of tools that can be applied
in different situations. Because of the
complexity of system-wide survivability, it is
impossible to have a "master plan" for
assuring survivability. Instead, we use
market mechanisms to create global survivability
as an emergent behavior resulting from a large
number of small, local decisions.
Our approach maintains the
simplicity of OSA application development that
has been largely responsible for the popularity
of OSAs by not requiring individual applications
or services to be responsible for the details of
ensuring their own survivability. This is
necessary because survivability is difficult to
program and its development costs should be
amortized across many applications, the
survivability needs of different applications or
services often conflict, and survivability
requires a more accurate knowledge of the
eventual deployment environment(s) than is
reasonable to expect at development time.
To achieve this, we make survivability orthogonal
to conventional OSA application semantics; in
other words survivability is "added" to
an application rather than built into it from the
start. This is done by a
"Survivability Service" that handles
the survivability needs of applications
collectively, responding to changes in workload,
resource requirements, resource availability, and
threats based on a number of environment models
that can be specified independently. A
consequence of making survivability orthogonal to
application functionality is that changing the
models (not the applications or services) allows
applications to be deployed into dynamically
changing or unanticipated environments.
This approach also supports the use of COTS and
GOTS that are not constructed for survivability.
The key to constructing
survivable systems is to configure them in such a
way that they can be easily reconfigured when
needed to survive loss of system resources.
We have extended and clarified the standard OSA
object model to create a survivable object
abstraction that makes it possible to define
a set of "survivable configurations"
that are able to withstand component loss and are
also capable of being systematically evolved into
new configurations should component loss become
severe. The abstraction provides ways to
change both the physical configuration (different
service placement or resource allocation) and the
logical configuration (service alternatives or
changed levels of service quality) of an
application. Developers use the abstraction to
specify, implement, and connect services.
The OSA Survivability Service manages
configurations defined in this abstraction to
keep them running as well as possible given the
currently available resources. The object
abstraction:
- makes a clean
distinction between the abstraction of a
service instance and its
implementation(s) in order to support
replication, instance migration, change
of implementation class for a given
service instance, and multiple
simultaneous implementation classes for a
given instance;
- abstracts the bindings
of clients to services and
implementations to resources in order to
allow an OSA Survivability Service to
determine which service instance best
meets the needs of a client, and how and
where that service instance should be
instantiated;
- defines useful
patterns of object configurations that
have desirable survivability properties;
- uses the concept of
quality of service (QoS) to allow
alternatives to both service bindings and
implementation instantiations in the
event resource limitations prevent
optimal behavior; and
- defines legal
transformations between legitimate
configurations.
We believe that a key to
adding any kind of "extra-functional
behavior" such as security, persistence,
survivability, etc., is to have an object
abstraction with the right kind of
"translucent joints" where systems can
either be mediated or taken apart and reassembled
dynamically in different ways. A joint is a
well defined place where a binding between system
components may be made. In general, more
information about the binding than is common in
programming languages is maintained; this could
be a statement of requirements of any object that
can satisfy the binding, the provenance required,
information flow restrictions, QoS, etc.
Translucence means that the joint is visible if
desired in order to use its special properties,
but otherwise is invisible except possibly for a
small performance penalty. In fact, it is
often possible to reduce or completely eliminate
the performance penalty at the cost of more
complexity in changing the binding. Prior
examples of the use of such joints to add
behavior are persistence and transaction control
in Open OODB and the security in the OMG Object
Security Service.
We are specifying the architecture
of an OSA Survivability Service to manage
applications defined using the survivable object
abstraction. The architecture supports a wide
variety of survivability actions (below), is
compatible with existing OSAs and projected
trends (including the various repositories and
the CORBA Security Service), and encompasses a
wide variety of existing research in fault
tolerant systems, failure detectors, system
models, etc. We currently have an overall
architecture for the Survivability Service that
covers the "big picture" of how the
components relate, including an internal
partitioning that allows major subsystems to be
replaced or refined, possibly by third parties.
Survivability actions supported by the OSA
Survivability Service are:
- Basic Process
Control gives the ability to
start, stop and restart processes, to
clean up after failed or aborted
processes, and to restore processes to
known states. Most of this is
provided by ORBs.
- Fault Tolerant
Services are services designed to
(usually) fail in known "good"
ways. Their failure modes become
part of the service specification.
This must be provided by the service
developers.
- Failure Detection
& Classification are
mechanisms to detect the symptoms of
failures and attacks, and classify the
events into likely failure
categories. This can be done
through probes, wrappers, or exception
reports from well-behaved
services. We will obtain
these mechanisms from elsewhere.
- High Service
Availability mechanisms use
replication or hierarchical masking
(i.e., error handling in the client) to
make individual service instances much
more highly available than they would
otherwise be. We concentrate on
replication-based policies since they do
not rely on the semantics of the services
and are therefore more widely applicable.
Many replication-based policies exist and
some are integrated with ORBs. These
mechanisms make it possible to physically
reconfigure an application by changing
the way individual services are
implemented; the logical organization
remains fixed in that clients still
interact with the same services after any
reconfiguration.
- Availability
Management determines the appropriate
fault tolerance mechanism to use for a
given service based on service failure
modes and perceived threats, and
determines the resource pool needed to
achieve desired availability. This is
where much of our design and development
work has been done.
- Service
Renegotiation makes it
possible to change the logical
organization of an application by binding
clients to alternate services if the
desired service should become unavailable
or degrades in performance. The
rebinding can be to an equivalent, but
distinct service (e.g., a different
server having the same maps), or to a
similar, but acceptable service (e.g., a
different server with maps of the same
area but at lower resolution).
Alternatively, the same service
connection can be maintained but at a
lower quality of service (e.g., more
errors or slower). In addition to
allowing rebinding to service
alternatives when services fail, service
renegotiation can represent a fallback
position if the costs of assuring service
availability become unacceptably
high. Service renegotiation
requires specifications of client-service
connections well beyond those currently
used in OSAs, and will be a main focus of
our project in the next year.
The OSA Survivability
Service configures and reconfigures applications
using currently available resources in an attempt
to avoid know threats. It uses a collection
of environment models describing
resources, threats, and overall situation in
determining what to do. These models are
defined roughly at present.
We are building a
Survivability Service prototye, including a
market mechanism for resource allocation, simple
models and model evolution to drive survivability
decisions under changing conditions,
specifications of how to rebind logically
equivalent or similar services, and some
visualization. This will allow
demonstration of a cohesive part of the
Survivability Service by the middle part of
1998. A concept demonstration of part of
this currently exists.
We are interested in
attending this workshop in order to trade ideas
about object abstractions and joints, and to
contribute to a discussion of how different
behaviors applied at the same joint should be
allowed to interact.
|

Objects
Home
IA Infrastructure
Metadata
Objects
Object Models
Object Models II
Object Models III
Object Models IV
Reports
Survivability
Webtrader |