Requirements for Versioning Tool

Tool Requirements

Interoperability and compatibility

The DIP ontology versioning tool will be integrated in the suite of DIP tools, and as such should be able to communicate with these. More importantly, it can benefit from the features offered by other tools in the ontology management suite. With this requirement this tool contributes to the overall interoperability and compatibility requirement for the tool suite. Deliverable 4, Section 2 situates the versioning tool in the DIP framework.

Genericity

The versioning principles should not be based on one particular ontology language, but should provide a solution for as many as possible multiple ontology languages and paradigms. Defining a model-independent versioning framework and tool might be utopian, but working in this direction we avoid discrimination.

An example of this is WSMO studio [todo ref], which supports conversions to and from various ontology languages. The versioning tool can utilize this feature to support interoperability and compatibility between versions, even if they are potentially written in different languages.

The ontology language adopted here is WSMO Core. The storage will be supported by ORDI.

Functional Requirements

In this section we elaborate on the functional requirements from a research point of view. For a detailed implementation priority list with respect to these requirements we refer to Section 4.

Version Space Representation

The set of all versions stored in the repository can be represented by a cloud of nodes in a version space. The change specification (in fact a transformation) between two versions is represented by directed edges connecting the respective version nodes. This results in a graph of versions. Some transformations are reversible, resulting into undirected edges (Figure 1).

Figure 1: An illustration adopted from [De Leenheer, 2004] of an arbitrary set of possible ontologies Ωkl. The dashed and solid arrows between the ontologies reflect transformations; each ontology is trivially accessible from itself so reflexive arrows are left implicit.

In general, in the repository we consider thus two types of first-class citizens: (i) versions (Requirement R4) and (ii) their inter-relationships (or transformations) (Requirement R7). Furthermore we have for each version (a) a unique identification (Requirement R5), and (b) an additional meta-information block (Requirement R6).

This version space representation gives the user the opportunity to revert changes, or for the limited duration of a session (or even permanently), revert back to a previous version of the ontology. It also acts in the first place as a conceptual overview of the requirements that are elaborated more in the following subsections; and in the second place as a possible visualisation of the version space.

Versioning Strategy

Trivially, a user must be able to store his version of an ontology for later retrieval. From this requirement it should be clear how we will physically store a version. Principles from classic versioning mechanisms such as CVS [Berliner, 1990], Subversion [todo - Collins-Sussman] can be adopted here.

Granularity of versions

Decisions have to be made what will be stored as being a new version. Different alternative levels of granularity are possible. We adopt and present here some principles from data schema versioning [todo - Andany et al., 1991]:

  • store whole ontology definition where the revision took place (lowest granularity);
  • store only parts, or more appropriate views, that where revised in the ontology;
  • store only the revised knowledge elements (highest granularity);

or even:

  • store only the formal change specification. The desired version is derived from the initial version by applying the consecutive stored transformations.

The granularity will only be augmented if performance and storage problems would manifest.

Authorization for revision

Permissions and restrictions for editing ontology versions should be foreseen, and supported by the underlying storage layer. If ORDI is chosen as storage layer, this is done automatically.

Version Status Identification

Ontology versions are given different labels, describing the state they are in. We identify essentially three:

  1. working version: the version is not stable, finished or closed: it is fully revisable in all its aspects;
  2. stable version: the version is complete and useful; versioning is possible;
  3. final version: the ontology has been agreed on and cannot be versioned or deleted ever again;

Naturally, administrator privileges are not subject to these restrictions.

The transformation transaction control (Requirement R8) must be aware of this requirement.

Version Identification

For each version stored in the system, there should be a persistent and unique identifier. This identifier is generated by an identification mechanism that is conform to Requirement R4. The latter means that the identification mechanism must be able to identify different versions of multiple ontologies, and versions of knowledge elements such as concepts.

Again we can refer to existing principles from software versioning [todo - Brown], and related work [todo - Klein et al.].

Meta-information about versions

Each version has important meta-information that is not necessarily for computer processing purposes, but rather for guiding the human ontology engineer in his versioning process. Apart from other potential elements, following elements are essential:

  • author(s);
  • date and time of creation;
  • informal comment on and arguments why the changes were made.

A more advanced way is modelling the information according to a certain (versioning) ontology, and some basic concepts of such an ontology have been highlighted by [todo - Klein]. However the relevance and added value of such an extra elaboration should be considered first.

Formal Transformation Specification

The evolution process between two ontology versions should be formally specified. [Klein 2004] provides different complementary alternatives (change logs, conceptual relations, and transformation sets) that can give a rich description of the change that the original ontology has undergone. We first discuss more basic needs.

A formal change specification describes unambiguously and correctly how exactly an evolution process of one ontology version into another one occurred. In the simplest case, an evolution process is a transformation described as a sequence of elementary transformations applied to a particular ontology.

Elementary transformations

Determining a set of possible elementary change operators goes parallel with determining which knowledge elements can evolve in an ontological definition, where the latter depends on the chosen ontology paradigm. We have stated in Requirement R2 that our versioning framework should be model-independent, so we assume there exists a finite set of atomic change operators, and that this set is available.

[todo - Banerjee et al., 1987] presents a taxonomy of change operators that can be applied to the ORION object-oriented data model. Other researchers did this likewise for other paradigms such as relational data schemas [e.g., todo - Roddick, 1993], conceptual data schemas [e.g., De Troyer, 1993Halpin, 1989], etc.

As an illustration, consider the RDFS model, where the evolvable knowledge elements are classes, slots, constraints, etc. This would result in respectively add class, drop class; add slot, drop slot; add constraint, drop constraint, etc. In general, when defining a taxonomy we can structure mutators in at least three categories: (i) for the specialisation/generalisation hierarchy, (ii) for the concept definitions, and eventually (ii) for the instance data. All ontology models should recognize these categories.

Basically, this set of operators must not restrict the possible transformations. In other words, it should be sound and complete.

More complex transformations

Next to the hard requirement for a finite, sound and complete set of atomic transformations (Requirement R7.1), we also require support for more complex changes. Complex changes allow the user to express his/her intent in a more meaningful (and high-level) manner.

Complex changes are built from a sequence of atomic changes, although the same sequence can have a potential different meaning; this is a very important distinction. Indicating a complex change defines exactly what atomic changes need to be made, but a certain sequence of atomic changes that matches those defined by a complex change, does not necessarily imply that the complex change has been made. As an example, consider two classes A and B, having respectively slots a and b. The sequence of deleting a in A and successively adding a in B is different from moving slot a from class A to B [Lerner, 2000].

The idea of complex change operators has been rarely studied in data schema evolution [Lerner, 2000], but has inspired related work in ontology evolution [todo - Stojanovic, Klein].

More advanced, a complex evolution process can be defined as transformations between subconstructs of the ontology. If the ontology is representable by a graph, a custom change could be the morphing from one subgraph into another. In general, a library can be kept of custom transformations. This idea is adopted from software evolution [Mens, 1999].

Version Validation Control

After the engineer has specified his transformation specification, he can consider it as a logical unit of work and store it in the version repository server. This transaction can have one of two outcomes. If it completes successfully, the transaction is said to have committed. On the other hand if it was not successful, the transaction is aborted. In the latter case the version repository is rolled back to its previous consistent state, and the user is notified. We refer to the ACID properties [todo - Haerder and Reuter, 1983] from the DB community here.

First of all the system should check whether the version is revisable by reading its permissions (cfr. authorization, requirement R4.2). Then and only then the system can and must check whether the transaction preserves the logical integrity of the version space before committing it to the server. The logical integrity consists of following two strong criteria:

  • Is the revised ontology logically valid?
  • Is the formal transformation specification correctly defined or deferred?

And two less important ones:

  • Is the revised ontology backwards compatible with the old ontology?
  • Is the old ontology forward compatible with the revised ontology?

Eventually the system could ask:

  • Is the revised version of the ontology the last one (or the youngest) that was made?

And decide whether to allow also updates of old versions of the ontology or not [Kim and Chou, 1988].

Any failure caused by a transaction in order to preserve/achieve logical integrity results in a roll-back of that transaction.

A last requirement is that each update of the version space should be persistent.

In data schema evolution several principles were defined to keep the schema consistent after each change of its definition. These solution principles are referred to as semantics of change.

A possibility is to define invariant properties intrinsic to the model to ensure semantic and structural integrity; and then to define rules or primitives for effecting the changes, by preserving these invariants [todo - Banerjee et al., 1987]. Mostly there are multiple alternative ways to preserve the invariants; the transformation rules are then responsible for choosing the most meaningful way.

Another possibility is to carefully restrict the set of mutators. In the relational model, only change atomic operators are allowed that preserve the consistency. More complex well-ordered sequences of such atomic change operators are allowed if there are constraints on their application order. [Roddick, 1993] requires the atomic change operators to be expressible in relational algebra.

Propagation of Changes

Once the ontology has been revised, its interpretation mapping with some committing applications might get broken, resulting in wrong interpretations of instance data and semantics. The responsible person for the committing application has several alternatives:

  • she ignores the new version and keeps her applications commitment to the old version;
  • conversion: she evolves along by adapting her application model so that a commitment with the new version is established with the new version;
  • alignment: she evolves along by only adapting the interpretation mapping such that the commitment is re-established with the new version, the application model is retained.

A change in an ontology is always a decision which has been agreed on by a community. Further on, each application responsible has to decide by itself whether she or he will follow the trend and change along. However, if the backwards/forwards compatibility requirement (see R8) is fulfilled, one has not to revise her application model or interpretation mapping.

Dropping a concept Suppose the decision is taken to drop the concept PERSON from an ontology. This means that the majority of the community members is not interested anymore in interpreting the concept. Members that do not agree keep their commitment to the old version, and the new version is backwards compatible with the old one.

Adding a concept A new concept is introduced in the ontology, resulting into a new version. Old application models will still be compatible with the new version.

Updating a concept In some paradigms this reflects to adding or dropping properties of a concept, in others this means dropping or adding relationships between concepts. Potential (consistency) problems must be anticipated here.

Changing constraints and rules The support needed here depends on the expressiveness of the language. But here also consistency problems might arise and must be anticipated.

A Feature for Analysing Differences between Versions

The user should be able to view differences between versions in an easy, and intuitive manner. A good starting point here can be the work done in PROMPTDiff. The custom transformations mentioned in Requirement R7.2 could be used in the comparison.

This feature unlocks possibilities for new applications such as forking/joining parallel ontology versions. When an initial ontology is used and evolved several times by different independent user groups (forking), it could be interesting to investigate the differences between the two resulting versions, and merge them. Note that ontology merging is not within the scope of this WP.

Impact Analysis

The impact of the consequences of a certain ontology evolution, both on the conceptual and instance level, should be calculated. Deployment of ontologies in applications typically comes down to committing [Guarino, 1998Meersman, 2001] or mapping the applications information assets (such as data schemas) to the ontological model. Any change in the ontology might break thus the integrity of this commitment.

For example, if the ontology has undergone an evolution process, impact analysis means to detect that the semantics of the web services are broken, and that the existing service clients may now be getting answers that are interpreted wrongly. Requirement R9 tackles this problem.

Further, there is a need to calculate the impact of the consequence in case of change cascades. If ontology A is included in another ontology B, then there may be consequences for B if A is revised.

Note that impact analysis only analyses the possible impact, and informs the engineer. It does not solve any of the problems it detected.

Distributed Environment Support

Ontology construction and deployment is an extensive task which is typically tackled by multiple teams of knowledge engineers and/or domain experts. They work concurrently on the same or different parts of the ontology, and regularly synchronisations are needed, resulting in frequently generated stable versions.

Interface Requirements

This subsection specifies the essential client-side requirements. Much of it is based on the ideas in [De Leenheer, 2004] and [todo - Stojanovic, 2002].

Version Browser

A graphical version space (according to Requirement R3) browser providing a convenient view on all first-class citizens (being ontologies and their inter-relationships) stored in the server. Further, a zoom-feature on all first-class citizens enabling:

  • zooming on an ontology label in the version space returns the specification of the ontology in an appropriate representation or language;
  • zooming on an inter-relationship provides detailed information on the associated transformation (Requirement R7) and meta-information (Requirement R6).

Transformation Editor

The engineer needs an editor where can evolve ontologies in 2 ways:

  1. either he knows precisely how to change the ontology: he can define the transformation syntactically;
  2. either he does not know exactly how to change the ontology, but has a perfect idea of how the resulting ontology should look like: based on aimed result, the tool can try to generate the transformation for him.

Notification Agent

The tool needs an agent to manage and broadcast messages and notifications to the users. E.g., when an engineer has committed a transformation, the notification agent is responsible for notifying possible implications such as cascading changes, and forced roll-backs as illustrated in Requirement R8.

Implementation Priority List

For the implementation we have distinguished the following phases:

  • Version 0: until Dec 2004
  • Version 1: between Jan 2005 and Jun 2005
  • Version 2: between Jul 2005 and Dec 2005
  • Version 3: between Jan 2006 and Jun 2006

The Xs indicate in which phase which requirement is being initially tackled.

Req ID Versioning Requirement Version 0 Version 1 Version 2 Version 3 Priority
R1 Interoperability/Compatibility X (affects all implementation)
R2 Genericity X (affects all implementation)
R4 Versioning Strategy X HIGHEST
R5 Version Identification X HIGHEST
R6 Additional Meta Information X HIGH
R3 Version Space Representation X HIGHEST
R13 GUI Version Browser X HIGHEST
R7 Formal Transformation Specification X HIGHEST
R14 GUI Transformation Editor X HIGH
R10 Difference Analysis X HIGHEST
R9 Propagation of Changes X HIGH
R8 Version Validation Control X MEDIUM
R11 Impact Analysis X HIGH
R15 GUI Notification Agent X HIGH
R12 Distributed Environment Support X LOW

Acknowledgement

The work is funded by the European Commission under the projects DIP, Knowledge Web, Ontoweb, SEKT, SWWS, Esperonto and h-TechSight; by Science Foundation Ireland under the DERI-Lion project; and by the Vienna city government under the CoOperate programme.

The authors would like to thank to all the members of the OMWG working group for their advices and inputs to this document.