Site Map
Home

A Global Ontology for a Truly Dynamic Web

  • This is a white paper that was presented at the Semantic Technology Conference at the Fairmont Hotel San Jose on March 6-9, 2006.

Introduction

  • This white paper is about a vision that much of the hard work of data collection can be moved to the point where the knowledge concerned is first recognized as such. Once recorded it would be available for any purpose anywhere. The knowledge concerned would be captured in a form that allows for all possible uses from simple record keeping up to and including the drawing of inferences.

A Global Knowledge Base - my vision for the Web

  • This paper is classed as "technical advanced", but what I have to say is actually very simple - simple but radical. I have picked-up on the W3C aspiration that the Web should evolve from a "web of documents" to a "web of data about everything for everyone". In my view this requires a global knowledge base. By this I mean a global repository of facts and perceptions that would reduce the amount of knowledge that is locked up in spread sheets and specialized data bases.
  • The main content of this paper is about the management of semantics across the Web - but first I need to set the scene by explaining what this global knowledge base would look like and how it would work. My aspiration is that everyone with knowledge they wish to share would be motivated to use Semantic Web technology for its primary storage.
  • This may seem a bit extreme but it is not so far removed from the "Web of pages" that has become so successful. It differs in two significant ways:
    • It handles raw knowledge with specific meaning rather than presentations of information that must be taken at face value
    • It is a service for websites rather than for individual client users
  • This global knowledge base would operate behind the scenes to enable the owners of individual websites to share knowledge in a way that would be transparent to their clients. Up-to-date knowledge drawn from the global knowledge base would be inserted into appropriate pages automatically as they are downloaded to individual clients.
  • The physical storage of this global knowledge base would of course be shared between the participating sites. For good logistic reasons the data concerned would be distributed to consuming sites as soon as it is entered. In this way responsibility for storage would lie with the consumers and the cost to contributors would be minimised.
  • Another benefit of this arrangement is that the passing of knowledge from one site to another would occur only once for each new version of any detail. This is far better than conducting a search at the site of origin every time such information is needed for inclusion in a page requested by a client.
  • The result of this arrangement is that each participating site has its own non-exclusive subset of the potentially vast global knowledge base - this subset being finely tuned to its owners' perception of what they want to present in the pages on that site.
  • At this point I should mention that the technical requirements of such an arrangement are relatively simple. I should also point out that this idea and the standards that I hope will follow are intended for participation by every Web user with the skill level needed to create a simple home page. At the same time it should have the power to attract participation from those with sophisticated web applications.
  • To make this idea work effectively we need to focus on the underlying nature of knowledge as distinct from the many ways in which it can be presented.

Knowledge as a Web Resource

  • The first thing to understand about knowledge is that it is never absolute. We acquire knowledge by a variety of means. Sometimes this is by direct observation or experience but even then we can be mistaken. More often we acquire knowledge second hand through what we read and hear.
  • None of these mechanisms is perfect and we all find it difficult to take on board knowledge that is based on unfamiliar concepts. On being told that a certain city is 200 miles away we will probably understand what is being said provided always that we know the language being used and are already familiar with the concepts of 'city' as a destination, 'away' as an indication of geographical separation and 'miles' as a measure of such separation.
  • If any of the above requirements for pre-knowledge are not met then there is a real risk of misunderstanding. Even where both parties share the language and the underlying concepts being used, the consumer still has to decide whether to trust the source as both competent and honest. If we are to obtain knowledge direct from the Web then all these issues must be addressed.
  • So how do we deal with this lack of certainty in a knowledge base that has contributors from all over the world with different cultural backgrounds and no vetting procedure? First we must accept that the 'knowledge' contained is comprised of opinions rather than facts. Truth is a perception that is formed in the mind of each person on the basis of evidence. All that can be shared through a global knowledge base is 'evidence' in the form of assertions.
  • In dealing with these assertions the easy bit is to decide how their 'payload' is to be represented as data. This is easy now but only because of the immense amount of effort that has already gone into defining standards for the representation of information over many decades. We now have XML and this will do very nicely thank you. But in a global knowledge base XML alone is not enough. We also have to deal with the issues of conceptual framework and trust.
  • In a truly global knowledge base containing 'knowledge about everything', the required conceptual framework is immesurably complex and ever growing. So how do we deal with this? Here again a lot of effort has gone in to defining standards and we can utilise the concepts developed for the 'semantic web'. These are grounded in the field of artificial intelligence where knowledge has to be expressed with sufficient rigor that it can be utilized by machines.

Keep It Simple

  • The key to making this work is "keep it simple" - remembering all the time that the global knowledge base is primarily for the users of knowledge not for highly skilled application programmers.
  • In adopting this existing work we must take account of the particular demands that stem from the unique purpose of the global knowledge base. We must so format the knowledge that its meaning can be effectively expressed by reference to the well understood semantic concepts of class and property.
  • What this means in practice is that there is a limit on the complexity of knowledge that can be shared. This should not be taken to imply that topics that are inherently complex should be avoided - only that all knowledge must be broken down as far as possible. Here I do not mean "as far as is easily done" I mean "broken down to the point where any further division would create fragments that have no useful meaning on their own".
  • Breaking information down into fragments that are "indivisible and semantically complete" does not require technical skill like the design of databases - just a cool head and full understanding of the real-world concepts involved in each case.
  • For example a stocklist of ingredients available to a cook can be broken down into its individual items, but these cannot be further broken down into their three data elements which are: a number, a unit of measure and a reference to a commodity such as flour or butter. This because such fragments are meaningless on their own.
  • Now consider the list of ingredients for a recipe. This looks similar but cannot be broken down at all because even the whole list is useless without the 'method'. Whether we change just one of the numbers, add a whole new item or slightly alter the wording of the 'method' we get a whole new version of the recipe. It takes a cook to know this - not a data analyst!
  • Now to deal with the matter of trust we need to ensure that every assertion is recorded with reference to its provenance and the context within which it applies. The provenance must include the immediate source with perhaps the ultimate source and/or a confidence factor. The context might include things like a (type of) activity, a (type of) location and/or time. It all depends on the nature of the knowledge concerned but must be sufficient for any would-be user to make an informed judgment on whether that source can be trusted to provide knowledge of that type - always bearing in mind the purpose that this particular user has in mind.
  • To make the semantics work in a simple way we must always treat each assertion as indivisible with just one subject, one meaning (reference to one property) one context and one provenance. It has just one value which can never be changed. If this value is found to be wrong or the thing described actually changes in the real world then the whole assertion should be superseded by a new one with a different time line. Similarly if some other source thinks it knows better it should create a competing assertion distinguished by its own provenance. This is because:
    • When a source selects the property for an assertion it takes responsibility for the applicability of that property to the subject concerned including all inheritance implications.
    • Assertions from sources that cannot be trusted to accept this responsibility are of dubious value and may be ignored.
  • The chief benefit that results from treating all assertions as indivisible and semantically complete is that they can be distributed individually with their meaning intact and without any need for synchronization.
  • The meaning of each assertion is determined absolutely by the definition of the cited property and is not affected in any way by the context in which it is presented. This much is axiomatic. Of course this definition may be inadequate and its creator may be aware of greater semantic depth depending of a variety of contextual factors but this counts for nothing in the global knowledge base.
  • In order to place such depth and variety on the Web, each of these contexts and meanings must be defined. There is nothing to prevent different assertions having the same value. There is also nothing to prevent an individual user from selecting a body of assertions, placing them in a new context and drawing inferences which comprise new knowledge with its own provenance. Indeed this goes to the heart of what the global knowledge base is all about. It is just a standard for the packaging of knowledge so that it can be readily selected, moved around and grouped for unforeseen purposes.

Dynamic but Stateless

  • The envisioned global knowledge base is stateless like the Web itself. It is designed to contain beliefs about all possible states of everything. Where a person or process needs to know the state of some collection of things as at a point in time this is compiled by selecting the appropriate assertions and using them to create a coherent set of 'relations'.
  • The selection of appropriate assertions to form this view is based on the context and provenance attached to each. For this reason, success with the global knowledge base is critically dependent on having an adequate standard for the description of both context and provenance. This standard must be applied across the board so that a single selection algorithm will yield comparable results from any population of assertions.
  • At its simplest, the standard for context definition need be no more than an insistence that each context is defined as a distinct Web resource with its own URI. The description of a context could be rich or very basic. In either case a person or process making selection from the global knowledge base would first have to know and understand the way contexts have been applied to assertions by the sort of contributors whose knowledge it wishes to receive. This would work very well within a reasonably well defined field of interest.
  • The quality of description for these contexts would vary and selecting what context to cite when selecting assertions for a purpose would require human judgment based on wide knowledge. To develop beyond this simple start we need to analyse the way in which context is used and understood by people in general so that the description of contexts could be standardized. This could provide a basis for the design of context criteria that could be readily specified. Common factors would be things like geographic location, activity (type), organization (type) etc. Much work has been done on this in the field of battlespace data management.

Global Semantics

  • For valid comparison and free combination of assertions that may have been created at opposite ends of the Web, their expressions of meaning must all come from the same ontology. This would have to be an ontology with unlimited depth and breadth so that it can grow into a single global ontology containing all classes and all properties.
  • This global ontology must follow the standards developed for use by machines but must also make sense to people. This means it must reflect the way people use the concept of generalization to describe the world.
  • Another issue for a knowledge base designed to be equally friendly to humans and machines is that much of the classification of individuals is temporary and/or a matter of conflicting opinions. This makes no particular demands on the ontology but does affect the way it is used. How can a machine find inferences about individuals if much of their classification is transient, fuzzy and indirectly expressed? My answer to this question is that the selection of assertions for analysis should:
    • Resolve issues of timing through an algorithm such as "latest available".
    • Resolve issues of conflicting opinion by specifying one or more acceptable sources.
    • Resolve indirect expression of classification as explained above.

Classification and its Implications

  • In order to connect with the way generalization is used in human thinking and dialogue it is necessary to go back to the basic idea that "individuals can be recognized as a group if they have something in common". This sort of group is known as a class. If we believe that a certain individual is "in a particular class" then we can make an assertion to that effect.
  • Knowledge of this sort can be expressed as a triple in the form <individual x> is a member of <class id>. Instead of "is a member of" we could use the words "is in", "is a" or their equivalents in other languages. These phrases are just different ways of citing the rather special predicate that represents the concept of generalization.
  • Assertions using the "is a" predicate may be thought of as classifying assertions while others such as "Fred is owner of car with registration number XYZ324" are purely descriptive. This sentence is a triple which breaks down into:
    • Subject: "Fred" - which identifes an individual.
    • Predicate: "is owner of car with registration number" - which identifes the property for which a value is being asserted.
    • Object: "XYZ342" - which is the value asserted for that property in respect of the subject "Fred".
  • Now the above case is a simple assertion based on a very user friendly predicate. In machine friendly terms it breaks down into the two triples:
    • Resource xxx has name "Fred".
    • Resource xxx is owner of car with registration number "XYZ342".
  • These triples both have predicates expressed in user friendly terms. If these are to make sense to machines then the phrases "has name" and "is owner of car with registration number" must be recognizable as properties defined somewhere with all their implications. This is one of the roles of an ontology.
  • If an ontology this user friendly is also to be useable by an inference engine then it will need to relate this user friendly property to something more basic:
    • The property "has name" is perhaps the most basic of all. In the global ontology it might be specified as applicable to instances of the "class of everything" and be inherited by classes like "person" and "car".
    • The property "is owner of car with registration number" might be defined as a sub-property of the more general property called "is owner of", which in turn would be defined as the inverse of "is owned by".
    • The property "is owner of" represents a concept that has wide use. It might also be specified as applicable to instances of the "class of everything" but its sub-property "is owner of car with registration number" should be applicable only to instances of the class "legal entity" with sub-classes such as "person" and "corporation".
  • With an ontology like this it would be clear to all, both machine and human, that the assertion "Resource xxx is owner of car with registration number XYZ342" implies that the source of this assertion believed that Resource xxx was something capable of owning a car, i.e. a person or corporation. This implication may be exploited by any user.
  • The point of this example is to show that a global ontology must make explicit the classifying implications that are hidden within descriptive properties.

Classifying Schemes

  • The issue here is that a global ontology must accomodate the basic reality that there will never be agreement across the Web on the basis on which any class should be specialized. This is so obvious that one might wonder why something so basic and obvious should be ignored completely by a standard like OWL. I believe that this omission stems from the purpose by which its creators were motivated.
  • If the main driver for creating an ontology is to express a particular mindset so that knowledge of a particular kind can be interpreted by a machine, then the overriding concern is to express the logical paths of inference that this machine should follow. In this case the management of different views on how a class should be specialized is just not a consideration. The only issue of importance here is to define the classes, their inheritance, their properties and other rules.
  • For a global ontology there are other considerations:
    • Consider the classes: "US Citizen", "British Citizen" etc.
      • Both of these are sub-classes of person.
    • Now consider the classes "Christian", "Buddhist, etc.
      • These are also sub-classes of person but the basis of specialization is different.
  • A global ontology must embrace many such classifying schemes and express all associated implications.

Growth Within the Global Ontology

  • A key issue for the envisioned global ontology is provision for growth. If the ontology is required to grow so as to accomodate all types of knowledge about all types of thing then the end result is not what matters since it will never be reached. Instead we have to focus on the process of organic growth.
  • The envisioned global ontology will continue to expand without limit because the human capacity for invention of new ways to perceive the world is itself without limit. This need not mean a proliferation of independent ontologies. There is a much better way.
  • What we need for Web-wide semantics is a single global ontology which is designed for organic growth in the following three dimensions:
    • Add a new way of classifying things.
    • Add a new class.
    • Add a new property.
  • When adding a new way of classifying things it is important to specify what existing class describes the sort of thing that can be further classified on this basis with an assurance that the resulting description and inheritance would be meaningful.
  • When adding a new class it is important to determine and specify whether it is a new specialization within a basis that is already represented or is just one example of a wholly new way of seeing things. In either case the new class must be placed within an appropriate classifying scheme. If one cannot be found then a new one must be created.

Properties

  • The various properties associated with a class are either:
    • Class properties where all individuals in the class share the same value.
    • Instance properties where each individual in the class has its own value.
  • In both cases the value is expressed in an assertion having the property as predicate
    • For class properties these assertions have the class as subject.
    • For instance properties these assertions have the individual as subject.
  • What really matters in a global ontology is to be clear which of these two descriptions applies:
    • Class property: this is a property which is defined for a class with just one value that applies to every individual within this class, such as the design weight for a type of vehicle - every individual vehicle of this type has the same design weight even though their individual weights may be different.
    • Instance property: this is a property defined for a class for which each instance of the class must or may have a value. The 'individual weight' mentioned above would be such a property because the value for each individual vehicle would be affected by variations in specification and will vary from time to time as fuel etc. is added and removed.

Acknowledgement

  • To the best of my belief the concept of classifying scheme as outlined above first emerged in a dialogue between myself and my colleague Ken Allen at Blandford in 1996 when we were working together for the British Army. After many twists and turns it is now to be found as a core concept in a modeling language called CBML that is used to articulate and correlate the many disparate but overlapping ontologies with which the Army has to work.
  • In the less controlled and even more disparate knowledge space of the World Wide Web this concept has merit as a way of allowing individual contributors to retain control over the specification of sub-classes that meet their own needs while incorporating suitable higher level classes which already exist as a Web resource.

OWL/RDF and the Semantic Web

  • OWL and RDF lie at the heart of the Semantic Web activity of W3C. RDF provides the standard for knowledge representation while OWL provides the standard for specifying the meaning and implications of properties and classes that may be cited in the RDF expression of knowledge.
  • These two languages have sound origins in the fields of artificial intelligence and knowledge engineering. They also benefit from the existence of highly effective software tools for the creation of ontologies as well as for the storage, browsing and analysis of knowledge bases.
  • The immediate benefit of these standards is that both knowledge and definitions of meaning and implication (ontologies) can be shared across the Web and processed on any platform. There is also a strong expectation that knowledge represented in the form of RDF files of different origin will 'snap together' to form aggregates for analysis.

Can RDF Go Global?

  • The RDF standard for knowledge representation is a triple comprised of one subject, one predicate and one object. This has strong echoes in the concept of assertion with its one subject, one meaning and one value - indeed the only difference of substance is that the assertion also has a context and provenance.
  • In order to support a global knowledge base as envisioned here this context and provenance are essential but the triple that makes up the payload of each assertion is a triple fully ready for inclusion in an RDF file for the purpose of analysis.

Can OWL Go Global?

  • The OWL standard for representation of ontologies on the World Wide Web is based on the concepts of: Class, SubClassOf, Property and SubPropertyOf etc.
  • To express a global ontology as envisioned here it would be necessary to augment OWL with support for the concepts of:
    • Classifying Scheme: with reference to a parent class.
    • MemberOfScheme: a reference to be used in the description of classes in place of the existing 'SubClassOf' concept.
  • In effect this would replace the simple hierarchy of classes by a similar hierarchy with classifying schemes interposed at each level. The presence of these classifying schemes does not disrupt the underlying hierarchy of sub-classes or alter it in any way. All it does is group the sub-classes at each level on each leg according to the basis of specialization such that "Christian" and "Buddhist" would be in one group while "US Citizen and "UK Citizen" would be in another - even though all of these are sub-classes of person.

How Does the 'Global Ontology' Help?

  • Progressive growth of a global ontology as described above would greatly ease the problems of integrating relevant parts of the large amount of knowledge data that is already available across the world. I believe that our current inability to do this on any useful scale has costly consequences.
  • We should be clear that the role of the global ontology and that of individual ontologies is quite different as follows:
    • Individual ontologies are used chiefly to provide semantics for bodies of knowledge expressed in the form of RDF so that they can be used to make inferences.
    • The single global ontology is designed as a library from which definitions of classes and properties can be drawn so as to build individual ontologies. The virtue of using definitions which are widely available on the Web is that the resulting individual ontologies will have valuable compatibility in so far as they include the same types of knowledge.
  • The process of using a global ontology to assist in the forming of individual ontologies is one of extracting the relevant subset which is then extended to meet the needs of a particular purpose. During this process of extraction the classifying schemes used for global ontology management are quietly removed to leave a simple hierarchy of sub-classes. All that happens in this process is that class membership of classifying schemes is replaced by a direct "SubClassOf" relationship to the class defined as parent for that scheme.
  • Where the extracted elements incorporated into many individual ontologies happen to overlap there will be assured compatibility. This compatibility can be retained while expressing the ontologies in full accordance with the OWL standard so that existing software can be used.

Conclusions

  • While this short white paper does not attempt even proof of concept it does appear to show that the concept of a single global knowledge base is feasible. More importantly I hope that it indicates how this vastly ambitious idea can come into being through the combined effect of many small individual initiatives by Web users of all kinds.
  • All that is required to enable this to happen is quite modest extension of the existing W3C standards of RDF and OWL. In both cases the extensions would only apply to use in the context of the single global knowledge base and its corresponding single global ontology. They would not in any way disrupt the use of these standards for individual purposes as they stand.
  • It is my belief that progressive growth of the envisioned global ontology will increase semantic harmony across the Web. Also that the companion global knowledge base will help to create an effective "web of dynamic data" - all this with very little investment by anyone.