|
A Global Knowledge Base
- The envisioned knowledge base is designed to contain information of any type about anything. It is a single coherent resource for use by anyone who is willing to subscribe. Its scope, content and value is determined by the cumulative effect of self-motivated collaboration between site owners.
- It is built-up and maintained through a form of publish and subscribe service with standards that ensure coherence and integrity. Information about any number of things, sourced through any number of sites, blends into a single virtual database which is distributed across subscribing sites in accordance with their own declarations of need.
- The meanings of all assertions are specified by reference to properties of classes. Both properties and classes have URIs and are themselves described by assertions on the dynamic web. These blend into a coherent whole because of the way new class specializing schemes are stitched into the existing body of definitions.
A Global Ontology
- Each class specializing scheme on the dynamic web is just a set of classes with three things in common:
- They are all sub-classes of the same parent class.
- They are all defined within the same site.
- They are all differentiated on the same basis of specialization.
- The class cited as parent for a class specializing scheme must already exist somewhere within the dynamic web. The owner of a class has no responsibility for any subordinate class specializing schemes and may be unaware of their existence.
- This structure enables all site owners to participate in a voluntary and unco-ordinated collaboration across the Web to grow a global ontology. This comprises a single hierarchy of class specializing schemes, each of which is created and maintained by one interested party through a particular site to represent a particular perception.
- The power of this global ontology comes about because the individual classes within each new classifying scheme inherit the definition of the class cited as parent of that scheme. Each site can use this inheritance to exploit what exists while remaing free to add extensions to meet the particular user needs that it is designed to serve.
- This global ontology is not dependent on any central policy or control but will evolve like a living organism. Class specializing schemes that serve a wide range of purposes will include member classes that come to be enriched with many subordinate schemes while those that acquire no popularity will remain as harmless backwaters.
A Hierarchy of Flat Class Specializing Schemes
- The natural growth of this ontology into a global entity is made possible because each class may have any number of subordinate class specializing schemes while these do not interfere with each other. Each scheme represents a further refinement on one particular basis. When other users want to cite a class to classify a resource they can choose from the member classes of all these different schemes. There is no need to try to avoid the creation of schemes that compete with others having a similar purpose. Such competition will stimulate improvement and those that best fit a purpose will acquire the de-facto authority than comes from popularity.
- The essential freedom of choice derives from the way each class specializing scheme is flat, i.e. restricted to a single level of sub-classes. This may seem odd because we are used to hierarchic taxonomies with many levels. In the global ontology of the dynamic web the only way to introduce a new level of sub-classes is to specify a new subordinate class specializing scheme.
- The main reason for this is to ensure that the basis of class specialization at each level is clearly specified. If we were to allow sub-classes to be specified without placing them in a class specializing scheme there would be nowhere for this 'basis of breakdown' to be expressed. Users would have to infer this basis by looking at all the sibling sub-classes.
- Of course the best hierarchic taxonomies come with a well designed structure that is supported by a high quality explanation of the basis of class specialization at all nodes.
The only problem with these is that they come as a package on a 'take it or leave it' basis. There is no mechanism through which a user can take those bits they like and add new specializations as and where they perceive a need. Other hierarchies of classes impose control on the way things are classified. I find this unacceptably restrictive and alien to the underlying philosophy of the web.
- For use in the dynamic web we need the explanations of the basis of class specialization to be expressed in a standard way. The procedure for adding an existing hierarchic taxonomy to the global ontology requires the basis of specialization at each level to be clearly expressed in a distinct flat class specializing scheme so that it is clear for all to see. If this should present any difficulty then there must be something wrong with the taxonomy and this will need to be sorted out before it can make a useful contribution to the global knowledge base.
Competing Specializations
- A major problem with hierarchic taxonomies is that users get locked in to the whole hierarchy including those parts that are poorly suited to their needs. What the dynamic web gives is a single hierarchy of single level class specializing schemes rather that a collection of competing class hierarchies.
- Competition on the dynamic web is helpful because it is localized, i.e. it is only between different ways of specializing a single existing sub-class - the most popular of these will come into wide use and be further specialized. This is quite unlike competition between class hierarchies which is seriously problematic as soon as one ventures beyond the bounds of a single tightly controlled academic discipline. The difficulty with class hierarchies is that there is only one basis of class specialization at each level on each leg - and this is fixed. Any user of such classes has to accept the hierarchy of sub-classes as a whole even though this precludes many possible specializations that are equally valid.
- For example suppose one user places in the global ontology a definition of the class "Person" together with a breakdown by ethnic origin. The rules of the global ontology ensure that this breakdown is kept separate from the definition of the class "Person" and is defined as a class specializing scheme called "Person by ethnic origin" linked to the parent class only by citing its URI.
- A second user may find this definition of the class "Person" and be content with it but wants a breakdown by say "country of origin". Because the existing breakdown is quite separate from the definition of the class "Person", it can be ignored by the second user who is therefore free to contribute a subtly different class specializing scheme called "Person by country of origin".
- How can this fragmentation be a good thing? The point is that in the global ontology of the dynamic web no one, not even the originator of a class, has the power to control how it is broken down. Any participant in the dynamic web can specialize any class in any way it likes. Each such specialization is represented by a separate class specializing scheme and does not interfere with other schemes in any way.
- If the dynamic web did not insist on having class specializing schemes at each level there would be no way of knowing which sub-classes are alternatives and which can be used together in describing a specific thing at a specific time. In the above example, if sub-classes were not placed in distinct single level class specializing schemes, sub-classes like "British Person" and "French Person" would be mixed-up with sub-classes like "Anglo-Saxon Person" and "Chinese Person". How would anyone know whether the sub-class "Chinese Person" means "Citizen of the Chinese Republic" or "person of Chinese ethnic origin"? A given instance can be either or both of these.
- On the dynamic web this ambiguity cannot arise because every class is a member of one and only one class specializing scheme. This leaves no doubt what is meant because the basis of breakdown, e.g. "by citizenship" and "by ethnic origin" is fully explained in the definition of the scheme to which each class belongs as member.
-
In some cases there may be several class specializing schemes with little real difference in their basis of class specialization. This is bound to occur as a result of independent initiatives that happen to address similar needs without being aware of each other - or just because people find it difficult to agree.
- On the dynamic web, with its insistence on properly specified class specializing schemes, the resulting overlap is no problem because it will be there for all to see with no risk of confusion. Clearly, in those cases where there really is no useful difference, it is desirable that such overlap be resolved. There is always the possibility that the owners of two or more similar class specializing schemes may agree that their competing schemes be merged into one - but the dynamic web works because it does not depend on this happening. Instead it allows the owners of any scheme to make any number of unilateral declarations for any specified member class is to be regarded as a sub-class of one or more other class within any other scheme.
Properties and the Structure of Information
- Because the dynamic web is a web of knowledge, the main significance of these classes is that they indicate the types of knowledge applicable for the description of a thing or concept of that class. These "types of knowledge " are known on the dynamic web as properties. They are defined in the global ontology with reference to the class of thing that they describe.
- The definition of each property provides a complete specification for a type of knowledge element. When this is combined with the URI of a specific describable thing we have the definition of an actual knowledge element.
- When any participating site wishes to place a candidate value for one of these knowledge elements onto the dynamic web it must create an assertion comprising:
- The subject comprising the URI of the thing described.
- The predicate comprising the URI of the property that the object represents.
- The object - comprising of the URI of another resource or a body of text with embedded images and/or links marked-up in accordance with the schema defined for the cited property.
- The context - specifying geographic, organizational, time and other constraints.
- The provenance - specifying immediate source etc.
- Composition of the provenance has to be standardized across the dynamic web so that participants can employ generic software to use this provenance as a factor in the selection of those versions it wants to receive.
- It may be helpful to think of an assertion as a document that expresses information of specified type about a specified thing, for a given context is in standard format for the type and comes with its own provenance.
- For most types of information the standard format is very simple such as unstructured text, single number, image or link. Using an XHTML schema to define such simple formats may seem like overkill but it allows the dynamic web to use the same language to specify the value structure for all properties from the simplest to the most complex. This should reinforce the idea that breaking information down into properties is all about identifying elements that are by their nature logically indivisible and semantically complete - and to do this regardless of the structural complexity that may be required to meet this condition.
The Independence of Assertions
- Knowledge is shared through information. If the dynamic web is to work as a coherent global repository of knowledge about everything for everyone, the information through which this knowledge is expressed must be broken into elements that are indivisible and semantically complete. This assertion is justified as follows:
- The elements must be regarded as indivisible because potential users need to be sure what is being said about what. If the body of words and/or pictures were split up in any way there would be doubt as to the meaning and/or subject of each fragment. Clues within the content cannot provide this assurance because the users and creators may not share the same mindset. Instead it is necessary for every body of words and/or pictures to include positive identification of:
- subject, i.e. the thing described.
- predicate, i.e. the property that the object represents.
- context of origin , i.e. the circumstances under which that body of words and/or pictures is deemed to be valid.
- The term "semantically complete" as used here implies that the meaning of an assertion is precisely as stated in the definition of the cited property and is not qualified in any way by the values of any other assertions . Any possibility of such dependency would make the information useless to subscribers because the subscribing site could never be sure that it has copies of the same versions of such 'other assertions ' that may have been taken into account by the originator of the element in question.
- In general the representation of information about things must allow for change over time. Of course there are types of information, such as a person's date of birth, for which only one value is correct. Then again there are some that are inherently variable such as a person's location.
- On the dynamic web we allow for multiple versions of all knowledge elements including those for which common sense says that only one value can be correct. This policy is justified as follows:
- On the World Wide Web each site is autonomous. Every participating site has an equal right to contribute information to the global knowledge base. Creating the definition of a class or property and publishing it as part of the global ontology does not confer any exclusive rights over information created in accordance with that definition. Anyone can cite any property as the meaning for information about any thing that has a URI.
- This means that neither the owner of that definition nor the creator of the URI for the described thing nor any one else can know many versions, i.e. relevant assertions, exist or where they are held. For this reason no rational case can be made for replacing one version (assertion) by another.
- It follows that the concept of 'latest correct value' is meaningless. This is why the dynamic web is deemed to comprise all versions in their initial state. It must be left to each subscribing site to decide which versions it wishes to receive and then to decide which of these it wishes to insert in each page as it is downloaded to clients.
- This approach is consistent with the concept of knowledge bases right back to 1983 when J. Bubenko said: "The KB should also view its domain in a time perspective and not restrict its knowledge only to the current state of the UoD (Universe of Discourse), ... The KB should never 'forget' anything". He also said "There is no concept of 'modifiable store' in the KB and the concepts of updating and deletion consequently do not apply".
- User perception of the global knowledge base is that of a massive resource containing all kinds of individual assertions about all kinds of things. Access to this resource is by subscription only. The reason for this is more about selective access than payment.
- Todays chief problem with information is clutter. It takes a lot of effort to find the information we want from within the vast amount that is out there. The dynamic web offers a solution in that a subscriber can arrange to receive precisely those kinds of information it wants to see about specific kinds of thing and to get these only from sources it trusts.
- Delivery of information to a subscribing site is by mutual agreement with any number of individual publishing sites. These are bilateral arrangements that may appear to have little to do with any global knowledge base. The global aspect stems from the use of a single global ontology. This is single and global because it is a single hierarchy of class specializing schemes as explained above.
- This global ontology starts at the top with a class for "anything that can be described by information". This ultimate generic class of everything is then specialized in many ways by any participating site to match its own interests. Any site owner wishing to exploit the dynamic web as a global knowledge base can search the global ontology to find willing publishers of assertions that match their needs.
- Now although the global ontology is a single coherent entity, its physical representation is partially distributed across the web in accordance with subscription contracts just like the knowledge data it describes. Effective global search therefore requires a search engine operating on a site that has elected to receive copies of class and property definitions from a wide range of publishing sites.
- I immagine that it would be technically feasible for a site to hold copies of the whole global ontology in order to support a comprehensive search service. There would also be scope for more specialised search services restricted to the specializations of a particular class. There is no need to legislate for this - it will just happen in response to need.
- Ideally the results of such a search would show the range of classes and properties available together with the URLs of willing publishing sites. Now, in the absence of anything better it may be assumed that the publisher of a class or property definition is a likely candidate for publisher of information of that type but this does not necessarily follow. Instead the identification of willing publishers would mostly come from the publishing sites themselves choosing to advise the search site of what they have to offer.
Variable and Implied Classification
- All classification of things is information. On the dynamic web, explicit statements that a specific thing is classified in a particular way are represented by assertions together with their own provenance. For all such information the property cited as predicate is a URI defined as "is instance of" and the object is the URI of the class concerned.
There may be any number of classifying assertions for a given subject.
- There can also be implicit classification where the fact of a thing being described with a value for one of the properties defined for a class is used to infer that the thing concerned must be an instance of that class.
Integrity
- The integrity of the dynamic web does not require that publishing sites must check that all subscribing sites have received every new assertion or even that all these sites are up and running. This is because the dynamic web is designed to work without any synchronization between sites. It contains only individual statements of fact or opinion made at a point in time and these, of course, are history that can never be changed. The documents in which such statements are presented to clients do get updated but it is not the documents that are distributed. These remain on their own host site where their expression of any knowledge element always reflects the latest version available at that site.
- Because there is no attempt to maintain a single variable state for anything there is nothing to synchronize. This means that there is no need for the publisher to track where copies have been sent or whether they arrive. If a copy gets lost - no matter - this is just a minor degradation at the subscribing site concerned. No failure of the distribution system can threaten the integrity or operation of the dynamic web as a whole.
- Every assertion has a provenance which includes a permanent record of its entry to the dynamic web as a global entity. This provenance includes both a timestamp and the URI of the site at which the entry occurred.
- The mechanism through which information is distributed around the dynamic web is concerned primarily with newly created assertions . There are only two triggers for further distribution of an assertion after the initial burst of activity at its creation. These are when a new subscription contract is started and when an existing subscribing site needs to be restored.
- The dynamic web is not concerned with documents at all. It operates entirely behind the scenes as a resource for use by site builders. Inclusion of any of information from the dynamic web within any document is determined entirely and independently by the design of pages on the sites involved. This applies as much to pages on the publishing site as it does to pages on the various subscribing sites.
- It is said above that "The integrity of the dynamic web does not require that publishing sites must check all subscribing sites that have received a new assertion". However the software that does the distribution should take reasonable steps to ensure reliability of service and this must take account of the fact that not all subscribing sites will be up and running at all times. For this reason, failure of any distribution message should be logged so that it can be resent when the listening service at the subscriber site comes back on line.
|