The Semantic Web and the QNET Wiki

From QNET
Jump to navigation Jump to search

Introduction: The Semantic Web

First of all, one has to consider what the Semantic Web is intended to be. At the risk of venturing a crude definition, the Semantic Web is a means of linking together different information resources on the Internet so that they can be accessible to machine reasoning systems. 'Link' here is certainly not meant in the sense of a hyperlink. Rather, the concept of a link here is more relevant to 'relationships'- something that ordinary hyperlinks do not necessarily express. By means of these relationships between say, web pages, associations can be made between information that endows it with meaning.

The Semantic Web deals with resources represented by Uniform Resource Indicators.

http://www.complico.org/joe_bloggs

[Please don't click it if you don't want to disappoint your web browser...]

Clearly one is left with a great deal of ambiguity about what these 'resources' actually are.

In the example above, the possibilities are:

  • it could be a regular web page address- eg, the home page of someone called Joe Bloggs
  • perhaps an abstract URI of a person called joe bloggs who works for an organisation called 'complico'
  • a network device attached to a cat cheekily called 'joe bloggs'
  • a sensor in a non-network location
  • a reference to a special interest group where JOE_BLOGS is an acronym for 'Joint Optimisation of Engineering Buildings and Leisure OrGanisation Group'
  • etc etc

To find out more about 'joe bloggs' we would have to be given some attributes or properties about him or it. That is, we need to know its relationships to other entities to get a better understanding of what it actually is. These entities could be either typed constants (strings, integers etc), collections, or references to other web resources.

Using these kind of formal data models can lead to a more rigorous and accurate search and retrieval mechanism for data on the Internet than is possible by keyword association alone. In this way, a higher degree of confidence and trust can be placed in the results from this machine reasoning that is based on these formally defined relationships. Compare this to Google where keyword searches are used to find the relevant data and a human agent is required to sort the results out.

As an example - here is some RDF code that could express the fact that Joe Bloggs is a person, and that they have a specific email address, web page, that they know some other people, etc. (I apologise for the code - this is how the data might be stored electronically, but semantic wikis allow people to express this in a less painful manner). With enough of these people descriptions, you could find all the people that Joe Bloggs knows, and build up a network of his contacts - something that would be next to impossible with keyword searches.

   <rdf:RDF xmlns:foaf="http://xmlns.com/foaf/0.1/" 
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
       <foaf:Person rdf:about="http://www.complico.org/joe_bloggs">
           <foaf:name>Joe Bloggs</foaf:name>
           <foaf:mbox rdf:resource="mailto:jbloggs@complico.org" />
           <foaf:homepage rdf:resource="http://www.joebloggs.com/" />
           <foaf:nick>Joe</foaf:nick>
           <foaf:depiction rdf:resource="http://www.joebloggs.com/joebloggs.jpg" />
           <foaf:knows>
               <foaf:Person>
                   <foaf:name>Dave Golby</foaf:name> 
               </foaf:Person>
           </foaf:knows>
       </foaf:Person>
   </rdf:RDF>

Some of the issues with this approach are:

  • Getting enough annotated data - users may be discouraged from doing annotations, as the benefits may not be immediately obvious.
  • Getting people to agree on a common vocabulary - the "foaf" (Friend Of A Friend) in the example above is a standard vocabulary, but other people might choose to use their own, which will cause problems.

The Semantic Web Stack

A (possibly oversimplified) description is that one starts with 'data models'. A data model formally expresses the relations between different data entities, eg, as in a traditional database schema. At this stage, no inference can necessarily be made on these descriptions. Indeed, the data model tends to be worked with by specific applications in which these associations are treated with much greater significance than is apparent to an external observer. So for example, one could conceive of a relational database of data that links a product to the datasets that have various analyses of it under certain flow regimes or operating conditions. This database would be used by a web site to search and retrieve this information, using application back-end code, eg, written in Java, PHP or a .NET language. In a certain sense, the knowledge of types and relations between the data entities is hidden in the application itself and is not visible to outsiders. Finally, the data models tend to be 'brittle' and easily break as different business needs arise.

So, as a consequence of this 'hidden knowledge' in the application two things arise:

  • a great deal of logic is re-implemented across different applications
  • more critically, this knowledge is hidden from, eg, collaborators or even the same organisation who may wish to exploit it.

QNET and the Semantic Web

Getting back to Wikis, a particular challenge is organising the information they contain in a way that makes sense to the user of the site. A common solution, as used in this first version of the QNET Wiki, is to 'stove-pipe' the articles into categories from some top-level downwards. So one has AC and UFR articles belonging to higher categories and so on.

The problem with this approach is the following:

  • the administrative overheads: administrators have to be very dilligent in order to classify and organise articles, inserting links into tables etc in order to make the articles accessible
  • it fixes a particular view of the articles that may not be appropriate for certain users.
  • it overlooks other very interesting relationships that may be submerged within the articles.

Ideally, one would prefer to annotate the articles themselves, ie, assign them properties that can be used for automatic categorisation.