Semantic Portals - The SWED Approach
The Semantic Community Portal approach aims to overcome the range of
limitations and problems with existing approaches to creating and maintaining
Web-based community information resources. These include high maintenance
costs and overheads, limited ability for third parties to re-use the information,
problematic nature of adding new types of information, and others detailed
in the Background section.
The details of the original specification for demonstration system is
given in the project
specification document. Below we present an overview of the approach
in particular the aspects that make it distinct from traditional and existing
approaches.
Overview
Figure 1 gives a basic overview of the Semantic Community Portals Approach.
The most striking point when compared to traditional approaches to creating
directories (or other types of web-based information) is the separation
of data creation and storage from that of publication.
The data (in the standard format of the semantic web, RDF)
is created and hosted by the information provider. This can be
done in different ways (e.g. using a web form that generates the RDF file
- as is generally the case with SWED) or it could be generated from existing
data in a database, or it could be written by hand using a text editor.
As long as the final file is in the correct RDF format and contains the
expected types of information, the next stage will work.

Figure 1 - Overview of Semantic Communities Portals Approach
The data is harvested (i.e. the file is located, a copy is made
and stored in a database) In the case of SWED it is stored along with
the thesauri/vocabularies that are used to categorise the organisations/projects
and the display templates associated with the information. Although in
general these can all be stored independently, even on different servers
on the web.
The portal viewer system then imports the information and processes
it to display it for the user - dynamically generating the views (based
on the templates) as the user browses or searches the site. SWED has chosen
a 'faceted browse' interface, in which users can explore the information
using facets (classification categories) under which the organisations/projects
are classified.
However perhaps more important than the specific technical architecture
used by SWED, is that the data is now part of the larger semantic
web. Anyone can now harvest the information and make use of it too
for example to produce specialist directories and/or add specialist information
to the existing information, e.g. information about museum collections
or volunteering opportunities, etc.
This can be done because the basic SWED data records are written in RDF
(see above) and uses externally available data elements and classification
vocabularies. For example address/contact data are represented using the
vCard standard. Other data and classification
terms are defined by SWED in a way that are linkable to other widely used
vocabularies e.g. term's in the SWED 'types of activity' classification
can be mapped to the very widely used Standard
Industrial Classification (SIC) system.
The following sections provides more detail about the approach taken
by SWED, including the processes, the data format and why it is so easy
to add information to existing RDF based data.
Data Creation and Storage
Figure 2- Creation and Storage of Organisations Directory
Information
Figure 2 illustrates the most basic difference between existing approaches
and the Semantic Portal approach. The data itself is created and hosted
(stored) by the organisations on their own web site (or that of a related
organisation, where they do not have a web site). This is exactly like
an organisational or project homepage on the Web. You can think of the
portal data file like a homepage for the Semantic Web.
The data file can be created in many different ways, figure 2 shows two
of these:
- A facilitating organisation (e.g. a directory publisher) might provide
a Web form that the organisation can visit. When this is completed the
RDF file generated can be sent to or downloaded by the organisation.
- A member of the organizations technical team might produce the RDF
using a simple text editor. How the file is created is independent of
the subsequent use of it.
In this approach the organisations themselves are responsible for the
publication of their own data. They create and update the data file.
Figure 3- Example of simple data file in RDF (XML syntax)
The data file is written in the standard language of the Semantic
Web, RDF (Resource Description Framework - see glossary)
this is the equivalent of HTML for for normal Web pages. An example of
a simple file is shown in Figure 3.
The data contains various types of information about the organisation
(metadata - see glossary) including name,
contact details, the topics that describe its areas of interest, the kind
of organisation, etc. this rich metadata means that the SWED site can
provide a large number of ways of browsing and searching for organisations.
Although the file is not generally read by humans (it is
designed for easy automatic processing by computers) it is possible to
see that it is made up of properties of the organisation e.g. the lines
with <swed:has_topic at the start are 'topics' that the organisation
is categories under. The long URL like values are used to indicate unambiguously
the specific concepts used in this classification scheme, e.g. the use
of the term 'enquiries' on its own is ambiguous, however with the added
http://www.swed.org.uk/2004/etc. it becomes clear that the term 'enquiries'
is used in this case as it is used by the organisation that created or
controls the http://www.swed.org.uk/2004/etc. web domain, this is called
a namespace - see glossary. Ideally the
namespace URL would point to a human (and/or machine readable) definition
of the term.
The structure of this can be seen more clearly by looking
at a diagram illustrating these properties. Figure 4 shows a simplified
graphical representation of the data in Figure 3.

Figure 4 - Simplified graphical representation of the data from
Figure 3 above
The central green oval represents the organisation or project
(with prorg_number of "prorg104" - this is only used for internal
SWED use) each of the purple lines represents a property of the organisation
e.g. has_primary_prorg_name is the property that defines the name of the
organisation or project (in this case with the value "The Environment
Council"). The blank green ovals represent 'values' of properties
which are more complex e.g. has_postal_address does not have a single
value but is made up of other properties such as Street, Locality etc.
Once the data file is made available by the organisation
on its own (or another) Web site, it can collected (harvested) using Web-based
computer programs - similar to those used by search engines to collect
and index information from Web pages. This means that any directory organisation
or indeed anyone with an Internet connection can use (reuse) the information.
Collation and Publication
Figure 5 illustrates the collation and publication phases Semantic Portal
approach. The RDF data files are harvested from the organisations' own
Web sites. This is done using a software robot (bot) that systematically
retrieves the RDF files of all organisation that are known to the directory.
This might be because i) the organisation has registered the location
of the file with the directory (as is the case with the SWED Directory)
or ii) the Bot located the file itself or iii) the directory organisation
has used a third party index of the location of the RDF files.
In most Semantic Web applications similar to the SWED project the data
files are harvested on a regular (often daily or hourly) basis. It may
also be that it is possible to prompt the system to harvest a particular
file.

Figure 5 - The Harvesting, Collation and Publication Stages
of the
Semantic Portals Approach
Once the files have been harvested (step 1 in figure 5) they can be added
to the directory publisher's(s) RDF database(s). This database holds copies
of the data. These copies are used to create the actual Web pages of the
directory Web site (step 2 & 3 in figure 5). The Web pages are generated
using a template based system allowing the easy creation and editing of
particular views of the information.
One other means of publication (more specifically syndication in this
case) not detailed in figure 4 is the use of RSS (which stands for RDF
Site Summary or Really Simple Syndication depending on the particular
version) news feeds. RSS is a standard machine readable format. It is
widely used within the news industry for sharing and publishing categorized
summary news feeds, to alert news agencies and customers of timely relevant
news items. Users can set up personal aggregators using various [often
freeware] software [e.g. http://disobey.com/amphetadesk/]
and choose which news feeds to collate. Using a form of RSS that uses
RDF a SWED type directory could publish the information so that it can
be harvested by users using RSS aggregators. This may be included in the
next phase of SWED development.
The Mechanism for Reusing and Enriching the Information
Reuse of information is an integral aspect of the Semantic Web. Because
the Semantic Community Portal is based on Semantic Web technical standards
(e.g. RDF) other directory organisations will find it easy to harvest
and collate the information. Figure 6 illustrates a number of ways that
this may happen.
In stage 1. (Figure 6) the directory organisation selectively harvests
the RDF files of organisations that are relevant to their particular area
of interest (e.g. species conservation, or pollution control). This is
possible because the RDF files contain the relevant classifications. The
directory organisation then collates the information as before. However
in stage 2a. they also add some additional specialist information themselves
- thus adding value to the information for their particular specialist
community of users (e.g. providing geographically related information).
They might use their own vocabulary for categorising or describing the
information.
In stage 2b. the directory organisation also harvests information from
a third party information provider (e.g. particular type of pollution
control services the organisations provide). Once again using their own
vocabulary for categorising or describing the information. This enriches
and adds value to the original information.

Figure 6 - Illustration of processes to reuse and enrich
information
The directory provider will then publish their specialist and enriched
information to their Web site providing a set of customised views (e.g.
Web pages, navigation system, search interfaces, ...) on the information.
Enriching the Basic Data
Enriching the data by integrating in with related information is a central
aspect of the Semantic Web. Figure 8 illustrates how simple this can be.
If we imagine that the 3rd party information provider in figure 6 is providing
information related to specialist services offered by a particular type
(sub-category) of organisation e.g. say a type of pollution control or
monitoring service. They simply need to create an RDF file with the additional
data that the relevant organisations offer the specialist service using
their own property and terms, a fragment of which is shown in figure 7.

Figure 7 - Adding 3rd Party RDF data
This basically says that the organisation that has the property 'swed:has_primary_url'
of "http://www.example.com", (i.e. the organisation with the
homepage www.example.com) also has the property 'thirdparty:service' of
"http://www.thirdparty.org.uk/terms#foobar". That is, it offers
a service that is categorised using the third party organisations' own
vocabulary, called 'foobar'.
The new data is simply added to the RDF database stored by the 3rd party
directory publisher, and can immediately used to provide the additional
information on their Web site, with minimal changes to the software configuration.
This includes the ability to search or select organisations on the basis
of whether they provide the specialist service.
Technical Architecture of SWED
The system specification is described in a separate document which can
be found at:
http://www.w3.org/2001/sw/Europe/reports/requirements_demo_2/
The specification document covers the approach in more depth and provides
an overview of the system architecture at a technical level. It also includes
some examples of potential use cases. Below we simply give a high level
review of the system architecture.
Finding Out More
If you would like more information about the Semantic Portals approach,
the SWED project more generally or SWAD-Europe projects visit our contacts
page to find out who to contact for your particular inquiry.
|