www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

RDF Data Access and Data Management

Data Representation
SPARQL
RDF Graphs Security
Automated Generation of RDF Views over Relational Data Sources
RDF Insert Methods in Virtuoso
Integration Middleware
Linked Data
IRI Dereferencing For FROM Clauses, "define get:..." Pragmas IRI Dereferencing For Variables, "define input:grab-..." Pragmas URL rewriting Examples of other Protocol Resolvers
Inference Rules & Reasoning
Performance Tuning
RDF Data Access Providers (Drivers)

15.7. Linked Data

There are many cases when RDF data should be retrieved from remote sources only when really needed. E.g., a scheduling application may read personal calendars from personal sites of its users. Calendar data expire quickly, so there's no reason to frequently re-load them in hope that they are queried before expired.

Virtuoso extends SPARQL so it is possible to download RDF resource from a given IRI, parse them and store the resulting triples in a graph, all three operations will be performed during the SPARQL query execution. The IRI of graph to store triples is usually equal to the IRI where the resource is download from, so the feature is named "IRI dereferencing" There are two different use cases for this feature. In simple case, a SPARQL query contains from clauses that enumerate graphs to process, but there are no triples in DB.DBA.RDF_QUAD that correspond to some of these graphs. The query execution starts with dereferencing of these graphs and the rest runs as usual. In more sophisticated case, the query is executed many times in a loop. Every execution produces a partial result. SPARQL processor checks for IRIs in the result such that resources with that IRIs may contain relevant data but not yet loaded into the DB.DBA.RDF_QUAD. After some iteration, the partial result is identical to the result of the previous iteration, because there's no more data to retrieve. As the last step, SPARQL processor builds the final result set.

15.7.1. IRI Dereferencing For FROM Clauses, "define get:..." Pragmas

Virtuoso extends SPARQL syntax of from and from named clauses. It allows additional list of options at end of clause: option ( param1 value1, param2 value2, ... ) where parameter names are QNames that start with get: prefix and values are "precode" expressions, i.e. expressions that does not contain variables other than external parameters. Names of allowed parameters are listed below.


15.7.2. IRI Dereferencing For Variables, "define input:grab-..." Pragmas

Consider a set of personal data such that one resource can list many persons and point to resources where that persons are described in more details. E.g. resource about user1 describes the user and also contain statements that user2 and user3 are persons and more data can be found in user2.ttl and user3.ttl, user3.ttl can contain statements that user4 is also person and more data can be found in user4.ttl and so on. The query should find as many users as it is possible and return their names and e-mails.

If all data about all users were loaded into the database, the query could be quite simple:

SQL>sparql
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?id ?firstname ?nick
where
  {
    graph ?g
      {
        ?id rdf:type foaf:Person.
        ?id foaf:firstName ?firstname.
        ?id foaf:knows ?fn .
        ?fn foaf:nick ?nick.
      }
   }
limit 10;

id                                                      firstname  nick
VARCHAR                                                 VARCHAR    VARCHAR
_______________________________________________________________________________

http://myopenlink.net/dataspace/person/pmitchell#this   LaRenda    sdmonroe
http://myopenlink.net/dataspace/person/pmitchell#this   LaRenda    kidehen{at}openlinksw.com
http://myopenlink.net/dataspace/person/pmitchell#this   LaRenda    alexmidd
http://myopenlink.net/dataspace/person/abm#this         Alan       kidehen{at}openlinksw.com
http://myopenlink.net/dataspace/person/igods#this       Cameron    kidehen{at}openlinksw.com
http://myopenlink.net/dataspace/person/goern#this       Christoph  captsolo
http://myopenlink.net/dataspace/person/dangrig#this     Dan        rickbruner
http://myopenlink.net/dataspace/person/dangrig#this     Dan        sdmonroe
http://myopenlink.net/dataspace/person/dangrig#this     Dan        lszczepa
http://myopenlink.net/dataspace/person/dangrig#this     Dan        kidehen

10 Rows. -- 80 msec.

It is possible to enable IRI dereferencing in such a way that all appropriate resources are loaded during the query execution even if names of some of them are not known a priori.

SQL>sparql
  define input:grab-var "?more"
  define input:grab-depth 10
  define input:grab-limit 100
  define input:grab-base "http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1300"
  prefix foaf: <http://xmlns.com/foaf/0.1/>
  prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?id ?firstname ?nick
where {
    graph ?g {
               ?id rdf:type foaf:Person.
               ?id foaf:firstName ?firstname.
               ?id foaf:knows ?fn .
               ?fn foaf:nick ?nick.
               optional { ?id rdfs:SeeAlso ?more }
            }
}
limit 10;

id                                                         firstname  nick
VARCHAR                                                    VARCHAR    VARCHAR
_______________________________________________________________________________

http://myopenlink.net/dataspace/person/ghard#this          Yrj+?n+?   kidehen
http://inamidst.com/sbp/foaf#Sean                          Sean       d8uv
http://myopenlink.net/dataspace/person/dangrig#this        Dan        rickbruner
http://myopenlink.net/dataspace/person/dangrig#this        Dan        sdmonroe
http://myopenlink.net/dataspace/person/dangrig#this        Dan        lszczepa
http://myopenlink.net/dataspace/person/dangrig#this        Dan        kidehen
http://captsolo.net/semweb/foaf-captsolo.rdf#Uldis_Bojars  Uldis      mortenf
http://captsolo.net/semweb/foaf-captsolo.rdf#Uldis_Bojars  Uldis      danja
http://captsolo.net/semweb/foaf-captsolo.rdf#Uldis_Bojars  Uldis      zool
http://myopenlink.net/dataspace/person/rickbruner#this     Rick       dangrig

10 Rows. -- 530 msec.

The IRI dereferencing is controlled by the following pragmas:

Default resolver procedure is DB.DBA.RDF_GRAB_RESOLVER_DEFAULT(). Note that the function produce two absolute URIs, abs_uri and dest_uri. Default procedure returns two equal strings, but other may return different values, e.g., return primary and permanent location of the resource as dest_uri and the fastest known mirror location as abs_uri thus saving HTTP retrieval time. It can even signal an error to block the downloading of some unwanted resource.

DB.DBA.RDF_GRAB_RESOLVER_DEFAULT (
  in base varchar,         -- base IRI as specified by input:grab-base pragma
  in rel_uri varchar,      -- IRI of the resource as it is specified by input:grab-iri or a value of a variable
  out abs_uri varchar,     -- the absolute IRI that should be downloaded
  out dest_uri varchar,    -- the graph IRI where triples should be stored after download
  out get_method varchar ) -- the HTTP method to use, should be "GET" or "MGET".

15.7.3. URL rewriting

URL rewriting is the act of modifying a source URL prior to the final processing of that URL by a Web Server.

The ability to rewrite URLs may be desirable for many reasons that include:

15.7.3.1. Using URL Rewriting to Solve Linked Data Deployment Challenges

URI naming schemes don't resolve the challenges associated with referencing data. To reiterate, this is demonstrated by the fact that the URIs http://demo.openlinksw.com/Northwind/Customer/ALFKI and http://demo.openlinksw.com/Northwind/Customer/ALFKI#this both appear as http://demo.openlinksw.com/Northwind/Customer/ALFKI to the Web Server, since data following the fragment identifier "#" never makes it that far.

The only way to address data referencing is by pre-processing source URIs (e.g. via regular expression or sprintf substitutions) as part of a URL rewriting processing pipeline. The pipeline process has to take the form of a set of rules that cater for elements such as HTTP Accept headers, HTTP response code, HTTP response headers, and rule processing order.

An example of such a pipeline is depicted in the table below.

Table: 15.7.3.1.1. Pre-processing source URIs
URI Source(Regular Expression Pattern) HTTP Accept Headers(Regular Expression) HTTPResponse Code HTTP Response Headers Rule Processing Order
/Northwind/Customer/([^#]*) None (meaning default) 200 or 303 redirect to a resource with default representation. None Normal (order irrelevant)
/Northwind/Customer/([^#]*) (text/rdf.n3) (application/rdf.xml) 303 redirect to location of a descriptive and associated resource (e.g. RESTful Web Service that returns desired representation) None
/Northwind/Customer/([^#]*) (text/html) (application/xhtml.xml) 406 (Not Acceptable)or303 redirect to location of resource in requested representation Vary: negotiate, acceptAlternates: {"ALFKI" 0.9 {type application/rdf+xml}}

The source URI patterns refer to virtual or physical directories for ex. at http://demo.openlinksw.com/. Rules can be placed at the head or tail of the pipeline, or applied in the order they are declared, by specifying a Rule Processing Order of First, Last, or Normal, respectively. The decision as to which representation to return for URI http://demo.openlinksw.com/Northwind/Customer/ALFKI is based on the MIME type(s) specified in any Accept header accompanying the request.

In the case of the last rule, the Alternates response header applies only to response code 406. 406 would be returned if there were no (X)HTML representation available for the requested resource. In the example shown, an alternative representation is available in RDF/XML.

When applied to matching HTTP requests, the last two rules might generate responses similar to those below:

$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/Customer/ALFKI

HTTP/1.1 303 See Other
Server: Virtuoso/05.00.3016 (Solaris) x86_64-sun-solaris2.10-64 PHP5
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Mon, 16 Jul 2007 22:40:03 GMT
Accept-Ranges: bytes
Location: /sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northwind%3E+WHERE+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}&format=application/rdf%2Bxml
Content-Length: 0

In the cURL exchange depicted above, the target Virtuoso server redirects to a SPARQL endpoint that retrieves an RDF/XML representation of the requested entity.

$ curl -I -H "Accept: text/html" http://demo.openlinksw.com/Northwind/Customer/ALFKI

HTTP/1.1 406 Not Acceptable
Server: Virtuoso/05.00.3016 (Solaris) x86_64-sun-solaris2.10-64 PHP5
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Mon, 16 Jul 2007 22:40:23 GMT
Accept-Ranges: bytes
Vary: negotiate,accept
Alternates: {"ALFKI" 0.9 {type application/rdf+xml}}
Content-Length: 0

In this second cURL exchange, the target Virtuoso server indicates that there is no resource to deliver in the requested representation. It provides hints in the form of an alternate resource representation and URI that may be appropriate, i.e., an RDF/XML representation of the requested entity.


15.7.3.2. The Virtuoso Rules-Based URL Rewriter

Virtuoso provides a URL rewriter that can be enabled for URLs matching specified patterns. Coupled with customizable HTTP response headers and response codes, Data-Web server administrators can configure highly flexible rules for driving content negotiation and URL rewriting. The key elements of the URL rewriter are:


15.7.3.3. Virtual Domains (Hosts) & Directories

A Virtuoso virtual directory maps a logical path to a physical directory that is file system or WebDAV based. This mechanism allows physical locations to be hidden or simply reorganised. Virtual directory definitions are held in the system table DB.DBA.HTTP_PATH. Virtual directories can be administered in three basic ways:


15.7.3.4. "Nice" URLs vs. "Long" URLs

Although we are approaching the URL Rewriter from the perspective of deploying linked data, the Rewriter was developed with additional objectives in mind. These in turn have influenced the naming of some of the formal argument names in the Configuration API function prototypes. In the following sections, long URLs are those containing a query string with named parameters; nice (aka. source) URLs have data encoded in some other format. The primary goal of the Rewriter is to accept a nice URL from an application and convert this into a long URL, which then identifies the page that should actually be retrieved.


15.7.3.5. Rule Processing Mechanics

When an HTTP request is accepted by the Virtuoso HTTP server, the received nice URL is passed to an internal path translation function. This function takes the nice URL and, if the current virtual directory has a url_rewrite option set to an existing ruleset name, tries to match the corresponding rulesets and rules; that is, it performs a recursive traversal of any rulelist associated with it. For every rule in the rulelist, the same logic is applied (only the logic for regex-based rules is described; that for sprintf-based rules is very similar):

The path translation function described above is internal to the Web server, so its signature is not appropriate for Virtuoso/PL calls and thus is not published. Virtuoso/PL developers can harness the same functionality using the DB.DBA.URLREWRITE_APPLY API call.


15.7.3.6. Enabling URL Rewriting via the Virtuoso Conductor UI

Virtuoso is a full-blown HTTP server in its own right. The HTTP server functionality co-exists with the product core (i.e., DBMS Engine, Web Services Platform, WebDAV filesystem, and other components of the Universal Server). As a result, it has the ability to multi-home Web domains within a single instance across a variety of domain name and port combinations. In addition, it also enables the creation of multiple virtual directories per domain.

In addition to the basic functionality, Virtuoso facilitates the association of URL Rewriting rules with the virtual directories associated with a hosted Web domain.

In all cases, Virtuoso enables you to configure virtual domains, virtual directories and URL rewrite rules for one or more virtual directories, via the (X)HTML-based Conductor Admin User Interface or a collection of Virtuoso Stored Procedure Language (PL)-based APIs.

The steps for configuring URL Rewrite rules via the Virtuoso Conductor are as follows:

URL-rewrite UI using Conductor
Figure: 15.7.3.6.1. URL-rewrite UI using Conductor

15.7.3.7. Enabling URL Rewriting via Virtuoso PL

The vhost_define()API is used to define virtual hosts and virtual paths hosted by the Virtuoso HTTP server. URL rewriting is enabled through this function's opts parameter. opts is of type ANY, e.g., a vector of field-value pairs. Numerous fields are recognized for controlling different options. The field value url_rewrite controls URL rewriting. The corresponding field value is the IRI of a rule list to apply.

15.7.3.7.1. Configuration API

Virtuoso includes the following functions for managing URL rewriting rules and rule lists. The names are self-explanatory.

-- Deletes a rewriting rule
DB.DBA.URLREWRITE_DROP_RULE

-- Creates a rewriting rule which uses sprintf-based pattern matching
DB.DBA.URLREWRITE_CREATE_SPRINTF_RULE

-- Creates a rewriting rule which uses regular expression (regex) based pattern matching
DB.DBA.URLREWRITE_CREATE_REGEX_RULE

-- Deletes a rewriting rule list
DB.DBA.URLREWRITE_DROP_RULELIST

-- Creates a rewriting rule list
DB.DBA.URLREWRITE_CREATE_RULELIST

-- Lists all the rules whose IRI match the specified 'SQL like' pattern
DB.DBA.URLREWRITE_ENUMERATE_RULES

-- Lists all the rule lists whose IRIs match the specified 'SQL like' pattern
DB.DBA.URLREWRITE_ENUMERATE_RULELISTS

15.7.3.7.2. Creating Rewriting Rules

Rewriting rules take two forms: sprintf-based or regex-based. When used for nice URL to long URL conversion, the only difference between them is the syntax of format strings. The reverse long to nice conversion works only for sprintf-based rules, whereas regex-based rules are unidirectional.

For the purposes of describing how to make dereferenceable URIs for linked data, we will stick with the nice to long conversion using regex-based rules.

Regex rules are created using the URLREWRITE_CREATE_REGEX_RULE() function.



15.7.3.8. Example - URL Rewriting For the Northwind RDF View

The Northwind schema is comprised of commonly understood SQL Tables that include: Customers, Orders, Employees, Products, Product Categories, Shippers, Countries, Provinces etc.

An RDF View of SQL data is an RDF named graph (RDF data set) comprised of RDF Linked Data (triples) stored in a Virtuoso Quad Store (the native RDF Data Management realm of Virtuoso).

In this example we are going interact with Linked Data deployed into the Data-Web from a live instance of Virtuoso, which uses the URL Rewrite rules from the prior section.

The components used in the example are as follows:

15.7.3.8.1. Northwind URL Rewriting Verification Using curl

The curl utility provides a useful tool for verifying HTTP server responses and rewriting rules. The curl exchanges below show the URL rewriting rules defined for the Northwind RDF view being applied.

Example 1:

$ curl -I -H "Accept: text/html" http://demo.openlinksw.com/Northwind/Customer/ALFKI

HTTP/1.1 303 See Other
Server: Virtuoso/05.00.3016 (Solaris) x86_64-sun-solaris2.10-64 PHP5
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Tue, 14 Aug 2007 13:30:02 GMT
Accept-Ranges: bytes
Location:  /isparql/execute.html?query=SELECT%20%3Fp%20%3Fo%20FROM%20%3Chttp%3A//demo.openlinksw.com/Northwind%3E%20WHERE%20{%20%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E%20%3Fp%20%3Fo%20}&endpoint=/sparql
Content-Length: 0

Example 2:

$ curl -I -H "Accept: application/rdf+xml" http://demo.openlinksw.com/Northwind/Customer/ALFKI

HTTP/1.1 303 See Other
Server: Virtuoso/05.00.3016 (Solaris) x86_64-sun-solaris2.10-64 PHP5
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Tue, 14 Aug 2007 13:30:22 GMT
Accept-Ranges: bytes
Location: /sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northwind%3E+WHERE+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}&format=application/rdf%2Bxml
Content-Length: 0

Example 3:

$ curl -I -H "Accept: text/html" http://demo.openlinksw.com/Northwind/Customer/ALFKI#this

HTTP/1.1 404 Not Found
Server: Virtuoso/05.00.3016 (Solaris) x86_64-sun-solaris2.10-64 PHP5
Connection: Keep-Alive
Content-Type: text/html; charset=ISO-8859-1
Date: Tue, 14 Aug 2007 13:31:01 GMT
Accept-Ranges: bytes
Content-Length: 0

The output above shows how RDF entities from the Data-Web, in this case customer ALFKI, are exposed in the Document Web. The power of SPARQL coupled with URL rewriting enables us to produce results in line with the desired representation. A SPARQL SELECT or CONSTRUCT query is used depending on whether the requested representation is text/html or application/rdf+xml, respectively.

The 404 response in Example 3 indicates that no HTML representation is available for entity ALFKI#this. In most cases, a URI of this form (containing a '#' fragment identifier) will not reach the server. This example supposes that it does: i.e., the RDF client and network routing allows the suffixed request. The presence of the #this suffix implicitly states that this is a request for a data resource in the Data-Web realm, not a document resource from the Document Web.2

Rather than return 404, we could instead choose to construct our rewriting rules to perform a 303 redirect, so that the response for ALFKI#this in Example 3 becomes the same as that for ALFKI in Example 1.



15.7.3.9. Transparent Content Negotiation

So as not to overload our preceding description of Linked Data deployment with excessive detail, the description of content negotiation presented thus far was kept deliberately brief. This section discusses content negotiation in more detail.

15.7.3.9.1. HTTP/1.1 Content Negotiation

Recall that a resource (conceptual entity) identified by a URI may be associated with more than one representation (e.g. multiple languages, data formats, sizes, resolutions). If multiple representations are available, the resource is referred to as negotiable and each of its representations is termed a variant. For instance, a Web document resource, named 'ALFKI' may have three variants: alfki.xml, alfki.html and alfki.txt all representing the same data. Content negotiation provides a mechanism for selecting the best variant.

As outlined in the earlier brief discussion of content negotiation, when a user agent requests a resource, it can include with the request Accept headers (Accept, Accept-Language, Accept-Charset, Accept-Encoding etc.) which express the user preferences and user agent capabilities. The server then chooses and returns the best variant based on the Accept headers. Because the selection of the best resource representation is made by the server, this scheme is classed as server-driven negotiation.


15.7.3.9.2. Transparent Content Negotiation

An alternative content negotiation mechanism is Transparent Content Negotiation (TCN), a protocol defined by RFC2295 . TCN offers a number of benefits over standard HTTP/1.1 negotiation, for suitably enabled user agents.

RFC2295 introduces a number of new HTTP headers including the Negotiate request header, and the TCN and Alternates response headers. (Krishnamurthy et al. note that although the HTTP/1.1 specification reserved the Alternates header for use in agent driven negotiation, it was not fully specified. Consequently under a pure HTTP/1.1 implementation as defined by RFC2616, server-driven content negotiation is the only option. RFC2295 addresses this issue.)


15.7.3.9.3. Deficiencies of HTTP/1.1 Server-Driven Negotiation

Weaknesses of server-driven negotiation highlighted by RFCs 2295 and 2616 include:


15.7.3.9.4. Variant Selection By User Agent

Rather than rely on server-driven negotiation and variant selection by the server, a user agent can take full control over deciding the best variant by explicitly requesting transparent content negotiation through the Negotiate request header. The negotiation is 'transparent' because it makes all the variants on the server visible to the agent.

Under this scheme, the server sends the user agent a list, represented in an Alternates header, containing the available variants and their properties. The user agent can then choose the best variant itself. Consequently, the agent no longer needs to send large Accept headers describing in detail its capabilities and preferences. (However, unless caching is used, user-agent driven negotiation does suffer from the disadvantage of needing a second request to obtain the best representation. By sending its best guess as the first response, server driven negotiation avoids this second request if the initial best guess is acceptable.)


15.7.3.9.5. Variant Selection By Server

As well as variant selection by the user agent, TCN allows the server to choose on behalf of the user agent if the user agent explicitly allows it through the Negotiate request header. This option allows the user agent to send smaller Accept headers containing enough information to allow the server to choose the best variant and return it directly. The server's choice is controlled by a 'remote variant selection algorithm' as defined in RFC2296.


15.7.3.9.6. Variant Selection By End-User

A further option is to allow the end-user to select a variant, in case the choice made by negotiation process is not optimal. For instance, the user agent could display an HTML-based 'pick list' of variants constructed from the variant list returned by the server. Alternatively the server could generate this pick list itself and include it in the response to a user agent's request for a variant list. (Virtuoso currently responds this way.)



15.7.3.10. Transparent Content Negotiation in Virtuoso HTTP Server

The following section describes the Virtuoso HTTP server's TCN implementation which is based on RFC2295, but without "Feature" negotiation. OpenLink's RDF rich clients, iSparql and the OpenLink RDF Browser, both support TCN. User agents which do not support transparent content negotiation continue to be handled using HTTP/1.1 style content negotiation (whereby server-side selection is the only option - the server selects the best variant and returns a list of variants in an Alternates response header).

15.7.3.10.1. Describing Resource Variants

In order to negotiate a resource, the server needs to be given information about each of the variants. Variant descriptions are held in SQL table HTTP_VARIANT_MAP. The descriptions themselves can be created, updated or deleted using Virtuoso/PL or through the Conductor UI. The table definition is as follows:

create table DB.DBA.HTTP_VARIANT_MAP (
  VM_ID integer identity, -- unique ID
  VM_RULELIST varchar, -- HTTP rule list name
  VM_URI varchar, -- name of requested resource e.g. 'page'
  VM_VARIANT_URI varchar, -- name of variant e.g. 'page.xml', 'page.de.html' etc.
  VM_QS float, -- Source quality, a number in the range 0.001-1.000, with 3 digit precision
  VM_TYPE varchar, -- Content type of the variant e.g. text/xml
  VM_LANG varchar, -- Content language e.g. 'en', 'de' etc.
  VM_ENC varchar, -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
  VM_DESCRIPTION long varchar, -- a human readable description about the variant e.g. 'Profile in RDF format'
  VM_ALGO int default 0, -- reserved for future use
  primary key (VM_RULELIST, VM_URI, VM_VARIANT_URI)
 )
create unique index HTTP_VARIANT_MAP_ID on DB.DBA.HTTP_VARIANT_MAP (VM_ID)

15.7.3.10.2. Configuration using Virtuoso/PL

Two functions are provided for adding or updating, or removing variant descriptions using Virtuoso/PL:

-- Adding or Updating a Resource Variant:
DB.DBA.HTTP_VARIANT_ADD (
  in rulelist_uri varchar, -- HTTP rule list name
  in uri varchar, -- Requested resource name e.g. 'page'
  in variant_uri varchar, -- Variant name e.g. 'page.xml', 'page.de.html' etc.
  in mime varchar, -- Content type of the variant e.g. text/xml
  in qs float := 1.0, -- Source quality, a floating point number with 3 digit precision in 0.001-1.000 range
  in description varchar := null, -- a human readable description of the variant e.g. 'Profile in RDF format'
  in lang varchar := null, -- Content language e.g. 'en', 'bg'. 'de' etc.
  in enc varchar := null -- Content encoding e.g. 'utf-8', 'ISO-8892' etc.
)


--Removing a Resource Variant
DB.DBA.HTTP_VARIANT_REMOVE (
  in rulelist_uri varchar, -- HTTP rule list name
  in uri varchar, -- Name of requested resource e.g. 'page'
  in variant_uri varchar := '%' -- Variant name filter
)

15.7.3.10.3. Configuration using Conductor UI

The Conductor 'Content negotiation' panel for describing resource variants and configuring content negotiation is depicted below. It can be reached by selecting the 'Virtual Domains & Directories' tab under the 'Web Application Server' menu item, then selecting the 'URL rewrite' option for a logical path listed amongst those for the relevant HTTP host, e.g. '{Default Web Site}'

The input fields reflect the supported 'dimensions' of negotiation which include content type, language and encoding. Quality values corresponding to the options for 'Source Quality' are as follows:

Table: 15.7.3.10.3.1. Source Quality
Source Quality Quality Value
perfect representation 1.000
threshold of noticeable loss of quality 0.900
noticeable, but acceptable quality reduction 0.800
barely acceptable quality 0.500
severely degraded quality 0.300
completely degraded quality 0.000


15.7.3.10.4. Variant Selection Algorithm

When a user agent instructs the server to select the best variant, Virtuoso does so using the selection algorithm below:

If a virtual directory has URL rewriting enabled (has the 'url_rewrite' option set), the web server:

The server may return the best-choice resource representation or a list of available resource variants. When a user agent requests transparent negotiation, the web server returns the TCN header "choice". When a user agent asks for a variant list, the server returns the TCN header "list".


15.7.3.10.5. Examples

In this example we assume the following files have been uploaded to the Virtuoso WebDAV server, with each containing the same information but in different formats:

We add TCN rules and define a virtual directory:

DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.html','text/html', 0.900000, 'HTML variant');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.txt', 'text/plain', 0.500000, 'Text document');
DB.DBA.HTTP_VARIANT_ADD ('http_rule_list_1', 'page', 'page.xml', 'text/xml', 1.000000, 'XML variant');
DB.DBA.VHOST_DEFINE (lpath=>'/DAV/TCN/',
                     ppath=>'/DAV/TCN/',
                     is_dav=>1,
                     vsp_user=>'dba',
                     opts=>vector ('url_rewrite', 'http_rule_list_1'));

Having done this we can now test the setup with a suitable HTTP client, in this case the curl command line utility. In the following examples, the curl client supplies Negotiate request headers containing content negotiation directives which include:

The server returns a TCN response header signalling that the resource is transparently negotiated and either a choice or a list response as appropriate.

In the first curl exchange, the user agent indicates to the server that, of the formats it recognizes, HTML is preferred and it instructs the server to perform transparent content negotiation. In the response, the Vary header field expresses the parameters the server used to select a representation, i.e. only the Negotiate and Accept header fields are considered.

$ curl -i -H "Accept: text/xml;q=0.3,text/html;q=1.0,text/plain;q=0.5,*/*;
q=0.3" -H "Negotiate: *" http://localhost:8890/DAV/TCN/page
HTTP/1.1 200 OK Server: Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu
VDB Connection: Keep-Alive Date: Wed, 31 Oct 2007 15:43:18
GMT Accept-Ranges: bytes TCN: choice Vary: negotiate,accept
Content-Location: page.html Content-Type: text/html
ETag: "14056a25c066a6e0a6e65889754a0602"
Content-Length: 49
<html> <body> some html </body> </html>

Next, the source quality values are adjusted so that the user agent indicates that XML is its preferred format.

$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3" -H "Negotiate:
*" http://localhost:8890/DAV/TCN/page HTTP/1.1 200 OK Server: Virtuoso/05.00.3021
(Linux) i686-pc-linux-gnu VDB Connection: Keep-Alive Date: Wed, 31 Oct 2007
15:44:07 GMT Accept-Ranges: bytes TCN: choice Vary: negotiate,accept
Content-Location: page.xml Content-Type: text/xml ETag:
"8b09f4b8e358fcb7fd1f0f8fa918973a" Content-Length: 39

<?xml version="1.0" ?> <a>some xml</a>

In the final example, the user agent wants to decide itself which is the most suitable representation, so it asks for a list of variants. The server provides the list, in the form of an Alternates response header, and, in addition, sends an HTML representation of the list so that the end user can decide on the preferred variant himself if the user agent is unable to.

$ curl -i -H "Accept: text/xml,text/html;q=0.7,text/plain;q=0.5,*/*;q=0.3" -H "Negotiate:
vlist" http://localhost:8890/DAV/TCN/page HTTP/1.1 300 Multiple Choices Server:
Virtuoso/05.00.3021 (Linux) i686-pc-linux-gnu VDB Connection: close Content-Type:
text/html; charset=ISO-8859-1 Date: Wed, 31 Oct 2007 15:44:35 GMT Accept-Ranges:
bytes TCN: list Vary: negotiate,accept Alternates: {"page.html" 0.900000 {type text/html}},
{"page.txt" 0.500000 {type text/plain}}, {"page.xml" 1.000000 {type text/xml}} Content-Length: 368

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>300 Multiple Choices</title>
</head>
<body>
<h1>Multiple Choices</h1>
Available variants:
<ul>
<li>
<a href="page.html">HTML variant</a>, type text/html</li>
<li><a href="page.txt">Text document</a>, type text/plain</li>
<li><a href="page.xml">XML variant</a>, type text/xml</li>
</ul>
</body>
</html>



15.7.4. Examples of other Protocol Resolvers

Example of LSIDs: A scientific name from UBio

SQL>sparql
define get:soft "soft"
select *
from <urn:lsid:ubio.org:namebank:11815>
where { ?s ?p ?o }
limit 5;

s                                 p                                           o
VARCHAR                           VARCHAR                                     VARCHAR
_______________________________________________________________________________

urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/title       Pternistis leucoscepus
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/subject     Pternistis leucoscepus (Gray, GR) 1867
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/identifier  urn:lsid:ubio.org:namebank:11815
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/creator     http://www.ubio.org
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/type        Scientific Name

5 Rows. -- 741 msec.

Example of LSIDs: A segment of the human genome from GDB

SQL>sparql
define get:soft "soft"
select *
from <urn:lsid:gdb.org:GenomicSegment:GDB132938>
where { ?s ?p ?o }
limit 5;

s  	                                        p  	                                           o
VARCHAR                                    VARCHAR                                               VARCHAR
_______________________________________________________________________________

urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:DBObject-predicates:accessionID      GDB:132938
urn:lsid:gdb.org:GenomicSegment:GDB132938  http://www.ibm.com/LSID/2004/RDF/#lsidLink            urn:lsid:gdb.org:DBObject:GDB132938
urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:DBObject-predicates:objectClass      DBObject
urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:DBObject-predicates:displayName      D20S95
urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:GenomicSegment-predicates:variantsQ  nodeID://1000027961

5 Rows. -- 822 msec.

Example of OAI: an institutional / departmental repository.

SQL>sparql
define get:soft "soft"
select *
from <oai:etheses.bham.ac.uk:23>
where { ?s ?p ?o }
limit 5;

s                           p                                           o
VARCHAR                     VARCHAR                                     VARCHAR
_____________________________________________________________________________

oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/title       A study of the role of ATM mutations in the pathogenesis of B-cell chronic lymphocytic leukaemia
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/date        2007-07
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/subject     RC0254 Neoplasms. Tumors. Oncology (including Cancer)
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/identifier  Austen, Belinda (2007) A study of the role of ATM mutations in the pathogenesis of B-cell chronic lymphocytic leukaemia. Ph.D. thesis, University of Birmingham.
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/identifier  http://etheses.bham.ac.uk/23/1/Austen07PhD.pdf

5 Rows. -- 461 msec.

Example of DOI

In order to execute correctly queries with doi resolver you need to have:

SQL>sparql
define get:soft "soft"
select *
from <doi:10.1045/march99-bunker>
where { ?s ?p ?o } ;

s                                                      p                                                 o
VARCHAR                                                VARCHAR                                           VARCHAR
_______________________________________________________________________________

http://www.dlib.org/dlib/march99/bunker/03bunker.html  http://www.w3.org/1999/02/22-rdf-syntax-ns#type   http://www.openlinksw.com/schemas/XHTML#
http://www.dlib.org/dlib/march99/bunker/03bunker.html  http://www.openlinksw.com/schemas/XHTML#title     Collaboration as a Key to Digital Library Development: High Performance Image Management at the University of Washington

2 Rows. -- 12388 msec.

Other examples

SQL>sparql
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX doap: <http://usefulinc.com/ns/doap#>
SELECT DISTINCT ?name ?mbox ?projectName
WHERE {
 <http://dig.csail.mit.edu/2005/ajar/ajaw/data#Tabulator>
doap:developer ?dev .
 ?dev foaf:name ?name .
 OPTIONAL { ?dev foaf:mbox ?mbox }
 OPTIONAL { ?dev doap:project ?proj .
            ?proj foaf:name ?projectName }
};

name          mbox              projectName
VARCHAR       VARCHAR           VARCHAR
____________________ ___________________________________________

Adam Lerer    NULL              NULL
Dan Connolly  NULL              NULL
David Li      NULL              NULL
David Sheets  NULL              NULL
James Hollenbach  NULL          NULL
Joe Presbrey  NULL              NULL
Kenny Lu      NULL              NULL
Lydia Chilton NULL              NULL
Ruth Dhanaraj NULL              NULL
Sonia Nijhawan    NULL          NULL
Tim Berners-Lee   NULL          NULL
Timothy Berners-Lee   NULL      NULL
Yuhsin Joyce Chen         NULL NULL

13 Rows. -- 491 msec.
SQL>sparql
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?friendsname ?friendshomepage ?foafsname ?foafshomepage
WHERE
 {
  <http://myopenlink.net/dataspace/person/kidehen#this> foaf:knows ?friend .
  ?friend foaf:mbox_sha1sum ?mbox .
  ?friendsURI foaf:mbox_sha1sum ?mbox .
  ?friendsURI foaf:name ?friendsname .
  ?friendsURI foaf:homepage ?friendshomepage .
  OPTIONAL { ?friendsURI foaf:knows ?foaf .
              ?foaf foaf:name ?foafsname .
              ?foaf foaf:homepage ?foafshomepage .
           }
 }
LIMIT 10;




friendsname  	   friendshomepage                         foafsname  	    foafshomepage
ANY                ANY                                     ANY              ANY
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Dan Connolly	    http://www.w3.org/People/Connolly/
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Henry J. Story   http://bblfish.net/
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Henry Story	    http://bblfish.net/
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Henry J. Story   http://bblfish.net/people/henry/card
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Henry Story	    http://bblfish.net/people/henry/card
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Ruth Dhanaraj    http://web.mit.edu/ruthdhan/www
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Dan Brickley	    http://danbri.org/
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Dan Brickley	    http://danbri.org/
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Daniel Krech	    http://eikeon.com/
 Tim Berners Lee   http://www.w3.org/People/Berners-Lee/   Daniel Krech	    http://eikeon.com/