14.17. RDF Graph Replication
The following section demonstrates how to replicate graphs from one Virtuoso
instance to (an)other Virtuoso instance(s), using the RDF Replication Feature.
Terms used in this section:
-
Host Virtuoso Instance, aka the publisher: the instance where we
will insert RDF data into a Named Graph; then create a publication of this graph.
-
Destination Virtuoso Instance, aka the subscriber: the instance
which will subscribe to the publication from the Host Virtuoso Instance.
The basic outline:
- First, use the Virtuoso Conductor on a Host Virtuoso Instance to publish a named
graph.
- Then, use the Virtuoso Conductor on a Destination Virtuoso Instance to subscribe
to deltas from the published graph.
- Finally, see how a change in the publisher's graph will appear in the subscriber's
graph.
14.17.1. Replication Scenarios
14.17.1.2. Introduction
In this section we will examine a proposed setup for a back-end server called MASTER which
publishes a number of graphs to a set of front-end machines called FARM-1 .. FARM-n and discuss
a couple of common scenarios like adding an extra machine to the farm, or replacing a broken
instance of MASTER.
In this example we will assume each virtuoso instance running on its own machine, so they can
use the same port numbers for both the main server (default 1111) as well as the http port
(default 8890) as each machine has an unique IP addresses. In the example we use MASTER-IP and
FARM-x-IP which should be replaced by either the real IP address or the DNS name of the machine
in question.
Since there will be a reverse-proxy service in front of the farm, all virtuoso instances
should have the URIQA Default host set to the outside name for this service. In this example
we will use http://test.example.com as the web service we are trying to setup.
14.17.1.3. Setup
14.17.1.3.1. Installing Virtuoso
All machines in this setup should be installed with similar installation paths like:
- /opt/virtuoso
- /dbs/virtuoso
- /virtuoso
- ...
The partition should be big enough to have room for the Virtuoso binaries and libraries,
the transaction logs, backups and, if you do not want to use the striping feature of Virtuoso,
it will need to have room for the main database files as well.
Here are the quick installation steps:
- Login as root.
- Create local user called virtuoso using the chosen installation path as home
direcotory.
- Login as virtuoso.
- Extract virtuoso-universal-server-6.1.tar in home directory.
- Run sh install.sh to install Virtuoso.
- Remove the file install.sh virtuoso-universal-server-6.1.tar virtuoso-server.taz if
not otherwise needed.
- Run bin/virtuoso-stop.sh to shutdown this Virtuoso instance.
- Install virtuoso.lic for this system in $HOME/bin directory.
As the replication process needs to make an ODBC connection to the MASTER machine, all
machines should have the following information in the $HOME/bin/odbc.ini:
[ODBC Data Sources]
..
MASTER_DSN = OpenLink Virtuoso
..
[MASTER_DSN]
Driver = OpenLink Virtuoso
Address = MASTER_IP:1111
14.17.1.3.2. Setting up MASTER
The MASTER machine is the back-end server machine. Various applications feed SPARQL data
into this machine it publishes a set of graphs using RDF Replication.
The MASTER machine should ideally be equipped with multiple redundant disks in RAID-1
or RAID-6 mode to minimize the risk that a single bad disk takes down the system. From a
Virtuoso point of view we will use a combination of online backups combined with checkpoint
audit trail to backup the content of the database in a safe way. The online backups, the
checkpoint audit trail as well as the replication logs can also be copied to secondary
storage using the rsync command and can be easily scripted as a cron job.
Changes to database/virtuoso.ini:
...
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 1 ; enable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[URIQA]
DefaultHost = test.example.com
...
[Replication]
ServerName = MASTER
ServerEnable = 1
QueueMax = 5000000
...
Once the MASTER is started using the bin/virtuoso-start.sh script we must enable RDF
replication before we start add data to the graphs we wish to replicate, so every record is
accounted for by the replication process. If there is existing data in the graphs to be
published, then this data would need to be added to a subscriber manually since the
replication process creates a delta set of changes since publishing was enabled.
To enable publishing of the graph we use the isql program to connect to the MASTER
instance:
$ isql MASTER-IP:1111
-- and run the following commands:
-- enable this instance as a publisher
rdf_repl_start();
-- add graphs to replication list
rdf_repl_graph_ins('http://test.example.com');
Next we create a backup directory inside the database directory and setup the online
backup, again using the isql program:
$ cd database
$ mkdir backup
$ isql MASTER_IP:1111
-- and run the following commands:
-- clear any previous context
backup_context_clear();
-- start the backup
backup_online ('bkup-#', 1000000, 0, vector ('backup'));
The following files can now be backed up using rsync or similar tool to another machine:
Table: 14.17.1.3.2.1. Files that can be backed up using rsync or similar tool to another machine
Files |
Description |
database/backup/*.bp |
the incremental backup files |
database/virtuoso.trx |
the main transaction log containing the most recent updates to the database that have not been checkpointed into the database |
database/virtuosoTIMESTAMP.trx |
all the previous transaction logs which can be used to reconstruct the database |
database/__rdf_repl*.log |
all the replication logs containing the changes to the published graph |
NOTE: Since the database is constantly modified during operation, it is of NO use to
backup the virtuoso.db using an rsync script unless the virtuoso instance was shutdown
beforehand, or certain extra precautions are taken which we will explain later on.
14.17.1.3.3. Setup SPARE master
The SPARE machine is a replica of the MASTER machine. This machine subscribes to the
publication of the MASTER to keep an exact match of the RDF graphs, but also publishes
this data without any initial subscribers.
The SPARE machine should ideally be equipped similar to the MASTER machine, with multiple
redundant disks in RAID-1 or RAID-6 mode to minimize the risk that a single bad disk takes down
the system. From a Virtuoso point of view we will use a combination of online backups combined
with checkpoint audit trail to backup the content of the database in a safe way. The online
backups, the checkpoint audit trail as well as the replication logs can also be copied to
secondary storage using the rsync command and can be easily scripted as a cron job.
Changes to database/virtuoso.ini:
...
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 1 ; enable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[URIQA]
DefaultHost = test.example.com
...
[Replication]
ServerName = SPARE
ServerEnable = 1
QueueMax = 5000000
...
We must enable RDF replication before we start add data to the graphs we wish to
replicate, so every record is accounted for by the replication process. If there is
existing data in the graphs to be published, then this data would need to be added
to a subscriber manually since the replication process creates a delta set of changes
since publishing was enabled.
To enable publishing of the graph, as well as subscribing to the MASTER, we first start
up this Virtuoso instance with bin/virtuoso-start.sh and then use the isql program to connect
to the SPARE instance:
$ bin/virtuoso-start.sh
$ isql SPARE-IP:1111
-- and run the following commands:
-- enable this instance as a publisher
rdf_repl_start();
-- add graphs to replication list
rdf_repl_graph_ins('http://test.example.com');
-- connect to master
repl_server ('MASTER', 'MASTER_DSN');
-- start subscribing to __rdf_repl
repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');
-- start initial replication
repl_sync_all ();
-- add subscription to scheduler
DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1);
Next we create a backup directory inside the database directory and setup the online backup,
again using the isql program:
$ cd database
$ mkdir backup
$ isql SPARE_IP:1111
--and run the following commands:
-- clear any previous context
backup_context_clear();
-- start the backup
backup_online ('bkup-#', 1000000, 0, vector ('backup'));
The following files can now be backed up using rsync or similar tool to another machine:
Table: 14.17.1.3.3.1. Files that can be backed up using rsync or similar tool to another machine
Files |
Description |
database/backup/*.bp |
the incremental backup files |
database/virtuoso.trx |
the main transaction log containing the most recent updates to the database that have not been checkpointed into the database |
database/virtuosoTIMESTAMP.trx |
all the previous transaction logs which can be used to reconstruct the database |
database/__rdf_repl*.log |
all the replication logs containing the changes to the published graph |
Note: Since the database is constantly modified during operation, it is of NO use to
backup the virtuoso.db using an rsync script unless the virtuoso instance was shutdown
beforehand, or certain extra precautions are taken which we will explain later on.
14.17.1.3.4. Setup FARM-1
The FARM-1 machine is the first front-end server machine. It subscribes to the publication
of the MASTER instance to keep up-to-date.
The FARM-1 machine can be run on simpler hardware than the MASTER instance.It does not
require the same level of redundancy in terms of hard disks etc, as there are a number of
these machines running in parallel each capable of returning results to the proxy. If one
FARM machine dies, it can simply be taken from the reverse-proxy list, repaired or replaced
with a fresh machine before it is added to the list of servers in the reverse proxy. As such
it does not need to be backed up separately, although we could make a backup of this
installation to quickly install the rest of the identical FARM boxes.
Change the database/virtuoso.ini file:
...
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 0 ; disable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[URIQA]
DefaultHost = test.example.com
...
[Replication]
ServerName = FARM-1 ; each FARM machine needs to have a unique replication name
ServerEnable = 1
QueueMax = 5000000
...
Next we start up the Virtuoso instance using the bin/virtuoso-start.sh command and
use the isql program to subscribe to the MASTER:
$ bin/virtuoso-start.sh
$ isql FARM-1-IP:1111
-- connect to master
repl_server ('MASTER', 'MASTER_DSN');
-- start subscribing to __rdf_repl
repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');
-- start initial replication
repl_sync_all ();
-- add subscription to scheduler
DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1);
At this point we can shutdown this Virtuoso instance using the bin/virtuoso-stop.sh
command and make a copy of the whole virtuoso installation as a blueprint to copy to
another FARM-x machine.
14.17.1.3.5. Setup FARM-2 from scratch
We can repeat the same steps we did for the FARM-1 machine, and just make sure we use
FARM-2 as the replication name in the database/virtuoso.ini file and use FARM-2-IP:1111 as
an argument to the isql program.
Change bin/virtuoso.ini:
[Replication]
ServerName = FARM-2
14.17.1.3.6. Setup FARM-3 using blueprint from FARM-1 installation
Extract the tarred/zipped copy of the installation made at the end of the setup of FARM-1.
Before starting up the instance, we only need to give this instance a unique name for
replication:
Change bin/virtuoso.ini:
[Replication]
ServerName = FARM-3
Next we start up the Virtuoso instance using the bin/virtuoso-start.sh command and since
the subscription records and schedule are already performed in the previous step, we just
use the isql program to perform a sync against the MASTER:
$ bin/virtuoso-start.sh
$ isql FARM-3-IP:1111
-- change replication name
DB.DBA.REPL_SERVER_RENAME ('FARM-1', 'FARM-3')
-- sync against master
repl_sync_all();
14.17.1.3.7. Setup FARM-4 using clone of FARM-1
If the system has been running for some time, it may not be practical to do a replication
from start, so there is an alternative way to setup a new FARM-4 machine.
We can either restore the blue-print backup we make at the end of FARM-1 installation,
or we do a fresh installation of virtuoso on the FARM-4 machine.
In both cases we shutdown the virtuoso instance and remove the database, as we are going
to replace this.
$ bin/virtuoso-stop.sh
$ cd database
$ rm virtuoso.db virtuoso.trx virtuoso.log virtuoso.pxa
Change the database/virtuoso.ini file:
...
[Parameters]
SchedulerInterval = 1 ; run the internal scheduler every minute
CheckpointAuditTrail = 0 ; disable audit trail on transaction logs
CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes
...
[URIQA]
DefaultHost = test.example.com
...
[Replication]
ServerName = FARM-4 ; each FARM machine needs to have a unique replication name
ServerEnable = 1
QueueMax = 5000000
...
Next we are going to temporarily disable checkpointing on FARM-1 machine so
we can copy its database without risking corruption:
$ isql FARM-1-IP:1111
-- disable automatic checkpointing
checkpoint_interval (-1);
-- and do an explicit checkpoint
checkpoint;
It is now safe to copy the database across using the rsync command:
$ rsync -avz virtuoso@FARM-1-IP:/path/to/virtuoso/database/virtuoso.db database/virtuoso.db
Next we re-enable checkpoint interval on FARM-1:
$ isql FARM-1-IP:1111
-- re-enable checkpointing
checkpoint_interval(60);
The last step is to start the database:
$ bin/virtuoso-start.sh
$ isql FARM-4-IP:1111
-- change replication name
DB.DBA.REPL_SERVER_RENAME ('FARM-1', 'FARM-4')
-- sync against master
repl_sync_all();
14.17.2. Replication Topologies
Typical replication topologies are Chains, Stars and Bi-directional. They can be achieved with
Virtuoso, by repeating the "Publish" and/or "Subscribe" steps on each relevant node.
14.17.2.1. Star Replication Topology
In a Star, there is one Publisher, and many Subscribers.
To set up a Star, follow the scenario:
- Configure Instance #1 to Publish.
- Configure Instance #2 to Subscribe to #1.
- Repeat as necessary.
14.17.2.1.2. Star Replication Topology Example
The following How-To walks you through setting up Virtuoso RDF Graph Replication in a Star Topology.
Prerequisites
Database INI Parameters
Suppose there are 3 Virtuoso instances respectively with the following ini parameters values:
- virtuoso1.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1
...
- virtuoso2.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2
...
- virtuoso3.ini:
...
[Database]
DatabaseFile = virtuoso3.db
TransactionFile = virtuoso3.trx
ErrorLogFile = virtuoso3.log
...
[Parameters]
ServerPort = 1113
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8893
...
[URIQA]
DefaultHost = localhost:8893
...
[Replication]
ServerName = db3
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for each of db1, db2, db3, with names db1, db2 and db3, respectively.
Install Conductor package
On each of the 3 Virtuoso instances install the conductor_dav.vad package.
Create a Publication on the Host Virtuoso Instance db1
- Go to Conductor -> Replication -> Transactional -> Publications
- Click Enable RDF Publishing
- A publication with the name RDF Publication should be created:
- Click the link which is the publication name.
- You will be shown the publication items page:
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Insert Data into a Named Graph on the Host
Virtuoso Instance
There are several ways to insert data into a Virtuoso Named Graph. In this example, we
will use the Virtuoso Conductor's Import RDF feature:
- In the Virtuoso Conductor, go to Linked Data -> Quad Store Upload:
- In the form:
- Tick the box for Resource URL and enter your resource URL, for e.g.:
http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this
- Enter for Named Graph IRI:
- Click Upload
- A successful upload will result in this message:
- Check the inserted triples by executing a query like the following against the SPARQL endpoint, http://cname:port/sparql:
SELECT *
FROM <http://example.org>
WHERE { ?s ?p ?o }
- See how many triples have been inserted in your graph:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
Subscribe to the Publication on the a
Destination Virtuoso Instance db2, db3, etc.
- Go to Conductor -> Replication -> Transactional -> Subscriptions
- Click New Subscription
- Specify a new Data Source Enter or selected target data source from the available connected Data Sources:
- Click Publications list
- Select the RDF Publication and click List Items
- Click Subscribe
- The subscription will be created
- Click Sync
- Check the retrieved triples by executing the following query
SELECT *
FROM <http://example.org>
WHERE {?s ?p ?o}
- See how many triples have been inserted into your graph by executing the following query:
SELECT COUNT(*)
FROM <http://example.org>
WHERE {?s ?p ?o}
These steps may be repeated for any number of Subscriber.
Insert Triples into the Host Virtuoso
Instance Graph and check availability at Destination Virtuoso Instance Graph
- To check the starting count, on the Destination Virtuoso Instance SPARQL Endpoint, execute:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- On the Host Virtuoso Instance go to Conductor -> Database -> Interactive SQL and execute the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Services>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Clients>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/SPARQL>
} ;
- To confirm that the triple count has increased by the number of inserted triples, execute the following on the Destination Virtuoso Instance SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
14.17.2.2. Chain Replication Topology
In a Chain, there is one original Publisher, to which there is only one Subscriber. That
Subscriber may also serve as a Publisher, again with only one Subscriber. The chain ends with
a Subscriber which does not Publish.
To set up a Chain, follow the scenario:
- Configure Instance #1 to Publish.
- Configure Instance #2 to Subscribe to #1.
- Configure Instance #2 to Publish.
- Configure Instance #3 to Subscribe to #2.
- Repeat as necessary.
14.17.2.2.2. Chain Replication Topology Example
The following How-To walks you through setting up Virtuoso RDF Graph Replication in a
Chain Topology.
Prerequisites
Database INI Parameters
Suppose there are 3 Virtuoso instances respectively with the following ini parameters values:
- virtuoso1.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1
...
- virtuoso2.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2
...
- virtuoso3.ini:
...
[Database]
DatabaseFile = virtuoso3.db
TransactionFile = virtuoso3.trx
ErrorLogFile = virtuoso3.log
...
[Parameters]
ServerPort = 1113
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8893
...
[URIQA]
DefaultHost = localhost:8893
...
[Replication]
ServerName = db3
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for each of db1, db2, db3, with names db1, db2 and db3, respectively.
Install Conductor package
On each of the 3 Virtuoso instances install the conductor_dav.vad package.
Create Publication on db1
- Go to http://localhost:8891/conductor and log in as dba
- Go to Conductor - > Replication - > Transactional - > Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db2 to db1's Publication
- Log in at http://localhost:8892/conductor
- Go to Replication - > Transactional - > Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db1
- Enter for db1 dba user credentials
- Click "Add Data Source"
- As result db1 will be shown in the "Connected Data Sources" list.
- Select db1 the "Connected Data Sources" list and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Create Publication on db2
- Go to http://localhost:8892/conductor and log in as dba
- Go to Conductor - > Replication - > Transactional - > Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db3 to db2's Publication
- Log in at http://localhost:8893/conductor
- Go to Replication - > Transactional - > Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db2
- Enter for db2 dba user credentials
- Click "Add Data Source"
- As result db2 will be shown in the "Connected Data Sources" list. Select it and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Insert Data into a Named Graph on the db1 Virtuoso Instance
- Log in at http://localhost:8891/conductor
- Go to Linked Data -> Quad Store Upload:
- In the shown form:
- Tick the box for Resource URL and enter your resource URL, e.g.:
http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this
- Enter for Named Graph IRI:
- Click Upload
- A successful upload will result in a shown message.
- Check the count of the inserted triples by executing a query like the following against the SPARQL endpoint,
http://localhost:8891/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 57 as total.
Check data on the Destination instances db2 and db3
- To check the starting count, on each of the Destination Virtuoso Instances db2 and db3 from SPARQL Endpoint execute:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 57 as total.
Add new data on db1
- Disconnect db2 and db3.
- On the Host Virtuoso Instance db1 go to Conductor - > Database - > Interactive SQL enter the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Services>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Clients>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/SPARQL>
} ;
- Click "Execute"
- As result the triples will be inserted
- Check the count of the destination instance graph's triples by executing the following query like against the SPARQL endpoint,
http://localhost:8891/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 60 as total.
Check data on the Destination instances db2 and db3
- Start instances db2 and db3
- To confirm that the triple count has increased by the number of inserted triples, execute the following on the Destination Virtuoso Instance db2 and db3 SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 60 as total.
14.17.2.3. Bi-directional Replication Topology
14.17.2.3.1. Bi-directional Replication Topology Example
The following How-To walks you through setting up Virtuoso RDF Graph Replication in a
Bi-directional Topology.
db1 <---- db2
db1 ----> db2
Prerequisites
Database INI Parameters
Suppose there are 2 Virtuoso instances respectively with the following ini parameters values:
- virtuoso1.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1
...
- virtuoso2.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for db1 and db2 with names db1 and db2 respectively.
Install Conductor package
On each of the 2 Virtuoso instances install the conductor_dav.vad package.
Create Publication on db2
- Go to http://localhost:8892/conductor and log in as dba
- Go to Conductor -> Replication -> Transactional -> Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db1 to db2's Publication
- Log in at http://localhost:8891/conductor
- Go to Replication -> Transactional -> Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db2
- Enter for db2 dba user credentials
- Click "Add Data Source"
- As result db2 will be shown in the "Connected Data Sources" list.
- Select db2 the "Connected Data Sources" list and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Create Publication on db1
- Go to http://localhost:8891/conductor and log in as dba
- Go to Conductor -> Replication -> Transactional -> Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db2 to db1's Publication
- Log in at http://localhost:8892/conductor
- Go to Replication -> Transactional -> Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db1
- Enter for db1 dba user credentials
- Click "Add Data Source"
- As result db1 will be shown in the "Connected Data Sources" list. Select it and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Insert Data into a Named Graph on the db2 Virtuoso Instance
- Log in at http://localhost:8892/conductor
- Go to Linked Data -> Quad Store Upload:
- In the shown form:
- Tick the box for Resource URL and enter your resource URL, e.g.:
http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this
- Enter for Named Graph IRI:
- Click Upload
- A successful upload will result in a shown message.
- Check the count of the inserted triples by executing a query like the following against the SPARQL endpoint,
http://localhost:8892/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 57 as total.
Check data on the Destination instance db1
- To check the starting count, execute from db1's SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 57 as total.
Add new data on db2
- Disconnect db1.
- On the Host Virtuoso Instance db2 go to Conductor -> Database -> Interactive SQL enter the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Services>
} ;
- Click "Execute"
- As result the triples will be inserted
- Check the count of the destination instance graph's triples by executing the following query like against the SPARQL endpoint,
http://localhost:8892/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 58 as total.
Check data on the Destination instance db1
- Start instance db1
- To confirm that the triple count has increased by the number of inserted triples, execute the following statement on db1's SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 58 as total.
Add new data on db1
- Disconnect db2.
- On the Host Virtuoso Instance db1 go to Conductor -> Database -> Interactive SQL enter the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Clients>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/SPARQL>
} ;
- Click "Execute"
- As result the triples will be inserted
- Check the count of the destination instance graph's triples by executing the following query like against the SPARQL endpoint,
http://localhost:8891/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 60 as total.
Check data on the Destination instance db2
- Start instance db2
- To confirm that the triple count has increased by the number of inserted triples, execute the following statement on db2's SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 60 as total.
14.17.3. Set up RDF Replication via procedure calls
14.17.3.1. Example
The following example shows how to use SQL procedures to set up Virtuoso RDF Graph Replication in a Chain Topology.
This can also be done through the HTTP-based Virtuoso Conductor.
14.17.3.1.2. Prerequisites
Database INI Parameters
Suppose there are 3 Virtuoso instances on the same machine.
The first instance holds the master copy of the data and publishes its changes to all other instances that subscribe to this master.
The second instance subscribes to the publication of the master copy, but also publishes all of these changes to any instance that subscribes to it.
The third instance only subscribes to the publication of the second instance.
Each of these 3 servers need unique ports and ServerName, DefaultHost for this replication scheme to work properly. Although not needed, this example also sets separate names for the database and related files. This results in the following ini parameters values (only changes are shown, the rest can remain default):
- repl1/virtuoso.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1-r
...
- repl2/virtuoso.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2-r
...
- repl3/virtuoso.ini:
...
[Database]
DatabaseFile = virtuoso3.db
TransactionFile = virtuoso3.trx
ErrorLogFile = virtuoso3.log
...
[Parameters]
ServerPort = 1113
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8893
...
[URIQA]
DefaultHost = localhost:8893
...
[Replication]
ServerName = db3-r
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for each of db1, db2, db3, with names db1, db2 and db3, respectively.
14.17.3.1.3. Configure Publishers and Subscribers
- Run the databases by starting start.sh, which has the following content:
cd repl1
virtuoso -f &
cd ../repl2
virtuoso -f &
cd ../repl3
virtuoso -f &
cd ..
- Use the isql command to execute the following rep.sql file:
--
-- connect to the first database which is only a publisher
--
set DSN=localhost:1111;
reconnect;
--
-- start publishing the graph http://test.org
---
DB.DBA.RDF_REPL_START();
DB.DBA.RDF_REPL_GRAPH_INS ('http://test.org');
--
-- connect to the second database in the chain, which is both a publisher and a subscriber
--
set DSN=localhost:1112;
reconnect;
--
-- start publishing the graph http://test.org
--
DB.DBA.RDF_REPL_START();
DB.DBA.RDF_REPL_GRAPH_INS ('http://test.org');
--
-- contact the first database
--
repl_server ('db1-r', 'db1', 'localhost:1111');
--
-- subscribe to its RDF publication(s)
--
repl_subscribe ('db1-r', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');
--
-- bring the replication service online
--
repl_sync_all();
--
-- and set scheduler to check every minute
--
DB.DBA.SUB_SCHEDULE ('db1-r', '__rdf_repl', 1);
--
-- connect to the third database in the chain, which is only a subscriber
--
set DSN=localhost:1113;
reconnect;
--
-- uncomment next 2 commands if this database should also be a publisher
--
--DB.DBA.RDF_REPL_START();
--DB.DBA.RDF_REPL_GRAPH_INS ('http://test.org');
--
-- contact second database
--
repl_server ('db2-r', 'db2', 'localhost:1112');
--
-- subscribe to its RDF publication(s)
--
repl_subscribe ('db2-r', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');
--
-- bring the replication service online
--
repl_sync_all();
--
-- and set schedule to check every minute
--
DB.DBA.SUB_SCHEDULE ('db2-r', '__rdf_repl', 1);