Chapter 8. Basic Programming with Sesame

Chapter 8. Basic Programming with Sesame
Prev		Next

Table of Contents

8.1. The RDF Model API

8.2. The Repository API

8.2.1. Creating a Repository object

8.2.1.1. Creating a main memory RDF Repository
8.2.1.2. Creating a Native RDF Repository
8.2.1.3. Creating a repository with RDF Schema inferencing
8.2.1.4. Accessing a remote repository

8.2.2. Using a repository: RepositoryConnections

8.2.2.1. Adding RDF to a repository
8.2.2.2. Querying a repository
8.2.2.3. Creating, retrieving, removing individual statements

8.2.3. Working with Graphs, Collections and Iterations

8.2.3.1. More on Graphs and GraphUtil

8.2.4. Using context

8.2.5. Transactions

In this chapter, we introduce the basics of programming with the Sesame framework. We assume that you have at least a basic understanding of programming in Java and of how the Resource Description Framework models data.

8.1. The RDF Model API

The core of the Sesame framework is the RDF Model API (see the Model API Javadoc). This API defines how the building blocks of RDF (statements, URIs, blank nodes, literals) are represented.

RDF statements are represented by the org.openrdf.model.Statement interface. Each Statement has a subject, predicate, object and (optionally) a context (more about contexts below, in the section about the Repository API). Each of these 4 items is a org.openrdf.model.Value. The Value interface is further specialized into org.openrdf.model.Resource, and org.openrdf.model.Literal. Resource represents any RDF value that is either a blank node or a URI (in fact, it specializes further into org.openrdf.model.URI and org.openrdf.model.BNode). Literal represents RDF literal values (strings, dates, integer numbers, and so on).

To create new values and statements, we can use a org.openrdf.model.ValueFactory. You can use a default ValueFactory implementation called org.openrdf.model.impl.ValueFactoryImpl:

ValueFactory factory = ValueFactoryImpl.getInstance();

You can also obtain a ValueFactory from the Repository you are working with, and in fact, this is the recommend approach. More about that in the next section.

Regardless of how you obtain your ValueFactory, once you have it, you can use it to create new URIs, Literals, and Statements:

URI bob = factory.createURI("http://example.org/bob");
URI name = factory.createURI("http://example.org/name");
Literal bobsName = factory.createLiteral("Bob");
Statement nameStatement = factory.createStatement(bob, name, bobsName);

The Model API also provides pre-defined URIs for several well-known vocabularies, such as RDF, RDFS, OWL, DC (Dublin Core), FOAF (Friend-of-a-Friend), and more. These constants can all be found in the org.openrdf.model.vocabulary package, and can be quite handy in quick creation of RDF statements (or in querying a Repository, as we shall see later):

Statement typeStatement = factory.createStatement(bob, RDF.TYPE, FOAF.PERSON);

8.2. The Repository API

The Repository API is the central access point for Sesame repositories. Its purpose is to give a developer-friendly access point to RDF repositories, offering various methods for querying and updating the data, while hiding a lot of the nitty gritty details of the underlying machinery.

The interfaces for the Repository API can be found in package org.openrdf.repository. Several implementations for these interface exist in various sub-packages. The Javadoc reference for the API is available online and can also be found in the doc directory of the download.

If you need more information about how to set up your environment for working with the Sesame APIs, take a look at Chapter 4, Setting up to use the Sesame libraries.

8.2.1. Creating a Repository object

The first step in any action that involves Sesame repositories is to create a Repository for it.

The central interface of the repository API is the Repository interface. There are several implementations available of this interface. The three main ones are:

org.openrdf.repository.sail.SailRepository is a Repository that operates directly on top of a Sail. This is the class most commonly used when accessing/creating a local Sesame repository. SailRepository operates on a (stack of) Sail object(s) for storage and retrieval of RDF data. An important thing to remember is that the behaviour of a repository is determined by the Sail(s) that it operates on; for example, the repository will only support RDF Schema or OWL semantics if the Sail stack includes an inferencer for this.
org.openrdf.repository.http.HTTPRepository is, as the name implies, a Repository implementation that acts as a proxy to a Sesame repository available on a remote Sesame server, accessible through HTTP.
org.openrdf.repository.sparql.SPARQLRepository is a Repository implementation that acts as a proxy to any remote SPARQL endpoint (whether that endpoint is implemented using Sesame or not).

In the following section, we will first take a look at the use of the SailRepository class in order to create and use a local Sesame repository.

8.2.1.1. Creating a main memory RDF Repository

One of the simplest configurations is a repository that just stores RDF data in main memory without applying any inferencing. This is also by far the fastest type of repository that can be used. The following code creates and initializes a non-inferencing main-memory repository:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.memory.MemoryStore;

...

Repository repo = new SailRepository(new MemoryStore());
repo.initialize();

The constructor of the SailRepository class accepts any object of type Sail, so we simply pass it a new main-memory store object (which is, of course, a Sail implementation). Following this, the repository needs to be initialized to prepare the Sail(s) that it operates on.

The repository that is created by the above code is volatile: its contents are lost when the object is garbage collected or when your Java program is shut down. This is fine for cases where, for example, the repository is used as a means for manipulating an RDF model in memory.

Different types of Sail objects take parameters in their constructor that change their behaviour. The MemoryStore for example takes a data directory parameter that specifies a data directory for persisent storage. If specified, the MemoryStore will write its contents to this directory so that it can restore it when it is re-initialized in a future session:

File dataDir = new File("C:\\temp\\myRepository\\");
Repository repo = new SailRepository( new MemoryStore(dataDir) );
repo.initialize();

As you can see, we can fine-tune the configuration of our repository by passing parameters to the constructor of the Sail object. Some Sail types may offer additional configuration methods, all of which need to be called before the repository is initialized. The MemoryStore currently has one such method: setSyncDelay(long), which can be used to control the strategy that is used for writing to the data file, e.g.:

File dataDir = new File("C:\\temp\\myRepository\\");
MemoryStore memStore = new MemoryStore(dataDir);
memStore.setSyncDelay(1000L);

Repository repo = new SailRepository(memStore);
repo.initialize();

8.2.1.2. Creating a Native RDF Repository

A Native RDF Repository does not keep its data in main memory, but instead stores it directly to disk (in a binary format optimized for compact storage and fast retrieval). It is an efficient, scalable and fast solution for RDF storage of datasets that are too large to keep entirely in memory.

The code for creation of a Native RDF repository is almost identical to that of a main memory repository:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.nativerdf.NativeStore;


...
File dataDir = new File("/path/to/datadir/");
Repository repo = new SailRepository(new NativeStore(dataDir));
repo.initialize();

By default, the Native store creates a set of two indexes (see Section 6.6.2, “Native store configuration”). To configure which indexes it should create, we can either use the NativeStore.setTripleIndexes(String) method, or we can directly supply a index configuration string to the constructor:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.nativerdf.NativeStore;


...
File dataDir = new File("/path/to/datadir/");
String indexes = "spoc,posc,cosp";
Repository repo = new SailRepository(new NativeStore(dataDir, indexes));
repo.initialize();

8.2.1.3. Creating a repository with RDF Schema inferencing

As we have seen, we can create Repository objects for any kind of back-end store by passing them a reference to the appropriate Sail object. We can pass any stack of Sails this way, allowing all kinds of repository configurations to be created quite easily. For example, to stack an RDF Schema inferencer on top of a memory store, we simply create a repository like so:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.memory.MemoryStore;
import org.openrdf.sail.inferencer.fc.ForwardChainingRDFSInferencer;

...

Repository repo = new SailRepository(
                          new ForwardChainingRDFSInferencer(
                          new MemoryStore()));
repo.initialize();

Each layer in the Sail stack is created by a constructor that takes the underlying Sail as a parameter. Finally, we create the SailRepository object as a functional wrapper around the Sail stack.

The ForwardChainingRDFSInferencer that is used in this example is a generic RDF Schema inferencer; it can be used on top of any Sail that supports the methods it requires. Both MemoryStore and NativeStore support these methods. However, a word of warning: the Sesame inferencers add a significant performance overhead when adding and removing data to a repository, an overhead that gets progressively worse as the total size of the repository increases. For small to medium-sized datasets it peforms fine, but for larger datasets you are advised not to use it and to switch to alternatives.

8.2.1.4. Accessing a remote repository

Working with remote repositories is just as easy as working with local ones. We can simply use a different Repository object, the HTTPRepository, instead of the SailRepository class.

A requirement is of course that there is a Sesame 2 server running on some remote system, which is accessible over HTTP. For example, suppose that at http://example.org/openrdf-sesame/ a Sesame server is running, which has a repository with the identification 'example-db'. We can access this repository in our code as follows:

import org.openrdf.repository.Repository;
import org.openrdf.repository.http.HTTPRepository;

...

String sesameServer = "http://example.org/openrdf-sesame/";
String repositoryID = "example-db";

Repository repo = new HTTPRepository(sesameServer, repositoryID);
repo.initialize();

8.2.2. Using a repository: RepositoryConnections

Now that we have created a Repository, we want to do something with it. In Sesame 2, this is achieved through the use of RepositoryConnection objects, which can be created by the Repository.

A RepositoryConnection represents - as the name suggests - a connection to the actual store. We can issue operations over this connection, and close it when we are done to make sure we are not keeping resources unnnecessarily occupied.

In the following sections, we will show some examples of basic operations.

8.2.2.1. Adding RDF to a repository

The Repository API offers various methods for adding data to a repository. Data can be added by specifying the location of a file that contains RDF data, and statements can be added individually or in collections.

We perform operations on a repository by requesting a RepositoryConnection from the repository. On this RepositoryConnection object we can perform various operations, such as query evaluation, getting, adding, or removing statements, etc.

The following example code adds two files, one local and one available through HTTP, to a repository:

import org.openrdf.OpenRDFException;
import org.openrdf.repository.Repository;
import org.openrdf.repository.RepositoryConnection;
import org.openrdf.rio.RDFFormat;
import java.io.File;
import java.net.URL;

...

File file = new File("/path/to/example.rdf");
String baseURI = "http://example.org/example/local";

try {
   RepositoryConnection con = repo.getConnection();
   try {
      con.add(file, baseURI, RDFFormat.RDFXML);

      URL url = new URL("http://example.org/example/remote.rdf");
      con.add(url, url.toString(), RDFFormat.RDFXML);
   }
   finally {
      con.close();
   }
}
catch (OpenRDFException e) {
   // handle exception
}
catch (java.io.IOEXception e) {
   // handle io exception
}

More information on other available methods can be found in the javadoc reference of the RepositoryConnection interface.

8.2.2.2. Querying a repository

The Repository API has a number of methods for creating and evaluating queries. Three types of queries are distinguished: tuple queries, graph queries and boolean queries. The query types differ in the type of results that they produce.

The result of a tuple query is a set of tuples (or variable bindings), where each tuple represents a solution of a query. This type of query is commonly used to get specific values (URIs, blank nodes, literals) from the stored RDF data. SPARQL SELECT queries are tuple queries.

The result of graph queries is an RDF graph (or set of statements). This type of query is very useful for extracting sub-graphs from the stored RDF data, which can then be queried further, serialized to an RDF document, etc. SPARQL CONSTRUCT and DESCRIBE queries are graph queries.

The result of boolean queries is a simple boolean value, i.e. true or false. This type of query can be used to check if a repository contains specific information. SPARQL ASK queries are boolean queries.

Note: Sesame 2 currently supports two query languages: SeRQL and SPARQL. The former is explained in Chapter 11, The SeRQL query language (revision 3.1), the specification for the latter is available online. In this chapter, we will use SPARQL queries in our examples.

8.2.2.2.1. Evaluating a tuple query

To evaluate a tuple query we simply do the following:

import java.util.List;
import org.openrdf.OpenRDFException;
import org.openrdf.repository.RepositoryConnection;
import org.openrdf.query.TupleQuery;
import org.openrdf.query.TupleQueryResult;
import org.openrdf.query.BindingSet;
import org.openrdf.query.QueryLanguage;

...

try {
   RepositoryConnection con = repo.getConnection();
   try {
	  String queryString = "SELECT ?x ?y WHERE { ?x ?p ?y } ";
	  TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, queryString);

	  TupleQueryResult result = tupleQuery.evaluate();
	  try {
			BindingSet bindingSet = result.next();
			Value valueOfX = bindingSet.getValue("x");
			Value valueOfY = bindingSet.getValue("y");

			// do something interesting with the values here...
	  }
	  finaly {
	      result.close();
	  }
   }
   finally {
      con.close();
   }
}
catch (OpenRDFException e) {
   // handle exception
}

This evaluates a SPARQL query and returns a TupleQueryResult, which consists of a sequence of BindingSet objects. Each BindingSet contains a set of Binding objects. A binding is pair relating a name (as used in the query's SELECT clause) with a value.

As you can see, we use the TupleQueryResult to iterate over all results and get each individual result for x and y. We retrieve values by name rather than by an index. The names used should be the names of variables as specified in your query (note that we leave out the '?' or '$' prefixes used in SPARQL). The TupleQueryResult.getBindingNames() method returns a list of binding names, in the order in which they were specified in the query. To process the bindings in each binding set in the order specified by the projection, you can do the following:

List<String> bindingNames = result.getBindingNames();
while (result.hasNext()) {
   BindingSet bindingSet = result.next();
   Value firstValue = bindingSet.getValue(bindingNames.get(0));
   Value secondValue = bindingSet.getValue(bindingNames.get(1));

   // do something interesting with the values here...
}

Finally, it is important to invoke the close() operation on the TupleQueryResult, after we are done with it. A TupleQueryResult evaluates lazily and keeps resources (such as connections to the underlying database) open. Closing the TupleQueryResult frees up these resources. Do not forget that iterating over a result may cause exceptions! The best way to make sure no connections are kept open unnecessarily is to invoke close() in the finally clause.

An alternative to producing a TupleQueryResult is to supply an object that implements the TupleQueryResultHandler interface to the query's evaluate() method. The main difference is that when using a return object, the caller has control over when the next answer is retrieved, whereas with the use of a handler, the connection simply pushes answers to the handler object as soon as it has them available.

As an example we will use SPARQLResultsXMLWriter, which is a TupleQueryResultHandler implementation that writes SPARQL Results XML documents to an output stream or to a writer:

import org.openrdf.query.resultio.sparqlxml.SPARQLResultsXMLWriter;

...

FileOutputStream out = new FileOutputStream("/path/to/result.srx");
try {
   SPARQLResultsXMLWriter sparqlWriter = new SPARQLResultsXMLWriter(out);

   RepositoryConnection con = myRepository.getConnection();
   try {
      String queryString = "SELECT * FROM {x} p {y}";
      TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SERQL, queryString);
      tupleQuery.evaluate(sparqlWriter);
   }
   finally {
      con.close();
   }
}
finally {
   out.close();
}

You can just as easily supply your own application-specific implementation of TupleQueryResultHandler though.

Lastly, an important warning: as soon as you are done with the RepositoryConnection object, you should close it. Notice that during processing of the TupleQueryResult object (for example, when iterating over its contents), the RepositoryConnection should still be open. We can invoke con.close() after we have finished with the result.

8.2.2.2.2. Evaluating a graph query

The following code evaluates a graph query on a repository:

import org.openrdf.query.GraphQueryResult;

GraphQueryResult graphResult = con.prepareGraphQuery(
QueryLanguage.SPARQL, "CONSTRUCT { ?s ?p ?o } WHERE {?s ?p ?o }".evaluate();

A GraphQueryResult is similar to TupleQueryResult in that is an object that iterates over the query results. However, for graph queries the query results are RDF statements, so a GraphQueryResult iterates over Statement objects:

while (graphResult.hasNext()) {
   Statement st = graphResult.next();
   // ... do something with the resulting statement here.
}

The TupleQueryResultHandler equivalent for graph queries is org.openrdf.rio.RDFHandler. Again, this is a generic interface, each object implementing it can process the reported RDF statements in any way it wants.

All Rio writers (such as the RDFXMLWriter, TurtleWriter, TriXWriter, etc.) implement the RDFHandler interface. This allows them to be used in combination with querying quite easily. In the following example, we use a TurtleWriter to write the result of a SPARQL graph query to standard output in Turtle format:

import org.openrdf.rio.Rio;
import org.openrdf.rio.RDFFormat;
import org.openrdf.rio.RDFWriter;

...

RepositoryConnection con = repo.getConnection();
try {
   RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, System.out);

	con.prepareGraphQuery(QueryLanguage.SPARQL, 
	     "CONSTRUCT {?s ?p ?o } WHERE {?s ?p ?o } ").evaluate(writer);
}
finally {
   con.close();
}

Again, note that as soon as we are done with the result of the query (either after iterating over the contents of the GraphQueryResult or after invoking the RDFHandler), we invoke con.close() to close the connection and free resources.

8.2.2.2.3. Preparing and Reusing Queries

In the previous sections we have simply created a query from a string and immediately evaluated it. However, the prepareTupleQuery and prepareGraphQuery methods return objects of type Query, specifically TupleQuery and GraphQuery.

A Query object, once created, can be (re)used. For example, we can evaluate a Query object, then add some data to our repository, and evaluate the same query again.

The Query object also has a setBinding method, which can be used to specify specific values for query variables. As a simple example, suppose we have a repository containing names and e-mail addresses of people, and we want to do a query for each person, retrieve his/her e-mail address, for example, but we want to do a separate query for each person. This can be achieved using the setBinding functionality, as follows:

RepositoryConnection con = repo.getConnection();
try {
   // First, prepare a query that retrieves all names of persons
   TupleQuery nameQuery = con.prepareTupleQuery(QueryLanguage.SPARQL,
         "SELECT ?name WHERE { ?person ex:name ?name . }");

   // Then, prepare another query that retrieves all e-mail addresses of persons:
   TupleQuery mailQuery = con.prepareTupleQuery(QueryLanguage.SPARQL,
         "SELECT ?mail WHERE { ?person ex:mail ?mail ; ex:name ?name . }");

   // Evaluate the first query to get all names
   TupleQueryResult nameResult = nameQuery.evaluate();
   try {
      // Loop over all names, and retrieve the corresponding e-mail address.
      while (nameResult.hasNext()) {
         BindingSet bindingSet = nameResult.next();
         Value name = bindingSet.get("name");

         // Retrieve the matching mailbox, by setting the binding for
			// the variable 'name' to the retrieved value. Note that we 
			// can set the same binding name again for each iteration, it will 
			// overwrite the previous setting.
         mailQuery.setBinding("name", name);

         TupleQueryResult mailResult = mailQuery.evaluate();

         // mailResult now contains the e-mail addresses for one particular person
         try {
            ....
         }
         finally {
            // after we are done, close the result
            mailResult.close();
         }
      }
   }
   finally {
      nameResult.close();
   }
}
finally {
   con.close();
}

The values with which you perform the setBinding operation of course do not necessarily have to come from a previous query result (as they do in the above example). Using a ValueFactory you can create your own value objects. You can use this functionality to, for example, query for a particular keyword that is given by user input:

ValueFactory factory = myRepository.getValueFactory();

// In this example, we specify the keyword string. Of course, this
// could just as easily be obtained by user input, or by reading from
// a file, or...
String keyword = "foobar";

// We prepare a query that retrieves all documents for a keyword.
// Notice that in this query the 'keyword' variable is not bound to
// any specific value yet.
TupleQuery keywordQuery = con.prepareTupleQuery(QueryLanguage.SPARQL,
      "SELECT ?document WHERE { ?document ex:keyword ?keyword . }");

// Then we set the binding to a literal representation of our keyword.
// Evaluation of the query object will now effectively be the same as
// if we had specified the query as follows:
//   SELECT ?document WHERE { ?document ex:keyword "foobar". }
keywordQuery.setBinding("keyword", factory.createLiteral(keyword));

// We then evaluate the prepared query and can process the result:
TupleQueryResult keywordQueryResult = keywordQuery.evaluate();

8.2.2.3. Creating, retrieving, removing individual statements

The RepositoryConnection can also be used for adding, retrieving, removing or otherwise manipulating individual statements, or sets of statements.

To be able to add new statements, we can use a ValueFactory to create the Values out of which the statements consist. For example, we want to add a few statements about two resources, Alice and Bob:

import org.openrdf.model.vocabulary.RDF;
import org.openrdf.model.vocabulary.RDFS;
...

ValueFactory f = myRepository.getValueFactory();

// create some resources and literals to make statements out of
URI alice = f.createURI("http://example.org/people/alice");
URI bob = f.createURI("http://example.org/people/bob");
URI name = f.createURI("http://example.org/ontology/name");
URI person = f.createURI("http://example.org/ontology/Person");
Literal bobsName = f.createLiteral("Bob");
Literal alicesName = f.createLiteral("Alice");

try {
   RepositoryConnection con = myRepository.getConnection();

   try {
      // alice is a person
      con.add(alice, RDF.TYPE, person);
      // alice's name is "Alice"
      con.add(alice, name, alicesName);

      // bob is a person
      con.add(bob, RDF.TYPE, person);
      // bob's name is "Bob"
      con.add(bob, name, bobsName);
   }
   finally {
      con.close();
   }
}
catch (OpenRDFException e) {
   // handle exception
}

Of course, it will not always be necessary to use a ValueFactory to create URIs. In practice, you will find that you quite often retrieve existing URIs from the repository (for example, by evaluating a query) and then use those values to add new statements. Or indeed, as we have seen in Section 8.1, “The RDF Model API”, for several well-knowns vocabularies we can simply reuse the predefined constants found in the org.openrdf.model.vocabulary package.

Retrieving statements works in a very similar way. One way of retrieving statements we have already seen actually: we can get a GraphQueryResult containing statements by evaluating a graph query. However, we can also use direct method calls to retrieve (sets of) statements. For example, to retrieve all statements about Alice, we could do:

RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true);

The additional boolean parameter at the end (set to 'true' in this example) indicates whether inferred triples should be included in the result. Of course, this parameter only makes a difference if your repository uses an inferencer.

The RepositoryResult is an iterator-like object that lazily retrieves each matching statement from the repository when its next() method is called. Note that, like is the case with QueryResult objects, iterating over a RepositoryResult may result in exceptions which you should catch to make sure that the RepositoryResult is always properly closed after use:

RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true);

try {
   while (statements.hasNext()) {
      Statement st = statements.next();

      ... // do something with the statement
   }
}
finally {
   statements.close(); // make sure the result object is closed properly
}

In the above method invocation, we see four parameters being passed. The first three represent the subject, predicate and object of the RDF statements which should be retrieved. A null value indicates a wildcard, so the above method call retrieves all statements which have as their subject Alice, and have any kind of predicate and object. The fourth parameter indicates whether or not inferred statements should be included or not.

Removing statements again works in a very similar fashion. Suppose we want to retract the statement that the name of Alice is "Alice"):

con.remove(alice, name, alicesName);

Or, if we want to erase all statements about Alice completely, we can do:

con.remove(alice, null, null);

8.2.3. Working with Graphs, Collections and Iterations

Most of these examples have been on the level of individual statements. However, the Repository API offers several methods that work with Collections of statements, allowing more batch-like update operations.

For example, in the following bit of code, we first retrieve all statements about Alice, put them in a org.openrdf.model.Graph (which is an implementation of java.util.Collection) and then remove them:

import info.aduna.iteration.Iterations;
import org.openrdf.model.Graph;
import org.openrdf.model.impl.GraphImpl;

// Retrieve all statements about Alice and put them in a Graph
RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true));
Graph aboutAlice = Iterations.addAll(statements, new GraphImpl());

// Then, remove them from the repository
con.remove(aboutAlice);

As you can see, the info.aduna.iteration.Iterations class provides a convenient method that takes an Iteration (of which RepositoryResult is a subclass) and a Collection (of which GraphImpl is a subclass) as input, and returns the Collection with the contents of the iterator added to it. It also automatically closes the Iteration for you.

In the above code, you first retrieve all statements, put them in a list, and then remove them. Although this works fine, it can be done in an easier fashion, by simply supplying the resulting object directly:

con.remove(con.getStatements(alice, null, null, true));

The RepositoryConnection interface has several variations of add, retrieve and remove operations. See the RepositoryConnection Javadoc for a full overview of the options.

8.2.3.1. More on Graphs and GraphUtil

In the above example, we used a Graph (see Graph Javadoc) as a collection class for statements. The Graph class has several advantages over using just any old Collection class. First of all, it provides a match method that can be used to retrieve statements from the collection that match a specific subject, predicate and object. In addition, you can use GraphUtil to easily retrieve specific parts of information from the statement collection.

For example, imagine we have a Graph containing names and e-mails for several persons (using the FOAF vocabulary). To easily retrieve each e-mail address for each person, we can do something like this:

import org.openrdf.model.Graph;
import org.openrdf.model.Literal;
import org.openrdf.model.Resource;
import org.openrdf.model.util.GraphUtil;
import org.openrdf.model.vocabulary.RDF;
import org.openrdf.model.vocabulary.FOAF;

Graph graph = ... ; // we initialized our graph before (for example by doing a query on our repository)

for (Resource subject: GraphUtil.getSubjects(graph, RDF.TYPE, FOAF.PERSON)) {
   Literal nameOfSubject = GraphUtil.getUniqueObjectLiteral(graph, subject, FOAF.NAME); 
   Literal mboxOfSubject = GraphUtil.getUniqueObjectLiteral(graph, subject, FOAF.MBOX); 
}

8.2.4. Using context

Sesame 2 supports the notion of context, which you can think of as a way to group sets of statements together through a single group identifier (this identifier can be a blank node or a URI).

A very typical way to use context is tracking provenance of the statements in a repository, that is, which file these statements originate from. For example, consider an application where you add RDF data from different files to a repository, and then one of those files is updated. You would then like to replace the data from that single file in the repository, and to be able to do this you need a way to figure out which statements need to be removed. The context mechanism gives you a way to do that.

In the following example, we add an RDF document from the Web to our repository, in a context. In the example, we make the context identifier equal to the Web location of the file being uploaded.

String location = "http://example.org/example/example.rdf";
String baseURI = location;
URL url = new URL(location);
URI context = f.createURI(location);

con.add(url, baseURI, RDFFormat.RDFXML, context);

We can now use the context mechanism to specifically address these statements in the repository for retrieve and remove operations:

// Get all statements in the context
RepositoryResult<Statement> result = con.getStatements(null, null, null, true, context);

try {
   while (result.hasNext()) {
      Statement st = result.next();
      ... // do something interesting with the result
   }
}
finally {
   result.close();
}

// Export all statements in the context to System.out, in RDF/XML format
RDFHandler rdfxmlWriter = new RDFXMLWriter(System.out);
con.export(context, rdfxmlWriter);

// Remove all statements in the context from the repository
con.clear(context);

In most methods in the Repository API, the context parameter is a vararg, meaning that you can specify an arbitrary number (zero, one, or more) of context identifiers. This way, you can combine different contexts together. For example, we can very easily retrieve statements that appear in either 'context1' or 'context2'.

In the following example we add information about Bob and Alice again, but this time each has their own context. We also create a new property called 'creator' that has as its value the name of the person who is the creator a particular context. The knowledge about creators of contexts we do not add to any particular context, however:

URI context1 = f.createURI("http://example.org/context1");
URI context2 = f.createURI("http://example.org/context2");
URI creator = f.createURI("http://example.org/ontology/creator");

// Add stuff about Alice to context1
con.add(alice, RDF.TYPE, person, context1);
con.add(alice, name, alicesName, context1);

// Alice is the creator of context1
con.add(context1, creator, alicesName);

// Add stuff about Bob to context2
con.add(bob, RDF.TYPE, person, context2);
con.add(bob, name, bobsName, context2);

// Bob is the creator of context2
con.add(context2, creator, bobsName);

Once we have this information in our repository, we can retrieve all statements about either Alice or Bob by using the context vararg:

// Get all statements in either context1 or context2
RepositoryResult<Statement> result =
      con.getStatements(null, null, null, true, context1, context2);

You should observe that the above RepositoryResult will not contain the information that context1 was created by Alice and context2 by Bob. This is because those statements were added without any context, thus they do not appear in context1 or context2, themselves.

To explicitly retrieve statements that do not have an associated context, we do the following:

// Get all statements that do not have an associated context
RepositoryResult<Statement> result =
      con.getStatements(null, null, null, true, (Resource)null);

This will give us only the statements about the creators of the contexts, because those are the only statements that do not have an associated context. Note that we have to explicitly cast the null argument to Resource, because otherwise it is ambiguous whether we are specifying a single value or an entire array that is null (a vararg is internally treated as an array). Simply invoking getStatements(s, p, o, true, null) without an explicit cast will result in an IllegalArgumentException.

We can also get everything that either has no context or is in context1:

// Get all statements that do not have an associated context, or that are in context1
RepositoryResult<Statement> result =
      con.getStatements(null, null, null, true, (Resource)null, context1);

So as you can see, you can freely combine contexts in this fashion.

Important:

getStatements(null, null, null, true);

is not the same as:

getStatements(null, null, null, true, (Resource)null);

The former (without any context id parameter) retrieves all statements in the repository, ignoring any context information. The latter, however, only retrieves statements that explicitly do not have any associated context.

8.2.5. Transactions

So far, we have shown individual operations on repositories: adding statements, removing them, etc. By default, each operation on a RepositoryConnection is immediately sent to the store and committed.

The RepositoryConnection interface supports a full transactional mechanism that allows one to group modification operations together and treat them as a single update: before the transaction is committed, none of the operations in the transaction has taken effect, and after, they all take effect. If something goes wrong at any point during a transaction, it can be rolled back so that the state of the repository is the same as before the transaction started. Bundling update operations in a single transaction often also improves update performance compared to multiple smaller transactions.

We can indicate that we want to begin a transaction by using the RepositoryConnection.begin() method. In the following example, we use a connection to bundle two file addition operations in a single transaction:

File inputFile1 = new File("/path/to/example1.rdf");
String baseURI1 = "http://example.org/example1/";

File inputFile2 = new File("/path/to/example2.rdf");
String baseURI2 = "http://example.org/example2/";

RepositoryConnection con = myRepository.getConnection();
try {
   con.begin();

   // Add the first file
   con.add(inputFile1, baseURI1, RDFFormat.RDFXML);

   // Add the second file
   con.add(inputFile2, baseURI2, RDFFormat.RDFXML);

   // If everything went as planned, we can commit the result
   con.commit();
}
catch (RepositoryException e) {
   // Something went wrong during the transaction, so we roll it back
   con.rollback();
}
finally {
   // Whatever happens, we want to close the connection when we are done.
   con.close();
}

In the above example, we use a transaction to add two files to the repository. Only if both files can be successfully added will the repository change. If one of the files can not be added (for example because it can not be read), then the entire transaction is cancelled and none of the files is added to the repository.

It's important to note that a RepositoryConnection only supports one active transaction at a time. You can check at any time whether a transaction is active on your connection by using the isActive() method. If you find you need to cater for concurrent transactions, you will need to use separate RepositoryConnections.

Prev		Next
Chapter 7. Application directory configuration	Home	Chapter 9. Parsing/Writing RDF with Rio