Posts Tagged ‘rdf’

h1

Cheap Linked Data identifiers

December 26, 2009

This is a (short) technical post.

Everyday, I face the problem of getting some Linked Data URIs that uniquely identify a “thing” starting from an ambiguous, poor and flat keyword or description. One of the first step dealing with the development of application that consumes Linked Data is to provide a mechanism that allows to link our own data sets to one (or more) LoD bubble. To gain a clear idea on why identifiers matters I suggest you to read this note from Dan Brickley: starting from some needs we encountered within the NoTube project he clearly underlined the importance of LoD identifiers. Even if the problem of uniquely identifying words and terms falls in the biggest category usually known as term disambiguation, I’d like to clarify here, that what I’m going to explain is a narrow restriction of the whole problem.

What I really need is a simple mechanism that allows me to convert one specific type of identifiers to a set of Linked Data URIs.

For example, I need something that given a book ISBN number it returns me a set of URIs that are referring to that book. Or, given the title of a movie I expect back some URIs (from DBpedia or LinkedMDB or whatever) identifying and describing it in a unique way.

Isn’t SPARQL enough for you to do that?

Yes, obviously the following SPARQL query may be sufficient:

but what I need is something quicker that I may invoke as an HTTP GET like:

http://localhost:8080/resolver?value=978-0-374-16527-7&category=isbn

returning back to me a simple JSON:

{ "mappings": [
"http://dbpedia.org/resource/Gomorrah_%28book%29"],
"status": "ok"
}

But the real issue here is the code overhead necessary if you want to add other kind of identifiers resolution. Let’s imagine, for instance, that I already implemented this kind of service and I want to add another resolution category. What I should do is to hard code another SPARQL query, modify the code allowing to invoke it as a service and redeploy it.

I’m sure we could do better.

If we give a closer look at the above SPARQL query, we easily figure out that the problem could be highly generalized. In fact, often resolving such kind of resolution means perform a SPARQL query asking for URIs that have a certain value for a certain property. As dbprop:isbn for the ISBN case.

And this is what I did the last two days: The NoTube Identity Resolver.

A simple Web service (described in the figure below) fully customizable by simply editing an XML configuration file.

NoTube Identity Resolver architecture

The resolvers.xml file allows you to provide a simple description of the resolution policy that will be accessible with a simple HTTP GET call.

Back to the ISBN example, the following piece of XML is enough to describe the resolver:

<resolver id=”2″ type=”normal”>
<category>isbn</category>
<endpoint>http://dbpedia.org/sparql</endpoint&gt;
<lookup>dbpedia-owl:isbn</lookup>
<sameas>true</sameas>
<matching>LITERAL</matching>
</resolver>

Where:

  • category is the value that have to be passed as parameter in the HTTP GET call to invoke this resolver
  • endpoint is the address of a SPARQL Endpoint where make the resolution
  • lookup is the name of the property intended to be
  • type (optional) the rdf:type of the resources to be resolved
  • sameas boolean value enabling or not the calling of the SameAs.org service to gain equivalent URIs
  • matching (allowing only URI and LITERAL as value) this element describes the type of the value to be resolved.

Moreover, the NoTube Identity Resolver gives you also the possibility to specify more complex resolution policies through a SPARQL query as shown below:

<resolver id="3" type="custom">
<category>movie</category>
<endpoint>http://dbpedia.org/sparql</endpoint&gt;
<sparql><![CDATA[SELECT DISTINCT ?subject
WHERE { ?subject a <http://dbpedia.org/ontology/Film&gt;.
?subject <http://dbpedia.org/property/title&gt; ?title.
FILTER (regex(?title, "#VALUE#")) }]]>
</sparql>
<sameas>true</sameas>
</resolver>

In other words, every resolver described in the resolvers.xml file allows you to enable one kind of resolution mechanism without writing a line af Java code.

Do you want to try?

Just download the war package, get this resolvers.xml (or write your own), export the RESOLVERS_XML_LOCATION environment variable pointing to the folder where the resolvers.xml is located, deploy the war on your Apache Tomcat application server, start the application and try it out heading your browser to:

http://localhost:8080/notube-identity-resolver/resolver?value=978-0-374-16527-7&category=isbn

That’s all folks