Many of the most exciting developments in information based technologies today incorporate Semantic Web technologies. These include IBM’s Watson, Apple’s Siri (originally created and developed at SRI), and Google’s Search Engine. It’s not surprising, but perhaps unfortunate, to read in Egon Willighagen’s preprint “Accessing biological data in R with semantic web technologies” that “most new databases do not yet use semantic web technologies.”
Note: R is a freely available and open source tool especially useful for interactive data analytics and visualization.
The happy news is that Egon Willighagen has once again contributed to the community tool chest. While his article is targeted at a life sciences audience, the tool (a set of R packages) he has written, known as rrdf, is useful to anyone who wants to import RDF data into R.
Want to start working with triples from within your R environment?Install the rrdf package:
install.packages(“rrdf”)
Installing the rrdf package also installs two dependencies: rJava and rrdflibs. The rrdf package provides RDF and SPARQL functionality through Apache Jena. The rrdflibs package contains the Apache Jena libraries which are written in Java. The rJava package provides an interface to Java so that Apache Jena may run. The rrdf package itself contains the R functions that wrap Jena functionality and convert data into the appropriate structures where needed.
Load the rrdf package into your R environment:
library(rrdf)
Now you’re able to query your favorite triple store and pull triples into your R environment. Here we will query Live DBpedia for a list of 40 programming languages. First provide the URL for the SPARQL endpoint:
endpoint <- "http://dbpedia.org/sparql"
Next provide the SPARQL query itself:
query <- "SELECT DISTINCT ?language WHERE { ?s ?o . ?o ?language } LIMIT 40"
Finally, carry out the query using the sparql.remote() function and assign the results to a variable:
data <- sparql.remote(endpoint, query)
You’ve imported your first set of triple data into R!