Progress has been made with linked-data and other Semantic Web technologies over the past few years so it’s a great time to revisit how we may work with linked-data using R. A few years ago (March 13, 2014 post “R and RDF: Where Statistics and the Semantic Web Meet”) discussion was around the R package rrdf, which is no longer actively supported. Today SPARQL is the package to use. SPARQL works in very much the same way as rrdf.
Install and load the SPARQL package, if you haven’t done so already:
Enter a SPARQL endpoint variable. The SPARQL endpoint is the semantic search entry point for a particular data repository. In this case let’s use the WikiPathways endpoint that provides data on biological pathways (see my June 27, 2017 post “WikiPathways: Open Biological Pathways Data on the Semantic Web”).
endpoint <- 'http://sparql.wikipathways.org'
Next assign a SPARQL query to a query variable. Be sure to surround the query itself with quotes.
query <- 'PREFIX wp:
SELECT DISTINCT str(?title) as ?pathway
?pw dc:title ?title ;
wp:organism ?organism ;
wp:organismName "Homo sapiens"^^xsd:string .
ORDER BY ?pathway'
Use the SPARQL() function to carry out the query.
data <- SPARQL(endpoint, query)
The results, the names for all the repository's pathways for humans, are now in the R environment. (Enter 'data' to see the dataset.)
The R SPARQL package is a great tool for those who want to pull data from across the Semantic Web and use R to analyze and visualize the results.