Update on R and Semantic Web Technologies

Progress has been made with linked-data and other Semantic Web technologies over the past few years so it’s a great time to revisit how we may work with linked-data using R. A few years ago (March 13, 2014 post “R and RDF: Where Statistics and the Semantic Web Meet”) discussion was around the R package rrdf, which is no longer actively supported. Today SPARQL is the package to use. SPARQL works in very much the same way as rrdf.

Install and load the SPARQL package, if you haven’t done so already:


Enter a SPARQL endpoint variable. The SPARQL endpoint is the semantic search entry point for a particular data repository. In this case let’s use the WikiPathways endpoint that provides data on biological pathways (see my June 27, 2017 post “WikiPathways: Open Biological Pathways Data on the Semantic Web”).

endpoint <- 'http://sparql.wikipathways.org'

Next assign a SPARQL query to a query variable. Be sure to surround the query itself with quotes.

query <- 'PREFIX wp:
SELECT DISTINCT str(?title) as ?pathway
?pw dc:title ?title ;
wp:organism ?organism ;
wp:organismName "Homo sapiens"^^xsd:string .
ORDER BY ?pathway'

Use the SPARQL() function to carry out the query.

data <- SPARQL(endpoint, query)

The results, the names for all the repository's pathways for humans, are now in the R environment. (Enter 'data' to see the dataset.)

The R SPARQL package is a great tool for those who want to pull data from across the Semantic Web and use R to analyze and visualize the results.

WikiPathways: Open Biological Pathways Data on the Semantic Web

The above PathwayWidget provides interactive visualizations of pathways pulled from WikiPathways in real-time. Here we see human pathways active in Alzheimer’s Disease. At right are lists of mitochondrial RNA differentially expressed in Alzheimer’s Disease.

Genes, proteins, and small molecules interact in our bodies through biological pathways to carry out the moment-to-moment work that supports life. WikiPathways is an open, collaborative platform for depositing data gathered through research on biological pathways in all lifeforms including humans.

WikiPathways currently contains over 2400 pathways from over 25 different species. Created by research groups that need a platform to support high-throughput data analysis and visualization, WikiPathways provides a treasure trove of linked-data for data scientists and programmers.

A SPARQL query endpoint is available at the WikiPathways site. For example, you may get an alphabetical list of human pathways from the repository by using the following query.

PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
SELECT DISTINCT str(?title) as ?pathway
   ?pw dc:title ?title ;
      wp:organism ?organism ;
      wp:organismName "Homo sapiens"^^xsd:string .
ORDER BY ?pathway

WikiPathways is a site that offers many resources including linked-data and Semantic Web tools and has adopted the Creative Commons CC0 waiver. It’s well worth exploring.

The Science Behind AlphaGo: Games of Perfect Information

Full Depth of One Branch in Tic-Tac-Toe

Figure 1. A diagram of possible tic-tac-toe game states (nodes) from first moves to last possible moves. Diagram breadth has been severely limited so that we may follow a couple pipelines (sequences of moves) to their end states (leaf nodes).

A great place to begin contemplating the science behind a computer program’s mastery of Go is at the beginning of the recent paper “Mastering the game of Go with deep neural networks and tree search” (published January 28, 2016 in Nature).

The first sentence begins “All games of perfect information…”

When a player is about to make a move in games of perfect information, the player can know all of the previous events that occurred since the start of the game. The AlphaGo paper cited above moves quickly from this brief reference to more complex ideas. It is worth our effort to contemplate the the idea of games of perfect information more thoroughly.

A paper published five years ago “Monte-Carlo tree search and rapid action value estimation in computer Go” (published July 2011 in Artificial Intelligence) explores the idea more thoroughly while considering two-player, perfect-information, zero-sum games like tic-tac-toe and checkers.

Let’s contemplate tic-tac-toe. What could be easier?

The first of two players sets an ‘X’ (cross) in one square in a three-by-three grid. The first player has 9 possible squares to select from (see Figure 2 below).

Tic-Tac-Toe First Move

Figure 2. There are 9 and only 9 possible first moves in tic-tac-toe.

The second player then makes her first move by setting an ‘O’ (nought) in 1 of the 8 remaining squares in the three-by-three grid (see Figure 3 below).

Tic-Tac-Toe Second Move

Figure 3. There are 8 and only 8 possible first moves for the second player in tic-tac-toe. However, there are 9 different configurations of 8 possible moves depending on where the first player places an ‘X’ (cross). This diagram shows only when the first player placed a cross in the top-left-most square. Eight more tree branches may be displayed to show all possible second player moves depending on which of 9 moves the first player chose as their first move.

Each player continues to play alternate moves until there are 3 crosses in a row (first player wins) or 3 noughts in a row (second player wins) or all 9 squares are filled (game is a draw; no one wins).

It is possible to draw a diagram containing every possible move and every possible sequences of moves that may be played in a game by creating an upside-down tree (inverted tree) that starts with the 9 possible moves shown in Figure 2. Each of the 9 possible moves, each representing a node in our inverted tree diagram, would be connected to the 8 possible moves (8 nodes) by the second player so that by the end of our two players’ first move our diagram would show 9 x 8 = 72 nodes. Think of a Figure 3 diagram created for each possible move in Figure 2 and all displayed in a single diagram.

These 72 nodes represent every possible move that players one and two may make during each player’s first turn. The potential moves and pipelines (sequence of moves) are completely determined. We cannot know the moves and sequence of moves that players will decide on during a particular game before that game is played. However, we do know every possible move and sequence of moves that they can make at each step of the game.

The entire tic-tac-toe world of possibilities can be laid out before us and, depending on the history of moves taken up to the moment we’re considering, we can know exactly what we may do next. We have perfect information about our tic-tac-toe world and what we may do in it.

Going Deep

So far we’ve considered the number of possible locations each player may play their move during their turn (breadth of possibilities) and just the first move of players one and two deep (depth; the sequence of moves across alternate turns). However, the depth of our inverted tree continues until a player wins with 3 in a row or all of the 9 squares are filled. Figure 1 above shows a diagram with restricted breadth but full depth.

At top of Figure 1 is our first player’s first move. Recall that this is just 1 of 9 possible moves (1 from a total breadth of 9).

Next down is our second player’s first move. Again, recall that this is just 1 of 8 possible moves. Also, before our first player selected a move, there were 8 more branches of 8 potential first moves for our second player.

The third level down in Figure 1 shows our first player’s second move based on our second player placing a nought in top row center. This is 1 of 7 possible moves.

In the fourth level down, our second player responds with her second move. One of 6 possible moves.

A pattern is apparent with 9 x 8 x 7 x 6 … potential moves. If we continue to ignore symmetries and other aspects of the tic-tac-toe game space that you probably notice, the full tic-tac-toe potential play space is indeed 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 or 9! (9 factorial) in size or 362,880.

Notice in the fifth level down, our first player’s third turn, I widen the breadth a bit and show 2 of the player’s 5 possible moves so we can see some of the variability in the end game (leaf node) pattern.

On our players’ fourth move (seventh and eighth levels down from top in Figure 1) we begin to see leaf nodes where the game ends. Two of the 3 possible first player moves displayed in level seven of the left-most branch are end leaf nodes with three crosses in a row (our first player would win).

This glimpse at a piece of the complete tic-tac-toe inverted-tree game space shows us how all of the information about winning, losing, and draw exists before a game of perfect information is even begun. Players choose from a limited set of options that determine the options that follow.

Today’s computers can easily maintain the complete game space information for tic-tac-toe. However, as the game becomes more complex, the game space increases extremely quickly. Checkers has a space of about 500 billion billion, chess has about 1040 and Go, another game of perfect information, has a space larger than all of the atoms in our known universe!

Clearly it’s possible to play games like Go since humans have done it for a very long time. How does a human brain do it? How can we get computers to do it? We’ll continue to explore these problems right here on my ActionPotential blog.