GraphGrep1.0, 1.1


Rosalba Giugno
Universita' di Catania
Dipartimento di Matematica e Informatica

Dennis Shasha
Courant Institute of Mathematical Sciences
Department of Computer Science
New York University

Date 1/1/2002.

 Installation    Usage    Glide    References



GraphGrep1.0 is implemented in ANSI C and in a high performance interpreted language called K. It has been ported on Unix, and  Windows  (Cygwin) platforms.
Follow these instructions to install and test GraphGrep1.0.

GraphGrep1.1 is implemented only in ANSI C. It has been ported on Unix, and  Windows  (Cygwin) platforms.
Follow these instructions to install and test GraphGrep1.0.


To run the examples go to the GraphGrep1.0 or GraphGrep1.1 directory and do the following:

  1. Run "graphbuild ex1/dbsmile.dat". This will  create a directory called graphgrepdata under your working directory with the abstract representation of the graphs.

  2. Search for subgraphs:

  3. Note: mquery.dat is represented as a list of node labels and edges; xquery2.dat and xquery.dat are expressed in Glide language. Use the command line option -x  if the query is in Glide language, otherwise use -m. The wildcards can be used only in the Glide language. 

  4. Replace an existing graph 

If you want to run GraphGrep1.0 in a different directory: (1) make sure that graphgrep and graphbuild are in your path; (2)  copy all *.k files in the new directory.   If you want to run GraphGrep1.1 in a different directory make sure that graphgrep and graphbuild are in your path.

For bugs and questions about GraphGrep please contact and



  To use GraphGrep1.0, 1.1 you have to: (1) create a dataset file; (2) preprocess the collection of graphs; and (3) run a query. The output of "graphgrep" contains all the occurrences of the query in the collection of graphs.



Glide (Graph LInear DEscription)  is a language designed to express graphs. Glide represents  a graph with a linear notation. The main idea  is to represent a graph as a set of branches where each node is presented only once.  The expressions in Glide, called regular graph expressions, allow the description of portions of graphs.  There are powerful  languages that work only with specific applications. Glide is designed to be as general as possible without compromising efficiency. 

Let us formulate the Glide regular graph expressions  using examples. First consider the graph in Fig. 1.

Fig. 1: Generic graph.

  1. A node is represented by  its label followed by the special character '/'  
  2.  a/
  3. An edge is represented by two consecutive node representations; a path of length n  is represented by n+1 consecutive node representations.
  4. a/b/
  5. Branches are  grouped using nested parentheses  '('  and ' )'.
  6. a/(h/c/)b/
  7. Cycles are broken by cutting an edge and labeling it with an integer. 


    1. The labels of the nodes  of the cut edge are followed by  '%', an integer  and '/'.
    2. c%1/ f/  i%1/
    3. The labels of the nodes of  the cut edge are followed by a list of '%'and integers. In this case the same node is a vertex of several cut edges.
    4. a%1/h/c%1%2/d/h/i%2/


Note: by node label we mean the value  used  in a matching algorithm to establish the equivalency between two nodes.

Glide uses  wildcards for approximate searches. The wildcards are used to match  single nodes or paths. Wildcards must be always followed by '/' . 
The wildcards used by Glide  and their semantics are given below.

Wildcard Semantic (matching with) Example
. any (single) node a/./c/  
* zero or more nodes a/*/c/
? zero or one node a/?/c/
+ one or more nodes a/+/c/

Glide manipulates any combination of wildcards. A recursive semantics is provided (see examples).

Glide expression Semantic
a/./././b/ any path of length 4 beginning with a node 'a 'and ending with a node 'b'
a/+/./b/ any path of length at least 3 beginning with a node 'a' and ending with a node 'b'
a/?/./ any path of length at least 2 and at most 3 starting with a node 'a'
././a/ any path of length 2 beginning with a node 'a '
a%1/(*/b/)./c/d%1/ any path where 'a' and 'c' are connected through a node, and a path between a and b exists.


The graph expressions (E's) in Glide are generated by the following grammar, where digit is a positive integer and label is a string.

E  <--  'label/'  | 'label{%digit}/'  | './'  |  '*/'  |  '?/' | '+/' 

E  <--  EE | '('E')'  


Graph Incorrect graph expression  Correct graph expression
a/((b/i/c%1)/h/c%1/) a/(b/i/c%1/)h/c%1/


Some details on Glide.




R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for Querying Graphs.
Proceeding of the International Conference in Pattern recognition (ICPR), Quebec, Canada, August 2002. pdf

D. Shasha, J.T-L Wang, R. Giugno, Algorithmics and Applications of Tree and Graph Searching
Proceeding of the ACM Symposium on Principles of Database Systems (PODS), Madison, Wisconsin, June 2002. pdf