Universita' di Catania
Dipartimento di Matematica e Informatica
Courant Institute of Mathematical Sciences
Department of Computer Science
New York University
What is GraphGrep?
Compare GraphBlast Demo
GraphGrep is a software package to search for a query graph in a database of graphs. Given a collection of graphs and a pattern graph, GraphGrep finds all the occurrences of the pattern in each graph. The pattern is a subgraph and it can be also a tree, a path, or a node. The pattern is expressed as a list of nodes and a list of edges.
The components of GraphGrep are the following:
Why GraphGrep? Many applications in industry, science, and engineering depend on efficient information retrieval methods from structured or semistructured databases. These databases contain items with a complex structure where the relationships between the components (of the items) have to be maintained. The items are typically represented by graphs. For example, shapes, images (computer vision), XML or OEM data (web exchange data model), molecules and proteins (computational biology), graphical sentences and grammars (visual languages) are represented by graphs. By using graphs the problem of information retrieval on a database is transformed to that of finding subgraphs in a collection of graphs. This problem is well known to be NP complete. Although graph-to-graph matching algorithms can be used, efficiency reasons dictate the use of special techniques in order to reduce the search space and the time complexity.
To use any version of GraphGrep you have to: (1) create a dataset file; (2) preprocess the collection of graphs; and (3) run a query. The output of "graphgrep" contains all the occurrences of the query in the collection of graphs.
GraphGrep2.0 is implemented in C++, it uses an efficient storage system, it is fast and accurate. Compared to GraphGrep1.0 or GraphGrep1.1, GraphGrep2.0 is faster but it does not allow inexact matchings.
|Input Query Format||Output Format||Language||Storage System||Filtering
|GraphGrep2.1||LNE||LNE, Glide||List of corresponding nodes||C++||Berkeley DB||Compacted
|GraphGrep2.0||LNE||LNE||List of corresponding nodes||C++||Berkeley DB||Compacted
|LNE||LNE,Glide||List of corresponding paths||C, K
|Binary Files||Extended Hashed paths||Yes|
In order to build a very fast system we are exploring different filtering algorithms, storage and matching techniques. Please specify to us which version you would like to use.
To download GraphGrep, send email to firstname.lastname@example.org
and to email@example.com.
If you care to describe your application, we'd be glad to hear about it. In any case, we will send you instructions for downloading the program.
Follow these instructions to install GraphGrep2.0.
1. Download BerkeleyDB 4.01 at http://www.sleepycat.com/update/snapshot/db-4.1.25.NC.tar.gz
2. Unzip it to a folder
3. Switch to the inside folder "build_unix"
4. Type "../dist/configure --enable-cxx" and execute
5. Type "make" and execute
6. Type "make install" and execute
7. Make sure there is a directory "/usr/local/BerkeleyDB.4.1/" in your system.
8. Make sure "libdb_cxx-4.1.a" " libdb_cxx.a" are in the "/usr/lib" folder. You need this library to build GraphGrep
If you have any trouble building BerkelyDB or to see detail information, please check "http://www.sleepycat.com/docs/ref/build_unix/intro.html"
1. Install successfully BerkeleyDB.
2. Unzip the graphgrep2.0 package
3. Type "cd graphgrep2.0"
4. Type "cd src; make; cd .. "
5. Check if there are "graphgrep" and "graphbuild" existing in the current folder.
For bugs and questions please contact the firstname.lastname@example.org and email@example.com.
To run the examples go to the GraphGrep2.0 directory and do the
following (If you want to run GraphGrep2.0 in a different directory: make sure that
graphgrep and graphbuild are in your path):
R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for Querying Graphs.
Proceeding of the International Conference in Pattern recognition (ICPR), Quebec, Canada, August 2002. pdf
D. Shasha, J.T-L Wang, R. Giugno,
Algorithmics and Applications of Tree and
Proceeding of the ACM Symposium on Principles of Database Systems (PODS), Madison, Wisconsin, June 2002. pdf
Chih-Yi, HSU, master in computer science at New York University 2002. Now he is working at AT&T lab, New Jersey.
This project is supported by
New York University, USA and
University of Catania, Italy.
Visit our CT-NYU project page for a further useful package of algorithms with applications in Bioinformatics and Mobile Stations.
This work has been partly supported by the U.S. National Science
Foundation under grants
This support is greatly appreciated.