GraphBlast |
Rosalba Giugno
Universita' di Catania
Dipartimento di Matematica e Informatica
giugno@dmi.unict.it
Dennis Shasha
Courant Institute of Mathematical Sciences
Department of Computer Science
New York University
shasha@cs.nyu.edu
GraphGrep is a software package to search for a query graph in a database of graphs. Given a collection of graphs and a pattern graph, GraphGrep finds all the occurrences of the pattern in each graph. The pattern is a subgraph and it can be also a tree, a path, or a node. The pattern is expressed as a list of nodes and a list of edges.
The components of GraphGrep are the following:
Why GraphGrep? Many applications in industry, science, and engineering depend on efficient information retrieval methods from structured or semistructured databases. These databases contain items with a complex structure where the relationships between the components (of the items) have to be maintained. The items are typically represented by graphs. For example, shapes, images (computer vision), XML or OEM data (web exchange data model), molecules and proteins (computational biology), graphical sentences and grammars (visual languages) are represented by graphs. By using graphs the problem of information retrieval on a database is transformed to that of finding subgraphs in a collection of graphs. This problem is well known to be NP complete. Although graph-to-graph matching algorithms can be used, efficiency reasons dictate the use of special techniques in order to reduce the search space and the time complexity.
To use any version of GraphGrep you have to: (1) create a dataset file; (2) preprocess the collection of graphs; and (3) run a query. The output of "graphgrep" contains all the occurrences of the query in the collection of graphs.
GraphGrep2.0 is implemented in C++, it uses an efficient storage system, it is fast and accurate. Compared to GraphGrep1.0 or GraphGrep1.1, GraphGrep2.0 is faster but it does not allow inexact matchings.
Input Data Format |
Input Query Format | Output Format | Language | Storage System | Filtering System |
Wildcards | |
GraphGrep2.1 | LNE | LNE, Glide | List of corresponding nodes | C++ | Berkeley DB | Compacted Hashed Paths Vector |
Yes |
GraphGrep2.0 | LNE | LNE | List of corresponding nodes | C++ | Berkeley DB | Compacted Hashed Paths Vector |
No |
GraphGrep1.0 GraphGrep1.1 |
LNE | LNE,Glide | List of corresponding paths | C, K C |
Binary Files | Extended Hashed paths | Yes |
In order to build a very fast system we are exploring different filtering algorithms, storage and matching techniques. Please specify to us which version you would like to use.
To download GraphGrep, send email to shasha@cs.nyu.edu
and to giugno@dmi.unict.it.
If you care to describe your application, we'd be glad to hear
about it.
In any case, we will send you instructions for downloading the program.
Follow these instructions to install GraphGrep2.0.
Install
BerkeleyDB
1. Download BerkeleyDB 4.01 at
http://www.sleepycat.com/update/snapshot/db-4.1.25.NC.tar.gz
2. Unzip it to a folder
3. Switch to the inside folder "build_unix"
4. Type "../dist/configure --enable-cxx" and execute
5. Type "make" and execute
6. Type "make install" and execute
7. Make sure there is a directory "/usr/local/BerkeleyDB.4.1/" in your
system.
8. Make sure "libdb_cxx-4.1.a" " libdb_cxx.a" are in the "/usr/lib" folder.
You need this library to build GraphGrep
If you have any trouble building BerkelyDB or to see detail
information, please check "http://www.sleepycat.com/docs/ref/build_unix/intro.html"
Build GraphGrep2.0
1. Install successfully BerkeleyDB.
2. Unzip the graphgrep2.0 package
3. Type "cd graphgrep2.0"
4. Type "cd src; make; cd .. "
5. Check if there are "graphgrep" and "graphbuild" existing in the current
folder.
For bugs and questions please contact the giugno@dmi.unict.it and shasha@cs.nyu.edu.
Run GraphGrep2.0
To run the examples go to the GraphGrep2.0 directory and do the
following (If you want to run GraphGrep2.0 in a different directory: make sure that
graphgrep and graphbuild are in your path):
R. Giugno, D. Shasha,
GraphGrep: A Fast and Universal Method for Querying
Graphs.
Proceeding of the International Conference in Pattern recognition
(ICPR), Quebec, Canada, August 2002.
pdf
D. Shasha, J.T-L Wang, R. Giugno,
Algorithmics and Applications of Tree and
Graph Searching
Proceeding of the ACM Symposium on Principles of Database
Systems (PODS), Madison, Wisconsin, June 2002.
pdf
Chih-Yi, HSU, master in computer science at New York University 2002. Now he is working at AT&T lab, New Jersey.
This project is supported by
New York University, USA and
University of Catania, Italy.
Visit our CT-NYU project page for
a further useful package of algorithms with applications in Bioinformatics
and Mobile Stations.
This work has been partly supported by the U.S. National Science
Foundation under grants
IIS-0414763,
DBI-0445666,
N2010 IOB-0519985,
N2010 DBI-0519984,
DBI-0421604,
and
MCB-0209754.
This support is greatly appreciated.