GraphClust

 

Diego Reforgiato Recupero
Universita' di Catania
Dipartimento di Matematica e Informatica
diegoref@dmi.unict.it

Dennis Shasha
Courant Institute of Mathematical Sciences
Department of Computer Science
New York University
shasha@cs.nyu.edu






What is GraphClust?

GraphClust is a tool that, given a labeled (directed and undirected) graph, clusters the nodes in the graph based on their topology. The GraphGrep software, by contrast, allows relatively small graphs to be used as queries into databases of usually larger graphs. That software finds matching subgraphs in the larger graphs very quickly.

GraphClust consists of 16 different algorithms broken down along four binary dimensions:



For all algorithms, the procedure starts in the same way. First, all substructures are found for each graph. Then a matrix is formed whose columns consist of all possible substructures and for which there is one row for each graph. Each entry [i,j] represents the number of substructures j in graph i. The following example illustrates this when the substructures are paths, the graphs are considered to be undirected, the number of clusters is 15, and the distance metric is Euclidean. So the options would be -g U -m E -kmeans 15 -s S.
INPUT: graph1 and graph2


Graph 1 Graph 2

For Graph1, we have the following shortest paths of length 1 up to LP=3.
starting from the upper node A : {A,AC,AB,ABA}
starting from the node B : {B,BC,BA,BA}
starting from the downer node A : {A,AB,ABC,ABA}
starting from the node C : {C,CA,CB,CBA}

For Graph2, we have the following shortest paths of length 1 up to LP=3.
starting from the node A : {A,AB,AC}
starting from the node B : {B,BA,BAC}
starting from the node C : {C,CA,CAB}

When edges are undirected, the path XYZ is equal to path ZYX we form the following matrix

Graph A AC AB ABA B BC ABC BAC
Graph1 2 2 4 2 1 2 2 0
Graph2 1 2 2 0 1 0 0 2

Once the matrix is created both algorithms take all rows and cluster them using distances - either inner product or Euclidean distance (Euclidean in this example), chosen by the user.
To use any algorithm of GraphClust you have to: (1) create a dataset file; (2) choose your options.
At the end of the process, a form of correlated substructures pairs is displayed.
The following image shows an example of such a form:



The substructures pairs are sorted decreasingly according to the correlation value. Greater the correlation value is, more correlated the two substructures are. The substructures pair chosen by the user in the left panel is displayed on the right. For each substructure, the nodes, painted as circles, have a label and the identification number. A circle containing the dot symbol is used as dummy and it means that the proposed substructure has only 1 node (the one linked to the circle with the dot symbol). Two input parameters are here involved: The output of "graphclust" includes a file named 'output' and a directory named 'correlation_files'.

The first one displays all the generated clusters and for each cluster it displays the centroid, all its elements and the distances of each element from the centroid.
The second one contains all the information already showed in the form.

Clustering is done in many areas of computer science, and there are a lot of different techniques. There are many applications for clustering graphs. Some examples are: Internet search engines, knowledge management systems, document databases and XML documents.

Home


Compare GraphClust

GraphClust is implemented in ANSI C, the graphical interface is implemented in JAVA and the substructures images are built by using the springgraph command.

 

Home


Download

To download GraphClust, please send email to shasha@cs.nyu.edu  and to diegoref@dmi.unict.it.
If you care to describe your application, we'd be glad to hear about it. In any case, we will send you instructions for downloading the program.

Home


Installation of GraphClust

GraphClust is implemented in ANSI C. It has been ported on Unix, and  Windows  platforms.

To install GraphClust:
  1. Unzip the GraphClust package
  2. type "cd GraphClust"
  3. type "cd src"
  4. type "make"
  5. type "cd .."
  6. Check if there is "graphclust_main", "graphclustA", "graphclustA_S", "graphclustB", "graphclustB_S" existing in the current folder
  7. Download the SubDue5.1 release from the download section at http://ailab.uta.edu/subdue/
  8. unzip the zip file where you want
  9. go to subdue-5.1.0/src directory
  10. type make
  11. copy binary files sgiso and subdue to the directory that graphclust_main, graphclustA, graphclustA_S and graphclustB_S are in.
  12. type ./graphclust_main without arguments to see all the possible options.
  13. Check if the springgraph command is present in own unix system. (Even without it, the sotware still works but it will not show the substructures images).

For bugs and questions please contact diegoref@dmi.unict.it or shasha@cs.nyu.edu.

Home


Usage of GraphClust

File formats

 

Running GraphClust

To run the examples go to the GraphClust directory and do the following

Home

 


Clustering References