|
SIGMOD 2008
2008 ACM SIGMOD International Conference on Management of Data
June 9-12 2008
Vancouver
Canada
|
Review: 1 |
Reviewer:
| Karl Aberer |
Email:
| karl.aberer@epfl.ch |
Organization:
| EPFL Lausanne |
Review:
| Question | Response |
1 | Overall Rating |
Weak Reject
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Medium
|
4 | Technical Depth |
High
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
to a limited extent
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
adequate
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
no or not applicable
|
8 | Presentation |
Not Good
|
9 | Reviewer Confidence |
High
|
10 | Name of External Reviewer (if applicable) |
Oana Jurca
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
The paper presents the problem of supporting select-join subscriptions in a distributed system over a multitude of data sources stored in a centralized location. The authors identify different types of data redundancy stemming from naïve approaches and propose a number of solutions for tuple (event) selection and dissemination. These solutions are detailed for the simpler case of two-way select-joins, and the extension to multi-way select-joins is briefly discussed. Finally, experiments show the improvement achieved by the various solutions over the naïve approaches.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
S1 Good formulation of the problem S2 Large range of possible solutions discussed S3 Extensive consideration of the optimization opportunities
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
W1 Typical publish/subscribe systems allow many subscribers and publishers; here, the data sources are centralized, allowing for only one publisher W2 The solutions are explained for the simple case of two-way select-joins, while the more probable case of multi-way select-joins is only briefly discussed; further more, it is not clear how straight-forward and what is the overall complexity of the extension. W3 The presentation lacks clarity: there are many notations, implicit definitions and different solutions and it is difficult to follow. A table recapitulating them would have been very useful. A number of things are omitted due to space constraints and we are referred to a technical report, which for obvious reasons, cannot be cited; maybe this type of submission was not the most appropriate for this work.
|
14 | Detailed comments (please number each point) |
1. The problem is very well presented, though I am surprised that there is no mention of the case when data sources are in different locations, as is the typical setting for publish/subscribe systems, to allow for multiple subscribers AND publishers 2. The presentation quality needs to be improved, as it is very difficult to grasp the differences between the many approaches. As a general comment, the settings must be specified more clearly, as should be the computational complexity of each solution and the space required for the data structures that is not discussed for any of them; only in the experiments chapter directory sizes are discussed, but it is not clear where the B-trees for semijoins are kept and how much space they take. Tables recapitulating the most important notations and the assumptions/ settings/ complexities/ data structures required, together with advantages and disadvantages of each solution would help. 3. Going back to the settings issue, the terms server, nodes, brokers must be clearly explained, as sometimes they are used interchangeably and some other times they describe different concepts. 4. The inter-subscription redundancy is typically transparently resolved in traditional publish-subscribe systems based on a network of brokers; this is not the case for pub/sub over structured overlays and must be treated separately, which is what happens in your solution; again, the setting should be clearer; still, you could have considered such kind of solutions, as in the beginning of the paper it seems that you keep state only on the server and not on the brokers (like in traditional pub/sub) and a lot of work has been done on minimizing such redundancy in pub/sub 5. Not clear where do you actually keep state, depending on the solution 6. Typos: page 2, col1, when defining redundancies: the some information; section 2.1 par 3 cross instead of across; sect. 3.1.1 clusteredness ; last 2 parags 3.2.1, missing ), to and s; sect 4, par 3 "the contents them"; sect 5, explanation of fig 4-center: with the payload instead of without the payload; sect 6.1 par1 "use main memory use main memory";, last but one par: "normal distributions distributions N_1..." 7. No point in mentioning the technical report when you are not allowed to reference it. 8. What is the difference between a semijoin in the case of two-way join and a binary semijoin in multi-way join? If there is no difference, then you only need as many semijoins as there are links in the join graph, not tables; if there are differences, than the brief extension explanation in section 4 is using undefined concepts (t_s affected R^S-semijoins) 9. What is the complexity of the greedy decomposition algorithm and how does it impact the overall computation complexity? 10. There is no actual comparison to related work, even though the cited papers use at least some of the same techniques you employ; maybe you could have made parallels between the related works and the naïve solutions you compare with.
|
15 | Comments for the Program Committee |
The authors propose a very complex solution, for which they can only provide the details in a very restrictive case (two-way and not multi-way select-joins), due to space constraints. The extension is not straight-forward and their brief discussion is not sufficient. I do not think they can clearly present everything in 12 pages, as the presentation already lacks clarity. Further more, they do not really compare with similar existing approaches, just with some naïve ones, from which it is extremely clear and expected that they will achieve better results.
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
Yes
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
See detailed comments 2,3,5,8,9,10
|
|
Review: 2 |
Reviewer:
| Alan Fekete |
Email:
| fekete@it.usyd.edu.au |
Organization:
| University of Sydney |
Review:
| Question | Response |
1 | Overall Rating |
Weak Accept
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Medium
|
4 | Technical Depth |
High
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
yes, allowing for space limitations
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
adequate
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
yes
|
8 | Presentation |
Adequate
|
9 | Reviewer Confidence |
Medium
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
A reasonable algorithm to optimize the computation of joins among event streams, taking account of sharing among subscriptions; this uses semijoins and skyline techniques. The problem mixes ideas from stream processing, distributed query processing, and pub/sub systems, and seems well-motivated; a range of algorithms are explored extensively, and there is a clear winner over much of the design space.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
S1. An interesting problem
S2. A good solution, melding several ideas from different parts of the db literature
S3. Extensive experiments to show the quality of the solution
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
W1. There are no error bars or confidence intervals in the graphs or discussed in the text (though the smoothness of the curves and the large gaps, suggests to me that the differences are indeed significant)
W2. It would be good to do experiments comparing to the performance from previous work like [14] or [17]; even though each is not aimed at the same problem, it can be applied here, and it would be good to know how it performs in your context.
|
14 | Detailed comments (please number each point) |
1. The problem is a complex mix of requirements from pub/sub, view maintainance, stream processing; it would help I think to put together all the requirements in one place (what inputs the system gets, how often they change, and what should happen)
2. It would be good to evaluate existing algorithms that were not designed with all these requirements, but can be applied in your context
3. It would be good to evaluate performance as the amount of overlap between subscriptions increases (while the number of subs is constant)
|
15 | Comments for the Program Committee |
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
Review: 3 |
Reviewer:
| Nick Roussopoulos |
Email:
| nick@cs.umd.edu |
Organization:
| University of Maryland |
Review:
| Question | Response |
1 | Overall Rating |
Weak Accept
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Medium
|
4 | Technical Depth |
Medium
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
to a limited extent
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
adequate
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
no or not applicable
|
8 | Presentation |
Not Good
|
9 | Reviewer Confidence |
Medium
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
The paper presents a reduction in maintenance of notification of publish/subscribe systems. Many of the redundancies in such systems are discussed and solutions are provided for avoiding them.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
s1. Characterization of redundancies
s2. Use and maintenance of dual semi-joins
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
w1. The formulation could have been simpler
w2. The presentation is too coarse in some areas and too dense in others.
|
14 | Detailed comments (please number each point) |
The presentation is not very intuitive. There is some redundancy in it in many areas but not well explained in others. It may due to the space limitation but it requires an effort to follow the paper.
|
15 | Comments for the Program Committee |
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
|