General notes I commented as I was reading, so you'll have one data point on a sort of a "focus group" experiment. Sometimes the comments space out, which means I got used to the flow/style of the text. Other times, I comment that I miss some explanation, just to find out that it is coming (in which case, perhaps a forward reference could help). I hope this style of comment is not too off-putting. I tried to write a summary immediately after each chapter. The idea is to offer a glimpse of what this reader retained from the chapter. Chapter 3 is the book's core, but it is pretty long. Couldn't the global (until 3.3.) discussion appear first and the local discussions, modification, etc, be in a separate chapter? MANOS_STRATOS: I like this idea. We need to discuss. There's an interesting discussion on segmentation and model hierarchy in learned indices (3.3.6), which is excellent, but we see no similar discussion on other indices. For instance, the evolution of (vertical) compression and (horizontal) node adaptivity in tries is---in my super biased opinion---quite interesting as well. Again, I'm a buff for how these data structures evolved, and clearly learned indices deserve a lot of space. But I wonder if some "historical" notes of how these data structures evolved could fit. MANOS_STRATOS: Manos, would you know about this? Overall, I feel a few missing data structures could have contributed to the narrative, b-epsilon trees and skip-lists mainly. MANOS_STRATOS: We should sicuss. Some illustrations might have been helpful, mainly at the beginning and end of chapters 3 and start of 4 (please, see suggestions on the detailed notes). I understand what I'd ask next is a big undertaking, but illustrating each data structure in chapter 4 would have also been interesting. MANOS_STRATOS: Good idea Perhaps there could be some closing words on what's beyond the structures discussed here? Other indexing structures such as geo and small world navigable graphs for approximate neighbor search, for instance, use many concepts described here but add a little "something else" to meet their requirements. Geo structures use different kinds of nested partitioning, and would small world graphs be a new kind of global organization? MANOS_STRATOS: Need to disuss ======== Detailed notes below: Data Structures for Big Data: Tradeoffs and Design Guidelines ===== Abstract-- "vary in time and location" - not sure what 'location' means here. DONE. ===== Chapter 1 p2: The 'Operations on Key-Value Structures' paragraphs appear "suddenly." Was it meant to be an example of how a class of data structures would be presented? It then cuts to '1.2 Audience & Prerequisites,' which is something commonly discussed in a Preamble. DONE. Moved audience and prerequesites p6: "hard disk [...] have been replaced by shingled disks", have they all? DONE p6: same sentence "and flash." I'd call those flash-based devices (or disks). DONE The chapter reads like a preamble rather than an introduction. LEAVE AS IS ===== Chapter 2 p9: Comma use in lists start British (this, this, and that), but here I see an American one ("access, size, and cost"). DONE p9: "When power fails" may not always be the case. "When power is shut down," perhaps? DONE p9: "Layers" and "levels" are used interchangeably. DONE p10: "a nanosecond clock time processor" is an odd term. Do you mean a modern processor? A processor running at 1 GHz or more? DONE p10: The assumption of having two layers of memory (slow/infinite and fast/limited) is interesting. Another aspect of these layers is the transfer sizes they support. They tend to get larger at deeper levels. My point is, (re- or un-)batching data as it goes up and down is also a theme in efficient data structures... MANOS_STRATOS: We should discuss p10: Is the metric for Reads vs. Updates "expected response times" only or also operations per second? Latency and throughput are two facets of a data structure. DONE p12: typo. "Presents amplification *is* a data-size-sensitive metric". is -> as. DONE p16: doesn't -> does not? DONE Summary: The chapter introduces the notion of amplification as a metric to evaluate data structures. The assumption is that data structures that minimize amplification may benefit more fully from hardware speed than those that do not. The issue, the chapter contends, is that some amplification in some dimension(s) is always needed. The chapter then argues that the choice of amplification should reflect how a data structure is accessed, i.e., its workload. It offers a method to characterize the workload based on the primitive access patterns it presents. MANOS_STRATOS: Should we include the summary above or something like it? (I'm of two minds) ===== Chapter 3 p18: Right in the beginning (first paragraph), the chapter mentions key-value data. This brings me back to the Introduction, in which operations on key-value structures appear "suddenly." This hints at key-value pairs being a central building block of data structures. If that's the case, I would now understand the emphasis on key-value operations from the get-go. I'd make this assumption more explicit in the Introduction. DONE p18: Should items on the list be capitalized? "1. Global Organization" as opposed to "1. global organization"? DONE p20: In the first paragraph on that page, the text mentions that there'll be a focus on data movement. It makes perfect sense, but somehow I left the previous chapter without registering that the analysis would be based on that. I'm sure it's there, but it didn't register with me. DONE p20: "selectivity (***fraction of entries retrieved)" there's an extra space. DONE p20: "Table 3.1: *Parameters* capturing the various *parameters*"? List of the various parameters? DONE The eight dimensions are pretty exciting, and I can see we'll talk about them at once in this chapter. As I started reading the 'global organization,' I wondered if there wouldn't be a way to portray the dimensions options in a figure. I would help visualize what the text ahead is going to bring. It seems each dimension will have its figure, e.g., figure 3.1. Still, it would be nice to portray the whole classification at once. MANOS_STRATOS: We need to discuss this. p22: "the previous collection" -> the latter collection? But now, in the next paragraph, I read "in our example dataset," and I realize I missed it. Was a dataset defined before? If it was, it did not register. DONE p22: "the first two bits" -> the two most significant bits? DONE p25: What's the "+1" for in the binary search complexity? If the "+1" is indeed justified, it does not appear in table 3.2 DONE p25: "partitions. and the cost of" -> extraneous period DONE p25: SIMD is defined in the "Optimizations" paragraph, but it appeared before on the same page. DONE p25: "Direct Addressing" is missing a definition like the other methods received. It jumps straight to "Applicable to." DONE. p29: "Full Scan -> Filter Indexing." Is the idea to discuss "Full Scan using/under/... Filter Indexing." A right-arrow in a sub-title can mean several things. Many subsequent subsections use the same title approach. MANOS_STRATOS: We should discuss. What do we think the arrow means? I think it might mean "can be improved by". p31: "When we search that was not inserted in the Bloom filter"? The sentence is not reading well. DONE p34: In Figure 3.10, I'm not sure I got what the ball represents. I mean, why a ball? DONE p34: "...can be merged with the child." Cite Patricia trees? And, BTW, Viktor's ART is fantastic, but I think the first work to toy with trie node adaptations was Judy arrays. MANOS_STRATOS: We should discuss p36: In the first paragraph, it would be nice to have an example of a model, e.g., explain how a simple linear model could, for instance, approximate the CDF curve of a well-behaved distribution. DONE p38: Perhaps use "(top)" and "(bottom)" on the figure caption to refer to the B+-Tree and the Learned Index, respectively? DONE p40: "s%," should 's' be italicized? DONE p40: "Table Table 3.4." 'Table' appears twice. DONE Summary (first half till 3.3.7): Armed with the metric from the previous chapter, we can now use a more principled approach to analyzing the performance of data structures. In this first half, the current chapter discusses the "global organization" of data, i.e., it looks at how entire data sets are organized on the macro level, and it then looks into the cost of performing scanning and searching operations in the different data organizations. In particular, a search could be aided by indexing structures or not. The chapter covers both, giving special attention to the various indexing techniques available. MANOS_STRATOS: We should discuss. ===== Chapter 3 (cont) p41-42: Sections 3.4 and 3.5 were brief. I understand that many of the global organization and search considerations apply here. However, the emphasis on global search is to eliminate content so that the search progresses fast. In contrast, the local search focuses on data organization that fits, for instance, cache structures or SIMD instructions better. The algorithms on this level also have to contend with the fact that the data movement into/from higher levels occurs at a different granularity than with lower ones. There could also be compression at this level. Lastly, local insertion and update may also focus on reducing or eliminating a new memory allocation at every operation---the Achilles heel of many data structures. (By continuing reading, I realize the latter is addressed in 3.6.) DONE p43: Was the "single copy invariant" introduced before, or is it being defined as it first appears here? I feel that a couple of sentences about this and the multi-copy organization could help. DONE p43: Ditto for "tombstone." DONE p44: In 3.6.2, I missed mentions of other commonly used out-of-place modifications such as multi-version or copy-on-write techniques. MANOS_STRATOS: We should discuss p45: "An out-of-place modification appends to a buffer in memory, hence, it requires zero disk access." They're often logged, however. DONE p46: Shameless plug about buffering/batching: our work in "Transaction Triaging" in VLDB'21 groups related transaction requests while they are traversing the network and presents them all at once to the database server. 100% fine if you don't feel it applies here. :) MANOS_STRATOS: We should discuss. Why not mention it? p48: I came to the end of 3.7.3 and imagined the discussion would touch on B-epsilon trees. Couldn't reserving a portion of the nodes to log be considered buffering updates? MANOS_STRATOS: We should discuss. p48: I was a bit surprised to see Key-Value Representation in 3.8. My recollection was that we were following the list of 8 dimensions. It's inconsequential, but I checked back to the initial list on page 19, and, indeed, buffering was discussed before kv representation. DONE p49: Regarding partial key-record, the Bigtable paper (ToCS'08) calls column families. It is the first time I've seen that concept being given a name. (But I have not researched enough to determine they were the first.) MANOS_STRATOS: We should discuss. p50: "Bit vectors can be aggressively compressed using variations of run-length encoding..." My recollection is that modern techniques such as Roaring rely more on substituting data for codes (plus on some hardware instructions) rather than strictly using run-length encoding. An advantage: they don't sacrifice random access. MANOS_STRATOS: We should discuss. p51: Could Table 3.7 have the section numbers where each is discussed? It'd be a helpful index. MANOS_STRATOS: We should discuss. I agree. p51-53: Would it make sense to present 3.10 as a flow chart? MANOS_STRATOS: We should discuss. I agree. p52: Item 4, radix-based structures "like" skewed data. If the keys are uniformly distributed over the alphabet, these structures' "forest-like" flavor loses a bit of its compression power. Put it differently, tries love skew. :) MANOS_STRATOS: We should discuss. p53: "might require several entire partitions" -> might require several entire partition full scans (such that sorting benefits are lost on them)? DONE p54: B+****-tree-like design -> extraneous space? DONE p53-55: There's a tremendous amount of insight when presenting answers to the questions!! I wonder if one of the cases here couldn't appear much earlier, say, in Chapter 1, to give a glimpse ahead of what kind of reasoning the reader will be able to do with the concepts that the book will introduce. MANOS_STRATOS: We should discuss. p56: Further reading appears in a chapter for the first time (?). It is most welcome. p57: Would the kind of "rebalancing" on skip-lists make it a candidate for "Rotations and Rebalancing" further reading? MANOS_STRATOS: We should discuss. Summary: This (half) chapter dealt with the remaining dimensions proposed before. I couldn't find a cohesive theme for the chapter but updates and formats dominated the discussion. The chapter also adds a method that relies on the dimensions and cost metrics to produce new or select among existing data structures and access methods. Interestingly, the after questions bring all the material discussed so far together by applying the notions the past chapter introduced in insight-inducing thought exercises. MANOS_STRATOS: To me this is an argument to keep the whole chapter together. ===== Chapter 4 p60: it seems that what's ahead are several data points on a three-scale axis (defined on page 59). If so, an upfront figure with the solutions (big picture) would have been nice. MANOS_STRATOS: We should discuss. I agree. p60: It would have been interesting to hear why the data points were selected this way. For instance, could the mix described in section 4.1 be that of a traditional OLTP system? And the one in 4.2, an in-memory OLTP system? And so on. DONE p61: Would Michael Bender's cache-oblivious trees belong in 4.2 or 4.3? MANOS_STRATOS: We should discuss. p62: typo in "a new hybrid design that employ" -> employs DONE p63,64: What a great collection of data structures! Bounded disorder, m Masstree, etc. Great to see these structures here. DONE p69: Q1. Bloom filters are also particularly suited because the runs in LSM levels are immutable. DONE p69: Q2. Some datasets compress better than others, i.e., when the keys are skewed. Otherwise, Tries can be memory guzzlers. MANOS_STRATOS: We should discuss p70. Q3. What is an interpolation-based method? Locating a key via an assumed distribution? DONE p70. Q4. Reminds me of WicKey from Arpaci Dusseau. This works well for KVs on SSDs. MANOS_STRATOS: We should discuss whether we need another reference p72. After all the examples of great data structures in this chapter, I feel the Further Reading list is a bit short. Would an in-memory database survey fit here, for instance? Would a survey of LSM variations? Luo and Carey in VLDB'20 come to mind. How about Tianzheng Wang's survey of trees in persistent memory? MANOS_STRATOS: We should discuss. Summary: This chapter put the "flow chart" at the end of Chapter 3 to use by picking some representative workloads and following the recommendations on how to design/select an appropriate data structure. The selection of data structures was quite diverse and educative. MANOS_STRATOS: We should discuss. ===== Chapter 5 p73 or p76. As the topic is on adaptivity and B+-trees, the idea of skip-lists came to mind. It presents an orthogonal adaptivity mechanism to tweaking the leaves' organization. It also introduces the use of probabilistic approaches to simplify data structure design. Would it fit here? MANOS_STRATOS: We should discuss p76: Typo: "PYRAMID" -> PyRUMID DONE p76: On the discussion of "no free lunch," adaptivity pegged to queries may generate uneven response times, even for different executions of a same query. (p78 somewhat comments on this but remarks that performance tends to improve with time.) MANOS_STRATOS: We should discuss p85: No further reading? MANOS_STRATOS: We should discuss Summary: This chapter presents the idea that data structure organizations (primarily local) can change as an effect of accessing the data structure (mainly reading). It offers exciting concepts of when to trigger adaptivity and how aggressive adaptivity should be. The chapter also discusses criteria to evaluate/benchmark different techniques. MANOS_STRATOS: We should discuss ===== Chapter 6 p89: "row-at-a-time"? Do you mean row-oriented organization? DONE p89: Paragraph on 80's initial veering towards column orientation: the SybaseIQ folks come to mind (Clark D. French on ICDE'97). They get little credit, but my understanding at the time is that they were ahead of everyone else. And, as you well mentioned, APL folks ate (still do) this kind of processing for breakfast. MANOS_STRATOS: We should discuss but why not add the reference. p91: typo: "a columnar database systems" -> system DONE p92: I missed a mention in 6.1. of approximate data structures for big data, e.g., counting/uniques sketches. MANOS_STRATOS: We should discuss p92: Another thought that came to mind is how difficult it is to update schemas, i.e., data structure layouts, in big data scenarios. I can share (it's public via their paper) that Google's Bigtable did something akin to what you call sideways cracking, slowly pushing a schema change to different regions of a data structure. MANOS_STRATOS: We should discuss p93: I'm not sure ext3 uses B+-trees. Directories are maintained via a hierarchical structure called inodes. Ted Tso (the primary maintainer) has some papers about the internals. Also, modern file systems cannot manage objects on a page granularity; the metadata becomes too big. They tend to use "extents" (a series of contiguous pages) as their base unit of allocation. This is why some file systems have provisions for very small files. MANOS_STRATOS: We should discuss p93: Carlo Curino at MSR has some interesting papers about the data management lifecycle for machine learning pipelines. MANOS_STRATOS: We should discuss p95: "the high cost of accessing data on remote sites." RDMA has been systematically chipping away at this cost to the point that, in an actual system, the cost of accessing local or remote data may be comparable. Heavy technology changes in the near future might make this difference negligible. MANOS_STRATOS: We should discuss Summary: This chapter was exciting, but I failed to sense a unifying theme. As for 6.1 and 6.2, Big Data invariably involves distribution, a topic that wasn't discussed. MANOS_STRATOS: We should discuss ===== Chapter 7 p100: "compare-and-set." Do you mean compare-and-swap? DONE p100: The best approach remains not to need locking by designing structures that, for instance, take advantage of naturally serialized machine instructions. This is the case with Masstree, which leverages the TSO-ish nature of x86 machines. BTW, there's no CAS in x86, and the equivalent instruction causes an entire bus lock. What folks consider lock-less structures is, in practice, anything but. (Dennis, I learned a thing or two with Niall Dalton on the topic.) MANOS_STRATOS: DONE p100: In 7.2, some assertions seem not to consider the recent changes that RDMA brought. There are two excellent papers about replication with RDMA (Tianzhend/Ippo VLDB'17 and Kraska/Stonebraker VLDB'19). MANOS_STRATOS: We should discuss p102: No further reading? MANOS_STRATOS: We should discuss Summary: This chapter touched on advanced topics that complement the material presented. It talks about concurrency, distribution, and efficient implementation techniques that can be applied after crafting a data structure for a particular workload. MANOS_STRATOS: We should discuss ===== Chapter 8 p105: "Data structure design remains an art. We hope a deeper under- standing of its dimensions will make the new data structures you design both beautiful and efficient." Wow, what an inspiring closing!! DONE:)