2Q: a low overhead high performance variant of the lru/2 algorithm

From ted@squall.cis.ufl.edu Tue Sep  7 15:16:15 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA15314; Tue, 7 Sep 93 15:16:07 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA16862; Tue, 7 Sep 93 15:15:58 -0400
Date: Tue, 7 Sep 93 15:15:58 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9309071915.AA16862@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Status: R

Dennis,
I just got an early version of the simulators working,
and the results are very promising.
I wrote two buffer modules, 1 for our algorithm (which I call 2-Q),
and one for LRU.
I wrote another module which feeds the buffers an IRM stream
(no correlation between requests), with a zipf distribution
on the access probabilities.

to interpret the parameters and results,
number of iterations, database size, seed are obvious.
queue size is the total number of buffers.
max FIFO queue size is the max number of the buffers
that can be dedicated to the first FIFO queue.
I ask for it on the LRU simulations, but its not used.
'a' is the parameter of the zipf distribution (a=.8 is 80/20)

I report the total number of buffer hits, and give
a breakdown on whether the hit was to the FIFO queue or the LRU queue.
(there are always 0 FIFO hits on the LRU-only queue).

I also report a histogram on the access and hits for
different classes of data items.  I break the data items
into 10 classes, from hottest to coldest.
"class 0" is the hottest to the 50th hottest data item,
"class 50" is the 51st to the 100th, etc.

The bottom line: 20+ % improvement in the hit rate.
2-Q discriminates against cold items in favor of hot items.
As we wanted.

OK, heres the raw results.
	Ted


squall% zipf2q
Simulator to test buffer algorithms.  Zipf distribution
Number of iterations?
100000
M=100000
Database size?
500
N=500
Queue size?
50
n=50
Max FIFO queue size?
25
k=25
******************* 2-Q algorithm ******************** 
what is a?
.5
a=0.500000
seed?
1
seed=1
Total of 20230 hits out of 100000 accesses
5863 were fifo, 14367 were LRU 

class 0, 29061 touch, 12248 lru, 2312 fifo
class 50, 13345 touch, 899 lru, 968 fifo
class 100, 10402 touch, 437 lru, 631 fifo
class 150, 8922 touch, 257 lru, 466 fifo
class 200, 7784 touch, 160 lru, 360 fifo
class 250, 6960 touch, 113 lru, 287 fifo
class 300, 6457 touch, 91 lru, 269 fifo
class 350, 5949 touch, 68 lru, 215 fifo
class 400, 5679 touch, 41 lru, 181 fifo
class 450, 5336 touch, 53 lru, 174 fifo


squall% zipf2q
Simulator to test buffer algorithms.  Zipf distribution
Number of iterations?
100000
M=100000
Database size?
500
N=500
Queue size?
50
n=50
Max FIFO queue size?
25
k=25
******************* 2-Q algorithm ******************** 
what is a?
.8
a=0.800000
seed?
1
seed=1
Total of 40123 hits out of 100000 accesses
6129 were fifo, 33994 were LRU 

class 0, 49787 touch, 32761 lru, 3312 fifo
class 50, 12991 touch, 837 lru, 1248 fifo
class 100, 8298 touch, 202 lru, 511 fifo
class 150, 6206 touch, 67 lru, 301 fifo
class 200, 5174 touch, 47 lru, 229 fifo
class 250, 4343 touch, 37 lru, 164 fifo
class 300, 3736 touch, 15 lru, 106 fifo
class 350, 3525 touch, 18 lru, 116 fifo
class 400, 3065 touch, 4 lru, 75 fifo
class 450, 2814 touch, 6 lru, 67 fifo





squall% zipflru
Simulator to test buffer algorithms.  Zipf distribution
Number of iterations?
100000
M=100000
Database size?
500
N=500
Queue size?
50
n=50
Max FIFO queue size?
25
k=25
********************************  LRU  **********************what is a?
.5
a=0.500000
seed?
1
seed=1
Total of 16072 hits out of 100000 accesses
0 were fifo, 16072 were LRU 

class 0, 29061 touch, 9657 lru, 0 fifo
class 50, 13345 touch, 1809 lru, 0 fifo
class 100, 10402 touch, 1146 lru, 0 fifo
class 150, 8922 touch, 842 lru, 0 fifo
class 200, 7784 touch, 628 lru, 0 fifo
class 250, 6960 touch, 516 lru, 0 fifo
class 300, 6457 touch, 458 lru, 0 fifo
class 350, 5949 touch, 377 lru, 0 fifo
class 400, 5679 touch, 330 lru, 0 fifo
class 450, 5336 touch, 309 lru, 0 fifo



squall% zipflru
Simulator to test buffer algorithms.  Zipf distribution
Number of iterations?
100000
M=100000
Database size?
500
N=500
Queue size?
50
n=50
Max FIFO queue size?
25
k=25
********************************  LRU  **********************what is a?
.8
a=0.800000
seed?
1
seed=1
Total of 34850 hits out of 100000 accesses
0 were fifo, 34850 were LRU 

class 0, 49787 touch, 30336 lru, 0 fifo
class 50, 12991 touch, 2065 lru, 0 fifo
class 100, 8298 touch, 826 lru, 0 fifo
class 150, 6206 touch, 496 lru, 0 fifo
class 200, 5174 touch, 357 lru, 0 fifo
class 250, 4343 touch, 232 lru, 0 fifo
class 300, 3736 touch, 175 lru, 0 fifo
class 350, 3525 touch, 163 lru, 0 fifo
class 400, 3065 touch, 104 lru, 0 fifo
class 450, 2814 touch, 96 lru, 0 fifo


From shasha@SHASHA.CS.NYU.EDU Tue Sep  7 15:44:07 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA15447; Tue, 7 Sep 93 15:44:06 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA03539; Tue, 7 Sep 93 15:44:04 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA15443; Tue, 7 Sep 93 15:44:03 -0400
Date: Tue, 7 Sep 93 15:44:03 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9309071944.AA15443@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: way to go...
Cc: shasha@cs.NYU.EDU
Status: R

Dear Ted,
Very encouraging results.
This suggests some next steps:

1. One thing I'm worried about is the case where we are doing a scan
say and there are k records per page, so it looks like k accesses.
We would want to avoid promoting such a page to the LRU queue in
that case, but that might require a new parameter.

2. Pat and Betty O'Neil apparently got some traces from a Swiss bank.
It might be good to try them too.
Also, he had some test with a B tree;
that might be interesting too.
Pat's email is
alias patrickoneil poneil@cs.umb.edu

3. Finally, there is the need to compare 2-Q with LRU/2.

Dennis

From ted@squall.cis.ufl.edu Sun Sep 12 15:40:22 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA00237; Sun, 12 Sep 93 15:40:21 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA26366; Sun, 12 Sep 93 15:40:10 -0400
Date: Sun, 12 Sep 93 15:40:10 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9309121940.AA26366@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Status: R

Dennis,
I ran a few experiments over the weekend.  I wanted to see how one
should set the max size of the FIFO portion of the buffer.
I graphed the results, and I'm sending them to you.  they're
in postscript, they'll arrive next.

I set the number of data items to 5000, and varied the size of the 
buffer between 250 slots and 2000 slots.  I then varied the proportion
of the queue that can be allocated to the FIFO queue.
0% allocation means pure LRU.
I used 2 zipf input distributions (p_{i}=c i^{-a}$, one
with a=.8 (80-20), the other with a=.5 (much less skewed).

I had expected that the optimal way to limit the size of the FIFO part
was at about 30% of the total queue.  I found that 5-7 % is about the
optimal.  I was encouraged to see that this value is fairly
constant no matter how the parameters are varied, since that makes it
easy to give a tuning recommendation.

The FIFO queue should be large enough to admit hot data items to
the LRU queue, but small enough to exclude cold items.
The best caching strategy with IRM references is to lock the hottest
items in memory, with one buffer reserved to service a miss on a cold item.
As the number of buffers grows, you want to lock in colder and colder items.
Hence, the size of the FIFO queue should grow proportionally.

I still need to test performance when the database size varies,
and perhaps for a non-zipf distribution.
I also hypothesize that the FIFO queue should be usually at its
maximum size (I need to collect those numbers).
If so, we can provide a self-tuning strategy.

Other things to note: a properly tuned 2-Q algorithm can double
your benefit - you'd need twice the number of LRU buffers to get the same hit
rate.  If you allocate too much of the queue to FIFO, your
performance can degrade seriously.
If your FIFO queue is to small, and you have a large buffer,
your performance can degrade. That is, those blips in
the top line in the charts are really there, its not a typo.

	Ted

From ted@squall.cis.ufl.edu Wed Sep 22 10:34:19 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA23851; Wed, 22 Sep 93 10:34:17 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA21547; Wed, 22 Sep 93 10:34:00 -0400
Date: Wed, 22 Sep 93 10:34:00 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9309221434.AA21547@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Status: R

Dennis,

Dear Ted,
Sounds like a lot of good work.
I like the idea that the algorithm is more subtle than I had
originally thought --- just as long as it remains easy to implement.
I'm not sure I understand everything, so please check my assertions.
  
  I'll keep all operations to O(1) with a small constant.

1. The experimental results are irrelevant since you ran the simulations
for too little time. Correct?


 Yes and no.  For the second set of runs I failed to capture the
 "long-term" behaviour.  I did capture a transient response.  I feel
 that we can't ignore the transient response, else we say that a useless
 strategy is optimal.

2.  What do you mean by:
>>1) the `optimum' maximum number of FIFO buffers is 1.
        There should be only one entry?
  >>that is, assuming an IRM reference stream that doesn't change locality,
  >>and allowing enough reference before we take measurements.

 The optimal strategy for an IRM scheme is to lock the hottest buffers
 into memory, and provide a single buffer to service hits on the remainder.
 The closer you come to emulating this strategy, the better your
 performance is.  So, the more strict your admission policy is, the
 better your performance is.
 You need big assumptions to make this work, like arbitrarily long
 IRM reference streams and no change in locality.

 The point is, we can't just look at "highest hit rate eventually".
 We have to look at how soon we can react to change.

5. I understand this analysis, but why do you want accesses on average
of third time?

 The "third time" is arbitrary, but it seems good. 
 Note that you need
 at least 2 references to admit a data item to the LRU buffer, because
 on the first reference admits the item to the FIFO buffer, the second
 to the LRU buffer.  If the probability of success is 1/2, then you need
 an average of 2 trials before a success (admission to the LRU buffer).
 If you want an average of 2.5 trials, you need a probability
 of success of 2/3, but this increases the required size of the FIFO buffer.
 A large FIFO buffer is perhaps not bad if we keep only item IDs.

6. Why are you storing anything. With LRU you only keep id's as well.
Do you mean that you page out all pages on the FIFO buffers?
That seems like a bad idea.

 I'm proposing to page out all items in the FIFO buffer, and only
 remember the their IDs in the queue.
 The problem is the following: to get responsiveness, you need a large
 FIFO buffer.  But, your FIFO buffer steals space from the LRU buffer.
 If you make your FIFO buffer large for responsiveness, your long-term
 hit rate suffers because you have a buffer that is more FIFO than LRU.
 I've observed that when the two-q buffer is working well, the FIFO
 buffer acts as a screening device, and has a low hit rate.
 In analogy to the LRU-2 algorithm, we should eagerly page out the
 items that are going through screening, and admit them after they
 have shown that they are hot.  

 If you don't actually store the pages in the FIFO queue, you can use
 that space for the LRU queue.  So, keeping a FIFO queue that is 70%
 of the size of the LRU queue doesn't steal pages from the LRU queue
 and will improve your long-term hit rate.
 Your buffer algorithm will be responsive to changes in locality
 because your screening device is large.

 If you're still concerned about transient response, we can look at
 a hybrid: set aside 9% of the pages to store the top part
 of the FIFO queue.  set aside another 1% of the pages to store the
 bottom part of the FIFO queue, which acts as a screening device.
 the remaining 90% of the pages are used for the LRU queue.
 So, new very hot items get admitted to the LRU buffer immediately
 without an addn'l miss.  New moderately hot items get admitted
 after 1-2 addn'l misses.
 Cold items are still locked out, and the LRU buffer is still large.

	Ted


From shasha@SHASHA.CS.NYU.EDU Wed Sep 22 10:58:18 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA23939; Wed, 22 Sep 93 10:58:17 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA13261; Wed, 22 Sep 93 10:58:17 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA23935; Wed, 22 Sep 93 10:58:15 -0400
Date: Wed, 22 Sep 93 10:58:15 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9309221458.AA23935@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: 2Q algorithm
Cc: shasha@cs.NYU.EDU
Status: R

Ted,
By my count, the algorithm now has the following parameters:

0. Size of available buffer.

1. size of FIFO queue as a recording device (this is really just
a set isn't it?)

2. size of FIFO queue as  a place to store real pages
(for transient response).

3. size of LRU queue (which is determined from 0, 1, 2).

How well it works depends on

1. how stable the traffic is --- if IRM, then want a small FIFO.
If lots of transient changes, then want a big FIFO.

We would like easily measureable traffic variables that can
determine the parameters. 
Is this understanding now correct?
Thanks,
Dennis

From ted@squall.cis.ufl.edu Wed Sep 22 11:24:12 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA24012; Wed, 22 Sep 93 11:24:10 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA21628; Wed, 22 Sep 93 11:23:55 -0400
Date: Wed, 22 Sep 93 11:23:55 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9309221523.AA21628@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  2Q algorithm
Status: R

Dennis,
Yes, I'm proposing the third parameter.
But, the idea is to provide a one-size-fits-all strategy.
thats why my analysis for the size of the FIFO is distribution-independant.

For an alternative, you can store all FIFO pages in memory,
and set it to about 10% of the free buffers.               
If you detect a locality change, increase the size of the FIFO allocation,
When the reference stream stabilizes, decrease the FIFO allocation.
However, this requires an on-line tuning strategy.
I can't think of a good strategy, how to detect locality change,
how to detect a stable period, how to keep the strategy from being fooled, etc.

I don't know if keeping part of the FIFO pages in memory and part out
of memory will help.  It'll improve performance for short, hot transients,
but do we expect to see those? 

It seems to me that an allocation of 5 - 10 % of the pages to a FIFO
won't hurt the IRM response much, but will make the algorithm much
more robust.  An additional recording area will again hurt IRM response
by only a little, but further make the algorithm robust.
If we can say, "set these parameters to 10% and 60%", then
expect that everyone has good performance, its better than saying
"examine your reference stream carefully, then set the parameter to X".
	Ted

From ted@squall.cis.ufl.edu Tue Sep 28 18:27:27 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA14927; Tue, 28 Sep 93 18:27:25 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA08257; Tue, 28 Sep 93 18:26:51 -0400
Date: Tue, 28 Sep 93 18:26:51 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9309282226.AA08257@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Status: R

Dennis,
sorry, this is the file I meant to send.
For the figures,
1) the hit ratio never exceeds .8 because there aren't enough
buffers. I'll calculate the best-possible hit rate.
2) a is the parameter of the zipf distribution (p_{i}=c*i^{-a}).
3) FIFO buffers are free in the sense that
I'm leaving the number of LRU buffers fixed as I increased the number of FIFO
buffers.  I'm only counting the hit rate from the LRU buffers.

	Ted

---------------------------------------------------------------------------

Dennis,
yet another update.

I ran some simulations where I assumed that the FIFO buffer is used
only as a screening device.
(this was easy because I only had to modify the parameters and collect
the LRU hit rate).

The results show that this is a quite good algorithm when you set the
number of FIFO screening positions to about 50% of the number of LRU buffers.
Further, the performance is not highly dependent on the value of the
tuning parameter.

I'm sending three postscript figures.
The first figure shows the hit rate when FIFO buffers are free, but
FIFO hits don't count towards the overall hit rate.
I collected these results after processing 1,000,000 references.
The hit rate sits at a high plateau when the number of FIFO
buffers is between 25% and 75% of the number of LRU buffers.

There is a sudden rise in the hit rate for 20,000 buffers and 18,000
FIFO entries.  I don't have a good explanation for it.

The second figure  is the same as the first, except that the results
are collected after 100,000 references.
(with 50,000 data items and 10,000 buffers, 100,000 references
is a short reference string)
This figure shows that the modified 2Q algorithm is responsive,
since you get a high hit rate with the FIFO size set to 
25% or greater.

In the third figure, I plot hit rate vs. the progress through
a run of 4,000,000 references.  (I forgot to label it, but there are
10,000 buffers, of 20% of the number of data items).
The plot shows that if the number of FIFO buffers is 25% or
more of the LRU buffers, then the 2Q algorithm is responsive
and has a high hit rate.
At the end of the run, the plots for .1 through .9 are well initialized.
Its hard to see, but .1 has the highest hit rate, .9 the lowest.
the difference is small.

I think that this is a good algorithm.
I don't think that there is a need to study the 2 parameter algorithm
for the SIGMOD paper (keep some physical fifo buffers around).
The modified 2Q algorithm is almost as responmsive as LRU when the
number of FIFO buffers is about 25% - 75% of the number of LRU buffers.
(I have some data on LRU, but I forgot to plot it last night, sorry.)

In addition, we can improve the initial responsiveness of the 2Q algorithm
by using a "fast load": If there is room in the LRU buffer,
store the referenced page.  If the LRU buffer is full, store the page
only if the page is listed in the FIFO area.

I'd like to look at the following issues next:
1) implement the fast load, look at performance.
2) next, run the simulation until the hit rate is stable,
then change the reference distribution.
3) throw some sequential scans into the stream.

	Ted


From shasha@SHASHA.CS.NYU.EDU Tue Sep 28 18:45:28 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA14978; Tue, 28 Sep 93 18:45:28 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA19069; Tue, 28 Sep 93 18:45:27 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA14974; Tue, 28 Sep 93 18:45:25 -0400
Date: Tue, 28 Sep 93 18:45:25 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9309282245.AA14974@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: a few questions
Cc: shasha@cs.NYU.EDU
Status: R

	Dear Ted,
	I like this test and this variant of the algorithm.
	Please check my understanding:

	1. You plan to study a one-parameter algorithm in which
	you look only at the size of the LRU buffers and assume
	there are 50% as many FIFO tags, but you don't count their
	size cost and don't count hits to FIFO.

	2. I don't really understand the third figure.


In the third figure, I plot hit rate vs. the progress through
a run of 4,000,000 references.  (I forgot to label it, but there are
10,000 buffers, of 20% of the number of data items).
The plot shows that if the number of FIFO buffers is 25% or
more of the LRU buffers, then the 2Q algorithm is responsive
and has a high hit rate.
At the end of the run, the plots for .1 through .9 are well initialized.
Its hard to see, but .1 has the highest hit rate, .9 the lowest.
the difference is small.

	So, when you first see a group of references you have
	a better hit ratio than after the arrival distribution
	stabilizes?

	3. This is a good heuristic.


In addition, we can improve the initial responsiveness of the 2Q algorithm
by using a "fast load": If there is room in the LRU buffer,
store the referenced page.  If the LRU buffer is full, store the page
only if the page is listed in the FIFO area.

	4. Good next steps.
	Afterwards, the challenge is to compare it with LRU/2.

I'd like to look at the following issues next:
1) implement the fast load, look at performance.
2) next, run the simulation until the hit rate is stable,
then change the reference distribution.
3) throw some sequential scans into the stream.


	Thanks,
	Dennis


From ted@squall.cis.ufl.edu Wed Sep 29 08:26:48 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA15875; Wed, 29 Sep 93 08:26:46 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA08452; Wed, 29 Sep 93 08:26:11 -0400
Date: Wed, 29 Sep 93 08:26:11 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9309291226.AA08452@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  a few questions
Status: R

Dennis,

1) Yes, I think that for now we should limit ourselves to
storing FIFO tags only.  50% seems like a good rule.
I read over O'Neil et al.'s pap[er yesterday, and I had some thoughts
about the correlated hit problem.  I think that we can towards
an automatic solution by using the 2-parameter 2Q algorithm.
Set aside 10% (or so) of the buffers for the first stage of the FIFO.
If a buffer in this stage gets 1 hit, don't add it to the LRU.
If it gets 2, add it to the LRU.
set up a second stage FIFO that has 40-50 % as many tags as the LRU.
run the usual algorithm with the second stage.
Exploring this issue can make a journal version stronger than the conference
version.

2) The second figure is a running count of the hit rate.
the first point on each line is the hit rate after 400,000 references,
the second is after 800,000 references, ..., the last after 4,000,000
references.  It shows the responsiveness of the 2Q algorithm
as a function of the size of the FIFO tags.

3) LRU/2 - I agree, I'll try to get it coded today or tomorrow.
It looks easy.

	Ted

From ted@squall.cis.ufl.edu Fri Oct  1 08:19:33 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA26815; Fri, 1 Oct 93 08:19:32 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA10284; Fri, 1 Oct 93 08:18:49 -0400
Date: Fri, 1 Oct 93 08:18:49 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310011218.AA10284@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Subject: 2Q beats LRU/2
Status: R


Dennis,
I got the LRU/2 queue working.
I ran a couple of quick experiments.
I used a database of 50,000 items and 10,000 buffers.
I ran a trace of 1,000,000 references.
I used a=.5 and a=.8 for the parameters in the zipf distribution
(a=.8 is close to 80/20, a=.5 is has a much smaller skew).
from LRU/2, I got a hit rate of .355 for a=.5 and 
a hit rate of .629 for a=.8

When I used the FIFO queue for screening only and used 50% for the FIFO size,
I got a hit rate of .361 for a=.5 and .632 for a=.8

Well, 2Q doesn't beat LRU/2 by much, but the performance is about the
same, and 2Q is easier and cheaper to implement.

regarding easier to implement, LRU/2 really does need the priority queue.
I tried to do without it and didn't get an answer after 20 hours.
The priority queue has to be able to do more than insert and deletemin,
you need to demote an entry in the queue (when you get a hit on
a buffered item).  This means that you must keep a table translating
from id number to position in the queue, or you must implement the priority
queue using pointers.  Fortunately, I had a priority queue with demotion
lying around from a previous project, else this would have taken another
couple of days.
        Ted

From ted@squall.cis.ufl.edu Fri Oct  8 15:00:22 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA17748; Fri, 8 Oct 93 15:00:18 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA15193; Fri, 8 Oct 93 14:24:10 -0400
Date: Fri, 8 Oct 93 14:24:10 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310081824.AA15193@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Status: R

Dennis,
I implemented the scans to test the resiliance of 2Q.
I set up the simulation so that after scanfreq IRM references,
the simulator would start a scan.
A scan is a non-repeating sequence of references.
A scan has an average length of scanlen.
there are 2 types of scans, to buffers not accessed by
the IRM stream, or to buffers that are accessed by the 
IRM stream.

I ran some simulations last night.  I used a database size of 50,000,
a buffer size of 5,000 and a skew of .5 in the ZIPF distribution.
I varied the average length of a scan.
I also varied the probability of starting a scan to keep the number
of scan references at about 1/3 of the total number of references.
I ran the scan simulator on the LRU and the 2Q buffer algorithms.
Half the time, the scans were done in the buffers referenced by
the IRM stream, half the time to a different set of buffers.

1) LRU
no scan		scanlen=1000	scanlen=2000	scanlen=4000	scanlen=8000
.182		.122		.125		.119		.113
% of no-scan	67.0		68.7		65.4		62.1

2) 2Q FIFO is 10%
no scan		scanlen=1000	scanlen=2000	scanlen=4000	scanlen=8000
.224		.172		.182		.178		.167
% of no-scan	76.8		81.3		79.5		.745

3) 2Q FIFO is 25%
no scan		scanlen=1000	scanlen=2000	scanlen=4000	scanlen=8000
.247		.178		.187		.183		.173
% of no-scan	72.1		75.7		74.0		70.0

4) 2Q is 50%
no scan         scanlen=1000    scanlen=2000    scanlen=4000    scanlen=8000
.248		.175		.182		.179		.170
% of no-scan 	70.6		73.4		72.2		.685

These results show that 2Q has a higher hit rate than LRU when
there are scans in the request stream.
In addition, the performance of 2Q doesn't suffer because of
the scans nearly as much as LRU does.
Since 1/3 of the references are scans, one should expect about
a 67% hit rate as compared to the case when the stream has no scans.
2Q does better because some of the scan references are to items
in the regular reference stream.

The performance of 2Q with a scan stream deteriorates when the
number of FIFO buffers increases.  Also, it suffers when the scan streams
become long.  This occurs because of repeat references in the scans.
the references from the old scan are held in the FIFO buffer, then
cause hits and addition to the LRU queue on the next scan.

In all cases, 2Q is clearly superior to LRU.
I'll look into the performance a little more carefully to make certain that
my hypotheses hold.  I think I'll restrict scans to executing in 
an auxillary set of buffers, so we know that the best possible
hit rate is 67%.
Which do you think is more ``realistic'', fewer long scans or mre short scans?


	Ted

From ted@squall.cis.ufl.edu Sat Oct  9 12:29:03 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA19117; Sat, 9 Oct 93 12:29:01 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA15884; Sat, 9 Oct 93 12:27:58 -0400
Date: Sat, 9 Oct 93 12:27:58 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310091627.AA15884@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis,
Here's the comparison with LRU2 that you asked for.
the only problem is that these take a long time to run.

As you can see, 2Q has a much higher hit rate and suffers less from
scans than LRU/2.
	Ted

1) LRU
no scan		scanlen=1000	scanlen=2000	scanlen=4000	scanlen=8000
.182		.122		.125		.119		.113
% of no-scan	67.0		68.7		65.4		62.1

2) 2Q FIFO is 10%
no scan		scanlen=1000	scanlen=2000	scanlen=4000	scanlen=8000
.224		.172		.182		.178		.167
% of no-scan	76.8		81.3		79.5		.745

3) 2Q FIFO is 25%
no scan		scanlen=1000	scanlen=2000	scanlen=4000	scanlen=8000
.247		.178		.187		.183		.173
% of no-scan	72.1		75.7		74.0		70.0

4) 2Q FIFO is 50%
no scan         scanlen=1000    scanlen=2000    scanlen=4000    scanlen=8000
.248		.175		.182		.179		.170
% of no-scan 	70.6		73.4		72.2		.685

5) LRU2
no scan         scanlen=1000    scanlen=2000    scanlen=4000    scanlen=8000
.231		.163		.166		.160		.150
% of no-scan	70.6		71.9		69.3		64.9

From shasha@SHASHA.CS.NYU.EDU Sat Oct  9 13:29:41 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA19241; Sat, 9 Oct 93 13:29:40 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA02017; Sat, 9 Oct 93 13:29:39 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA19237; Sat, 9 Oct 93 13:29:38 -0400
Date: Sat, 9 Oct 93 13:29:38 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9310091729.AA19237@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: Re:  great news
Cc: shasha@cs.NYU.EDU
Status: RO

	Ted,

Dennis,
my thought was to fudge the waiting parameterfor the SIGMOD paper,
then explore it for the ASPLOS paper.
	For Sigmod, we should use the same scheme LRU/2 used.
We can try attaching a time to the first entry to the FIFO, and not
promoting to the LRU unless the 2nd hit is more than X seconds or X references
later.  A second possibility if to not promote if the item is in the
first Y buffers of the FIFO.

For the ASPLOS paper I was thinking of examining the 2-parameter algorithm.
use, say 10% of the buffers to hold the first part of the FIFO queue.
don't promote from this queue unless there is a third hit, more than X seconds
from entry, etc.  Promote from the tag-only part of the FIFO queue.
the advantage is that it might be easily tunable.
	Ted

	Ultimately, the best may be to promote only if the second access
	to page p follows an access to a page p' at least for a given process.
	This does the right thing for scans.
	I think that simply means you don't promote the first item
	in a FIFO queue for a given process.
	Does this seem feasible?


	By the way, when you promote from the FIFO, do you delete from
	its middle or just mark it and let its FIFO rating degenerate?
	If not, the operation may not be constant time.

From shasha@SHASHA.CS.NYU.EDU Mon Oct 11 10:19:07 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA22011; Mon, 11 Oct 93 10:19:06 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA05771; Mon, 11 Oct 93 10:19:04 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA22007; Mon, 11 Oct 93 10:19:03 -0400
Date: Mon, 11 Oct 93 10:19:03 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9310111419.AA22007@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: Re:  great news
Cc: shasha@cs.NYU.EDU
Status: R

>>Ted,


Dennis,
        Ultimately, the best may be to promote only if the second access
        to page p follows an access to a page p' at least for a given process.
        This does the right thing for scans.
        I think that simply means you don't promote the first item
        in a FIFO queue for a given process.
        Does this seem feasible?

The O'Neils were worried about correlated accesses.
looking for an intervening access seems difficult,
how to define it, how to detect it, is it required to
come from the same or from a different process, etc.
I'm suggesting the 2-parameter algorithm to hopefully
solve the problem in a brain-dead (on behalf of the user) but
robust solution.

>> If 2-parameter works, then fine. It seems difficult to prove that
>> there won't be any pathological results even for the scan vs.
>> random access case.
>> Maybe it will work though because at worst we will behave
>> more or less like LRU.
>> By the way defining the intervening access
>> is fairly straightfoward: if it comes from the same thread of control
>> then it is intervening.

        By the way, when you promote from the FIFO, do you delete from
        its middle or just mark it and let its FIFO rating degenerate?
        If not, the operation may not be constant time.

I always promote.  Most VM systems keep page table entries locked in
memory (at least 4.3 BSD does).  the page table entry contains links
to implement LRU.  These links can be used to implement the
FIFO queue also.  All you need to do is define another page state or 2.
everything is O(1) with small contants, and there is no need for elaborate
support structures.
>> good.
	Ted


From ted@squall.cis.ufl.edu Wed Oct 13 14:22:15 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA26201; Wed, 13 Oct 93 14:22:10 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA18586; Wed, 13 Oct 93 14:20:52 -0400
Date: Wed, 13 Oct 93 14:20:52 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310131820.AA18586@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: RO

Dennis,
I wrote a two-pool simulator to simulate random access through a
data structure.
The simulation input is identical to the two pool simulation
described on page 12-13 of Oneil's paper.
The simulator alternates between uniformly random samples from
a large pool and from a small pool.

2Q  beats LRU/2 for <= 100 buffers, loses by a small margin for > 100 buffers.
I set FIFO to 50% of LRU for the 2Q runs.
On my simulations, I found LRU  and LRU/2
results very close to those in the paper,
so I'll just list them here.

Buffers		LRU	LRU/2	2Q
60		.14	.291	.299
80		.18	.382	.395
100		.22	.459	.486
120		.26	.496	.500
140		.29	.502	.501
160		.32	.503	.502


I'd like to think about wrapping up the performance results section.
I have three reference sources: zipf, sequence, and two-pool.
Important things to show are:
1) performance with the zipf distribution
    a) sensitivity to changes in the FIFO size
    b) responsiveness
    c) comparison with LRU, LRU2 with different buffer sizes.
2) with stream
    - pick 1 or 2 FIFO sizes
    - pick 2 or 3 stream sizes (short, medium, long)
    - compare to LRU, LRU2 with different buffer sizes
3) two-pool
    - Use the same experiment as in the O'Neil paper,
	use a FIFO size of 50%.

Anything else?
If you get good traces I'll run those.
I can produce a run where we store the FIFO pages in memory instead of
as tags, to show the reason for the choice.
Also, I can derive some equations that show why 2Q works,
why you should pick the FIFO size as a % of the LRU size,
heuristic for 50%, etc.
	
	Ted

From ted@squall.cis.ufl.edu Sat Oct 16 12:46:35 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA02228; Sat, 16 Oct 93 12:46:33 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA20848; Sat, 16 Oct 93 12:45:12 -0400
Date: Sat, 16 Oct 93 12:45:12 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310161645.AA20848@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: windows traces
Status: R

Dennis,
I ran the windows traces through a preprocessing, so that
I can map page references to contiguous set of integers.
(I divided by 4096 to get the page number)

I found that the small trace references 38 distince pages, and
the large trace references 42 distinct pages.
	Ted

From ted@squall.cis.ufl.edu Fri Oct 22 14:13:41 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA12820; Fri, 22 Oct 93 14:13:39 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA23969; Fri, 22 Oct 93 14:12:00 -0400
Date: Fri, 22 Oct 93 14:12:00 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310221812.AA23969@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  description of our algorithm
Status: R

Dennis,
the problem with the divided-A1 algorithm is that I'd need to make
some substantial modifications (the A1-in queue must be separate
from the A1-out queue).  the algorithm is simple, but debugging
and validation are time consuming.
I thought that we were going to explore this algorithm and the
intervening-access algorithm in the ASPLOS paper.

For a "real trace", I have access to the log files of the
NSSDC archive.  I can reduce those files and run them
through the simulators.  I think there are 18,000 file requests
in the log that I have.
In general, I think that an interesting application of 2Q is to
look at caching for mass storage systems.
there are two new twists: 1) you cache files, and these have variing
sizes. 2) the files have minimum-residency constraints (you might
give the requester a minimum of 1 hour, 1 day, etc., to transfer
the files from temporary disk).  We can also view the files as
"objects" of variing sizes.
	Ted

From ted@squall.cis.ufl.edu Fri Oct 22 14:35:40 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA12956; Fri, 22 Oct 93 14:35:37 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA24040; Fri, 22 Oct 93 14:34:03 -0400
Date: Fri, 22 Oct 93 14:34:03 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310221834.AA24040@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  description of our algorithm
Status: R

1) OK, I'll work up something to show to oracle.
But, how do we distinguish the ASPLOS paper from the SIGMOD paper?
2) I would like to create a large-storage caching algorithm as
a separate piece of work, in addition to the SIGMOD and ASPLOS papers.
It would make my NASA sponsors happy.                         
	Ted

From ted@squall.cis.ufl.edu Fri Oct 22 15:45:22 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA13159; Fri, 22 Oct 93 15:45:19 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA24168; Fri, 22 Oct 93 15:43:45 -0400
Date: Fri, 22 Oct 93 15:43:45 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310221943.AA24168@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  description of our algorithm
Status: R

Dennis,
I was thinking about the algorithm descriptions, and you should
include the fast-load heuristic.

Let S1 be the size of A1, Sn be the size of An.
Let K be the max size of A1.  Let N be the number of available buffers.

A1-in algorithm:
On a miss, add the referenced page to the head of A1.
If S1+Sn<N, don't kick out any page.

A1-out algorithm:
On a miss, if Sn<N, add to the head of the LRU queue.

A1-out will load as fast as LRU, but is likely to have a lower
value in the buffer for a while.  Overall, it should help.
I can run experiments to validate this.
	Ted

From ted@squall.cis.ufl.edu Tue Oct 26 12:24:11 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA05760; Tue, 26 Oct 93 12:24:03 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA26667; Tue, 26 Oct 93 12:22:17 -0400
Date: Tue, 26 Oct 93 12:22:17 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310261622.AA26667@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  jcss reprijnts
Status: R

Dennis,
I played around with the trace files yesterday.
I can get some results out when I use a page size of 256 bytes
and I run the large trace.
with 8 pages, LRU gets about 43% hit rate, and 2Q gets about 47.
for this, I use 2 pages in A1in.
LRU2 does inbetween, but it feels like an unfair comparison
because I'm not using any page-pinning heuristics.

The fast-load algorithm works quite well.  In fact so well that
the startup transients almost disappear.
I'm putting together a workload generator that changes
locality in the middle of the simulation for a fairer
test of responsiveness.
	Ted

From ted@squall.cis.ufl.edu Wed Oct 27 09:46:54 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA08005; Wed, 27 Oct 93 09:46:53 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA27420; Wed, 27 Oct 93 09:45:06 -0400
Date: Wed, 27 Oct 93 09:45:06 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310271345.AA27420@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  your code
Status: R

Dennis, no there is no provision for ignoring second accesses.
I had some thoughts about using the A1in buffer for this.
that is, don't promote to LRU if you have a hit on A1in,
only on A1out.  An alternative is your intervening-access
algorithm, but I have no information about processes in the trace files.
My results on the supplied trace files only use the heuristic
of setting the size of A1in to 2, and I get hit rates that match
or beat that of LRU.

I use fifohistory to collect statistics, I analyze fifohistory
in the driver module.

There is a bug in the code I sent.  I added some comments to a working
version, and I have a typo in ending one of the comments
	*./ instead of */
If you make this change it works correctly.

	Ted

From ted@squall.cis.ufl.edu Wed Oct 27 11:46:53 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA08158; Wed, 27 Oct 93 11:46:51 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA27611; Wed, 27 Oct 93 11:45:06 -0400
Date: Wed, 27 Oct 93 11:45:06 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310271545.AA27611@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis,
Here are the results of running the trace files.
I use the larger file (the .libxt file), and
assumed 256 byte pages.  that gave me close to 300 pages.
I ran the file through LRU, LRU2 and 2Q and varied the number
of buffers. (as I noted, the LRU2 comparison is a bit unfair).
for 2Q, I always used 2 buffers for A1in and half the number
of lru+a1in buffers for A1out.

# buffers	LRU	LRU/2	2Q
4		.428	.443	.444
6		.479	.504	.511
8		.536	.551	.565
10		.578	.601	.607
12		.655	.676	.694
14		.713	.737	.743
16		.730	.758	.771
18		.759	.785	.800
20		.772	.799	.821
22		.795	.825	.839
24		.825	.848	.848
26		.839	.860	.862
28		.849	.869	.878
30		.861	.876	.894

40		.946	.938	.944

So, 2Q beats LRU even when we use a simple parameter assignment strategy.
the exception is for 40 buffers, when the hit rate is very high.
With such a high hit rate, you usually have locked the hot items
into the buffer with LRU, and LRU beats LRU2 and this version of 2Q
becasue it can react faster to changes in locality.

I tried changing the size of the A1 buffer.  I left the total
number of FIFO buffers constant.
for 40 buffers
|A1in|		hit rate
1		.941
2		.944
3		.946
4		.946
5		.946	
6		.946
7		.946

for 20 buffers
|A1in|		hit rate
1		.821
2		.821
3		.821
4		.818


My conclusions:
If your hit rate is low, you need to concentrate on locking the hottest
items into the buffer.  So, LRU2 and 2Q always beat LRU by a large
margin.  If the hit rate is high, LRU is already locking the hot
items into the buffer.  What matters to get the last few percent in
the hit rate is to react quickly to a change in locality.
LRU2 fails, its definition makes it a little slow to react
(though perhaps some addn'l hints can help).
2Q can match LRU if you make the size of the A1in queue large enough
(but not too large).  

Its not clear that 2Q is a better page replacement algorithm than
LRU for general programs, since you expect very high hit rates.
It will beat LRU for disk page buffering (you expect mnoderate hit
rates) and for database applications (since they have special 
characteristics and often a relatively low hit rate).

	Ted

From ted@squall.cis.ufl.edu Wed Oct 27 12:42:33 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA08371; Wed, 27 Oct 93 12:42:31 -0400
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA27641; Wed, 27 Oct 93 12:40:41 -0400
Date: Wed, 27 Oct 93 12:40:41 -0400
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9310271640.AA27641@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  your results
Status: R

Dennis,
we can put the results in the paper, I can fill in
the details around 40 buffers.
I'll try to think about including some of the page pinning
heuristics in the lru2 algorithm.

WRT setting the parameters, this is my intuition:
Given N buffers,
set the max size (after startup) of A1in to N/10
set the max size of A1out to .4*N

that should give 2Q enough responsiveness to be useful.

WRT the NASA traces, I was planning on creating a trace file from their
may '93 log this weekend. 
	Ted

From mess@almaden.ibm.com Tue Nov  2 13:33:19 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA28322; Tue, 2 Nov 93 13:33:18 -0500
Received: from ALMADEN.IBM.COM by cs.NYU.EDU (5.61/1.34)
	id AA07040; Tue, 2 Nov 93 13:33:15 -0500
Message-Id: <9311021833.AA07040@cs.NYU.EDU>
Received: from almaden.ibm.com by almaden.ibm.com (IBM VM SMTP V2R2)
   with BSMTP id 4282; Tue, 02 Nov 93 10:33:22 PST
Date: Tue, 2 Nov 93 10:33:21 PST
From: "Ted Messinger" <mess@almaden.ibm.com>
To: SHASHA@cs.NYU.EDU
Subject: Trace data
Status: R

dennis,

to provide a unique identification for each page accessed in a db2
system you need a 3 digit (decimal) database id, a 3 digit tablespace
id, and an 8 digit page number, which means we need a total of 16 bytes
of data (2 blanks to separate the values) for each page request.
i have a tape full of customer trace data that contains approx. 1.9
million page requests. that means 32 mb of data. my mvs machine that
has access to that data does not have access to any outside network,
but it does have a network connection to a vm machine that has internet
access. i doubt that i could ever get a 32 mb file shipped across the
network (either local or internet) so i assume i will have to send it
in chunks. i could block the 16 byte entries into some resonable block
size (6400??) and create a file of n blocks, and then send you the file.
then there would be some number of files. how does that sound?

cheers,
ted

p.s. the data that i would send you will be created with a C program.


From ted@squall.cis.ufl.edu Sat Nov  6 17:35:36 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA11928; Sat, 6 Nov 93 17:35:34 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA02363; Sat, 6 Nov 93 17:35:20 -0500
Date: Sat, 6 Nov 93 17:35:20 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311062235.AA02363@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis,
I looked at the figures I sent you, yes they are a mess.
I'll fix them up and send off another batch tomorrow.

I processed the traces from IBM.
They look good, 75514 unique pages out of 500000 references
Now I need to try run the processed file through
the simulators.  Here are some preliminary results.


3000 buffers	LRU	LRU/2	2Q
A1in=300	.730	.737	.7413 	(A1out=1200)
A1in=500			.7422   (A1out=1600)

1500 buffers	LRU	LRU/2	2Q
A1in=250	.6831	.6926	.6967  (A1out=750)

1000 buffers    LRU     LRU/2   2Q
A1in=160	.6544	.6629	.6680  (A1out=500)

500 buffers	LRU	LRU/2	2Q
A1in=80		.6055	.6020	.6097  (A1out=250)

250 buffers	LRU	LRU/2	2Q
A1in=50		.5396	.5367	.5448  (A1out=125)


I need to run some tuning experiments to find
the best parameters for LRU/2 and 2Q.

	Ted



From ted@squall.cis.ufl.edu Mon Nov  8 17:28:00 1993
Received: from squall.cis.ufl.edu by shasha.cs.nyu.edu (5.61/1.34)
	id AA00322; Mon, 8 Nov 93 17:27:57 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA03471; Mon, 8 Nov 93 17:27:52 -0500
Date: Mon, 8 Nov 93 17:27:52 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311082227.AA03471@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: IBM trace study
Status: RO

Dennis,
I ran a few experiments and I found that setting A1in to 25%
and A1out to 35% gave good performance.
So, I ran a suite of experiments with that setting
(LRU/2 has a FIFO size of 25% also)
Here's the results.

# buffers       2Q      LRU2    LRU
50              .348    .345    .317
100             .436    .432    .413
300             .523    .516    .512
500             .616    .610    .606
700             .643    .639    .630
1000            .668    .665    .654
1200            .681    .678    .667
1500            .696    .693    .683
1700            .704    .702    .691
2000            .715    .712    .704
2500            .730    .727    .718
3000            .741    .738    .730
3500            .751    .747    .740
4000            .759    .755    .748
5000            .771    .768    .762
6000            .781    .778    .773
7000            .788    .786    .781
8000            .794    .792    .789
9000            .800    .798    .795

So,
2Q always beats LRU/2, and has a respectable lead over LRU.
This is comparable to the O'Neils results.  they
don't run experiments where the hit rate is better than .47
I'll send the chart tomorrow.

Also, I tried to print the charts from ghostview, and
I couldn't get anything.  apparently, the printer doesn't
want to print the encapsulated postscript.  It'll
print if you import into a .tex file, though.
	Ted

From ted@squall.cis.ufl.edu Wed Nov 10 08:02:03 1993
Received: from squall.cis.ufl.edu by shasha.cs.nyu.edu (5.61/1.34)
	id AA03863; Wed, 10 Nov 93 08:02:01 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA04271; Wed, 10 Nov 93 08:01:58 -0500
Date: Wed, 10 Nov 93 08:01:58 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311101301.AA04271@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Subject: Re:  experiments section
Status: R

Dennis,
Let me re-send you the tables for the trace-based simulations.
The ones you have might be a little too preliminary for poublication.

I'd like to argue for including one of the hit rate vs. time figures.
The performance of the algorithms on real traces depends on their
responsiveness.  Also, we need to provide a reason for wanting to set
A1in to 50% not 5%.
If we run out of space (likely) we should cut these figures.
But, we still need to discuss the results, and
we should put these in the journal version.
	Ted

From ted@squall.cis.ufl.edu Mon Nov 15 11:42:27 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA16618; Mon, 15 Nov 93 11:42:25 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA07001; Mon, 15 Nov 93 11:42:20 -0500
Date: Mon, 15 Nov 93 11:42:20 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311151642.AA07001@squall.cis.ufl.edu>
To: shasha@shasha.cs.nyu.edu
Subject: Re:  first draft of paper
Status: R

Dennis,
The analysis shows
  1) that the A1 queue really does act like a filter
  2) that its size should be a given fraction of the Am size.
So, the tuning results from the paper can be applied to much
larger (smaller) systems.
I'll try to get an insensitivity result.

I've been thinking about A1in and the correlated reference period.
It seems to me that A1in should serve to filter out the correlated references.
That is, only promote to Am the references made to pages in A1out,
don't promote from A1in.
I revised the simulator to reflect these changes.
without tuning, the new 2Q algorithm beat the old 2Q algorithm
for most buffer sizes, sometimes substantially.
I hope to get new results by tomorrow.

I'm working on the paper, I'll update the algorithm description
to reflect the new approach.
	Ted

From ted@squall.cis.ufl.edu Fri Nov 19 14:59:06 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA18129; Fri, 19 Nov 93 14:59:04 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA09702; Fri, 19 Nov 93 14:59:01 -0500
Date: Fri, 19 Nov 93 14:59:01 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311191959.AA09702@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  analysis confusion
Status: R

Dennis,
Yes, I'm pretty happy with the paper, it came out better than
I expected.  Still, I want to make certain.

LRU can beat 2Q in a number of circumstances:
rapidly changing locality, most correlated references
not caught by A1in, etc.
But, it is the exception, not the rule.
	Ted

From ted@squall.cis.ufl.edu Sat Nov 20 13:11:42 1993
Received: from [128.227.35.35] by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA20598; Sat, 20 Nov 93 13:11:35 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA10161; Sat, 20 Nov 93 13:11:31 -0500
Date: Sat, 20 Nov 93 13:11:31 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311201811.AA10161@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  new algorithm and changing locality
Status: R

Dennis,
yes, thats the idea.  You'll get one addn'l miss on Aout.
	Ted

From ted@squall.cis.ufl.edu Mon Nov 22 17:41:33 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA28155; Mon, 22 Nov 93 17:41:30 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA11204; Mon, 22 Nov 93 17:41:22 -0500
Date: Mon, 22 Nov 93 17:41:22 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311222241.AA11204@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  tests
Status: R

Dennis,
On the simulations with the synthetic traces (i.e, zipf, zipf+scan, 2-pool),
there are no correlated references. Hence, I set Kin=1.
On the simulations with real traces (db2, window), there are plenty
of correlated references, and Kin=.2*B, Kin=.3*B both work well.
I always use Kout=.5*B, except to test the setting of Kout.

I revised some of the descriptions to make more clear that I
set Kin=1 on the synthetic-trace experiments.
OOW ran their experiments the same way.
It makes the theoretical performance of 2Q, LRU/2 a bit more clear.
	Ted

From ted@squall.cis.ufl.edu Tue Nov 23 14:28:54 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA02502; Tue, 23 Nov 93 14:28:51 -0500
Received:  by squall.cis.ufl.edu (5.61ufl/4.12)
	id AA11615; Tue, 23 Nov 93 14:28:40 -0500
Date: Tue, 23 Nov 93 14:28:40 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <9311231928.AA11615@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  tests
Status: R

Dennis,
One addn'l experiment sounds like a good idea.  I'll
be able to report the results tomorrow.
	Ted

From margo@das.harvard.edu Thu Dec  9 16:56:42 1993
Received: from virtual61.harvard.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA13970; Thu, 9 Dec 93 16:56:39 -0500
Date: Thu, 9 Dec 93 16:56:36 EST
From: margo@das.harvard.edu
Message-Id: <9312092156.AA17274@virtual61.harvard.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  page access traces
Status: R

Cool!  The Sprite file system traces are publicly available,
but they are at a logical file level (i.e. byte stream).
It shouldn't be too tough to use them.  You can get them
via anonymous ftp at:
	sprite.berkeley.edu:sosp-traces

I can't think of any other source of traces off the top of my
head.  I know that IBM has tons of disk traces if you have
connections there.  Also, HP has some disk level tracing, but
I'm not sure how useful that would be.  You might try contacting
John Wilkes at hplabs (wilkes@hpl.hp.com).

- Margo


From wilkes@hplajw.hpl.hp.com Fri Dec 17 01:07:23 1993
Received: from hplms26.hpl.hp.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA04553; Fri, 17 Dec 93 01:07:19 -0500
Received: from hplms2.hpl.hp.com by hplms26.hpl.hp.com with SMTP
	(16.6/15.5+ECS 3.3+HPL1.1S) id AA16976; Thu, 16 Dec 93 22:07:51 -0800
Received: from hplajw.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.6/15.5+ECS 3.3+HPL1.1I) id AA08739; Thu, 16 Dec 93 22:07:10 -0800
Received: by hplajw.hpl.hp.com
	(1.37.109.4/15.5+IOS 3.14) id AA17541; Thu, 16 Dec 93 22:07:10 -0800
Date: Thu, 16 Dec 93 22:07:10 -0800
From: John Wilkes <wilkes@hplajw.hpl.hp.com>
Message-Id: <9312170607.AA17541@hplajw.hpl.hp.com>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re: page level traces
Status: R

Dennis,

I am afraid the only data at my disposal is disk-level data, rather
than processor physical/virtual-memory page accesses (which is I assume
what you want).  You might try contacting Sawesn Ghanem, at
Sawsan_Ghanem@hpg200.desk.hp.com, but bear in mind that she works in a
product group, and woill need some convincing to spend the time
necessary to make this stuff available!

john wilkes


> From shasha@SHASHA.CS.NYU.EDU Fri Dec 10 04:27 PST 1993
> Received: from hplms26.hpl.hp.com by hplajw.hpl.hp.com with SMTP
> 	(1.37.109.4/15.5+IOS 3.14) id AA11693; Fri, 10 Dec 93 04:27:35 -0800
> Return-Path: <shasha@SHASHA.CS.NYU.EDU>
> Received: from SHASHA.CS.NYU.EDU by hplms26.hpl.hp.com with SMTP
> 	(16.6/15.5+ECS 3.3+HPL1.1S) id AA09551; Fri, 10 Dec 93 04:28:03 -0800
> Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
> 	id AA14473; Fri, 10 Dec 93 07:26:07 -0500
> Date: Fri, 10 Dec 93 07:26:07 -0500
> From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
> Message-Id: <9312101226.AA14473@SHASHA.CS.NYU.EDU>
> To: wilkes@hplms26.hpl.hp.com
> Subject: page level traces
> Cc: ted@squall.cis.ufl.edu
> 
> Dear Dr. Wilkes,
> We are working on a variant of lru that seems to do better
> buffer management.
> This seems to work well across many applications.
> We are trying to find other sources of real life page access data
> and wonder if you have any available.
> Thanks very much,
> Dennis
> 
> 
> Dennis Shasha
> Associate Professor
> Department of Computer Science
> Courant Institute of Mathematical Sciences
> New York University
> 251 Mercer Street
> New York, N.Y. 10012-1185
> U.S.A.
> Tel:  +1 (212) 998-3086
> Fax: 212-995-4123
> Internet: shasha@cs.nyu.edu
> 

From shasha@SHASHA.CS.NYU.EDU Tue Dec 21 06:47:19 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA17363; Tue, 21 Dec 93 06:47:17 -0500
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA20338; Tue, 21 Dec 93 06:47:17 -0500
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA17359; Tue, 21 Dec 93 06:47:15 -0500
Date: Tue, 21 Dec 93 06:47:15 -0500
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9312211147.AA17359@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: novell today
Cc: shasha@cs.NYU.EDU
Status: R

Dear Ted,
Since I will be at Novell today, I want to be able to discuss
with Doshi which algorithms you're going to try.
My recommendation is:

1. start with the simple one I sent. (I hope you see now that
it is indeed different from the clock algorithm since long
term popularity translates to long term survivorship).

2. go on to one that incorporates some notion of an A1out

3. continue with Doshi's variant that has a special correlated
reference sweep.

If you could tell me your plans today, that would be good.
Ideally, we would have results on traces in the next few weeks and then
Doshi and I would try it out on the real system using the most
promising results from the trace experiments.

Thanks,
Dennis

From ted@squall.cis.ufl.edu Mon Dec 27 15:28:10 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA27238; Mon, 27 Dec 93 15:27:58 -0500
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.12)
	id PAA00587; Mon, 27 Dec 1993 15:27:52 -0500
Date: Mon, 27 Dec 1993 15:27:52 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <199312272027.PAA00587@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  second attempt
Status: R

Dennis,
Sorry for the late response, but my email was having problems for a while.

Regarding my plans, I've written a WS-clock simulator,
and I've gotton as far as feeding it a Zipf input.

The WSclock algorithm that you sent me won't work.
There is no guarantee that sweeping N/k pages will find an unreferenced page.
So, you must continue until you have freed up at least one page.

I was curious to see what the sweep increment should be.
I found that I got the best performance (best hit rate, fewest pages scanned)
when k=N.  That is, sweep until you find a page to free, then stop.

That is as far as I got (My in-laws came down and I had to help entertain them)
For my plans, I would like to get something done in time to submit it to
ASPLOS.  My game plan is along the lines of what you describe.
I'd like to play with WSclock a little more, then add in some reference
memory, then add in A1out, then look at Doshi's algorithm.
I don't teach this semester, so I should be able to make fast progress.
	Ted

From ted@squall.cis.ufl.edu Mon Dec 27 17:33:03 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA27603; Mon, 27 Dec 93 17:32:55 -0500
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.12)
	id RAA00735; Mon, 27 Dec 1993 17:32:51 -0500
Date: Mon, 27 Dec 1993 17:32:51 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <199312272232.RAA00735@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  second attempt
Status: R

Dennis:
work 904-392-1492
home    -373-5046

Yes, WSclock works with the mod I described.
I have not put in the history maintenance yet, but the hooks are there
and the mods are near-trivial.
I note that there are two ways to age the survivorship info:
decrementing (as you suggest) and halving (as Doshi suggests).
I can try both.
The ASPLOS deadline is mid march, so I should try to tell you something
by the end of January.
WRT the SPRITE traces, I downloaded some of them at one point,
but the internet was slow and crises interrupted, so I still
need to finish it.

I haven't sent you anything else.
Did the IDA contact you? They're scheduling an interview for me.
	Ted

From ted@squall.cis.ufl.edu Tue Dec 28 14:56:06 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA28127; Tue, 28 Dec 93 14:56:04 -0500
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.12)
	id OAA00520; Tue, 28 Dec 1993 14:56:01 -0500
Date: Tue, 28 Dec 1993 14:56:01 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <199312281956.OAA00520@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  second attempt
Status: R

Dennis,
Yes, those are the questions I need answers to.

I implemented the "survivor" function last night
(as per your suggestion, aging is done by decrementing)
I made some tests using the windowing traces.
Some quite preliminary results are:

#buffers	clock hit rate		clock/history hit rate
10		.5777			.5801
20		.7667			.7753
30		.8600			.8676
40		.9377			.9457
50		.9613			.9659

The gain in hit rate is small, but it continues throught the
range of buffer sizes.  These hit rates are comparable to LRU
(you can check the 2Q paper).
The improvement in hit rate is kind of small, about .5% - 1.5%
The improvement in miss rate, however, is much better,
about 10-20% for high hit rates.  For a VM system, that translates
into a big gain in efficiency.

So, keeping access histories is worth the extra work
(I found that you need to scan 50% - 100% more pages).
But, it seems to me that we can do better, we're not at all close to 2Q
hit rates.
	Ted

From shasha@SHASHA.CS.NYU.EDU Wed Dec 29 15:52:40 1993
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA00399; Wed, 29 Dec 93 15:52:39 -0500
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA18978; Wed, 29 Dec 93 15:52:38 -0500
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA00395; Wed, 29 Dec 93 15:52:36 -0500
Date: Wed, 29 Dec 93 15:52:36 -0500
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9312292052.AA00395@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: Re:  note from grey
Cc: shasha@cs.NYU.EDU
Status: R

	Ted,

Dennis,
Did you send Jim Grey a copy of the 2Q report?

	Yes, sorry not to have told you.
	I sent a copy to him, Weikum, O'Neil, Ted Messinger (IBM),
	and Mike Hartstein (Oracle).

thanks for composing the reply.
The dynamic pool idea looks interesting,
the problem, though, it coming up with realistic test cases.
Ted

	I think we should defer that until doing further
	studies on the clock algorithm using the ideas
	we've already laid out.
	The problems ought to be pretty orthogonal.
	It is interesting to ask how the paging rate of the
	successive clock algorithms compares with pure 2Q.

	Dennis


From ted@squall.cis.ufl.edu Wed Dec 29 15:58:06 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA00432; Wed, 29 Dec 93 15:58:04 -0500
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.12)
	id PAA01271; Wed, 29 Dec 1993 15:58:02 -0500
Date: Wed, 29 Dec 1993 15:58:02 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <199312292058.PAA01271@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  note from grey
Status: R

Dennis,
adding history to clock makes it competitive with LRU.
its still far behind 2Q

I modified clock/history to age the survivor counts by halving.
its not as good as decrementing, at least on the windowing traces.

	Ted

From ted@squall.cis.ufl.edu Thu Dec 30 20:24:48 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA04644; Thu, 30 Dec 93 20:24:43 -0500
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.12)
	id UAA00453; Thu, 30 Dec 1993 20:24:38 -0500
Date: Thu, 30 Dec 1993 20:24:38 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <199312310124.UAA00453@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis,

I implemented an idea that I had about a clock algorithm into 2Q.
I want to divide the pages into three classes, A1in, A1out, and Am.
A page should be in Am if its referenced again shortly
after its correlated reference period.

When a miss occurs, bring in the page and mark the page "IN".
Every reference marks the page's reference bit.
When you hunt for pages to free, apply the following algorithm:

1) the page is marked "IN": reset the reference bit, mark the page "IN1".

2) if the page is marked "IN1",
   a) if the reference bit is not set, kick out this page and reuse the frame.
   b) if the rference bit is set, mark the page "INmany", set 
       survive_count=1, reference=0

3) if the page is marked "INmany"
    a) if the reference bit is set, increment survivor_count up to s_max,
	reset the refernce bit.
    b) if the reference bit isn't set, and the sruvivor count is >0,
	decrement the survivor count
    c) if the survivor count is zero, and you're reclaiming INmany pages,
	kick out this page and reuse the frame.

So, pages marked "IN" are like A1in pages, "IN1" like A1out,
and INmany like Am.
You reclaim INmany pages when there are too many of them.  I set it up
so that you start cleaning when an upper bound is reached and you stop
when a lower bound is reached.

results from the windowing trace:

# buffers	clock	clock/survivor	clock/classes
20		.7667	.7765		.7993
40		.9377	.9459		.9494
60		.9726	.9753		.9762

for clock/survivor, I used a max survivor count of 8,
for clock/classes, I used a max survivor count of 5 and set
the INmany range to 30%-40% of the number of pages.

If you look at the miss rates, clock/classes reduces the miss rate
from clock/survivor by 3% - 10%, and is 14%-18% better than clock.

I think I can do better still by storing references to the A1out
pages instead of the pages, but at least this shows that we can
improve on 2nd-chance by a good margin.

	Ted


Number of page slots &	LRU/2 &	LRU	 & 2Q	& 2Q \\
                     &        &          & Kin=30\% & Kin=20\% \\ \hline
20         &  .809   &   .772  &      .832   &     .836 \\ \hline
40         &  .943   &   .946  &      .946   &     .946 \\ \hline
From ted@squall.cis.ufl.edu Fri Dec 31 11:57:33 1993
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA10036; Fri, 31 Dec 93 11:57:30 -0500
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.12)
	id LAA00664; Fri, 31 Dec 1993 11:57:27 -0500
Date: Fri, 31 Dec 1993 11:57:27 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <199312311657.LAA00664@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  simple 2q clock vs. 2q
Status: R

Actually, it beats 2Q for 40 buffers (hit rate of .949 vs. 946)

I think that 2Q will be very hard to beat if the miss rate is large
(say, >10%), but can be beaten for small miss rates.
This is OK becasue 2Q is appropriate for managing disk caches, but
not for page tables.
I can probably tweak 2Q to get better performance by
  1) reducing the size of A1in - a few blocks go a long way when the
      miss rate is small
  2) storing some of the A1out pages in memory.  When the miss rate is large,
      misses on A1out paes don't cost much, but can be expensive when
      the miss rate is small.
I'm working on a more direct clock-style emulation of 2Q,
and I'll probably incorporate these tweaks.
	Ted

From ted@squall.cis.ufl.edu Sun Jan  2 22:26:06 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA11525; Sun, 2 Jan 94 22:26:04 -0500
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.12)
	id WAA01669; Sun, 2 Jan 1994 22:25:59 -0500
Date: Sun, 2 Jan 1994 22:25:59 -0500
From: "Ted Johnson" <ted@squall.cis.ufl.edu>
Message-Id: <199401030325.WAA01669@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis,
I wrote another reference bit-style algorithm, one that more
closely emulates 2Q.

This new algorithm keeps three classes of queues: A1, A2, and AM.
A1 handles correlated references, A2 is the filter, and AM is the main buffer.

On a miss, the page is brought in and put into the A1 queue.
A1 is kept to size IN1, after adding a page to the head of A1, we remove
a page from the tail of A2 and put it into A2.  at this point we reset
the reference bit.

A2 has 2 memory-resident part (A2in) and a tags-only part (A2out).
The memory-resident is kept at a constant size, the tags-only limited
to a maximum size.  When a page from A1 is added to the head of A2in
the page at the tail of A2in is examined.  If the reference bit is set,
put it on the head of Am.  Else, put the frame on the free frame list,
and put the page in A2out (kicking out the page at the tail of A2out).
If a fault occurs on a page in A2out, it is put in Am directly.

When a page is added to the head of Am, Am is cleaned using the clock/history
algorithm to find free pages.


results from the windowing trace:

# buffers	clock	clock/survivor	clock/classes	clock/2Q
40		.9377	.9459		.9494		.9463
60		.9726	.9753		.9762		.9765
80		.9825	.9850		.9855		.9862

for clock/survivor, I used a max survivor count of 8,
for clock/classes, I used a max survivor count of 5 and set
the INmany range to 30%-40% of the number of frames.
for clock/2Q, I used a max survivor count of 7, and set A1 to 1/3 of
the frames, A2in to 2 pages, A2out to 1/5 the frames

The clock/2Q algorithm works better as the number of pages increase.
for 80 pages, its 25% better than clock, 12% better than clock/history,
and 5% better than clock/classes.
(better in terms of miss rate)
Also, it requires less work, even though it seems more complex.
Most of the time, you get replacement pages from A2in.
Its only when you add a page to Am that you need to do any cleaning,
so you wind up touching only about 50% of the pages that clock/survivor
or clock/classes do.

I'll explore the parameters a little more and then create a table
using both the windowing and the db2 traces, so you can show
it to Gray et al.  I'll also write up the algorithms a little
more algorithmicly.
BTW I downloaded some of the stanford sosp traces.
loading one set took 65M.
Unfortunately, they are collected at the servers, so they've already
beed filtered through the client caches.
But then, ousterhout et al have a reason for claiming that most ops are
writes.
	Ted

From ted@cis.ufl.edu Sat Jan 15 15:08:25 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA02509; Sat, 15 Jan 94 15:08:19 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id PAA06482; Sat, 15 Jan 1994 15:00:50 -0500
Date: Sat, 15 Jan 1994 15:00:50 -0500
Message-Id: <199401152000.PAA06482@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis, Heres the comparison I promised.
I tried five algorithms.

1) clock: on a miss, scan through the pages.  if the reference bit is set,
reset it.  else reclaim that page.

2) clock+history (clkhst or hst): similar to clock, but keep a reference
history count.  if the reference bit is set, increment up to a parameter
(I indicate the parameter in the table).  Else, decrement.  if the
history count is zero, reclaim the page.

3) clock+halving:  similar to clock+history, but maintain
the history by shifting instead of increment/decrement.
If the reference bit is set, shift left and
add the parameterized value (a power of 2).  If the reference bit is not set,
shift left.  If zere, reclaim the page.

4) clock/class: the first 2Q approximation

I implemented an idea that I had about a clock algorithm into 2Q.
I want to divide the pages into three classes, A1in, A1out, and Am.
A page should be in Am if its referenced again shortly
after its correlated reference period.

When a miss occurs, bring in the page and mark the page "IN".
Every reference marks the page's reference bit.
When you hunt for pages to free, apply the following algorithm:

1) the page is marked "IN": reset the reference bit, mark the page "IN1".

2) if the page is marked "IN1",
   a) if the reference bit is not set, kick out this page and reuse the frame.
   b) if the rference bit is set, mark the page "INmany", set
       survive_count=1, reference=0


3) if the page is marked "INmany"
    a) if the reference bit is set, increment survivor_count up to s_max,
        reset the refernce bit.
    b) if the reference bit isn't set, and the sruvivor count is >0,
        decrement the survivor count
    c) if the survivor count is zero, and you're reclaiming INmany pages,
        kick out this page and reuse the frame.

So, pages marked "IN" are like A1in pages, "IN1" like A1out,
and INmany like Am.
You reclaim INmany pages when there are too many of them.  I set it up
so that you start cleaning when an upper bound is reached and you stop
when a lower bound is reached.

5) clock/2Q: the second attempt at simulating 2Q.
Doshi: This is a lot like your algorithm.

This new algorithm keeps three classes of queues: A1, A2, and AM.
A1 handles correlated references, A2 is the filter, and AM is the main buffer.

On a miss, the page is brought in and put into the A1 queue.
A1 is kept to size IN1, after adding a page to the head of A1, we remove
a page from the tail of A1 and put it into A2.  at this point we reset
the reference bit.

A2 has 2 memory-resident part (A2in) and a tags-only part (A2out).
The memory-resident is kept at a constant size, the tags-only limited
to a maximum size.  When a page from A1 is added to the head of A2in
the page at the tail of A2in is examined.  If the reference bit is set,
put it on the head of Am.  Else, put the frame on the free frame list,
and put the page in A2out (kicking out the page at the tail of A2out).
If a fault occurs on a page in A2out, it is put in Am directly.

When a page is added to the head of Am, Am is cleaned using the clock/history
algorithm to find free pages.


	RESULTS:

window trace

This is the windowing code trace from Novell.
- For the clock/class algorithm, I used a max history count of 5,
and allowed the number of AM pages to vary between 30% and 40% of the
total number of pages.
- For the clock/2Q algorithm, I used a max history count of 7,
used 1/3 of the buffer for A1, 3 buffers for A2in, and set A2out
to 20% of the total number of buffers.
- for clkhst and clkhlv, I varied the maximum history count to
find the best value.


buffers		clk2q	clkclass hlv:2	hst:7	clock
20		.8159	.7988	.7664	.7761	.7667
40		.9468	.9494	.9394	.9457	.9378
60		.9765	.9761	.9728	.9753	.9726
80		.9862	.9855	.9829	.9851	.9826
100		.9903	.9896	.9883	.9894	.9882
120		.9939	.9934	.9926	.9931	.9925


clkhlv	1	2	4	8
20	.7667	.7664	.7662	.7663
40	.9393	.9394	.9394	.9394
60	.9728	.9728	.9729	.9729
80	.9829	.9829	.9830	.9830
100	.9882	.9883	.9883	.9883
120	.9926	.9926	.9926	.9926

clkhst	1	3	7	9
20	.7667	.7737	.7761	.7765
40	.9393	.9448	.9457	.9459
60	.9728	.9748	.9753	.9751
80	.9829	.9848	.9851	.9851
100	.9882 	.9894	.9894	.9893
120	.9926	.9929	.9931	.9932


DB2 trace

This the the trace from IBM.
- for the clock/class algorithm, I used a max history count of 7,
and let Am range between 30% and 40% of the available buffers.
- for clock/2Q, I used 45% of the buffers for A1,
1% of the buffers for A2in, and set A2out to store references to 60% of
the buffers


buffers		clk2q	clkclass hlv:2	hst:7	clock
500		.6346	.6200	 .6032	.6117	.5990
1000		.6859	.6709	 .6525	.6589	.6490
2000		.7271	.7170	 .7015	.7072	.6984
5000		.7777	.7705	 .7589	.7634	.7564
10000		.8056	.8038	.7959	.7996	.7949
20000		.8280	.8272	.8233	.8250	.8231

clkhlv	2	4
500	.6026	.6032
1000	.6522	.6525
2000	.7011	.7015
5000	.7587	.7589
10000	.7959	.7959
20000	.8233	.8233


Conclusions:
1) keeping some history gives a big improvement over clock.
2) increment/decrement is better than shift-left
3) You can get better performance still by implementing a 2Q idea
of preferring to clean from newly added pages.
4) clock/2Q and clock/class show promise, but more study is
needed to determine their value and how to set their parameters.





From ted@cis.ufl.edu Mon Jan 17 19:10:01 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA06123; Mon, 17 Jan 94 19:09:58 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id TAA07382; Mon, 17 Jan 1994 19:09:53 -0500
Date: Mon, 17 Jan 1994 19:09:53 -0500
Message-Id: <199401180009.TAA07382@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis,
I ran the algorithms with the "scan traces"
That is, normal reference are zipf (80/20), and
occasionally there is a scan.
As before, 1/3 of the references are scans.
the normal database size is 50,000,
the scans can also access a second pool of 30,000 pages

For the experiments, I used a max survive count of 3.
for clkcls, the AM pages ranged between 30% and 40% of the total
number of buffers.
for clk2q, 40% of the buffers are for A1, 1% for a2in,
and a2out stores tags equivalent to 50% of the total number
of buffers.
(I didn't tune the paramenters, I just used settings
that worked well on the traces)

average scan length=500
buffers		clock	clkhst	clkcls	clk2q
1000		.2369	.2466	.2892	.3220
5000		.2895	.2956	.3315	.3642
10000		.4335	.4436	.4632	.4826
20000		.5242	.5336	.5440	.5534

average scan length=2000
buffers         clock   clkhst  clkcls  clk2q
1000		.2621	.2674	.3069	.3339
5000		.3726	.3827	.4131	.4387
10000		.4442	.4546	.4763	.4954
20000		.5359	.5449	.5555	.5656

So, the algorithms do show scan resistance.
	Ted


.bp
Dear Ted,

I saw Jim Gray today and told him about some of our stuff.
He liked it a lot (so I think you could ask him to recommend you
later if you wanted).
One thing came up in our discussion and I think it would
be interesting to test: raw page accesses aren't the really main issue.
The real issue is the number of reads.
Sometimes one might want to read several pages if one knows they
will be read sequentially.
At the least this avoids rotational delay but it can do more
if there is a seek in between sequential accesses.

So, one thing to see is whether there is significant
sequential access activity in our runs.
If so, then how about trying the following heuristic:

if the last time we accessed i, we also accessed i+1, i+2,... i+k
then access those other pages immediately if we access i.

There is a separate simpler heuristic of just detecting and prefetching
sequentially accessed pages.

Dennis
From ted@cis.ufl.edu Mon Jan 24 11:42:45 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA21317; Mon, 24 Jan 94 11:42:43 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id LAA12431; Mon, 24 Jan 1994 11:42:40 -0500
Date: Mon, 24 Jan 1994 11:42:40 -0500
Message-Id: <199401241642.LAA12431@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: copy of letter to Doshi
Status: R


 clock-2Q

Doshi,
Hi, I'm Ted Johnson, the person that dennis is developing the
2Q algorithms with.

I have two clock-2Q variants, and I think that Dennis has
discussed their performance with you.

You asked about the idea of increasing the survivor count on a 
reference, and halving it if there is no reference.
I found that this method doesn't give good performance - at
least with my traces and with my simulator.

I can send you the code for the two algorithms
(they both have a feel thats similar to your suggestion).
the clock/class algorithm is fairly simple,
and the clock/2Q algorithm is a bit more complex,
but simpler than it sounds.
the clock/2Q seems to do less work than the others because you
usually find free pages by popping them off of the A2in list
instead of scanning for free pages.

I implemented the A2out as a FIFO of page names.
When I put a page into A2out, I mark it as being in A2out.
If there is a fault on a page that is marked A2out,
I put it in Am instead of A1.
(but I don't remove its entry from A2out).
When a page is removed from A2out, I reset the A2out mark in
the page table.

        Ted

From doshi@usl.novell.com Mon Jan 24 16:17:07 1994
Received: from usl.usl.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA23465; Mon, 24 Jan 94 16:17:02 -0500
Message-Id: <9401242117.AA23465@SHASHA.CS.NYU.EDU>
Date: Mon, 24 Jan 94 16:15 EST
From: doshi@usl.com
To: ted@cis.ufl.edu
Cc: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU
Subject: 2Q
Received: from usl by summit.novell.com; Mon, 24 Jan 1994 16:15 EST
Content-Type: text
Content-Length: 1581
Status: R


Hello Ted,

Thanks for your mail. Dennis had already forwarded
a description of the 2Q clock algorthms and performance data.
	
I have a couple of questions and a comment about clk2q:

(1) I am assuming that A1, A2in, and A2out are all Fifo, based on
the description. Is that a correct assumption?

(2) Did you find the algorithm very sensitive to the size of A2out?

The second question arises from practical considerations. For
Unix kernel, setting aside 20% of memory for tag storage would be
objectionable. As a reference, the memory for caching file system 
buffers is usually tuned to be around 10% of the total memory pool. 
However, if A2out can be sized as some factor of the number of 
frames in Am, and performance stays much the same, the algorithm 
becomes more practical, for example:
     For a system with a page pool of 1200 pages (4Kbyte/page)
     one would set aside storage for about 4800 tags; at about 
     12 bytes per tag, this would amount to less than 15 pages,
     or just above a percent of the page pool reserved for tags. 
I was wondering how easy it may be to establish that this is 
sufficient for the improved performance.

Considering dedicated storage, unless A2out can be made 
to consume small amounts, it appears that clkclass has the
advantage that it has no tags memory. In effect, it gambles
that anything that is referenced in two consecutive
sweeps IS going to be referenced more often. It seems a 
reasonably good bet to make.

I would like to recieve code. Dennis and I would probably meet
tomorrow and talk about it.

-K. Doshi.




From doshi@usl.novell.com Mon Jan 24 16:32:48 1994
Received: from usl.usl.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA23881; Mon, 24 Jan 94 16:32:46 -0500
Message-Id: <9401242132.AA23881@SHASHA.CS.NYU.EDU>
Date: Mon, 24 Jan 94 16:31 EST
From: doshi@usl.com
To: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Subject: Re: progress so far
Received: from usl by summit.novell.com; Mon, 24 Jan 1994 16:31 EST
Content-Type: text
Content-Length: 294
Status: R

Dear Dennis-

Yes, we can meet Tuesday, either late morning or after lunch.
I feel concerned about clk2q's space allocation for 
A2out. I am hoping that we can formulate it as tags that are
in some small multiple of the frames in Am, rather than as
20% of main memory.
	
	See you,
	-K. Doshi.


From ted@cis.ufl.edu Mon Jan 24 17:54:03 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA24293; Mon, 24 Jan 94 17:54:01 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id RAA12772; Mon, 24 Jan 1994 17:53:57 -0500
Date: Mon, 24 Jan 1994 17:53:57 -0500
Message-Id: <199401242253.RAA12772@squall.cis.ufl.edu>
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU
Subject:  2Q
Status: R

Doshi,

1) Yes, A1, A2in, and A2out are all FIFO.

2) The performance is not overly sensitive to the size of A2out.
Also, the size of A2out should be a percent of the size of Am,
50% is a good guess.  So, if there are 1200 pages, you should
set aside room for 600 tags, or about .1% of the total # of pages.

Making A2out small means that its harder to enter Am.
If Am is small, you want only hot pages in it, so set A2out small.
If Am is large, you want to let the merely warm pages to enter,
so make A2out large.  
So, its intuitive that A2out should scale with Am.

I think that clkclass is probably easier to implement and to justify,
but it isn't as scan resistant as clock2Q.

Its relatively easy to modify a simulator,
(as compared to a VM system)
so if there are any modifications of these algorithms you'd like
for me to test I can take a shot at implementing them and running
experiments.  My phone is 904-392-1492 if you want to call
with a question.

I'll send the code of the simulators in the next few messages.
	Ted


From Dennis to Ted on Jan. 25


Dear Ted,
1. Doshi plans to start with clock/class.

2. One question about the algorithm you implmeneted for clock/class.
You say;
the page is marked "IN": reset the reference bit, mark
the page "In1".

If the reference bit is already reset, then the algorithm
just kicks out the page, right?

3. Doshi requests that we run simulations (using Zipf,
two pool and so on) to test for good parameter settings.

Thanks,
Dennis
the sensitivities
From ted@cis.ufl.edu Tue Jan 25 11:52:27 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA26341; Tue, 25 Jan 94 11:52:23 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id LAA13086; Tue, 25 Jan 1994 11:49:47 -0500
Date: Tue, 25 Jan 1994 11:49:47 -0500
Message-Id: <199401251649.LAA13086@squall.cis.ufl.edu>
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU
Subject: re: 2Q
Status: R

Doshi,

>From doshi@usl.novell.com Tue Jan 25 09:45:09 1994
Received:  from usl.novell.com  by cis.ufl.edu (8.6.4/4.11)
        id JAA13311; Tue, 25 Jan 1994 09:45:05 -0500
Message-Id: <199401251445.JAA13311@cis.ufl.edu>
Date: Tue, 25 Jan 94 09:43 EST
From: doshi@usl.com
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU, <ted@cis.ufl.edu>
Subject: Re: 2Q
Received: from usl by summit.novell.com; Tue, 25 Jan 1994 09:43 EST
Content-Type: text
Content-Length: 313
Status: R

Ted,
        Thanks. I feel too, that clkclass is relatively easy
to implement and justify.
        What I will do is think about the data you have sent,
and start planning out a scheme for changing VM. As I do this, I
will probably come up with variants that I can send to you for
possible simulation, if that is okay.
        Doshi


Sounds good.
I've been thinking that a good application for the algorithms
is for scientific computing (scan resistance will help for
multiplying matrices, etc.)
I'm going to try to come up with a synthetic trace based on
a matrix multiply.

Also, I meant to send the simulations last night, but
I got tied up in other things.
I'm going to document them a little more, then
send them.

Dennis, I'm going to mail the simulators to Doshi,
if you'd like a copy just ask.
	Ted

From shasha@SHASHA.CS.NYU.EDU Tue Jan 25 14:03:42 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA27279; Tue, 25 Jan 94 14:03:41 -0500
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA20971; Tue, 25 Jan 94 14:03:40 -0500
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA27274; Tue, 25 Jan 94 14:03:37 -0500
Date: Tue, 25 Jan 94 14:03:37 -0500
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9401251903.AA27274@SHASHA.CS.NYU.EDU>
To: doshi@usl.com, ted@squall.cis.ufl.edu
Subject: case IN
Cc: shasha@cs.NYU.EDU
Status: R

Dear Ted and Doshi,
If I understand the code correctly (I've excerpted the relevant part),
the page is not kicked out if it's unreferenced while in state IN.
Now, I understand that this might be to keep a page in
memory while correlated references occur, but it might
not be for the best.
Reason: if a page is put in a frame, then the scan for free pages
must have hit every other page in the buffer before coming back.
This is likelyy to be longer than the correlated reference period.
So this favoritism to pages in the IN state may be excessive.
Thanks,
Dennis


switch(page[scan_ptr].free){
	    case IN:	/* new page, lets see if it is reference again */
		page[scan_ptr].reference=0;
		page[scan_ptr].free=IN1;
		break;

	    case IN1:	/* we're testing this page for the 2nd reference */
	      if(page[scan_ptr].reference){
		  page[scan_ptr].reference=0;
		  page[scan_ptr].survive=1;
		  page[scan_ptr].free=INMANY;
		  numAM++;
/*	If there are too many Am pages, turn on the Am cleaning	*/
		  if(!cleanAM && numAM==AMhi)	
			cleanAM=1;
	      }
	      else {
		  item[page[scan_ptr].item].state=OUT;
	.....

From ted@cis.ufl.edu Tue Jan 25 14:10:44 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA27488; Tue, 25 Jan 94 14:10:43 -0500
Received: from squall.cis.ufl.edu by cs.NYU.EDU (5.61/1.34)
	id AA21009; Tue, 25 Jan 94 14:10:38 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id OAA13274; Tue, 25 Jan 1994 14:10:31 -0500
Date: Tue, 25 Jan 1994 14:10:31 -0500
Message-Id: <199401251910.OAA13274@squall.cis.ufl.edu>
To: doshi@usl.com, shasha@SHASHA.CS.NYU.EDU, ted@squall.cis.ufl.edu
Subject: Re:  case IN
Cc: shasha@cs.NYU.EDU
Status: R


Yes, perhaps a way to improve the algorithm is to scan the IN and IN1
pages more quickly.
I'll think about it.
        Ted

From ted@cis.ufl.edu Tue Jan 25 15:36:18 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA28880; Tue, 25 Jan 94 15:36:14 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id PAA13486; Tue, 25 Jan 1994 15:36:13 -0500
Date: Tue, 25 Jan 1994 15:36:13 -0500
Message-Id: <199401252036.PAA13486@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  case IN
Status: R

Dennis, Doshi,
Regarding Dennis' comment that the clock/class algorithm 
doesn't clean the IN and IN1 pages vigorously enough:

If you want to clean the IN pages more vigorously, then you
increase the amount of pages allocated to INMANY.
But, I found the best performance occurs when the number of INMANY
pages is between 30% and 40% of the total.

The reason why there is an improvement with clock/class
is because it prefers to clean IN and IN1 pages instead on INMANY pages.
A weakness is that the control system is perhaps a bit too vrude,
since INMANY cleaning is either on or off.
The quest for a better control system led me to clock/2Q.
(and I note that clock/class is only 1/2 as good as clock/2Q for
scan resistance).

Perhaps a better algorithm is the following:
Keep the IN and IN1 pages in a FIFO, the INMANY pages in
a circular list.
On a miss, mark the page IN, put it on the head of the IN FIFO.
If you need to obtain more free pages, then do the
following until enough free pages have been reclaimed:
  1) transfer K1 pages from the tail of IN to the head of IN1,
     reset their reference bit.
  2) Take K1 pages from the tail on IN1.  for each of these pages,
     look at the reference bit.  if set, put on INMANY.  Else,
     put on the free page list.
  3) scan AM pages on the INMANY list.

I can write up this algorithm and see how it performs.
I'd appreciate any input about this or other possibilities.
	Ted

From doshi@usl.novell.com Wed Jan 26 20:19:57 1994
Received: from usl.usl.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA07577; Wed, 26 Jan 94 20:19:52 -0500
Date: Wed, 26 Jan 94 20:18 EST
Message-Id: <9401262018.AA06123@summit.novell.com>
From: doshi@summit.novell.com
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU, ted@cis.ufl.edu
Received: from usl by summit.novell.com; Wed, 26 Jan 94 20:18 EST
Subject: re: 2Q
Content-Length: 761
Content-Type: text/plain
Status: R

Ted,
	I think I have recieved all code files (5 of them),
though I cannot easily tell which is supposed to be zipf.c and
which is clk2q.c by superficial examination.

	As I start to implement clkclass for VM (in Unixware 1.0),
I will follow your desciption of the algorithm along with the
amendment that Dennis noted with respect to unreferenced IN 
pages -- so I will reclaim the memory for such pages.

	Just to give you and Dennis some idea of time 
needed: I think the VM implementation will take up next two weeks
in conjunction with other priorities. So I should be able to
report back by the middle of February.  Once this is done, if we
should decide to pursue the clk2q as well, I am confident that it
will only take an additional week to implement.



From shasha Thu Jan 27 04:35:42 1994
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA11125; Thu, 27 Jan 94 04:35:37 -0500
Date: Thu, 27 Jan 94 04:35:37 -0500
From: shasha (Dennis Shasha)
Message-Id: <9401270935.AA11125@SHASHA.CS.NYU.EDU>
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU, ted@cis.ufl.edu
Subject: re: 2Q
Status: R

Dear Doshi,
Thanks for your note.
Ted pointed out that any page with IN will always be tagged as referenced,
since a page is IN as the result of a fault and the first time
it is visited by the daemon it will be moved to IN1.
So, every page that faults into IN will move soon to IN1.

Dennis

From ted@cis.ufl.edu Thu Jan 27 12:48:38 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA13911; Thu, 27 Jan 94 12:48:35 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id MAA14097; Thu, 27 Jan 1994 12:48:30 -0500
Date: Thu, 27 Jan 1994 12:48:30 -0500
Message-Id: <199401271748.MAA14097@squall.cis.ufl.edu>
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU, ted@cis.ufl.edu
Subject: re: 2Q
Status: R

Doshi,
Sorry about not labeling the files better, but I did mail them
in the order listed.

rand.c contains all of the random number generator stuff.
zseq.c contains the main() function.
clk2q.c and clkclass.c look similar,
but clk2q.c is about 9k while clkclass.c is about 3k,
and there is a printf in the init_queue() routine that
identifies the algorithm being simulated.

I have been talking to dennis about an algorithm that is more
aggressive about cleaning from the non-AM pages, but which doesn't
require A2out or the equivalent.
I think that the clock/2Q algorithm is actaully what we want,
just don't use A2out and make A2in large.
I'll run some simulations to find good parameter settings.
Also, I'll be able to make a more formal tuning study 
of clk/class within 2 weeks.

Regarding Dennis' amendment, I made the assumption that the
faulting reference will set the reference bit,
and even if it didn't, a second reference is almost certain.
I've been known to make incorrect assumptions,
so if its an easy test it seems worthwhile.
	Ted

From ted@cis.ufl.edu Mon Jan 31 13:49:52 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA24850; Mon, 31 Jan 94 13:49:47 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id NAA16491; Mon, 31 Jan 1994 13:49:46 -0500
Date: Mon, 31 Jan 1994 13:49:46 -0500
Message-Id: <199401311849.NAA16491@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  are we interested?
Status: R

Dennis, I'm in principle interested,
let me ask the people here who run VMS if they know what this TK-50/TK-70
is.  If we can decode it, it sounds pretty good.

BTW,
1) I ftp'd Gerhard Weikum's OLTP trace file.
its 38M, it should be useful.
2) I ran the A2out-less clock/2Q algorithm on the DB@ trace file
and got very good results, better than the previous clock/2Q results.
I suspect that since there are fewer parameters, removing
A2out makes clock/2Q easier to tune.
	Ted

From doshi@usl.novell.com Wed Feb  2 15:27:41 1994
Received: from usl.usl.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA05342; Wed, 2 Feb 94 15:27:37 -0500
Date: Wed, 2 Feb 94 15:26 EST
Message-Id: <9402021527.AA00784@summit.novell.com>
From: doshi@summit.novell.com
To: shasha@SHASHA.CS.NYU.EDU, ted@cis.ufl.edu
Received: from usl by summit.novell.com; Wed,  2 Feb 94 15:26 EST
Subject: clkclass: a perturbation
Content-Length: 1526
Content-Type: text/plain
Status: R


Ted:
Dennis:
	I have just started on the VM implementation of clkclass, and
immediately noticed one difference with the algorithm used in simulations, 
which may be significant. 

	In kernel, the free list of pages gets populated for two reasons:
pages get freed because of process terminations, and due to forced
freeing by the daemon. For a large percentage of pages that are freed for
either reason, the backing store identity can be (and is) preserved while
the pages are on the free list; the identity is destroyed only when a
page is redeployed to cache new data/text. Under moderate load, one can
expect that many pages will not lose identity between the time that they
are freed and the time that they are rediscovered during the process of
resolving a fault. Thus, while changes in page freeing directly influence
miss rate, a higher miss rate does not always suggest more disk reads. 
	Thus the metric of greater interest is the number of disk reads that
are forced by a particular paging policy, along with the basic metric of the
number of misses. If pages are forced to lose identity at the instant of
being freed, then disk reads and the number of misses are synonymous.

	However, there is typically much identity preservation and reuse
in a workload characteristic of processes terminating and releasing pages.
In addition, clock algorithms tend to have a delay between the time that
pages are freed and when they are used to service misses. 

	I will proceed to track both performance statistics.

	-doshi-	

From doshi@usl.novell.com Sun Feb  6 01:35:03 1994
Received: from usl.usl.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA26880; Sun, 6 Feb 94 01:35:00 -0500
Date: Sun, 6 Feb 94 01:34 EST
Message-Id: <9402060134.AA03412@summit.novell.com>
From: doshi@summit.novell.com
To: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Cc: doshi@summit.novell.com
Received: from usl by summit.novell.com; Sun,  6 Feb 94 01:34 EST
Subject: re: 2Q
Content-Length: 357
Content-Type: text/plain
Status: R


Dennis:
	Sharing of pages, and deferred identity removal of pages,
require a slightly different semantic in converting the VM algorithm
to clkclass.
	I would like to describe this and see what you think. Would
you be coming here this Tuesday? (Or any other time this week?)
	If you were not planning to be here, we can talk on the phone.

Thanks!

-Doshi.

From ted@cis.ufl.edu Mon Feb  7 15:26:46 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA04274; Mon, 7 Feb 94 15:26:43 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id PAA02472; Mon, 7 Feb 1994 15:26:45 -0500
Date: Mon, 7 Feb 1994 15:26:45 -0500
Message-Id: <199402072026.PAA02472@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: 2Q in main memory caches
Status: R

Dennis,
I asked Madhu Gopinathan to try 2Q on main memory caching.
I just got this from him.
	Ted

>From madgop@cis.ufl.edu Mon Feb  7 15:24:36 1994
Received:  from sloop.cis.ufl.edu  by cis.ufl.edu (8.6.4/4.11)
        id PAA29095; Mon, 7 Feb 1994 15:24:34 -0500
From: "Madhu Gopinathan" <madgop@cis.ufl.edu>
Received:  from localhost  by sloop.cis.ufl.edu (8.6.4/4.11)
        id PAA06928; Mon, 7 Feb 1994 15:25:10 -0500
Date: Mon, 7 Feb 1994 15:25:10 -0500
Message-Id: <199402072025.PAA06928@sloop.cis.ufl.edu>
To: ted
Subject: TWO Q
Status: R


Dr. Johnson,
                It works. Two_Q outperformed LRU for the traces I tested 
(I used only 2 trace files). I couldn't find you in your office. I'll
conduct more tests today and will meet you tomorrow.


From ted@cis.ufl.edu Fri Feb 11 11:33:07 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA23308; Fri, 11 Feb 94 11:33:03 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id LAA04969; Fri, 11 Feb 1994 11:33:01 -0500
Date: Fri, 11 Feb 1994 11:33:01 -0500
Message-Id: <199402111633.LAA04969@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: VLDB
Status: R

Dennis, the VLDB deadline is Feb. 23.

I found the Dewitt paper.  It seems to mostly say that you
should have a buffer hint when a scan starts and not cache those
pages.
	Ted

From ted@cis.ufl.edu Tue Feb 15 12:53:51 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA22956; Tue, 15 Feb 94 12:53:46 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id MAA06801; Tue, 15 Feb 1994 12:53:45 -0500
Date: Tue, 15 Feb 1994 12:53:45 -0500
Message-Id: <199402151753.MAA06801@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  anything for doshi?
Status: R

Dennis,
I hove found a low/high percentage of 30/40 to work well in most
situations.  I've gotten delayed by preparing to resubmit the rejections,
so I never made an exhaustive study.

I'm pretty disappointed by the two rejections,
especially since I thought that they are both very good works.
Also, a rejection due to a lack of unspecified citations
that might not exist is very annoying.
I've talked to some statisticians here, I should be able to
include the requisite references on the next submission.

I did find that the clock/2Q with |A2OUT|=0 works pretty well,
and I can suggest some guidelines for that.
I'm happy to hear that Doshi made good progress with the implementation,
I'll set up some tuning experiments now.
	Ted

From ted@cis.ufl.edu Tue Feb 15 14:34:39 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA24464; Tue, 15 Feb 94 14:34:31 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id OAA06979; Tue, 15 Feb 1994 14:34:10 -0500
Date: Tue, 15 Feb 1994 14:34:10 -0500
Message-Id: <199402151934.OAA06979@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  anything for doshi?
Status: R

Dennis,
Yes, I got your note.
I can't make a fair comparison between 2Q and DBMIN in time for the
VLDB conference.  DBMIN incorporates a load control policy,
which is a big part of why DeWitt found it better than the
other algorithms (they didn't have load control).
In order for DBMIN to make sense, you need an elaborate driver which
specifies exactly what kind of operations are being executed,
and on what files.

Only 1 person wanted to see DBMIN, and that person really wanted
a comparison between 2Q and GCLOCK (=clock/history).
However, 2 people asked for more references.
I am going to write a background section that includes
some specific qualitative comparisons to DBMIN, and a discussion
of why 2Q does not preclude the use of buffer hints.
	Ted

From ted@cis.ufl.edu Wed Feb 16 13:47:13 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA03469; Wed, 16 Feb 94 13:47:11 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id NAA07949; Wed, 16 Feb 1994 13:46:36 -0500
Date: Wed, 16 Feb 1994 13:46:36 -0500
Message-Id: <199402161846.NAA07949@squall.cis.ufl.edu>
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis, Doshi,
I ran some tuning experiments last night.
I used as input the DB2 and the windowing trace files.
For the DB2 trace, I used 5000,10000, and 20000 buffers
for the windowing trace I used 60,90, and 120 buffers.
I set MAX_HIST (the upper bound of the history count) to 7.
I modified the history count by increment on reference/decrement
on no-reference.
I varied the upper and lower windows of the number of AM buffers
in increments of 10% :
  lower bound	upper bound
	20	30-80
	30	40-80
	40	50-80
	50	60-90
	60	70-90

Since the two traces come from very different sources,
I'm hoping that the results are representative.
But, I did not use a free list, I didn't simulate multiprogramming,
these are only 2 traces, etc.
I hope it helps.


	DB2 trace

5000 buffers

    upper bound	30	40	50	60	70	80	90
lower bound	
20		.7668	.7678	.7672	.7674	.7678	.7682
30			.7705	.7712	.7709	.7719	.7704
40				.7697	.7698	.7700	.7700
50					.7691	.7691	.7687	.7653
60						.7691	.7660	.7640



10000 buffers

    upper bound	30	40	50	60	70	80	90
lower bound	
20		.8025	.8031	.8041	.8043	.8041	.8039
30			.8038	.8048	.8043	.8042	.8039
40				.8046	.8036	.8039	.8039
50					.8041	.8036	.8039	.8039
60						.8036	.8039	.8039


20000 buffers

    upper bound	30	40	50	60	70	80	90
lower bound	
20		.8269	.8271	.8272	.8272	.8272	.8272
30			.8272	.8272	.8272	.8272	.8272
40				.8272	.8272	.8272	.8272
50					.8272	.8272	.8272	.8272
60						.8272	.8272	.8272


	Window trace

60 buffers

    upper bound 30      40      50      60      70      80      90
lower bound     
20		.9758	.9758	.9758	.9759	.9761	.9760
30			.9762	.9761	.9760	.9758	.9757
40				.9759	.9761	.9756	.9739
50					.9750	.9748	.9734	.9702
60						.9739	.9702	.9664


90 buffers

    upper bound 30      40      50      60      70      80      90
lower bound     
20		.9880	.9881	.9880	.9880	.9881	.9882
30			.9881	.9879	.9880	.9881	.9882
40				.9880	.9882	.9883	.9882
50					.9879	.9878	.9881	.9870
60						.9878	.9875	.9860


120 buffers

    upper bound 30      40      50      60      70      80      90
lower bound     
20		.9933	.9933	.9932	.9933	.9933	.9934
30			.9933	.9932	.9933	.9933	.9934
40				.9932	.9933	.9933	.9934
50					.9932	.9934	.9933	.9933
60						


Conclusions:
1) the performance isn't too sensitive to the upper/lower settings
2) the best settings are 30%-40%, 30%-50%, 40%-50%, 40%-60%.

	Ted


From doshi@usl.Summit.Novell.COM Thu Feb 17 13:55:20 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA16202; Thu, 17 Feb 94 13:55:19 -0500
Received: from usl.summit.novell.com by cs.NYU.EDU (5.61/1.34)
	id AA01168; Thu, 17 Feb 94 13:55:14 -0500
Date: Thu, 17 Feb 94 13:54 EST
Message-Id: <9402171354.AA01870@summit.novell.com>
From: doshi@summit.novell.com
To: dennis@summit.novell.com
Received: from usl by summit.novell.com; Thu, 17 Feb 94 13:54 EST
Subject: preliminary results
Content-Length: 812
Content-Type: text/plain
Status: R



Dennis-
	I finished a few pilot measurements under Gaede benchmark.
This is a system benchmark that consists of a mix of Unix system
commands comprising file copying, editing, text processing, date,
compilation, etc. 

	The bad news is that under memory stress, a slight loss
is observed with the clkclass policy. What seems to be happening
is this- the clkclass policy does retain more of the pages that
would be otherwise recycled. However, to make up for the lack of
these pages, the system swapper kicks in more (about 50% more often),
causing more swap transitions. SiThis hurts performance, and negates
gains from the clkclass policy.

	This analysis is based on Sar data. Next I will add in the
counters we had identified, and gather fresh observations to confirm
that the analysis is correct.

-doshi-

From ted@cis.ufl.edu Fri Feb 18 17:19:44 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA29680; Fri, 18 Feb 94 17:19:42 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id RAA09623; Fri, 18 Feb 1994 17:19:49 -0500
Date: Fri, 18 Feb 1994 17:19:49 -0500
Message-Id: <199402182219.RAA09623@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Dennis,
Status: RO

how is Doshi doing with the clkclass etc.

I was thinking about his mail.
if clkclass works well, it should be better in tight memory situations:
Suppose you have X processes each with Yi pages, i=1 .. X,
and with miss rates of Mi, i=1 .. X when running under clock.
Since clkclass has a lower miss rate, you can run the X processes
with yi pages i=1 .. X and get the same miss rates Mi,
where yi<Yi.
So, clkclass should require less swapping, not more.

I think the problem is occurring because clkclass works by not freeing
the Am pages during the cleaning cycles.  You need to respond by 
cleaning all pages more frequently.
If the size of Am ranges between AMhi and AMlo,
then you have an average of (AMhi+AMlo)/2 AM pages in the system (or at  
least thats my guess).  Half the time you clean Am pages, half
the time you don't.  So, the page cleaning daemon
should either scan (AMhi+AMlo)/(4M) more pages at a time,
or be invoked that much more often.

One cost of clkclass is that you have to do more work to find free pages.
one advantage of clk/2Q is that you don't have to scan nearly as much.
	Ted

From ted@cis.ufl.edu Wed Mar  9 18:43:57 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA07670; Wed, 9 Mar 94 18:43:50 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id SAA21303; Wed, 9 Mar 1994 18:43:49 -0500
Date: Wed, 9 Mar 1994 18:43:49 -0500
Message-Id: <199403092343.SAA21303@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

Dennis,
Here is the report I put together for Jim Gray.
What do you think?
I explicitly say why I'm sending the comparison,
and I discuss the new alsogithms.
	Ted

Jim,
Dennis and I have submitted our 2Q paper to VLDB (it was rejected by SIGMOD).
We see that you are on the program committee.
In the VLDB paper, we do not include a comparison to any of the clock
algorithms -- we regard 2Q as a buffer management algorithm (where we
have time to manipulate buffer pointers) rather than a page management
algorithm (where we rely on reference bits).
	Ted: I would rather that we ask his advice whether we should
	include this stuff in the VLDB paper.
	The difference is not that great is it?
We do make a comparison to the implementation of GCLOCK described
by Nicola, Dan and Dias in the 1992 SIGMETRICS, which sets the history
count of the referenced item on every reference (we find that the
performance of GCLOCK is poor).
	Ted: please eliminate parens above. Jim may have
	some emotional committment to it.
In case you are wondering about the relative performance of the algorithms,
we put together a comparison table.

Algorithms:

   Buffer management algorithms

1) LRU
2) LRU/2
3) 2Q

   	Clock algorithms

4) Clock : Arrange the page frames in a ring.  On a hit, set the reference bit.
   On a miss, scan through the pages.  If the reference bit is set, reset it.
   else reclaim that page.
4) Clock+halving: same as clock, bit if you find the reference bit set,
  reset it then set the history count to MAX.  If the reference bit isn't set,
  halve the history count.  If zero, reclaim that page.
5) Clock+history : same as clock, but if you find the refernece bit set,
  increment the history count (up to MAX).  If the reference bit isn't set,
  decrement the history count.  If negative, reclaim that page.

	Nwe clock algorithms based on 2Q

6) Clock/class : Partition the pages into 3 classes: New, acceessed once,
  accessed many.  The algorithm will reclaim accessed-many pages only
  if there are "too many" accessed many pages.  This is controlled by
  high and low watermarks.  If the number of accessed-many pages hits
  the high watermark, start reclaiming them.  if the number of accessed-many
  pages hits the low watermark, stop reclaiming them.

  On a miss, bring in that page and mark it new.
  To find free frames, scan through the page frames as in a clock 
  algorithm.  The scanning daemon takes the following actions:
    1) if the page is marked "new", reset the reference bit, mark the
       page "accessed once".
    2) if the page is marked "accessed once", test the reference bit.
       if set, set the history count to 1, clear the reference bit, 
       mark the page "accessed many".  Else reclaim the page.
    3) If the page is marked "accessed many", test the refernece bit.
	if set, clear the reference bit and increment the history count.
        else decrement the history count to a floor of zero.
        If the algorithm is reclaiming accessed-many pages and the
        history count was zero before decrementing, reclaim that page.

7) Clock/2Q: Partition the pages into 3 classes: New (A1), acceessed once (A2),
  accessed many (Am).  Instead of putting all the pages in the
  same list, use a different list for A1 pages, A2 pages, and Am pages.
  Manage the lists as a FIFO queue.

  On a miss:
   1) Push the new page into the tail of the A1 queue, pop page P1
      from the head of the A1 queue.
   2) clear the reference bit of P1.  Push P1 onto the tail of A2,
      pop P2 from the head of P2.
      The tail of A2 (A2in) stores actual pages, the head stores page
      references (A2out).
   3) Examine the reference bit of P2.  If not set, reclaim that page.
      Else, reset the reference bit, set the history count to 1,
      and push it into the tail of Am.  Scan through Am (popping from
      the head and pushing into the tail) using the clock/history
      algorithm until you find a free page.

   
Parameters:
LRU/2, 2Q : these results are from the VLDB paper.
clock/halve : we found that setting MAX=2 gave the best performance,
 but there is little sensitivity to the parameter.
clock/history : MAX=7
clock/class : MAX=5, the watermarks are 30% and 40% of the total number
 of pages.
clock/2Q : MAX=7, A1 and 1/3 of the frames, A2in has 1% of the frames,
   A2out has space to store references to pages in 50% of the frames.

	Ted: say that the numbers are the hit ratio

Windowing trace:
buffers         clk2q   clkclass hlv:2  hst:7   clock	LRU	LRU/2	2Q
20              .8159   .7988   .7664   .7761   .7667	.772	.809	.832
40              .9468   .9494   .9394   .9457   .9378	.946	.946	.946
60              .9765   .9761   .9728   .9753   .9726
80              .9862   .9855   .9829   .9851   .9826
100             .9903   .9896   .9883   .9894   .9882
120             .9939   .9934   .9926   .9931   .9925

DB2 trace:
buffers         clk2q   clkclass hlv:2  hst:7   clock	LRU	LRU/2	2Q
500             .6346   .6200    .6032  .6117   .5990	.606	.610	.624
1000            .6859   .6709    .6525  .6589   .6490	.654	.665	.686
2000            .7271   .7170    .7015  .7072   .6984	.704	.712	.729
5000            .7777   .7705    .7589  .7634   .7564	.762	.768	.778
10000           .8056   .8038   .7959   .7996   .7949	.801	.802	.801
20000           .8280   .8272   .8233   .8250   .8231


Scans (same parameters as in the VLDB paper):
average scan length=2000
buffers         clock   clkhst  clkcls  clk2q	LRU	LRU/2	2Q
1000            .2621   .2674   .3069   .3339
5000            .3726   .3827   .4131   .4387	.3826	.4448	.4673
10000           .4442   .4546   .4763   .4954	.4526	.5017	.5149
20000           .5359   .5449   .5555   .5656


	Ted: Great. Give a qualitative conclusion.

From shasha@SHASHA.CS.NYU.EDU Thu Mar 10 18:25:31 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA17404; Thu, 10 Mar 94 18:25:30 -0500
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA20122; Thu, 10 Mar 94 18:25:29 -0500
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA17400; Thu, 10 Mar 94 18:25:26 -0500
Date: Thu, 10 Mar 94 18:25:26 -0500
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9403102325.AA17400@SHASHA.CS.NYU.EDU>
To: ted@cis.ufl.edu
Subject: jim gray
Cc: shasha@cs.NYU.EDU
Status: R

Dear Ted,
Looks good.
I have modified only a tiny bit for the sake of tone.
Dennis

------

Jim,
Dennis and I have submitted our 2Q paper to VLDB (it was rejected by SIGMOD).
In the submission, we have not include a comparison to any of the clock
algorithms.
We do make a comparison to the implementation of GCLOCK described
by Nicola, Dan and Dias in the 1992 SIGMETRICS, which sets the history
count of the referenced item on every reference.
In case you are wondering about the relative performance of the algorithms,
we put together a comparison table.
Do you think that we should add a comparison to the (general) clock algorithms
to the VLDB paper?

Algorithms:

   Buffer management algorithms

1) LRU
2) LRU/2
3) 2Q

   	Clock algorithms ("general" clock algorithms)

4) Clock : Arrange the page frames in a ring.  On a hit, set the reference bit.
   On a miss, scan through the pages.  If the reference bit is set, reset it.
   else reclaim that page.
4) Clock+halving: same as clock, bit if you find the reference bit set,
  reset it then set the history count to MAX.  If the reference bit isn't set,
  halve the history count.  If zero, reclaim that page.
5) Clock+history : same as clock, but if you find the refernece bit set,
  increment the history count (up to MAX).  If the reference bit isn't set,
  decrement the history count.  If negative, reclaim that page.

	New clock algorithms based on 2Q

6) Clock/class : Partition the pages into 3 classes: New, acceessed once,
  accessed many.  The algorithm will reclaim accessed-many pages only
  if there are "too many" accessed-many pages.  This is controlled by
  high and low watermarks.  If the number of accessed-many pages hits
  the high watermark, start reclaiming them.  if the number of accessed-many
  pages hits the low watermark, stop reclaiming them.

  On a miss, bring in that page and mark it new.
  To find free frames, scan through the page frames as in a clock 
  algorithm.  The scanning daemon takes the following actions:
    1) if the page is marked "new", reset the reference bit, mark the
       page "accessed once".
    2) if the page is marked "accessed once", test the reference bit.
       if set, set the history count to 1, clear the reference bit, 
       mark the page "accessed many".  Else reclaim the page.
    3) If the page is marked "accessed many", test the refernece bit.
	if set, clear the reference bit and increment the history count.
        else decrement the history count to a floor of zero.
        If the algorithm is reclaiming accessed-many pages and the
        history count was zero before decrementing, reclaim that page.

7) Clock/2Q: Partition the pages into 3 classes: New (A1), acceessed once (A2),
  accessed many (Am).  Instead of putting all the pages in the
  same list, use a different list for A1 pages, A2 pages, and Am pages.
  Manage the lists as a FIFO queue.

  On a miss:
   1) Push the new page into the tail of the A1 queue, pop page P1
      from the head of the A1 queue.
   2) clear the reference bit of P1.  Push P1 onto the tail of A2,
      pop P2 from the head of P2.
      The tail of A2 (A2in) stores actual pages, the head stores page
      references (A2out).
   3) Examine the reference bit of P2.  If not set, reclaim that page.
      Else, reset the reference bit, set the history count to 1,
      and push it into the tail of Am.  Scan through Am (popping from
      the head and pushing into the tail) using the clock/history
      algorithm until you find a free page.

   
Parameters:
LRU/2, 2Q : these results are from the VLDB paper.
clock/halve : we found that setting MAX=2 gave the best performance,
 but there is little sensitivity to the parameter.
clock/history : MAX=7
clock/class : MAX=5, the watermarks are 30% and 40% of the total number
 of pages.
clock/2Q : MAX=7, A1 and 1/3 of the frames, A2in has 1% of the frames,
   A2out has space to store references to pages in 50% of the frames.

The tables show the hit rates of the algorithms under different inputs
and with varying numbers of buffers.

Windowing trace:
buffers         clk2q   clkclass hlv:2  hst:7   clock	LRU	LRU/2	2Q
20              .8159   .7988   .7664   .7761   .7667	.772	.809	.832
40              .9468   .9494   .9394   .9457   .9378	.946	.946	.946
60              .9765   .9761   .9728   .9753   .9726
80              .9862   .9855   .9829   .9851   .9826
100             .9903   .9896   .9883   .9894   .9882
120             .9939   .9934   .9926   .9931   .9925

DB2 trace:
buffers         clk2q   clkclass hlv:2  hst:7   clock	LRU	LRU/2	2Q
500             .6346   .6200    .6032  .6117   .5990	.606	.610	.624
1000            .6859   .6709    .6525  .6589   .6490	.654	.665	.686
2000            .7271   .7170    .7015  .7072   .6984	.704	.712	.729
5000            .7777   .7705    .7589  .7634   .7564	.762	.768	.778
10000           .8056   .8038   .7959   .7996   .7949	.801	.802	.801
20000           .8280   .8272   .8233   .8250   .8231


Scans (same parameters as in the VLDB paper):
average scan length=2000
buffers         clock   clkhst  clkcls  clk2q	LRU	LRU/2	2Q
1000            .2621   .2674   .3069   .3339
5000            .3726   .3827   .4131   .4387	.3826	.4448	.4673
10000           .4442   .4546   .4763   .4954	.4526	.5017	.5149
20000           .5359   .5449   .5555   .5656


We find that of the general clock algorithms, clock+history
can usually beat LRU by a small margin, but the others are not as good
as LRU.  However, these algorithms are usually not as good as LRU/2
or 2Q.  The clock algorithms that mimic 2Q (clock/class and clock/2Q)
are competitive with and sometimes superior to LRU/2 and 2Q.

A couple of months ago you commented that an advantage of the clock algorithm
is that you can measure the memory need of the system by measuring
the clock velocity.  You asked:

>> So, here are some interesting questions:
>>  Is 2Q better than clock?
	Yes, both 2Q and the clock-style implementations of 2Q are better
	than the general clock algorithms.
>>  What is the analog of the clock velocity to decide memory pressure 
>>   and drive automatic buffer growth?
	For 2Q, you would need to use a page fault frequency mechanism.
	For clock/2Q or clock/class, you have a clock pointer, and you
	can use it.




From ted@cis.ufl.edu Fri Mar 11 15:12:39 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA21332; Fri, 11 Mar 94 15:12:37 -0500
From: ted@cis.ufl.edu
Received:  from localhost  by squall.cis.ufl.edu (8.6.4/4.11)
	id PAA22233; Fri, 11 Mar 1994 15:12:34 -0500
Date: Fri, 11 Mar 1994 15:12:34 -0500
Message-Id: <199403112012.PAA22233@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Gray's response
Status: R


>From jimgray@sfbay.ENET.dec.com Thu Mar 10 20:13:53 1994
Received:  from mts-gw.pa.dec.com  by cis.ufl.edu (8.6.4/4.11)
        id UAA09011; Thu, 10 Mar 1994 20:13:50 -0500
Received: by mts-gw.pa.dec.com (5.65/13Jan94)
        id AA20264; Thu, 10 Mar 94 17:13:11 -0800
Message-Id: <9403110113.AA20264@mts-gw.pa.dec.com>
Received: from sfbay.enet; by decpa.enet; Thu, 10 Mar 94 17:13:12 PST
Date: Thu, 10 Mar 94 17:13:12 PST
From: "Jim Gray, Digital dtn:542-3955. 455 Market, 7th fl. SF Ca 94133 tel: 415-882-3955, fax:..x3991  10-Mar-1994 1711" <jimgray@sfbay.ENET.dec.com>
To: ted@cis.ufl.edu
Cc: carr_richard@tandem.com
Apparently-To: carr_richard@tandem.com, ted@cis.ufl.edu
Subject: clocks and 2Q
Status: RO

ted:
 Great!!
 Yes, comparing your results to general clock and WSclock IS
     useful, especially since you show yours is better on some
     "real" workloads.
Jim


---Ted

From ted@cis.ufl.edu Fri Mar 25 12:29:44 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA10969; Fri, 25 Mar 94 12:29:42 -0500
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/8.6.7-cis.ufl.edu)
	id MAA00500; Fri, 25 Mar 1994 12:10:40 -0500
Date: Fri, 25 Mar 1994 12:10:40 -0500
Message-Id: <199403251710.MAA00500@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: 2Q for memory cahces
Status: R

Dennis,
If you remember, I asked a student to look at applying 2Q to multiprocessor
caches.  He has generated some results, and they aren't that great.
Using an 8-way associative cache, he can get a small improvement
for the right set of parameters, but sometimes the performance is worse.

There is a new faculty here, Jih-Kwon Peir, who does a lot of work
in cache design.  He has built a facility to generate traces from an
executing program and use it to evaluate a cache design.
I talked to him about the 2Q-cache idea.
He thinks that LRu is very hard to beat, but that it might be possible
to show a good result.

The advantage of working with Peir is that he gan generate good traces
(the student is using a standard set that are generated by ATUM,
and might not be good other than for teaching).
Also, he knows a lot about cache design.
If I pursue this project with him, both he and his postdoc will
(naturally) expect to have their names on any papers.
OTOH, If I don't work with him, its not likely that I can make the
idea work.

How do you feel about it?  

Also, any luck with Doshi?

	Ted

From shasha@SHASHA.CS.NYU.EDU Tue Apr  5 14:41:54 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA25096; Tue, 5 Apr 94 14:41:53 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA19293; Tue, 5 Apr 94 14:41:51 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA25092; Tue, 5 Apr 94 14:41:50 -0400
Date: Tue, 5 Apr 94 14:41:50 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9404051841.AA25092@SHASHA.CS.NYU.EDU>
To: ted@squall.cis.ufl.edu
Subject: unexpected competition
Cc: shasha@cs.NYU.EDU
Status: R

Dear Ted,
There is an article in the March 1994 Computer p. 43
that looks a lot like 2Q.
It works like this: it has a probationary segment and a protected segment.

A page when faulted in enters the most recently side of the 
probaionary segment.
If never accessed agin, then it is discarded.
If hit, then it is promoted to the most recently accessed side
of the protected segment.
If never accessed again, it is demoted to the probationary segment
(most recently accessed part).

There is no history of old pages, but otherwise it is 
uncomfortably close.

Here is the algorithm in detail:

newly faulted page goes to front of probationary queue (A1in)

on hit:
  page goes to front of protected queue (AM)

on need for space in AM:
  move page at back of AM to front of A1

on need for space in A1in
   discard page.

Do you think you could simulate this?
Thanks,
Dennis


From shasha@SHASHA.CS.NYU.EDU Sat Apr 16 10:57:04 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA16278; Sat, 16 Apr 94 10:57:03 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA16277; Sat, 16 Apr 94 10:57:02 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA16274; Sat, 16 Apr 94 10:56:53 -0400
Date: Sat, 16 Apr 94 10:56:53 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9404161456.AA16274@SHASHA.CS.NYU.EDU>
To: rjtm@almaden.ibm.com
Subject: page traces from some ibm system
Cc: mess@svpal.org, shasha@cs.NYU.EDU, ted@squall.cis.ufl.edu
Status: R

Dear Dr. Morris,

Ted Johnson and I have been working on new paging algorithms
based on the LRU/k algorithm, but with constant time complexity.
We have such an algorithm called 2Q that seems to perform well
on traces that Ted Messinger had provided before he retired.
At one point, he offered us more data, but we didn't have disk space
at the time.
Now we have the space, but Ted has retired.
He thinks you might have traces or be able to get the ones
he has.
I enclose his email on that score below.
If you could get us page traces from a system, we would appreciate
it very much.
We think our algorithm is good and would like further evidence
of its virtues so industrial types might think about using it.

Many thanks,
Dennis
>From mess@svpal.org Sat Apr 16 10:39:16 1994
Received: from svpal.mvhs.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA16208; Sat, 16 Apr 94 10:39:15 -0400
Received: by svpal.org (Smail3.1.28.1 #4)
	id m0psBV5-00020WC; Sat, 16 Apr 94 07:37 PDT
Date: Sat, 16 Apr 1994 07:37:39 -0700 (PDT)
From: Ted Messinger <mess@svpal.org>
Subject: Re: Traces
To: Dennis Shasha <shasha@SHASHA.CS.NYU.EDU>
In-Reply-To: <9404140913.AA02563@SHASHA.CS.NYU.EDU>
Message-Id: <Pine.3.88.9404160745.A26298-0100000@svpal.svpal.org>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: R

Dennis,

I should have included Roberts ID. It's RJTM at almaden.ibm.com.
He knows where the tapes are. The real problem is going to be finding
a person who knows how to get the data off the tapes. To be honest
with you, I don't think anyone will be able to do it. It requires
knowledge of the collected data, (the tapes contain more then the 
buffer manager data), my 'preprocessing' program, and the MVS system.
I don't believe that knowledge exists at Almaden.

Cheers,
Ted


From rjtm@almaden.ibm.com Tue Apr 19 19:19:50 1994
Received: from ALMADEN.IBM.COM by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA08894; Tue, 19 Apr 94 19:19:49 -0400
Received: from ALMADEN by almaden.ibm.com (IBM VM SMTP V2R2) with BSMTP id 8526;
   Tue, 19 Apr 94 16:17:53 PDT
Received: by ALMADEN (XAGENTA 3.0) id 1670; Tue, 19 Apr 1994 16:17:52 -0700
Received: by rjtm.almaden.ibm.com (AIX 3.2/UCB 5.64/4.03)
          id AA19619; Tue, 19 Apr 1994 16:19:32 -0700
Date: Tue, 19 Apr 1994 16:19:32 -0700
From: <rjtm@almaden.ibm.com> (Robert J. T. Morris)
Message-Id: <9404192319.AA19619@rjtm.almaden.ibm.com>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  page traces from some ibm system
Cc: mess@svpal.org
Status: RO


Dennis,
I think that Ted is right, in that although we have the
data on tapes here, and would be willing to give it to you,
there is no-one with Ted's mainframe expertise here now
in Almaden that would be able to process it and get it to
you. I would not want to send you the raw tapes, we have
to be somewhat careful that we know what we are sending you
and are not violating our agreement with customer sites
where we collected it.

The only ideas I have are the following:
1) Ted might be willing to collaborate with you on
a "retirement research project".
2) We have been trying to get Ted back with us on
a part time basis (because of his rare skills), but
various regulations prevent us from doing that before
year end. Even at that time, Ted may or may not want
to do it.

Sorry I cant be more careful. I've seen a paper on LRU/n,
and it looks interesting and sensible. Good luck.

Robert


Status: R

From ted@cis.ufl.edu Mon Apr 25 11:45:55 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA09589; Mon, 25 Apr 94 11:45:53 -0400
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/8.6.7-cis.ufl.edu)
	id LAA11598; Mon, 25 Apr 1994 11:45:49 -0400
Date: Mon, 25 Apr 1994 11:45:49 -0400
Message-Id: <199404251545.LAA11598@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  vldb
Status: R

OK, I should be able to make the new experiments by the end of next week.
We'll see about Bates.
	Ted

Weikum + Tandem second chance.

From ted@cis.ufl.edu Wed May  4 13:11:56 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA14499; Wed, 4 May 94 13:07:29 -0400
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/8.6.7-cis.ufl.edu)
	id NAA03911; Wed, 4 May 1994 13:07:31 -0400
Date: Wed, 4 May 1994 13:07:31 -0400
Message-Id: <199405041707.NAA03911@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Weikum's trace
Status: R

Dennis,
I just finished running & tabulating the results from Weikum's file.
The results are the same as for the other traces.

\begin{table}
\begin{center}
\begin{tabular}{|c| c c c c c|}
\hline
Number of page slots &  LRU/2 & LRU & Gclock & 2nd Chance & 2Q  & 2Q \\
                     &        &       &      &   & Kin=30\% & Kin=20\% \\ \hline
100     & .086  & .083  & .083  & .083  & .096  & .090 \\ \hline
200     & .164  & .144  & .144  & .141  & .196  & .181 \\ \hline
500     & .284  & .234  & .236  & .223  & .334  & .329 \\ \hline
1000    & .384  & .328  & .327  & .318  & .405  & .405 \\ \hline
2000    & .454  & .425  & .425  & .418  & .465  & .464 \\ \hline
5000    & .544  & .537  & .538  & .532  & .556  & .557 \\ \hline
10000   & .616  & .607  & .607  & .602  & .626  & .624 \\ \hline
20000   & .678  & .671  & .671  & .665  & .681  & .680 \\ \hline
\end{tabular}
\caption{OLTP trace hit rate comparison of LRU, LRU/2, and 2Q.}
\label{oltp.tab}
\end{center}
\end{table}

I need to run 2nd chance on the other 2 traces, but that is easy
and I don't expect any surprises.
I got the formatting instructions from VLDB,
with any luck I can have a draft done by friday.
	Ted

From ted@cis.ufl.edu Mon May  9 12:12:52 1994
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA17557; Mon, 9 May 94 12:12:48 -0400
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/8.6.7-cis.ufl.edu)
	id MAA21592; Mon, 9 May 1994 12:12:38 -0400
Date: Mon, 9 May 1994 12:12:38 -0400
Message-Id: <199405091612.MAA21592@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  vldb paper question
Status: R

Dennis,
I'll try to contact Gerhard Weikum to get an explanation.
I'm getting higher hit rates than they did (for both LRU and LRU/2), so
I don't know how to explain the differences.
I note that for the 2-pool experiment, I get exactly the
same hit rates.
	Ted

From shasha@SHASHA.CS.NYU.EDU Tue May 17 20:46:13 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA03234; Tue, 17 May 94 20:46:12 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA01337; Tue, 17 May 94 20:45:57 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA03225; Tue, 17 May 94 20:45:45 -0400
Date: Tue, 17 May 94 20:45:45 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9405180045.AA03225@SHASHA.CS.NYU.EDU>
To: bga@sybase.com
Subject: Re:  2Q paper
Cc: shasha@cs.NYU.EDU
Status: R

		Dear Brian,
		Thanks for looking at the paper.
		I think we must not have made the algorithm clear enough.
		Dennis

>From bga@sybase.com Tue May 17 14:36:34 1994

Hi,
	I finally sat down and read your 2Q paper.  It was vary interesing and
	I'm thinking of possibly trying it out in our product.  I have some
	questions and comments about the paper.

	*)  Comment:  I found the explaination of the algorithm a bit congusing.
	The pseudo-code helped, but I think a picture of the three queues and
	how things are moving around would improve the readability a bit.

		You are right. We do that in the transparencies.

	*)  Question:  Do you have a patent on this?

		No. Public domain algorithm. To appear in VLDB 94 in fact.

	*)  Question:  I was reading the March '94 issue of Computer and in
	there is an algorithm called SLRU (segmented least receintly used).  Can
	you comment on the differences between your algorithm and this one.  They
	seem similar,  you might want to discuss it in your paper.

		We thought about discussing the differences, but
		I think we forgot to.
		Basic difference: they don't remember stuff that
		has been kicked out.
		Also, they give no numbers, so it is hard to evaluate.
		It would be reasonable to try it out, but we think
		it would perform like our simplified version.

	*)  Question:  you talk about solving the correlated reference problem
	by not promoting items in the A1in queue.  What seems strange to me is
	that there is no correlation in the Am queue:  if a buffer page makes it
	to that queue then it is always reset to the head of the queue.  What
	is the reason for the lack of correlation?

		By the time a page makes it to the Am queue.
		it is truly popular (it has long-term popularity
		as opposed to the popularity of a page that has been
		hit a lot by a single process doing, say, a scan).
		So, we don't think we need correlation.
		Our tests with traces bear this out.

	*)  Question:  You do not promote items in the A1in queue in order to
	detect correlation.  However, there seems to be some strange behavior of
	this replacement strategy in a simple nested loop join.   I have a page
	for relation R with tuples that I am using to scan pages of relation
	Q.  It seems that the access to the R page would be correlated since the
	page is in A1in.  The Q pages are being faulted in and are moving the R
	page to the tail of the A1in queue.  Say that the buffer is full, so 
	eventually the R page makes it to the end of A1in:  R page is ejected
	and a reference to it is put on A1out.  The next access to R, faults the
	page in and moves the page to the Am queue, where it is not correlated
	anymore.  The question is:  why was the R page faulted out at all?  This
	is a very common case where 2Q would eject a page that I need.  Am I
	missing something?

		You are right that we fault the R page out unnecessarily
		in this case. But we do this only once. If the page has
		k tuples then we do two faults instead of 1. The algorithm
		does the right thing in that it gives preference to
		Q pages.

The last two items relate to a worry I have with 2Q.  The correlation in
A1in and the lack of correlation in Am seem to create a situation where
a heavily access page is in A1in, but gets ejected because of the large
correlation window while the lightly access page langishes in memory
because it is sparsly accessed over time and it has no correlation.  I
understand that all of these algorithms are heuristics and one can
always construct cases where the strategy performs poorly.  Basically if
you can answer my last question,  I'll feel a lot better about 2Q.

		But if the heavily accessed page has longterm popularity
		it will end up in Am. We are just trying to avoid
		flashes in the pan.

-- 
Brian G. Anderson           email:  brian.anderson@sybase.com
Sybase, Inc.                voice:  (510) 922-0937
2000 Powell St., 5th Foor     fax:  (510) 922-5388
Emeryville, CA 94608


		Best,
		Dennis

From doshi@usl.novell.com Sun May 22 19:06:23 1994
Received: from usl.summit.novell.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA01195; Sun, 22 May 94 19:06:17 -0400
Date: Sun, 22 May 94 19:06 EDT
Message-Id: <9405221907.AA17453@summit.novell.com>
From: doshi@summit.novell.com
To: doshi@summit.novell.com, shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Received: from usl by summit.novell.com; Sun, 22 May 94 19:06 EDT
Subject: Re: which date is better for you?
Content-Length: 224
Content-Type: text/plain
Status: R


Dear Dennis-
	Just completed Linpack runs with Gaede as background. There is
an 8% improvement with clkcls+hd over base+hd. I have yet to do one more
run, which would be base, to see what the difference will be.

	-doshi-


From shasha@SHASHA.CS.NYU.EDU Tue Jun 28 17:00:23 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA28279; Tue, 28 Jun 94 17:00:22 -0400
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA01792; Tue, 28 Jun 94 17:00:20 -0400
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA28273; Tue, 28 Jun 94 17:00:14 -0400
Date: Tue, 28 Jun 94 17:00:14 -0400
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9406282100.AA28273@SHASHA.CS.NYU.EDU>
To: shasha@cs.NYU.EDU, ted@squall.cis.ufl.edu
Subject: things to do
Status: R


Dear Ted,

Publication goal: Tocs article showing 2Q from caching
to OS tocommercial database to tertiary storage.

Tertiary storage: big question is browsing patterns
and how to use them.

Possible techniques include:
2Q per file
file as unit of access, but then the sizes are different

In general, how to use browsing patterns in coordination with
2Q.

Assumption is that a platter is the biggest unit for a file. 


In many queue, kick out files as a function ofsize.
That is, make it more likely to kick out a big file than
a smaller one.

A1in gets replaced by minimum residency requirement.

Dennis



From ted@cis.ufl.edu Tue Jul  5 16:38:21 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA18085; Tue, 5 Jul 94 16:38:15 -0400
Received: from squall.cis.ufl.edu by cs.NYU.EDU (5.61/1.34)
	id AA27303; Tue, 5 Jul 94 16:38:14 -0400
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/8.6.7-cis.ufl.edu)
	id QAA24327; Tue, 5 Jul 1994 16:38:03 -0400
Date: Tue, 5 Jul 1994 16:38:03 -0400
Message-Id: <199407052038.QAA24327@squall.cis.ufl.edu>
To: doshi@usl.Summit.Novell.COM, shasha@cs.NYU.EDU
Status: R

Dennis, Doshi

WRT Craig's comments, I agree with Doshi:
it seems to me that the algorithm looks for repeated use, etc.,
but it is very expensive to run.

I promised a more algorithmic description of clock/2Q,
so hre it is.  Also, I can send the simulator:

1) on a page miss
  a) get a frame from the free frame pool
  b) load the page onto the frame, put on the head of A1.

2) when the cleaning daemon is called,
  a) let M be the number of misses since the last time
     the page daemon was called.
  b) take M pages from the tail of A1, reset their reference bits,
     put them on the head of A2in.
  c) take M pages from the the tail of A2in.
     for each of these pages,
       i) if the refernece bit is set, put the page in Am.
       ii) else, put the page in the free frame pool.
  d) let L be the number of pages added to Am in step c).
     clean L pages from Am using the usual clock algorithm.

extentions:
  1) half-modify: put them in Am (?).
  2) using tags:  If a page is cleaned from A2in,
    push it on the head of A2out.  If you miss on a page whose
    tag in in A2out, put it in Am directly.
    Note: we can manage A2out as a guess.
    Manage A2out as a FIFO.  When there is a hit, there is no need
    to remove the tag from A2out.
    When you put a page
    on A2out, mark its VM entry.  When you remove the
    tag from A2out, unmark the entry.
    THis is how I ran the simulations.

From doshi@usl.Summit.Novell.COM Mon Aug  1 10:26:21 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA08916; Mon, 1 Aug 94 10:26:21 -0400
Received: from usl.summit.novell.com by cs.NYU.EDU (5.61/1.34)
	id AA21858; Mon, 1 Aug 94 10:26:14 -0400
Date: Mon, 1 Aug 94 10:24 EDT
Message-Id: <9408011024.AA04742@summit.novell.com>
From: doshi@summit.novell.com
To: ted@cis.ufl.edu
Cc: dennis@summit.novell.com
Received: from usl by summit.novell.com; Mon,  1 Aug 94 10:24 EDT
Subject: some questions
Content-Length: 794
Content-Type: text
Status: R


Ted:
	I am just now getting around to starting work on the Clk-2Q
implementation. I have a couple of questions, listed below:

	(a) I realize now that I received a partial listing of simulation
            code from you, consisting only of the functions init_queue()
            and access(), so I could not find out how you are setting
            sizes for A1, and A2. Can you indicate how A1 and A2 are
	    sized (if at all).
	(b) Is it correct that Am is scanned as often as needed without
            waiting, for reclaiming desired number of pages? 

	Is there a phone number that I can call you at, for a brief
discussion, on this? Also, I have been trying to understand at an
intuitive level why clk-2Q would be expected to better at selecting
replacement pages.

	Thanks!

	-doshi-


From shasha Mon Sep 26 11:50:37 1994
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA23490; Mon, 26 Sep 94 11:50:36 -0400
Date: Mon, 26 Sep 94 11:50:36 -0400
From: shasha (Dennis Shasha)
Message-Id: <9409261550.AA23490@SHASHA.CS.NYU.EDU>
To: shasha@SHASHA.CS.NYU.EDU, ted@cis.ufl.edu
Subject: Re:  so how did it go?
Status: R

	Dear Ted,

Dennis,
Not too many people in the audience, I was up against Jim Gray's
tutorial and a panel on scientific databases that I wanted to attend myself.
I did talk to Jim Gray for a few minutes -- he said that he was
glad that the paper got published, and he was surprised at the
"controversy" about the paper.  I also talked to Christos Faloutsos,
	Did he mention what controversy, e.g. who was the source.
who liked the paper.  Finally, Kurt Brown (Carey's student)
mentioend that they were going to use 2Q in some of their resource
experiments -- because they feel that 2Q is the best buffering algorithm
available.
	That sounds good --- our fame will grow slowly.

Not much progress on the file caching.
NSSDC changes the format of its log file every 2 months or so,
etc.
I do have 6 months worth of data available,
and I'm also startign to get data from an up-and-running
version of the EOSDIS archive.

	That's promising.

There are some papers in VLDB about object caching -- the VLDB paper
on "dual buffering" is concerend with decisions about moving small
objects from their page to an object caching area, and back.
So, two ideas are:
  1) can we apply a 2Q-like statistical analysis to improve a 
     dual-buffering algorithm?
  2) can we describe a file caching strategy as also being a strategy
     for "large objects"?  Mass storage people sometimes talk
     about theor file caching this way.

	Ted: I think that approach 2 is better, because statistical
	analyses tend to be controversial.

Also, I think that file caching is useful in distributed file systems.
I disagree with your colleague that "unix semantics" are proactical
in large file systems.  If you cache 64K at a time, his example of
repeatedly looking at the tail of a file won't work.  I can't get
his example to work when both processes are running on my workstation,
and they cache 8k at a time.

Ted

	I agree with you (though I can't recall who suggested that).
	Progress remains slow at USL. High priority interrupts....
	Still, Doshi wants to publish so I think it will happen.
	Dennis


From doshi@usl.novell.com Mon Sep 26 15:02:15 1994
Received: from usl.summit.novell.com by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA25321; Mon, 26 Sep 94 15:02:06 -0400
Date: Mon, 26 Sep 94 15:00 EDT
Message-Id: <9409261500.AA12172@summit.novell.com>
From: doshi@summit.novell.com
Received: from usl by summit.novell.com; Mon, 26 Sep 94 15:00 EDT
To: dennis@summit.novell.com, doshi@summit.novell.com,
        shasha@SHASHA.CS.NYU.EDU, ted@cis.ufl.edu
Content-Length: 4387
Content-Type: text
Status: R




Ted:
Dennis:

I apologize for not having communicated for a while. I was
unable to spend any time on the clk-2q work because of
intervening fires.

Anyway, this weekend I got back to clk/2q. I started to sketch
out the needed implementation, and needed to deal with an
issue not apparently dealt with in Ted's simulations. So
let me describe the problem and some of my thoughts, and hear
back what you think.

Ted's algorithm is:
------------------------------------------------------------------
1) on a page miss
  a) get a frame from the free frame pool
  b) load the page onto the frame, put on the head of A1.

2) when the cleaning daemon is called,
  a) let M be the number of misses since the last time
     the page daemon was called.
  b) take M pages from the tail of A1, reset their reference bits,
     put them on the head of A2in.
  c) take M pages from the the tail of A2in.
     for each of these pages,
       i) if the refernece bit is set, put the page in Am.
       ii) else, put the page in the free frame pool.
  d) let L be the number of pages added to Am in step c).
     clean L pages from Am using the usual clock algorithm.

extentions:
  1) half-modify: put them in Am (?).
  2) using tags:  If a page is cleaned from A2in,
    push it on the head of A2out.  If you miss on a page whose
    tag in in A2out, put it in Am directly.
    Note: we can manage A2out as a guess.
    Manage A2out as a FIFO.  When there is a hit, there is no need
    to remove the tag from A2out.
    When you put a page
    on A2out, mark its VM entry.  When you remove the
    tag from A2out, unmark the entry.

Ted recommended A1 = 7% of memory, A2 = 40% of memory (all A2in).
A2out would not be there.
------------------------------------------------------------------

The problem is what to do in accounting for page traffic from
process exits and file closes, as well as refaults of identity-full
pages back from free lists.  

I am working with the following page states:

	A1
	A2in
	Am

	F-A1 	(identityful and freed from A1 due to exit/close)
	F-A2in	(................ ......... A2in ...............)
	F-Am	(.......................... Am .................)
	P-A2in  (identityful and freed from A2in by pageout)
	P-Am	(...........................Am ............)

	Free-NoId (free pages without identity).
		When recycling (aborting identity), the OS always 
		takes pages from Free-NoId first, and from the 
		identity-full pool only if there are
		no identity-less pages available. 

- My thought is to treat a refaulting page without transition. That is,
  pages from F-A1 would go back to the state A1 when refaulted; same with
  others.

- Since many pages could leave the A1, A2in, Am states temporarily and
  refaulted back (because of file closures and process exits), the original
  algorithm cannot be applied as stated, without causing a lot of unneeded
  freeing from the Am queue.

  So I was wondering whether the following would be closer in spirit to
  the desired clk-2q semantic:

	(a) In each daemon interval, we move all pages that entered
	    A1 in the previous interval into A2. This includes pages
	    of all categories (fresh faults and refaults).

	(b) For each A2 page in a given interval, we promote the page
	    to Am at the end of the interval if it was touched. Otherwise
	    we move it to the appropriate free state. (Same treatment for
	    pages that were refaulted into A2 over the interval?)

	(c) If L pages get promoted in step (b) from A2 to Am, then we
	    must freeup L pages from Am.
	    Issue: What if a very large number of pages get promoted
		   from A2 to Am? Should we firewall against freeing 
		   too many pages out of Am?

  In this proposal, there would be no fixed quanta for A1 or A2.

- I started comparing the restated algorithm above, with the clkclass
  algorithm. Steps (a) and (b) exactly match what clkclass does, but step
  (c) is less restrained about freeing pages in comparison with clkclass,
  since there is no use of a survivor count. The pageout daemon would need 
  to walk the lists A2 and Am separately, instead of walking the global
  page list. 

What do you think? I guess Dennis and I can talk about this tomorrow,
assuming he will be here. If you feel that we can stay with Ted's 
algorithm by some appropriate handling of spontaneously freed pages,
I would like to know. 

	-doshi-





From ted@cis.ufl.edu Fri Sep 30 15:09:17 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA02468; Fri, 30 Sep 94 15:09:16 -0400
Received: from squall.cis.ufl.edu by cs.NYU.EDU (5.61/1.34)
	id AA21100; Fri, 30 Sep 94 15:09:13 -0400
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/8.6.7-cis.ufl.edu)
	id PAA06447; Fri, 30 Sep 1994 15:08:22 -0400
Date: Fri, 30 Sep 1994 15:08:22 -0400
Message-Id: <199409301908.PAA06447@squall.cis.ufl.edu>
To: doshi@usl.Summit.Novell.COM, shasha@cs.NYU.EDU
Status: RO

Doshi,
SOrry about my slow response, but I get too many interrupts also.

My comments are indicated by
 >> (comment)


------------------------------------------------------------------
1) on a page miss
  a) get a frame from the free frame pool
  b) load the page onto the frame, put on the head of A1.

2) when the cleaning daemon is called,
  a) let M be the number of misses since the last time
     the page daemon was called.
  b) take M pages from the tail of A1, reset their reference bits,
     put them on the head of A2in.
  c) take M pages from the the tail of A2in.
     for each of these pages,
       i) if the refernece bit is set, put the page in Am.
       ii) else, put the page in the free frame pool.
  d) let L be the number of pages added to Am in step c).
     clean L pages from Am using the usual clock algorithm.

extentions:
  1) half-modify: put them in Am (?).
  2) using tags:  If a page is cleaned from A2in,
    push it on the head of A2out.  If you miss on a page whose
    tag in in A2out, put it in Am directly.
    Note: we can manage A2out as a guess.
    Manage A2out as a FIFO.  When there is a hit, there is no need
    to remove the tag from A2out.
    When you put a page
    on A2out, mark its VM entry.  When you remove the
    tag from A2out, unmark the entry.


Ted recommended A1 = 7% of memory, A2 = 40% of memory (all A2in).
A2out would not be there.
 >> I thought that tags are too difficult to implement.
 >> Let me dig up my A2out experiments.
------------------------------------------------------------------

The problem is what to do in accounting for page traffic from
process exits and file closes, as well as refaults of identity-full
pages back from free lists.  

I am working with the following page states:

	A1
	A2in
	Am

	F-A1 	(identityful and freed from A1 due to exit/close)
	F-A2in	(................ ......... A2in ...............)
	F-Am	(.......................... Am .................)
	P-A2in  (identityful and freed from A2in by pageout)
	P-Am	(...........................Am ............)

	Free-NoId (free pages without identity).
		When recycling (aborting identity), the OS always 
		takes pages from Free-NoId first, and from the 
		identity-full pool only if there are
		no identity-less pages available. 

- My thought is to treat a refaulting page without transition. That is,
  pages from F-A1 would go back to the state A1 when refaulted; same with
  others.
  >> My nice clean simulations did not have to deal with free lists.
  >> Perhaps we can use them to our advantage.  In particular
  >> when you fault on a P-A2in page put it in Am.
  >> That way the A2in queue can be smaller.

- Since many pages could leave the A1, A2in, Am states temporarily and
  refaulted back (because of file closures and process exits), the original
  algorithm cannot be applied as stated, without causing a lot of unneeded
  freeing from the Am queue.

  So I was wondering whether the following would be closer in spirit to
  the desired clk-2q semantic:

	(a) In each daemon interval, we move all pages that entered
	    A1 in the previous interval into A2. This includes pages
	    of all categories (fresh faults and refaults).

	(b) For each A2 page in a given interval, we promote the page
	    to Am at the end of the interval if it was touched. Otherwise
	    we move it to the appropriate free state. (Same treatment for
	    pages that were refaulted into A2 over the interval?)

	(c) If L pages get promoted in step (b) from A2 to Am, then we
	    must freeup L pages from Am.
	    Issue: What if a very large number of pages get promoted
		   from A2 to Am? Should we firewall against freeing 
		   too many pages out of Am?

   >> Perhaps you free enough pages to make the free lists comfortably large?

  In this proposal, there would be no fixed quanta for A1 or A2.

  >> OK, but it depends on how long the daemon interval is.
  >> If the interval is not "long enough", perhaps you don't
  >> guard against correlated references and perhaps the test
  >> for admission to Am is too tight.
  >> One solution is to keep the pages from 2 intervals in A1, A2in.
  >> Or, admit re-faulted P-A2in pages to Am.
  >> 1 daemon interval is probably sufficient for A1,
  >> and perhaps the free list will provide the necessary
  >> length for A2in.

- I started comparing the restated algorithm above, with the clkclass
  algorithm. Steps (a) and (b) exactly match what clkclass does, but step
  (c) is less restrained about freeing pages in comparison with clkclass,
  since there is no use of a survivor count. The pageout daemon would need 
  to walk the lists A2 and Am separately, instead of walking the global
  page list. 

What do you think? I guess Dennis and I can talk about this tomorrow,
assuming he will be here. If you feel that we can stay with Ted's 
algorithm by some appropriate handling of spontaneously freed pages,
I would like to know. 

  >> It looks like a good algorithm to me.
  >> I did use a survivor count in Am though.
  >> Perhaps it is not needed?

	-doshi-






From usl!doshi@summit.novell.com Thu Nov 17 21:06:47 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA26910; Thu, 17 Nov 94 21:06:46 -0500
Received: from usl.summit.novell.com by cs.NYU.EDU (5.61/1.34)
	id AA29438; Thu, 17 Nov 94 21:06:43 -0500
Date: Thu, 17 Nov 94 21:04 EST
Message-Id: <9411172104.AA13222@summit.novell.com>
From: doshi@summit.novell.com
To: dennis@summit.novell.com, ted@cis.ufl.edu
Cc: doshi@summit.novell.com
Received: from usl by summit.novell.com; Thu, 17 Nov 94 21:07 EST
Subject: results
Content-Length: 1667
Content-Type: text
Status: R



Dear Ted, Dennis:

	As I conveyed to Dennis, the modified clock-class (along the
lines we had discussed for getting clock-2Q like behavior) showed no
difference for the Hot-Cold benchmark ("dennis") and for the linpack
matrix multiply, over clock-class algorithm. I did not try more testing,
since the algorithm failed to give improvements for the "friendly"
workloads.

	Dennis suggested that we try to collect some more data to show
that clock-class is safe. He suggested running full kernel builds, and
I agreed that this would be an easy and valuable data point to gather.

	For the kernel builds, we had to reduce memory down to 7MB in 
order to provoke paging. The base algorithm was better than clock-class 
by 2%; this appeared to be due to increased swapper activity with 
clock class. Disabling the swapper reduced the difference to less 
than 1%, which is in the noise range. Note also, that clock-class as 
currently implemented is more expensive because we have to keep duplicate 
state outside the page structures; if integrated with the OS by proper 
design, its overhead would be the same as that of the base kernel. 
In many ways, kernel builds have the same "bad" properties as gaede: 
they are excessive on the use of segmap and have a significant
amount of process creations and exits (> 7200, which surprised me!).

	Dennis and I discussed what to do in the immediate term. We are
to seek participation in Amadeus to see if we can "sell" the clk-class
paging policy for Amadeus nucleus in exchange for a commitment to carry
out the development. We thought we should also begin documenting the work.

	Let me know your thoughts.

	-K. Doshi-




From ted@cis.ufl.edu Fri Nov 18 14:32:47 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA01194; Fri, 18 Nov 94 14:32:46 -0500
Received: from usl.summit.novell.com by cs.NYU.EDU (5.61/1.34)
	id AA03876; Fri, 18 Nov 94 14:32:04 -0500
From: ted@cis.ufl.edu
Received: by summit.novell.com; Fri Nov 18 14:29 EST 1994
Received: by squall.cis.ufl.edu (8.6.7/8.6.7-cis.ufl.edu)
	id OAA26722; Fri, 18 Nov 1994 14:30:28 -0500
Date: Fri, 18 Nov 1994 14:30:28 -0500
Original-From: ted@cis.ufl.edu
Content-Length: 862
Content-Type: text
Message-Id: <199411181930.OAA26722@squall.cis.ufl.edu>
To: dennis@summit.novell.com, doshi@summit.novell.com, ted@cis.ufl.edu
Subject: Re:  results
Status: R

Doshi,
I'm disappointed that clock/2Q gave no improvement.
But, real applications are different than simulations.

It would be nice to put the algorithm to actual use,
then it would be easier to see any benefit.

I've been looking into file migration algorithms.
Apparently, the most popular method is to
take a time space product:
  let R_i be the time since the last reference to file i
  let S_i be the size of file i
When you need to choose a file for replacement,
choose the one with the largest R_i*S_i

In practice, this is implemented by putting files into bins based
on size, keeping the files in each bin in a LRU list.

Perhaps there is room for a second-reference algorithm,
but early studies by Smith claimed that there is no
statistical connection between inter-reference times.
I'm a bit skeptical of Smith's results, we'll see what happens.
	Ted

From shasha@SHASHA.CS.NYU.EDU Mon Nov 28 16:55:29 1994
Received: from CS.NYU.EDU by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA20736; Mon, 28 Nov 94 16:55:29 -0500
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (5.61/1.34)
	id AA14864; Mon, 28 Nov 94 16:55:27 -0500
Received: by SHASHA.CS.NYU.EDU (5.61/1.34)
	id AA20732; Mon, 28 Nov 94 16:55:26 -0500
Date: Mon, 28 Nov 94 16:55:26 -0500
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9411282155.AA20732@SHASHA.CS.NYU.EDU>
To: shasha@cs.NYU.EDU, ted@cis.ufl.edu
Subject: nasa
Status: R

>> Dear Ted,
>> Thanks for your note. My comments follow >> below.

Dennis,
I've finally made some progress on the file/object caching algorithms.
I have to write a paper analyzing the NASA traces by Jan 9,
so it seemed like time to get moving (I need to put a caching analysis
in the paper).

I look at 4 algorithms:
1) LRU -- the usual.

2) STWS (Space Time Working Set) : choose for replacement the
  file with the largest size*(time since last reference) product.
  Like LRU, STWS is the mirror image of the optimal algorithm:
  Replace the file with the largest size*(time to next reference) product.
    -- This algorithm is good, but very expensive

>> My understanding is that this is optimal assuming that replacing
>> any object costs the same amount. If bigger objects cost more,
>> then their effective size should be made smaller than their actual
>> size. 

3) STbin : put files in bins based on the log of their size.
   manage each bin using LRU.  To pick a file for replacement, look
   at the tail file in each bin.  Choose for replacement
   the file with the largest space*time product.
    -- this is the algorithm most often implemented.

4) 2Qbin : manage A1 as a FIFO, A2 as a STbin.  Limit A1 to X% of memory.

5) 2bin : Manage A1 as STbin, A2 as STbin.  Limit A1 to X% of memory.

I used for the trace files  the log files from NSSDC, since 1/1/94 they
report file sizes.
I broke the data into 3-month chunks, tr_1-3.94 means data from 1/94 to 3/94,
etc.
2Qbin does not perform well, so I just report the X=30% experiments.
2bin does perform well, so I report runs for X=30%, X=50%, X=70%.
The block size is 1K, and I ran 1G - 5G experiments.

                Hit Rates

tr_1-3.94         77415 accesses

blocks          LRU     STWS    STbin   2Qbin   2bin:.3 2bin:.5 2bin:.7         
1048576         0.144   0.234   0.195   0.140   .176    .189    .201
2097152         0.243   0.314   0.267   0.176   .225    .232    .244
3145728         0.288   0.341   0.337   0.212   .235    .288    .333
4194304         0.309   0.355   0.350   0.236   .269    .336    .355
5242880         0.320   0.364   0.360   0.263   .330    .360    .361

tr_4-6.94       92325 accesses

blocks          LRU     STWS    STbin   2Qbin   2bin:.3 2bin:.5 2bin:.7
1048576         0.155   0.235   0.189   0.190   .210    .214    .212
2097152         0.202   0.313   0.194   0.285   .274    .278    .268
3145728         0.272   0.362   0.286   0.303   .307    .329    .324
4194304         0.292   0.423   0.448   0.320   .338    .343    .401
5242880         0.304   0.471   0.500   0.355   .475    .481    .473

tr_7-9.94       113594 accesses

blocks          LRU     STWS    STbin   2Qbin   2bin:.3 2bin:.5 2bin:.7
1048576         0.173   0.270   0.205   0.164   .196    .209    .200
2097152         0.248   0.308   0.260   0.200   .263    .266    .262
3145728         0.271   0.328   0.309   0.243   .295    .311    .320
4194304         0.302   0.340   0.340   0.308   .298    .350    .346
5242880         0.327   0.366   0.377   0.328   .339    .381    .374

   
So, you can see that LRU has lousy performance, and unsurprisingly
2Qbin is lousy also.
STWS has the best performance, but then is does a lot of computation.
STbin generally does a good job, and sometimes is better than STWS.
2Bin does well.  The fair comparison is to STbin, since both require
comparable work.  By varying the value of X, I can make 2Bin beat STbin
in most cases.

It is a promising start.
My thoughts about algorithm development are:
  1) see if LRU/2 + STWS will work.  I.e., select for replacement
     the file whose product of the file size & time from the penultimate
     reference is the largest.
>> Yes, good idea.
  2) look at a STWS+2Q algorithm -- i.e., implement STWS in both queues,
     instead of using the bins.
>> Yes, good idea. This would be interesting.
  3) look over some of the automatic tuning ideas we were talking about
     a few months ago.
>> Which ones?
  4) Archives tend to have a minimum residency requirement : the file
     must state on-line for at least 2 days, etc.  Try to take advantage
     of that to look for repeat references "for free".
>>Absolutely, this could be a big win.
  5) In the NSSDC trace, the file sizes range from 10K to 1G.
     In a very large file, the transfer cost dominates the faulting cost.
     So, STWS is not the mirror image of the optimal algorithm anymore.
     Can we account for this?
>> By reducing the benefit of knocking out large files, e.g. 
>> if it costs twice as much as a standard file 
>> to bring in f because of its large
>> size then the benefit of freeing it should be 1/2 what it would
>> be if f only cost unit time to be brought in. 
>> See my comment above.

Also, I'd like to get a different source of references.
Any suggestions?
>> Postgres people at Berkeley. 
>> Here is one address that I believe is valid.

>> alias lindaanne rehun@ucbqal.berkeley.edu rehun@jade.berkeley.edu

	Ted

>> Thanks, Dennis.

From shasha@SHASHA.CS.NYU.EDU Wed Feb  1 21:16:15 1995
Received: from cs.NYU.EDU by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA00597; Wed, 1 Feb 95 21:16:14 EST
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (4.1/1.34)
	id AA01448; Wed, 1 Feb 95 21:16:09 EST
Received: by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA00594; Wed, 1 Feb 95 21:16:09 EST
Date: Wed, 1 Feb 95 21:16:09 EST
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9502020216.AA00594@SHASHA.CS.NYU.EDU>
To: ted@cis.ufl.edu
Cc: shasha@cs.NYU.EDU
Status: R

	Dear Ted,
	Nice to hear from you.

Dennis,
A few things:

1) How is the page replacement project coming along?
I've seen some announcements for SOSP, perhaps
we can send a writeup there?

	No more news I'm afraid. Doshi was called off to a 
	very high priority project and then recently had a baby.
	I know this is unsatisfactory, but I think that this 
	will be a strong OS paper only if it is implemented in a real
	system. That is currently the plan for some future releases
	of the operating system, which is great.
	The bad part is that we can't publish about it soon.
	Do you agree?

2) I received the following note a few days ago

-----------------------------------------

                                        Young Chul Park
                                        Dept. of Computer Science
                                        College of Natural Science
                                        Kyungpook National University
                                        Taegu, Korea, 709-701
                                        ycpark@bh.kyungpook.ac.kr

                                        January 25, 1995.

Dear professor Theodore Johnson,

   I am a professor at Kyungpook National University in Korea.
I received my Ph.D degree on December 1989 from professor Peter Scheuermann 
at Northwestern University in  Evanston IL. U.S.A.

   Recently I have read your paper co-authored with Dennis Shasha, which
is titled "2Q: A Low Overhead High Performance Buffer Management Replacement
Algorithm" published on 1994 VLDB Conference.
Since I have been implementing a relational DBMS called "BADA DBMS"
sponsored by Korean Goverenment and I have not satisfied at all the previous
schemes for the buffer replacement, when I have read your paper,
I have been so delighted to get such an excellent algorithm.
I finally decided to implement your algorithm and implemented that
on BADA DBMS.

   While I am implementing your 2Q algorithm, I got some new idea
to improve your algorithm. In such a way that I modified your algorithm
that I called "modified 2Q algorithm" and implemented that modified version
on BADA DBMS.

   In theses days, almost all the new algorithms about already known problems
should present simulation results.  Since I have no knowledge about 
the simulation and also you already have suvh an environment, 
if you are interested in doing simulation on my modification of 2Q algorithm, 
I will send you my "modified 2Q" algorithm.

   If you are interested in this work, please let me know.
According to your response, I will send you my "modified 2Q algorithm".
After reading my modification, if you are still interested on that,
we can co-work for the simulation study and also publish a paper on the result.
I will be available from February 2 since in the mean time
I will be out of town.
I hope your positive response as soon as possible and I sincerely want
to have chance to work with you.
If you have any question please let me know.

                        Best Regards,


                        Young Chul Park

--------------------------------------------------------------------

I told Dr. Park that I'd be interested in hearing about his 
improvement to 2Q and executing the simulations.
I hope this is OK.

	Yes, this is fun.

3) I saw Stonebraker last week, at a NASA conference.
I asked him about tertiary storage traces.
He said that Sequoia 2K breaks its data into same-size chunks
for more convenient access, and for other reasons his traces would
not help me.

While I was at this conference, I talked to Ethan Miller, who is
at UMBC now.  Ethan was one of Katz's students and did a lot of
work on monitoring the IO behavior of supercomputing installations.
Ethan has a number of traces appropriate for file migration algorithm
studies. I proposed working with him to develop and analyze some
algorithms.  Again, I hope that this is OK. I told Ethan that
we have been discussion the project, and the work would be a joint
project between the three of us.
I thnik that Ethan will help get the file migration project off the ground
because he has the traces we need, he is a good experimentalist,
and he knows a lot about tertiary storage systems.

		Great. This is really hot stuff.

Also, I saw John Turek at this conference, it was for NASA EOSDIS
projects and John is the PI for IBM's project.

	Ted

		Yes, I know. He took over from Harold Stone.
		Best,
		Dennis


From m-ke0082@sparky.cs.nyu.edu Thu Mar 16 10:53:01 1995
Received: from cs.NYU.EDU by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA16281; Thu, 16 Mar 95 10:53:00 EST
Received: from SPARKY.CS.NYU.EDU by cs.NYU.EDU (4.1/1.34)
	id AA20694; Thu, 16 Mar 95 10:53:00 EST
Received: by SPARKY.CS.NYU.EDU (4.1/1.34)
	id AA09609; Thu, 16 Mar 95 10:52:58 EST
Date: Thu, 16 Mar 95 10:52:58 EST
From: m-ke0082@sparky.cs.nyu.edu (Ken Estes)
Message-Id: <9503161552.AA09609@SPARKY.CS.NYU.EDU>
To: shasha@cs.NYU.EDU
Status: R


I have been looking at your 2Q paper and I have an exploratory data question.

When analysing a long page trace for the first time it is convenient to
create a graphical plot of theoretical hit ratio / buffer size.
Traditionaly for LRU algorithms this invlovles creating a histogram of the
distance string (see Tanenbaum "Modern Operating Systems" P 117). 
This distance string is easliy generalizable to the LRU/k case by defining the
distance of a particular page reference to be the size the LRU/k buffer would 
have to be for this reference to be a chache hit. By looking at a historgram
of this distance data one can predict what the benifit of any particular
chache size would be for this application. To create the distance string one
need only simmulate an infinite stack while analyzing the logfile.
Note in particular that during the analysis we explore all possible buffer 
sizes.

In the case of the 2Q algorithm it is not clear how one could implement
distance string. The movement of the data between the three queues is
quite compilicated and seems to preclude an analysis over all posible
buffer sizes. If we fix the tuning parameters as you suggest
(Kin 25% of page slots, Kout holds 50% of the buffer), this leaves
only the buffer size as an independent variable. The distance string
for any page reference is still the same the size the buffer would need
to be for this page reference to be a chache hit. The problem is that 
in different size buffers this page reference could potentialy be
on any of the queues. I do not have a clear idea of a data structure
to keep track of all the possible places this page could be stored.


Ken
m-ke0082


From m-ke0082@SPARKY.CS.NYU.EDU Thu Mar 16 12:19:27 1995
Received: from SPARKY.CS.NYU.EDU by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA17701; Thu, 16 Mar 95 12:19:25 EST
Received: by SPARKY.CS.NYU.EDU (4.1/1.34)
	id AA14849; Thu, 16 Mar 95 12:19:23 EST
Date: Thu, 16 Mar 95 12:19:23 EST
From: m-ke0082@SPARKY.CS.NYU.EDU (Ken Estes)
Message-Id: <9503161719.AA14849@SPARKY.CS.NYU.EDU>
To: shasha@SHASHA.CS.NYU.EDU
Status: R

One more question. Do you have any good articles (or ways of analysing)
hierarchical chaches?

The next question would be how much would a hierarchy of chaches help.
I have no references for any info of chache hierarchies.

Ken
m-ke0082


From ted@cis.ufl.edu Thu Mar 16 12:22:04 1995
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA17750; Thu, 16 Mar 95 12:22:01 EST
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/cis.ufl.edu)
	id MAA27683; Thu, 16 Mar 1995 12:20:35 -0500
Date: Thu, 16 Mar 1995 12:20:35 -0500
Message-Id: <199503161720.MAA27683@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  do we know the answer to this?
Status: R

Dennis,
2Q is not a 'stack' algorithm, so the distance string technique won't work.

OTOH, the LRU/k algorithm, as one would wnat to use it, isn't
a stack algorithm either.  Why?  As you increase the #buffers,
you should increase the correlated reference period.
LRU/k is a stack algorithm only when the size of the correlated
reference period is fixed.

Perhaps if you filter correlated references by time-of-reference,
LRU/k is a stack algorithm.  But I'm not comfortable with
that magic number.

	Ted

From ted@cis.ufl.edu Thu Mar 16 14:37:50 1995
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA19029; Thu, 16 Mar 95 14:37:47 EST
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/cis.ufl.edu)
	id OAA27784; Thu, 16 Mar 1995 14:36:55 -0500
Date: Thu, 16 Mar 1995 14:36:55 -0500
Message-Id: <199503161936.OAA27784@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  do we know the answer to this?
Status: R

Ken's second question is a little out of my league.
Usually, 2-level (k-level) caches are main memory caches,
so there is a small but very fast cache on-chip,
a larger but slower off-chip cache, etc.

There are some ideas about hierarchical cahcing in hierarchical
storage management, so the hierarchies are something like
local disk / remote disk / fast jukebox / slow jukebox / warehouse
There is not much good work done on file caching much less
hierarchical file caching.
	Ted

BTW why is Ken so interested in these topics

Also, what was that idea about parallel databases that you had?
	Ted

From shasha@SHASHA.CS.NYU.EDU Thu Mar 16 14:50:48 1995
Received: from cs.NYU.EDU by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA19092; Thu, 16 Mar 95 14:50:46 EST
Received: from SHASHA.CS.NYU.EDU by cs.NYU.EDU (4.1/1.34)
	id AA21978; Thu, 16 Mar 95 14:50:45 EST
Received: by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA19089; Thu, 16 Mar 95 14:50:44 EST
Date: Thu, 16 Mar 95 14:50:44 EST
From: shasha@SHASHA.CS.NYU.EDU (Dennis Shasha)
Message-Id: <9503161950.AA19089@SHASHA.CS.NYU.EDU>
To: ted@cis.ufl.edu
Subject: Re:  do we know the answer to this?
Cc: shasha@cs.NYU.EDU
Status: R

	Dear Ted,

Ken's second question is a little out of my league.
Usually, 2-level (k-level) caches are main memory caches,
so there is a small but very fast cache on-chip,
a larger but slower off-chip cache, etc.

There are some ideas about hierarchical cahcing in hierarchical
storage management, so the hierarchies are something like
local disk / remote disk / fast jukebox / slow jukebox / warehouse
There is not much good work done on file caching much less
hierarchical file caching.
Ted

BTW why is Ken so interested in these topics

	I think it has to do with Mosaic and caching pages in
	Mosaic systems.

Also, what was that idea about parallel databases that you had?
Ted

	Sorry not to have gotten back to you about this.
	The history is this: Jim Gray and I are thinking
	about writing a book about parallel databases and some things
	have come up concerning storing redundant data for decision
	support queries and the like.
	I'm not sure whether he wants to explore them or not.

	Assuming he doesn't, might you be?
	If so, can I send you a power point 4 doc describing
	a new data structure?

	Thanks,
	Dennis

From ted@cis.ufl.edu Thu Mar 16 14:58:34 1995
Received: from squall.cis.ufl.edu by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA19170; Thu, 16 Mar 95 14:58:30 EST
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/cis.ufl.edu)
	id OAA27824; Thu, 16 Mar 1995 14:57:44 -0500
Date: Thu, 16 Mar 1995 14:57:44 -0500
Message-Id: <199503161957.OAA27824@squall.cis.ufl.edu>
To: shasha@SHASHA.CS.NYU.EDU
Subject: Re:  do we know the answer to this?
Status: R

Yes, I'd be interested in exploring the data structure.
I do not have power point, can you send me
an ascii or a postscript file instead?
	Ted

From ted@cis.ufl.edu Tue Apr 11 12:47:03 1995
Received: from cs.NYU.EDU by SHASHA.CS.NYU.EDU (4.1/1.34)
	id AA05086; Tue, 11 Apr 95 12:47:02 EDT
Received: from squall.cis.ufl.edu by cs.NYU.EDU (4.1/1.34)
	id AA13160; Tue, 11 Apr 95 12:47:00 EDT
From: ted@cis.ufl.edu
Received:  by squall.cis.ufl.edu (8.6.7/cis.ufl.edu)
	id MAA18940; Tue, 11 Apr 1995 12:46:10 -0400
Date: Tue, 11 Apr 1995 12:46:10 -0400
Message-Id: <199504111646.MAA18940@squall.cis.ufl.edu>
To: shasha@cs.NYU.EDU
Status: R

Dennis, should I help this guy?

>From noh@cs.UMD.EDU Tue Apr 11 00:04:06 1995
Received:  from seine.cs.UMD.EDU  by cis.ufl.edu (8.6.12/cis.ufl.edu)
        id BAA02477; Tue, 11 Apr 1995 01:04:03 -0400
Received: by seine.cs.UMD.EDU (8.6.11/UMIACS-0.9/04-05-88)
        id BAA05158; Tue, 11 Apr 1995 01:04:00 -0400
Date: Tue, 11 Apr 1995 01:04:00 -0400
From: noh@cs.UMD.EDU (Sam H. Noh)
Message-Id: <199504110504.BAA05158@seine.cs.UMD.EDU>
To: ted@cis.ufl.edu
Subject: your help
Cc: noh@cs.UMD.EDU
Status: RO

Hi there.
I'm sending this mail in regards to the 2Q paper by
yourself and Prof. Shasha.

I was wondering if I could obtain the traces that you used
in your experiments.
I noticed that the traces were all obtained from  other sources.
Still,  I would be much obliged if you could provide them.
Otherwise, if you can tell me where you got them
I will try to contact them directly.

YOur prompt reply will be greatly appreciated.
Thanks.

-- sam noh

Also, nice to hear that 2Q will go into sybase.
	TEd

