2001-07-02 18:42:42

by Rik van Riel

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Thu, 28 Jun 2001, Marco Colombo wrote:

> I'm not sure that, in general, recent pages with only one access are
> still better eviction candidates compared to 8 hours old pages. Here
> we need either another way to detect one-shot activity (like the one
> performed by updatedb),

Fully agreed, but there is one problem with this idea.
Suppose you have a maximum of 20% of your RAM for these
"one-shot" things, now how are you going to be able to
page in an application with a working set of, say, 25%
the size of RAM ?

If you don't have any special measures, the pages from
this "new" application will always be treated as one-shot
pages and the process will never be able to be cached in
memory completely...

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [email protected] (spam digging piggy)


2001-07-03 10:34:20

by Marco Colombo

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Mon, 2 Jul 2001, Rik van Riel wrote:

> On Thu, 28 Jun 2001, Marco Colombo wrote:
>
> > I'm not sure that, in general, recent pages with only one access are
> > still better eviction candidates compared to 8 hours old pages. Here
> > we need either another way to detect one-shot activity (like the one
> > performed by updatedb),
>
> Fully agreed, but there is one problem with this idea.
> Suppose you have a maximum of 20% of your RAM for these
> "one-shot" things, now how are you going to be able to
> page in an application with a working set of, say, 25%
> the size of RAM ?
>
> If you don't have any special measures, the pages from
> this "new" application will always be treated as one-shot
> pages and the process will never be able to be cached in
> memory completely...

I see your point. Running Gnome on a 64MB box means you have most
of the pages that are "warm" (using my definition), so there's little
room for "cold" (new) pages, and maybe they don't get a chance of
being accessed a second time before they are evicted, which leads to
thrashing if you're trying to start something really big (well, I guess
the access pattern within a typical ws is not uniformly distributed, so
some pages will get accessed twice, but I see the problem).

I'll try and make my point a bit clearer.
I was referring to background aging only. When aging
is caused by pressure, you don't make any difference between pages.
I don't know how the idea to give high values for page->age on the second
access instead of the first is going to be implemented, but I'm assuming
that new pages are going to be placed on the active list with a low age
value (PAGE_AGE_START_FIRST ?), maybe even 0 (well, I'm not a guru of
course). I'm just saying that, to avoid Mike's "problem" (which BTW
I don't believe is a big one, really), we could stop background aging
on interactive pages (short form for "pages that belong to the ws of an
interactive process") at a certain miminum age, say
PAGE_AGE_BG_INTERACTIVE_MINIMUM, with PAGE_AGE_BG_INTERACTIVE_MINIMUM
> PAGE_AGE_START_FIRST). Weighting the difference between the two
ages, you can give long-standing interactive pages some advantage vs
new pages. But they will be aged below PAGE_AGE_START_FIRST and eventually
moved to the inactive list. After all, they *are* good candidates.
Does this make some sense?

Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
when background aging, maybe it's not enough to keep processes like
updatedb from causing interactive pages to be evicted.
That's why I said we should have another way to detect that kind of
activity... well, the application could just let us know (no need to
embed an autotuning-genetic-page-replacement-optimizer into the kernel).
We should just drop all FS metadata accessed by updatedb, since we
know that's one-shot only, without raising pressure at all. Just like
(not that I'm proposing it) putting those "one-shot" pages directly on
the inactive-clean list instead of the active list. How an application
could declare such a behaviour is an open question, of course. Maybe it's
even possible to detect it. And BTW that's really fine tuning.
Evicting an 8 hours old page may be a mistake sometime, but it's never
a *big* mistake.

>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
> http://www.surriel.com/ http://distro.conectiva.com/
>
> Send all your spam to [email protected] (spam digging piggy)

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ [email protected]



2001-07-03 15:01:50

by Daniel Phillips

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> when background aging, maybe it's not enough to keep processes like
> updatedb from causing interactive pages to be evicted.
> That's why I said we should have another way to detect that kind of
> activity... well, the application could just let us know (no need to
> embed an autotuning-genetic-page-replacement-optimizer into the kernel).
> We should just drop all FS metadata accessed by updatedb, since we
> know that's one-shot only, without raising pressure at all.

Note that some of updatedb's metadata pages are of the accessed-often kind,
e.g., directory blocks and inodes. A blanket low priority on all the pages
updatedb touches just won't do.

> Just like
> (not that I'm proposing it) putting those "one-shot" pages directly on
> the inactive-clean list instead of the active list. How an application
> could declare such a behaviour is an open question, of course. Maybe it's
> even possible to detect it. And BTW that's really fine tuning.
> Evicting an 8 hours old page may be a mistake sometime, but it's never
> a *big* mistake.

IMHO, updatedb *should* evict all the "interactive" pages that aren't
actually doing anything[1]. That way it should run faster, provided of
course its accessed-once pages are properly given low priority.

I see three page priority levels:

0 - accessed-never/aged to zero
1 - accessed-once/just loaded
2 - accessed-often

with these transitions:

0 -> 1, if a page is accessed
1 -> 2, if a page is accessed a second time
1 -> 0, if a page gets old
2 -> 0, if a page gets old

The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned
clock-wise, relying on the age computation[2]. Eviction candidates are taken
from the cold end of the 0 level list, unless it is empty, in which case they
are taken from the 1 level list. In desperation, eviction candidates are
taken from the 2 level list, i.e., random eviction policy, as opposed to what
we do now which is to initiate an emergency scan of the active list for new
inactive candidates - rather like calling a quick board meeting when the
building is on fire.

Note that the above is only a very slight departure from the current design.
And by the way, this is just brainstorming, it hasn't reached the 'proposal'
stage yet.

[1] It would be nice to have a mechanism whereby the evicted 'interactive'
pages are automatically reloaded when updatedb has finished its work. This
is a case of scavenging unused disk bandwidth for something useful, i.e.,
improving the interactive experience.

[2] I much prefer the hot/cold terminology over old/young. The latter gets
confusing because a 'high' age is 'young'. I'd rather think of a high value
as being 'hot'.

--
Daniel

2001-07-03 18:20:52

by Daniel Phillips

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

An ammendment to my previous post...

> I see three page priority levels:
>
> 0 - accessed-never/aged to zero
> 1 - accessed-once/just loaded
> 2 - accessed-often
>
> with these transitions:
>
> 0 -> 1, if a page is accessed
> 1 -> 2, if a page is accessed a second time
> 1 -> 0, if a page gets old
> 2 -> 0, if a page gets old

Better:

1 -> 0, if a page was not referenced before arriving at the old end
1 -> 2, if it was

Meaning that multiple accesses to pages on the level 1 list are treated as a
single access. In addition, this reflects what we can do practically with
the hardware referenced bit.

--
Daniel

2001-07-03 18:26:12

by Daniel Phillips

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Monday 02 July 2001 20:42, Rik van Riel wrote:
> On Thu, 28 Jun 2001, Marco Colombo wrote:
> > I'm not sure that, in general, recent pages with only one access are
> > still better eviction candidates compared to 8 hours old pages. Here
> > we need either another way to detect one-shot activity (like the one
> > performed by updatedb),
>
> Fully agreed, but there is one problem with this idea.
> Suppose you have a maximum of 20% of your RAM for these
> "one-shot" things, now how are you going to be able to
> page in an application with a working set of, say, 25%
> the size of RAM ?

Easy. What's the definition of working set? Those pages that are frequently
referenced. So as the application starts up some of its pages will get
promoted from used-once to used-often. (On the other hand, the target
behavior here conflicts with the goal of grouping together several
temporally-related accesses to the same page together as one access, so
there's a subtle distinction to be made here, see below.)

The point here is that there are such things as run-once program pages, just
as there are use-once file pages. Both should get low priority and be
evicted early, regardless of the fact they were just loaded.

> If you don't have any special measures, the pages from
> this "new" application will always be treated as one-shot
> pages and the process will never be able to be cached in
> memory completely...

The self-balancing way of doing this is to promote pages from the old end of
the used-once list to the used-often (active) list at a rate corresponding to
the fault-in rate so we get more aggressive promotion of referenced-often
pages during program loading, and conversely, aggressive demotion of
referenced-once pages.

--
Daniel

--
Daniel

2001-07-04 08:13:10

by Ari Heitner

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0


On Tue, 3 Jul 2001, Daniel Phillips wrote:

> And by the way, this is just brainstorming, it hasn't reached the 'proposal'
> stage yet.

So while we're here, an idea someone proposed in #debian while discussion this
thread ([email protected], you know who you are): QoS for application paging
on desktops. Basically you designate to the kernel which applications you want
to give priviledges, and it avoids swapping them out, even if they've been idle
for a long time. You designate your desktop apps, and then when updatedb comes
along they don't get kicked (but something more intensive like a kernel compile
would claim the pages). Maybe it would be as simple as a category of apps whose
pages won't get kicked before a singly-touched page (like and updatedb or
streaming media run).

For the record, I'm impressed with the new VM design, and I think its unbiased
behaviour (once the bugs are ironed out) will be exactly what I'm looking for
in life (traditional Unix "the fair way") :) Currently using a 4-way RS/6000
running AIX 4.2 which has been up for a long time and is running a lot of
programs (even though the active set is quite reasonable), and decides to swap
at evil times :)

Looking forward to the tweaks/settings options that will appear on this VM over
the next little while...



Cheers,

Ari Heitner



2001-07-04 08:33:02

by Marco Colombo

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Tue, 3 Jul 2001, Daniel Phillips wrote:

> On Monday 02 July 2001 20:42, Rik van Riel wrote:
> > On Thu, 28 Jun 2001, Marco Colombo wrote:
> > > I'm not sure that, in general, recent pages with only one access are
> > > still better eviction candidates compared to 8 hours old pages. Here
> > > we need either another way to detect one-shot activity (like the one
> > > performed by updatedb),
> >
> > Fully agreed, but there is one problem with this idea.
> > Suppose you have a maximum of 20% of your RAM for these
> > "one-shot" things, now how are you going to be able to
> > page in an application with a working set of, say, 25%
> > the size of RAM ?
>
> Easy. What's the definition of working set? Those pages that are frequently
> referenced. So as the application starts up some of its pages will get
> promoted from used-once to used-often. (On the other hand, the target
> behavior here conflicts with the goal of grouping together several
> temporally-related accesses to the same page together as one access, so
> there's a subtle distinction to be made here, see below.)
[...]

In Rik example, the ws is larger than available memory. Part of it
(the "hottest" one) will get double-accesses, but other pages will keep
condending the few available (physical) pages with no chance of being
accessed twice. But see my previous posting...

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ [email protected]

2001-07-04 09:41:58

by Marco Colombo

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Tue, 3 Jul 2001, Daniel Phillips wrote:

> On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> > when background aging, maybe it's not enough to keep processes like
> > updatedb from causing interactive pages to be evicted.
> > That's why I said we should have another way to detect that kind of
> > activity... well, the application could just let us know (no need to
> > embed an autotuning-genetic-page-replacement-optimizer into the kernel).
> > We should just drop all FS metadata accessed by updatedb, since we
> > know that's one-shot only, without raising pressure at all.
>
> Note that some of updatedb's metadata pages are of the accessed-often kind,
> e.g., directory blocks and inodes. A blanket low priority on all the pages
> updatedb touches just won't do.

Remember that the first message was about a laptop. At 4:00AM there's
no activity but the updatedb one (and the other cron jobs). Simply,
there's no 'accessed-often' data. Moreover, I'd bet that 90% of the
metadata touched by updatedb won't be accessed at all in the future.
Laptop users don't do find /usr/share/terminfo/ so often.

> > Just like
> > (not that I'm proposing it) putting those "one-shot" pages directly on
> > the inactive-clean list instead of the active list. How an application
> > could declare such a behaviour is an open question, of course. Maybe it's
> > even possible to detect it. And BTW that's really fine tuning.
> > Evicting an 8 hours old page may be a mistake sometime, but it's never
> > a *big* mistake.
>
> IMHO, updatedb *should* evict all the "interactive" pages that aren't
> actually doing anything[1]. That way it should run faster, provided of
> course its accessed-once pages are properly given low priority.

So in the morning you find your Gnome session completely on swap,
and at the same time a lot of free mem.

> I see three page priority levels:
>
> 0 - accessed-never/aged to zero
> 1 - accessed-once/just loaded
> 2 - accessed-often
>
> with these transitions:
>
> 0 -> 1, if a page is accessed
> 1 -> 2, if a page is accessed a second time
> 1 -> 0, if a page gets old
> 2 -> 0, if a page gets old
>
> The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned
> clock-wise, relying on the age computation[2]. Eviction candidates are taken
> from the cold end of the 0 level list, unless it is empty, in which case they
> are taken from the 1 level list. In desperation, eviction candidates are
> taken from the 2 level list, i.e., random eviction policy, as opposed to what
> we do now which is to initiate an emergency scan of the active list for new
> inactive candidates - rather like calling a quick board meeting when the
> building is on fire.

Well, it's just aging faster when it's needed. Random evicting is not
good. List 2 is ordered by age, and there're always better candidates
at the end of the list than at the front. The higher the pressure,
the shorter is the time a page has to rest idle to get at the end of the
list. But the list *is* ordered.

> Note that the above is only a very slight departure from the current design.
> And by the way, this is just brainstorming, it hasn't reached the 'proposal'
> stage yet.
>
> [1] It would be nice to have a mechanism whereby the evicted 'interactive'
> pages are automatically reloaded when updatedb has finished its work. This
> is a case of scavenging unused disk bandwidth for something useful, i.e.,
> improving the interactive experience.

updatedb doesn't really need all the memory it takes. All it needs is
a small buffer to sequentially scan all the disk. So we should just
drop all the pages it references, since we already know they won't be
referenced again by noone else.

> [2] I much prefer the hot/cold terminology over old/young. The latter gets
> confusing because a 'high' age is 'young'. I'd rather think of a high value
> as being 'hot'.

True. s/page->age/page->temp/g B-)

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ [email protected]

2001-07-04 14:42:03

by Daniel Phillips

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Wednesday 04 July 2001 10:32, Marco Colombo wrote:
> On Tue, 3 Jul 2001, Daniel Phillips wrote:
> > On Monday 02 July 2001 20:42, Rik van Riel wrote:
> > > On Thu, 28 Jun 2001, Marco Colombo wrote:
> > > > I'm not sure that, in general, recent pages with only one access are
> > > > still better eviction candidates compared to 8 hours old pages. Here
> > > > we need either another way to detect one-shot activity (like the one
> > > > performed by updatedb),
> > >
> > > Fully agreed, but there is one problem with this idea.
> > > Suppose you have a maximum of 20% of your RAM for these
> > > "one-shot" things, now how are you going to be able to
> > > page in an application with a working set of, say, 25%
> > > the size of RAM ?
> >
> > Easy. What's the definition of working set? Those pages that are
> > frequently referenced. So as the application starts up some of its pages
> > will get promoted from used-once to used-often. (On the other hand, the
> > target behavior here conflicts with the goal of grouping together several
> > temporally-related accesses to the same page together as one access, so
> > there's a subtle distinction to be made here, see below.)
>
> [...]
>
> In Rik example, the ws is larger than available memory. Part of it
> (the "hottest" one) will get double-accesses, but other pages will keep
> condending the few available (physical) pages with no chance of being
> accessed twice. But see my previous posting...

But that's exactly what we want. Note that the idea of reserving a fixed
amount of memory for "one-shot" pages wasn't mine. I see no reason to set a
limit. There's only one critereon: does a page get referenced between the
time it's created and when its probation period expires?

Once a page makes it into the active (level 2) set it's on an equal footing
with lots of others and it's up to our intrepid one-hand clock to warm it up
or cool it down as appropriate. On the other hand, if the page gets sent to
death row it still has a few chances to prove its worth before being cleaned
up and sent to the aba^H^H^H^H^H^H^H^H reclaimed. (Apologies for the
multiplying metaphors ;-)

--
Daniel

2001-07-04 15:00:13

by Daniel Phillips

[permalink] [raw]
Subject: Re: VM Requirement Document - v0.0

On Wednesday 04 July 2001 11:41, Marco Colombo wrote:
> On Tue, 3 Jul 2001, Daniel Phillips wrote:
> > On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> > > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> > > when background aging, maybe it's not enough to keep processes like
> > > updatedb from causing interactive pages to be evicted.
> > > That's why I said we should have another way to detect that kind of
> > > activity... well, the application could just let us know (no need to
> > > embed an autotuning-genetic-page-replacement-optimizer into the
> > > kernel). We should just drop all FS metadata accessed by updatedb,
> > > since we know that's one-shot only, without raising pressure at all.
> >
> > Note that some of updatedb's metadata pages are of the accessed-often
> > kind, e.g., directory blocks and inodes. A blanket low priority on all
> > the pages updatedb touches just won't do.
>
> Remember that the first message was about a laptop. At 4:00AM there's
> no activity but the updatedb one (and the other cron jobs). Simply,
> there's no 'accessed-often' data. Moreover, I'd bet that 90% of the
> metadata touched by updatedb won't be accessed at all in the future.
> Laptop users don't do find /usr/share/terminfo/ so often.

The problem is when you have a directory block, say, that has to stay around
quite a few seconds before dropping into disuse. You sure don't want that
block treated as 'accessed-once'.

The goal here is to get through the updatedb as quickly as possible. Getting
the user's "interactive" programs loaded back in afterwards is a separate,
much more difficult problem IMHO, but no doubt still has a reasonable
solution. I'm not that worried about it, my feeling is: if we fix up the MM
so it doesn't bog down with a lot of pages in cache and, in addition, do
better readahead, interactive performance will be just fine.

> > > Just like
> > > (not that I'm proposing it) putting those "one-shot" pages directly on
> > > the inactive-clean list instead of the active list. How an application
> > > could declare such a behaviour is an open question, of course. Maybe
> > > it's even possible to detect it. And BTW that's really fine tuning.
> > > Evicting an 8 hours old page may be a mistake sometime, but it's never
> > > a *big* mistake.
> >
> > IMHO, updatedb *should* evict all the "interactive" pages that aren't
> > actually doing anything[1]. That way it should run faster, provided of
> > course its accessed-once pages are properly given low priority.
>
> So in the morning you find your Gnome session completely on swap,
> and at the same time a lot of free mem.
>
> > I see three page priority levels:
> >
> > 0 - accessed-never/aged to zero
> > 1 - accessed-once/just loaded
> > 2 - accessed-often
> >
> > with these transitions:
> >
> > 0 -> 1, if a page is accessed
> > 1 -> 2, if a page is accessed a second time
> > 1 -> 0, if a page gets old
> > 2 -> 0, if a page gets old
> >
> > The 0 and 1 level pages are on a fifo queue, the 2 level pages are
> > scanned clock-wise, relying on the age computation[2]. Eviction
> > candidates are taken from the cold end of the 0 level list, unless it is
> > empty, in which case they are taken from the 1 level list. In
> > desperation, eviction candidates are taken from the 2 level list, i.e.,
> > random eviction policy, as opposed to what we do now which is to initiate
> > an emergency scan of the active list for new inactive candidates - rather
> > like calling a quick board meeting when the building is on fire.
>
> Well, it's just aging faster when it's needed. Random evicting is not
> good.

It's better than getting bogged down in scanning latency just at the point
you should be starting new writeouts. Obviously, it's a tradeoff.

> List 2 is ordered by age, and there're always better candidates
> at the end of the list than at the front. The higher the pressure,
> the shorter is the time a page has to rest idle to get at the end of the
> list. But the list *is* ordered.

No, list 2 is randomly ordered. Pages move from the initial trial list to
the active list with 0 temperature, and drop in just behind the one-hand scan
pointer (which we actually implement as the head of the list). After that
they get "aged" up or down as we do now. (New improved terminology: heated
or cooled according to the referenced bit.)

> > Note that the above is only a very slight departure from the current
> > design. And by the way, this is just brainstorming, it hasn't reached the
> > 'proposal' stage yet.
> >
> > [1] It would be nice to have a mechanism whereby the evicted
> > 'interactive' pages are automatically reloaded when updatedb has finished
> > its work. This is a case of scavenging unused disk bandwidth for
> > something useful, i.e., improving the interactive experience.
>
> updatedb doesn't really need all the memory it takes. All it needs is
> a small buffer to sequentially scan all the disk. So we should just
> drop all the pages it references, since we already know they won't be
> referenced again by noone else.

I hope it's clear how the method I'm describing does that.

> > [2] I much prefer the hot/cold terminology over old/young. The latter
> > gets confusing because a 'high' age is 'young'. I'd rather think of a
> > high value as being 'hot'.
>
> True. s/page->age/page->temp/g B-)

Yep.

--
Daniel