LinuxLists.cc - RSS Limit implementation issue

2006-02-09 21:10:47

Subject: RSS Limit implementation issue

I am working to implement enforcing RSS limits of a process. I am
planning to make a check for rss limit when setting up pte. If the
limit is crossed I see couple of different ways of handling .

1. Kill the process . In this case there is no swapping problem.

2. Dont kill the process but dont allocate the memory & do yield as we
do for init process. Modify the scheduler not to chose the process
which has already allocated rss upto its limit. When rss usage
fallsbelow its limit then the scheduler may chose it again to run.
Here there is a scenario when no page of the process has been freed or
swapped out because there were enough free pages? Then we need a way
to reschedule the process by forcefully freeing some pages or need to
kill the process.

I am looking forward for your comments & pros/cons of both approach &
any other alternatives you might come up with.

Thanks
Ram Gupta

2006-02-09 23:12:45

by be-news06

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

Ram Gupta <[email protected]> wrote:
> planning to make a check for rss limit when setting up pte. If the
> limit is crossed I see couple of different ways of handling .
>
> 1. Kill the process . In this case there is no swapping problem.

This signal would happen on random page access, right?

> 2. Dont kill the process but dont allocate the memory & do yield as we
> do for init process. Modify the scheduler not to chose the process
> which has already allocated rss upto its limit.

Yes, that behaviour looks good. That would keep the system responsive.
Basically the same state as waiting for a page to be paged in.

(However a user based rss limit would be more interesting than process
based)

Gruss
Bernd

2006-02-10 13:09:32

by Alan

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

On Iau, 2006-02-09 at 15:10 -0600, Ram Gupta wrote:
> I am working to implement enforcing RSS limits of a process. I am
> planning to make a check for rss limit when setting up pte. If the
> limit is crossed I see couple of different ways of handling .
>
> 1. Kill the process . In this case there is no swapping problem.

Not good as the process isn't responsible for the RSS size so it would
be rather random.

> 2. Dont kill the process but dont allocate the memory & do yield as we
> do for init process. Modify the scheduler not to chose the process
> which has already allocated rss upto its limit. When rss usage
> fallsbelow its limit then the scheduler may chose it again to run.
> Here there is a scenario when no page of the process has been freed or
> swapped out because there were enough free pages? Then we need a way
> to reschedule the process by forcefully freeing some pages or need to
> kill the process.

That is what I would expect. Or perhaps even allowing the process to
exceed the RSS but using the RSS limit as a swapper target so that the
process is victimised early. No point forcing swapping and the RSS limit
when there is free memory, only where the resource is contended ..

2006-02-10 14:50:41

by Ram Gupta

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

On 2/9/06, Alan Cox <[email protected]> wrote:

>
> That is what I would expect. Or perhaps even allowing the process to
> exceed the RSS but using the RSS limit as a swapper target so that the
> process is victimised early. No point forcing swapping and the RSS limit
> when there is free memory, only where the resource is contended ..
>
>

So we will need some kind of free memory threshold . If free memory is
more than it than we can let RSS exceed & scheduler can also schedule
it in this situation but not if free memory is less than the
threshhold. Also we need to figure out a way for swapper to target
pages based on RSS limit. One possible disadvantage I can think is
that as the swapper swaps out a page based on RSS limit , the
process's rss will become within the rss limit & then scheduler will
schedule this process again & hence possibly same page might have to
be brought in. This may cause increase in swapping. What do you think
how much realistic is this scenario?

2006-02-10 16:31:16

by Kyle Moffett

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

On Feb 10, 2006, at 09:50, Ram Gupta wrote:
> On 2/9/06, Alan Cox <[email protected]> wrote:
>> That is what I would expect. Or perhaps even allowing the process
>> to exceed the RSS but using the RSS limit as a swapper target so
>> that the process is victimised early. No point forcing swapping
>> and the RSS limit when there is free memory, only where the
>> resource is contended ..
>
> So we will need some kind of free memory threshold . If free memory
> is more than it than we can let RSS exceed & scheduler can also
> schedule it in this situation but not if free memory is less than
> the threshhold. Also we need to figure out a way for swapper to
> target pages based on RSS limit. One possible disadvantage I can
> think is that as the swapper swaps out a page based on RSS limit ,
> the process's rss will become within the rss limit & then scheduler
> will schedule this process again & hence possibly same page might
> have to be brought in. This may cause increase in swapping. What do
> you think how much realistic is this scenario?

Just use a basic hysteresis device:

When allocating resources:
if (resource > limit + delta)
disable_process();

When freeing resources:
if (resource < limit - delta)
enable_process();

If the delta is set to something reasonable (say 1 or 2 pages), then
the process will only be rescheduled when it gets enough free RSS
(one page to satisfy its latest request and a few spare). Even
better, you could use a running average of "time between RSS-
triggered-pauses" to figure out how much memory you should keep free.
Pseudocode below:

[Tuneables]
unsigned int time_quantum_factor;
unsigned int limit;
unsigned int max_delta;

[Per-process state]
unsigned int pages;
unsigned int delta;
unsigned long long last_limit_time;

[When allocating resources]
if (pages > limit + delta) {
int time_factor = log2(now - last_limit_time)
- time_quantum_factor;
last_limit_time = now;

if (time_factor < 0 && delta < max_delta)
delta <<= 1;
else if (time_factor > 0 && delta > 1)
delta >>= 1;

put_process_to_sleep();
}

[When freeing resources]
if (resource < limit - delta)
enable_process();

The effect of this code would be that the RSS code would avoid
rescheduling a process more often than every 1<<time_quantum_factor
microseconds. It would attempt to provide a safe hysteresis delta
such that the process would have enough pages free that it could
probably run for at least the minimum amount of time. Note that the
code would _only_ have an effect if the process is already about to
sleep on an RSS limit, otherwise that code path would never get hit.
Obviously it's possible to adjust this algorithm to react more slowly
or quickly by adjusting the shift values, but it should work well
enough for a beta as-is.

Cheers,
Kyle Moffett

--
I didn't say it would work as a defense, just that they can spin that
out for years in court if it came to it.
-- Rob Landley

2006-02-10 21:39:00

by Bill Davidsen

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

Ram Gupta wrote:
> I am working to implement enforcing RSS limits of a process. I am
> planning to make a check for rss limit when setting up pte. If the
> limit is crossed I see couple of different ways of handling .
>
> 1. Kill the process . In this case there is no swapping problem.

Since the process has little or no control over that it seems
impractical. And works the wrong way, when there is a ton of free memory
the process would get a large rss and be killed, while on a loaded
system it would run.
>
> 2. Dont kill the process but dont allocate the memory & do yield as we
> do for init process. Modify the scheduler not to chose the process
> which has already allocated rss upto its limit. When rss usage
> fallsbelow its limit then the scheduler may chose it again to run.
> Here there is a scenario when no page of the process has been freed or
> swapped out because there were enough free pages? Then we need a way
> to reschedule the process by forcefully freeing some pages or need to
> kill the process.
>
> I am looking forward for your comments & pros/cons of both approach &
> any other alternatives you might come up with.

First, someone did some work on this a few years ago, you might be able
to find info looking a rmap posts for the mid 2.4 days.

Second, I think this limitation needs to be enforced only when free
memory is below some trigger point, when candidates for reclaim would be
drawn from processes over their rss target.

Finally, it would be good to be aggressive about cleaning dirty pages of
a process of the target, so pages clould be reclaimed quickly. There are
a lot of factors in that, useless disk activity being one possible side
effect.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2006-02-10 22:20:39

by Rik van Riel

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

On Fri, 10 Feb 2006, Ram Gupta wrote:

> Also we need to figure out a way for swapper to target pages based on
> RSS limit.

Indeed. You do not want an RSS limited process to get stuck
on an idle system, because nothing wants to free its memory.

> One possible disadvantage I can think is that as the swapper
> swaps out a page based on RSS limit, the process's rss will become
> within the rss limit & then scheduler will schedule this process again &
> hence possibly same page might have to be brought in. This may cause
> increase in swapping. What do you think how much realistic is this
> scenario?

Thanks to the swap cache, this should not be an issue.

You don't need to actually write the page to disk when removing
the page from the process RSS - you simply add it to the swap
cache, unmap it and move it to the far end of the inactive list,
where kswapd will run into it quickly if the system needs memory
again in the future.

--
All Rights Reversed

2006-02-13 14:52:26

by Ram Gupta

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

On 2/11/06, Chris Siebenmann <[email protected]> wrote:

> I suggest a third method: steal another page from the process itself.
> This automatically keeps the process within its own RSS, slows down its
> activities, *and* lets it keep running.
>
> Under the name 'paging against itself', I believe this has probably
> already been done by various people in the early 1990s.

This method may cause increased amount of swapping affecting the
overall system performance.

Thanks
Ram Gupta

2006-02-13 20:38:01

by Ram Gupta

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

On 2/13/06, Chris Siebenmann <[email protected]> wrote:

>
> I believe that this is the inevitable result of anything that doesn't
> kill the process on the spot. When you put the process to sleep, you
> effectively reduce its RSS by reducing its page-touching activity; if
> the system is under memory pressure, other things will then steal pages
> from it anyways.
>
True but with your approach swapping occurs right away. Objective
here is to reduce the chances of swapping as much as possible.

Regards
Ram Gupta

2006-03-23 16:56:45

by Ram Gupta

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

On 2/9/06, Alan Cox <[email protected]> wrote:
> On Iau, 2006-02-09 at 15:10 -0600, Ram Gupta wrote:
> > I am working to implement enforcing RSS limits of a process. I am
> > planning to make a check for rss limit when setting up pte. If the
> > limit is crossed I see couple of different ways of handling .
> >
> > 1. Kill the process . In this case there is no swapping problem.
>
> Not good as the process isn't responsible for the RSS size so it would
> be rather random.
>

I doubt I am missing some point here. I dont understand why the
process isn't responsible for RSS size. This limit is process specific
& the count of rss increases when the process maps some page in its
page table.

Thanks
Ram Gupta

2006-03-24 23:07:50

by Bodo Eggert

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

Ram Gupta <[email protected]> wrote:
> On 2/9/06, Alan Cox <[email protected]> wrote:
>> On Iau, 2006-02-09 at 15:10 -0600, Ram Gupta wrote:

>> > I am working to implement enforcing RSS limits of a process. I am
>> > planning to make a check for rss limit when setting up pte. If the
>> > limit is crossed I see couple of different ways of handling .
>> >
>> > 1. Kill the process . In this case there is no swapping problem.
>>
>> Not good as the process isn't responsible for the RSS size so it would
>> be rather random.
>>
>
> I doubt I am missing some point here. I dont understand why the
> process isn't responsible for RSS size. This limit is process specific
> & the count of rss increases when the process maps some page in its
> page table.

It can't be responsible because the kernel is controlling the RSS size e.g.
by prefetching or swapping out. The process has very little means of
influencing these mechanisms, nor is there a way to actively shrink the RSS.

(It's perfectly legal to e.g. mmap a large file (1 GB) and memcopy that area
into another mmaped file, and the kernel is expected to keep sane even on a
16 MB machine.)

You may introduce a mechanism that allows a process to keep it's RSS < n,
but you'll have to deal with legacy processes at least for the next 20 years.

--
Ich danke GMX daf?r, die Verwendung meiner Adressen mittels per SPF
verbreiteten L?gen zu sabotieren.

2006-03-30 20:27:38

by Bill Davidsen

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

Ram Gupta wrote:
> On 2/9/06, Alan Cox <[email protected]> wrote:
>> On Iau, 2006-02-09 at 15:10 -0600, Ram Gupta wrote:
>>> I am working to implement enforcing RSS limits of a process. I am
>>> planning to make a check for rss limit when setting up pte. If the
>>> limit is crossed I see couple of different ways of handling .
>>>
>>> 1. Kill the process . In this case there is no swapping problem.
>> Not good as the process isn't responsible for the RSS size so it would
>> be rather random.
>>
>
> I doubt I am missing some point here. I dont understand why the
> process isn't responsible for RSS size. This limit is process specific
> & the count of rss increases when the process maps some page in its
> page table.

A process has no control over its RSS size, only its virtual size. I'm
not sure you're clear on that, or just not saying it clearly. Therefore
the same process, say a largish perl run, may be 175mB in vsize, and
during the day have rss of perhaps half that. At night, with next to no
load on the machine, the rss is 175mB because there is a bunch of free
memory available.

If you want to make rss a hard limit the result should be swapping, not
failure to run. I'm not sure the limit in that form is a good idea, and
before someone reminds me, I do remember liking it better a few years ago.

If you can come up with a better way to adjust rss to get better overall
greater throughput while being fair to all processes, go to it. But in
general these things are a tradeoff, like swappiness, you tune until the
volume of complaints reaches a minimum.

You could do tuning to get minimum page faults overall, or assure a
minumum size, or... I think there's room for improvement, particularly
for servers, but a hard limit doesn't seem to be it.

Didn't Rik do something in this area back in 2.4 and decide there
weren't many fish in that pond?

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2006-03-30 21:28:44

by Roger Heflin

[permalink] [raw]

Subject: RE: RSS Limit implementation issue

>
> A process has no control over its RSS size, only its virtual
> size. I'm not sure you're clear on that, or just not saying
> it clearly. Therefore the same process, say a largish perl
> run, may be 175mB in vsize, and during the day have rss of
> perhaps half that. At night, with next to no load on the
> machine, the rss is 175mB because there is a bunch of free
> memory available.
>
> If you want to make rss a hard limit the result should be
> swapping, not failure to run. I'm not sure the limit in that
> form is a good idea, and before someone reminds me, I do
> remember liking it better a few years ago.

working_set_size limits sucked on VMS. The OS would limit a process to
its working set size and casue the entire machine to swap
even though there was adequate free memory. I believe they
had a normalworkingset size variable, and a maxworkingsetsize
one indicated how much ram you could get on a memory limited
system, the other indicated the most it would ever let you get even if
there was plenty of free ram. The maxworkingsetsize caused
a lot of issues, as the default appeared to be defined for
much smaller systems that we were using at the time, and so
were much too low, and cause unnecessary swapping. Part of the
issue would be that the admin would need to know what he was
doing to use the feature, and most don't.

The argument from the admins at the time was that this limited
the damage to other processes by preventing certain processes
from getting too much memory, they ignored the fact that
anything swapping (even only the one process) unnecessarly
*KILLED* performance for the entire machine, since swapping
is rather expensive on the os.

Roger

2006-03-31 03:00:27

by Peter Chubb

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

>>>>> "Bill" == Bill Davidsen <[email protected]> writes:

Bill> Ram Gupta wrote:

Bill> If you want to make rss a hard limit the result should be
Bill> swapping, not failure to run. I'm not sure the limit in that
Bill> form is a good idea, and before someone reminds me, I do
Bill> remember liking it better a few years ago.

Bill> If you can come up with a better way to adjust rss to get better
Bill> overall greater throughput while being fair to all processes, go
Bill> to it. But in general these things are a tradeoff, like
Bill> swappiness, you tune until the volume of complaints reaches a
Bill> minimum.

What I did in one experiment was to:
1. delay swapin requests if the process was over its rsslimit,
until it fell below, and
2. Poke the swapper to try to swap out the current process's
pages in that case.

The problem with the approach is that it behaved poorly under memory
pressure. If a process's optimum working set was larger than its RSS
limit, then either it was delayed to the point of glaciality, or it
could saturate the swap device (and so disturb other processes's
operation).

--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au ERTOS within National ICT Australia

2006-03-31 03:27:06

by Bill Davidsen

[permalink] [raw]

Subject: Re: RSS Limit implementation issue

Peter Chubb wrote:

>>>>>>"Bill" == Bill Davidsen <[email protected]> writes:
>>>>>>
>>>>>>
>
>Bill> Ram Gupta wrote:
>
>Bill> If you want to make rss a hard limit the result should be
>Bill> swapping, not failure to run. I'm not sure the limit in that
>Bill> form is a good idea, and before someone reminds me, I do
>Bill> remember liking it better a few years ago.
>
>Bill> If you can come up with a better way to adjust rss to get better
>Bill> overall greater throughput while being fair to all processes, go
>Bill> to it. But in general these things are a tradeoff, like
>Bill> swappiness, you tune until the volume of complaints reaches a
>Bill> minimum.
>
>What I did in one experiment was to:
> 1. delay swapin requests if the process was over its rsslimit,
> until it fell below, and
> 2. Poke the swapper to try to swap out the current process's
> pages in that case.
>
>The problem with the approach is that it behaved poorly under memory
>pressure. If a process's optimum working set was larger than its RSS
>limit, then either it was delayed to the point of glaciality, or it
>could saturate the swap device (and so disturb other processes's
>operation).
>
>
>
I'm paying close attention, but that's kind of the problem people have,
memory pressure gets high and the processes don't run. I thought of
"swap out two to swap in one" as a way to dribble the rss down, but I
doubt it's a magic solution. Swap kills, so does not swap and no memory.
Obvious solution is to make the rss limit hard and keep it small, I
don't think that's the answer.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979