> On 11/03/2010 06:40 PM, Mandeep Singh Baines wrote:
>
> > I've created a patch which takes a slightly different approach.
> > Instead of limiting how fast pages get reclaimed, the patch limits
> > how fast the active list gets scanned. This should result in the
> > active list being a better measure of the working set. I've seen
> > fairly good results with this patch and a scan inteval of 1
> > centisecond. I see no thrashing when the scan interval is non-zero.
> >
> > I've made it a tunable because I don't know what to set the scan
> > interval. The final patch could set the value based on HZ and some
> > other system parameters. Maybe relate it to sched_period?
>
> I like your approach. For file pages it looks like it
> could work fine, since new pages always start on the
> inactive file list.
>
> However, for anonymous pages I could see your patch
> leading to problems, because all anonymous pages start
> on the active list. With a scan interval of 1
> centiseconds, that means there would be a limit of 3200
> pages, or 12MB of anonymous memory that can be moved to
> the inactive list a second.
>
> I have seen systems with single SATA disks push out
> several times that to swap per second, which matters
> when someone starts up a program that is just too big
> to fit in memory and requires that something is pushed
> out.
>
> That would reduce the size of the inactive list to
> zero, reducing our page replacement to a slow FIFO
> at best, causing false OOM kills at worst.
>
> Staying with a default of 0 would of course not do
> anything, which would make merging the code not too
> useful.
>
> I believe we absolutely need to preserve the ability
> to evict pages quickly, when new pages are brought
> into memory or allocated quickly.
>
> However, speed limits are probably a very good idea
> once a cache has been reduced to a smaller size, or
> when most IO bypasses the reclaim-speed-limited cache.

Yeah.

But I doubt fixed rate limit is good thing. When playing movie case
(aka streaming I/O case), We don't want any throttle. I think.
Also, I don't like jiffies dependency. CPU hardware improvement naturally
will break such heuristics.

btw, now congestion_wait() already has jiffies dependency. but we should
kill such strange timeout eventually. I think.

2010-11-09 02:53:15

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl for protecting the working set

> > I don't think current VM behavior has a problem.
> > Current problem is that you use up many memory than real memory.
> > As system memory without swap is low, VM doesn't have a many choice.
> > It ends up evict your working set to meet for user request. It's very
> > natural result for greedy user.
> >
> > Rather than OOM notifier, what we need is memory notifier.
> > AFAIR, before some years ago, KOSAKI tried similar thing .
> > http://lwn.net/Articles/268732/
>
> Thanks! This is perfect. I wonder why its not merged. Was a different
> solution eventually implemented? Is there another way of doing the
> same thing?

Now memcg has memory threshold notification feature and almost people
are using it. If you think notification fit your case, can you please
try this feature at first?
And if it doesn't fit your case and we will get a feedback from you,
we probably can extend such one.

Thanks.