Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752584Ab0KATnL (ORCPT ); Mon, 1 Nov 2010 15:43:11 -0400 Received: from smtp-out.google.com ([74.125.121.35]:50472 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750820Ab0KATnJ (ORCPT ); Mon, 1 Nov 2010 15:43:09 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=google.com; s=beta; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=MyNbXOanIoDgdDJaLfHcrv6+9i+3F9ms6wvU0BS/OiWz7Cd12XtLcsAtaEse+YPhnf nuzSiQfKJXEQlcjomZ4w== MIME-Version: 1.0 In-Reply-To: <4CCF0BE3.2090700@redhat.com> References: <20101028191523.GA14972@google.com> <20101101012322.605C.A69D9226@jp.fujitsu.com> <20101101182416.GB31189@google.com> <4CCF0BE3.2090700@redhat.com> Date: Mon, 1 Nov 2010 12:43:03 -0700 X-Google-Sender-Auth: 7a06toPEI9X4vD-aS7tBxD7b6Mg Message-ID: Subject: Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl for protecting the working set From: Mandeep Singh Baines To: Rik van Riel Cc: KOSAKI Motohiro , Andrew Morton , Mel Gorman , Minchan Kim , Johannes Weiner , linux-kernel@vger.kernel.org, linux-mm@kvack.org, wad@chromium.org, olofj@chromium.org, hughd@chromium.org Content-Type: text/plain; charset=ISO-8859-1 X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3808 Lines: 90 On Mon, Nov 1, 2010 at 11:50 AM, Rik van Riel wrote: > On 11/01/2010 02:24 PM, Mandeep Singh Baines wrote: > >> Under memory pressure, I see the active list get smaller and smaller. Its >> getting smaller because we're scanning it faster and faster, causing more >> and more page faults which slows forward progress resulting in the active >> list getting smaller still. One way to approach this might to make the >> scan rate constant and configurable. It doesn't seem right that we scan >> memory faster and faster under low memory. For us, we'd rather OOM than >> evict pages that are likely to be accessed again so we'd prefer to make >> a conservative estimate as to what belongs in the working set. Other >> folks (long computations) might want to reclaim more aggressively. > > Have you actually read the code? > I have but really just recently. I consider myself an mm newb so take any conclusion I make with a grain of salt. > The active file list is only ever scanned when it is larger > than the inactive file list. > Yes, this prevents you from reclaiming the active list all at once. But if the memory pressure doesn't go away, you'll start to reclaim the active list little by little. First you'll empty the inactive list, and then you'll start scanning the active list and pulling pages from inactive to active. The problem is that there is no minimum time limit to how long a page will sit in the inactive list before it is reclaimed. Just depends on scan rate which does not depend on time. In my experiments, I saw the active list get smaller and smaller over time until eventually it was only a few MB at which point the system came grinding to a halt due to thrashing. I played around with making the active/inactive ratio configurable. I sent a patch out for an inactive_file_ratio. So instead of the default 50%, you'd make the ratio configurable. inactive_file_ratio = (inactive * 100) / (inactive + active) I saw less thrashing at 10% but this patch wasn't nearly as effective as min_filelist_kbytes. I can resend the patch if you think its interesting. >>> Q2: In the above you used min_filelist_kbytes=50000. How do you decide >>> such value? Do other users can calculate proper value? >>> >> >> 50M was small enough that we were comfortable with keeping 50M of file >> pages >> in memory and large enough that it is bigger than the working set. I >> tested >> by loading up a bunch of popular web sites in chrome and then observing >> what >> happend when I ran out of memory. With 50M, I saw almost no thrashing and >> the system stayed responsive even under low memory. but I wanted to be >> conservative since I'm really just guessing. >> >> Other users could calculate their value by doing something similar. > > Maybe we can scale this by memory amount? > > Say, make sure the total amount of page cache in the system > is at least 2* as much as the sum of all the zone->pages_high > watermarks, and refuse to evict page cache if we have less > than that? > > This may need to be tunable for a few special use cases, > like HPC and virtual machine hosting nodes, but it may just > do the right thing for everybody else. > > Another alternative could be to really slow down the > reclaiming of page cache once we hit this level, so virt > hosts and HPC nodes can still decrease the page cache to > something really small ... but only if it is not being > used. > > Andrew, could a hack like the above be "good enough"? > > Anybody - does the above hack inspire you to come up with > an even better idea? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/