Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757387AbXHVHpe (ORCPT ); Wed, 22 Aug 2007 03:45:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753877AbXHVHp0 (ORCPT ); Wed, 22 Aug 2007 03:45:26 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:54306 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753993AbXHVHpZ (ORCPT ); Wed, 22 Aug 2007 03:45:25 -0400 Date: Wed, 22 Aug 2007 09:45:08 +0200 From: Ingo Molnar To: Christoph Lameter Cc: Peter Zijlstra , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC 0/7] Postphone reclaim laundry to write at high water marks Message-ID: <20070822074508.GA3160@elte.hu> References: <20070820215040.937296148@sgi.com> <1187692586.6114.211.camel@twins> <1187730812.5463.12.camel@lappy> <1187734144.5463.35.camel@lappy> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.14 (2007-02-12) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.1.7-deb -1.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2966 Lines: 54 * Christoph Lameter wrote: > > I want slab to fail when a similar page alloc would fail, no magic. > > Yes I know. I do not want allocations to fail but that reclaim occurs > in order to avoid failing any allocation. We need provisions that make > sure that we never get into such a bad memory situation that would > cause severe slowless and usually end up in a livelock anyways. Could you outline the "big picture" as you see it? To me your argument that reclaim can always be done instantly and that the cases where it cannot be done are pathological and need to be avoided is fundamentally dangerous and quite a bit short-sighted at first glance. The big picture way to think about this is the following: the free page pool is the "cache" of the MM. It's what "greases" the mechanism and bridges the inevitable reclaim latency and makes "atomic memory" available to the reclaim mechanism itself. We _cannot_ remove that cache without a conceptual replacement (or a _very_ robust argument and proof that the free pages pool is not needed at all - this would be a major design change (and a stupid mistake IMO)). Your patchset, in essence, tries to claim that we dont really need this cache and that all that matters is to keep enough clean pagecache pages around. That approach misses the full picture and i dont think we can progress without agreeing on the fundamentals first. That "cache" cannot be handled in your scheme: a fully or mostly anonymous workload (tons of apps are like that) instantly destroys the "there is always a minimal amount of atomically reclaimable pages around" property of freelists, and this cannot be talked or tweaked around by twiddling any existing property of anonymous reclaim. Anonymous memory is dirty and takes ages to reclaim. The fact that your patchset causes an easy anonymous OOM further underlines this flaw of your thinking. Not making anonymous workloads OOM is the _hardest_ part of the MM, by far. Pagecache reclaim is a breeze in comparison :-) So there is a large and fundamental rift between having pages on the freelist (instantly available to any context) and having them on the (current) LRU where they might or might not be clean, etc. The freelists are an implicit guarantee of buffering and atomicity and they can and do save the day if everything else fails to keep stuff insta-freeable. (And then we havent even considered the performance and scalability differences between picking from the pcp freelists versus picking pages from the LRU, havent considered the better higher-order page allocation property of the buddy pool and havent considered the atomicity of in-irq-handler allocations.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/