Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756045AbXHUWJW (ORCPT ); Tue, 21 Aug 2007 18:09:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752807AbXHUWJP (ORCPT ); Tue, 21 Aug 2007 18:09:15 -0400 Received: from pentafluge.infradead.org ([213.146.154.40]:47366 "EHLO pentafluge.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751673AbXHUWJO (ORCPT ); Tue, 21 Aug 2007 18:09:14 -0400 Subject: Re: [RFC 0/7] Postphone reclaim laundry to write at high water marks From: Peter Zijlstra To: Christoph Lameter Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org In-Reply-To: References: <20070820215040.937296148@sgi.com> <1187692586.6114.211.camel@twins> <1187730812.5463.12.camel@lappy> Content-Type: text/plain Date: Wed, 22 Aug 2007 00:09:03 +0200 Message-Id: <1187734144.5463.35.camel@lappy> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3893 Lines: 95 On Tue, 2007-08-21 at 14:29 -0700, Christoph Lameter wrote: > On Tue, 21 Aug 2007, Peter Zijlstra wrote: > > > It quickly ends up with all of memory in the laundry list and then > > recursing into __alloc_pages which will fail to make progress and OOMs. > > Hmmmm... Okay that needs to be addressed. Reserves need to be used and we > only should enter reclaim if that runs out (like the first patch that I > did). > > > But aside from the numerous issues with the patch set as presented, I'm > > not seeing the seeing the big picture, why are you doing this. > > I want general improvements to reclaim to address the issues that you see > and other issues related to reclaim instead of the strange code that makes > PF_MEMALLOC allocs compete for allocations from a single slab and putting > logic into the kernel to decide which allocs to fail. We can reclaim after > all. Its just a matter of finding the right way to do this. The latest patch I posted got rid of that global slab. Also, all I want is for slab to honour gfp flags like page allocation does, nothing more, nothing less. (well, actually slightly less, since I'm only really interrested in the ALLOC_MIN|ALLOC_HIGH|ALLOC_HARDER -> ALLOC_NO_WATERMARKS transition and not all higher ones) I want slab to fail when a similar page alloc would fail, no magic. Strictly speaking: if: page = alloc_page(gfp); fails but: obj = kmem_cache_alloc(s, gfp); succeeds then its a bug. But I'm not actually needing it that strict, just the ALLOC_NO_WATERMARK part needs to be done, ALLOC_HARDER, ALLOC_HIGH those may fudge a bit. > > Anonymous pages are a there to stay, and we cannot tell people how to > > use them. So we need some free or freeable pages in order to avoid the > > vm deadlock that arises from all memory dirty. > > No one is trying to abolish Anonymous pages. Free memory is readily > available on demand if one calls reclaim. Your scheme introduces complex > negotiations over a few scraps of memory when large amounts of memory > would still be readily available if one would do the right thing and call > into reclaim. This is the thing I contend, there need not be large amounts of memory around. In my test prog the hot code path fits into a single page, the rest can be anonymous. > > 'Optimizing' this by switching to freeable pages has mainly > > disadvantages IMHO, finding them scrambles LRU order and complexifies > > relcaim and all that for a relatively small gain in space for clean > > pagecache pages. > > Sounds like you would like to change the way we handle memory in general > in the VM? Reclaim (and thus finding freeable pages) is basic to Linux > memory management. Not quite, currently we have free pages in the reserves, if you want to replace some (or all) of that by freeable pages then that is a change. I'm just using the reserves. > > Please, stop writing patches and write down a solid proposal of how you > > envision the VM working in the various scenarios and why its better than > > the current approach. > > Sorry I just got into this a short time ago and I may need a few cycles > to get this all straight. An approach that uses memory instead of > ignoring available memory is certainly better. Sure if and when possible. There will always be need to fall back to the reserves. A bit off-topic, re that reclaim from atomic context: Currently we try to hold spinlocks only for short periods of time so that reclaim can be preempted, if you run all of reclaim from a non-preemptible context you get very large preemption latencies and if done from int context it'd also generate large int latencies. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/