Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760832AbXJDSPp (ORCPT ); Thu, 4 Oct 2007 14:15:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757420AbXJDSPh (ORCPT ); Thu, 4 Oct 2007 14:15:37 -0400 Received: from pentafluge.infradead.org ([213.146.154.40]:50459 "EHLO pentafluge.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757395AbXJDSPg (ORCPT ); Thu, 4 Oct 2007 14:15:36 -0400 Subject: Re: [PATCH] remove throttle_vm_writeout() From: Peter Zijlstra To: Andrew Morton Cc: Miklos Szeredi , wfg@mail.ustc.edu.cn, linux-mm@kvack.org, linux-kernel@vger.kernel.org In-Reply-To: <20071004104650.d158121f.akpm@linux-foundation.org> References: <1191501626.22357.14.camel@twins> <1191504186.22357.20.camel@twins> <1191516427.5574.7.camel@lappy> <20071004104650.d158121f.akpm@linux-foundation.org> Content-Type: text/plain Date: Thu, 04 Oct 2007 20:10:10 +0200 Message-Id: <1191521410.5574.36.camel@lappy> Mime-Version: 1.0 X-Mailer: Evolution 2.12.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4056 Lines: 102 On Thu, 2007-10-04 at 10:46 -0700, Andrew Morton wrote: > On Thu, 04 Oct 2007 18:47:07 +0200 Peter Zijlstra wrote: > > static int may_write_to_queue(struct backing_dev_info *bdi) > > { > > if (current->flags & PF_SWAPWRITE) > > return 1; > > if (!bdi_write_congested(bdi)) > > return 1; > > if (bdi == current->backing_dev_info) > > return 1; > > return 0; > > } > > > > Which will write to congested queues. Anybody know why? OK, I guess I could have found that :-/ > commit c4e2d7ddde9693a4c05da7afd485db02c27a7a09 > Author: akpm > Date: Sun Dec 22 01:07:33 2002 +0000 > > [PATCH] Give kswapd writeback higher priority than pdflush > > The `low latency page reclaim' design works by preventing page > allocators from blocking on request queues (and by preventing them from > blocking against writeback of individual pages, but that is immaterial > here). > > This has a problem under some situations. pdflush (or a write(2) > caller) could be saturating the queue with highmem pages. This > prevents anyone from writing back ZONE_NORMAL pages. We end up doing > enormous amounts of scenning. > > A test case is to mmap(MAP_SHARED) almost all of a 4G machine's memory, > then kill the mmapping applications. The machine instantly goes from > 0% of memory dirty to 95% or more. With dirty page tracking this is not supposed to happen anymore. > pdflush kicks in and starts writing > the least-recently-dirtied pages, which are all highmem. with highmem >> normal, and user pages preferring highmem, this will likely still be true. > The queue is > congested so nobody will write back ZONE_NORMAL pages. kswapd chews > 50% of the CPU scanning past dirty ZONE_NORMAL pages and page reclaim > efficiency (pages_reclaimed/pages_scanned) falls to 2%. So, the problem is a heavy writer vs swap. Which is still possible. > So this patch changes the policy for kswapd. kswapd may use all of a > request queue, and is prepared to block on request queues. So request queue's have a limit above the congestion level on which they will block? NFS doesn't have that AFAIK > What will now happen in the above scenario is: > > 1: The page alloctor scans some pages, fails to reclaim enough > memory and takes a nap in blk_congetion_wait(). > > 2: kswapd() will scan the ZONE_NORMAL LRU and will start writing > back pages. (These pages will be rotated to the tail of the > inactive list at IO-completion interrupt time). > > This writeback will saturate the queue with ZONE_NORMAL pages. > Conveniently, pdflush will avoid the congested queues. So we end up > writing the correct pages. > > In this test, kswapd CPU utilisation falls from 50% to 2%, page reclaim > efficiency rises from 2% to 40% and things are generally a lot happier. > > > The downside is that kswapd may now do a lot less page reclaim, > increasing page allocation latency, causing more direct reclaim, > increasing lock contention in the VM, etc. But I have not been able to > demonstrate that in testing. > > > The other problem is that there is only one kswapd, and there are lots > of disks. That is a generic problem - without being able to co-opt > user processes we don't have enough threads to keep lots of disks saturated. > > One fix for this would be to add an additional "really congested" > threshold in the request queues, so kswapd can still perform > nonblocking writeout. This gives kswapd priority over pdflush while > allowing kswapd to feed many disk queues. I doubt if this will be > called for. I could do that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/