Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762186AbXJDWjs (ORCPT ); Thu, 4 Oct 2007 18:39:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760581AbXJDWjk (ORCPT ); Thu, 4 Oct 2007 18:39:40 -0400 Received: from mail-gw1.sa.eol.hu ([212.108.200.67]:38983 "EHLO mail-gw1.sa.eol.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760516AbXJDWjj (ORCPT ); Thu, 4 Oct 2007 18:39:39 -0400 To: akpm@linux-foundation.org CC: miklos@szeredi.hu, wfg@mail.ustc.edu.cn, a.p.zijlstra@chello.nl, linux-mm@kvack.org, linux-kernel@vger.kernel.org In-reply-to: <20071004145640.18ced770.akpm@linux-foundation.org> (message from Andrew Morton on Thu, 4 Oct 2007 14:56:40 -0700) Subject: Re: [PATCH] remove throttle_vm_writeout() References: <20071004145640.18ced770.akpm@linux-foundation.org> Message-Id: From: Miklos Szeredi Date: Fri, 05 Oct 2007 00:39:16 +0200 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2555 Lines: 65 > None of the above. > > [PATCH] vm: pageout throttling > > With silly pageout testcases it is possible to place huge amounts of memory > under I/O. With a large request queue (CFQ uses 8192 requests) it is > possible to place _all_ memory under I/O at the same time. > > This means that all memory is pinned and unreclaimable and the VM gets > upset and goes oom. > > The patch limits the amount of memory which is under pageout writeout to be > a little more than the amount of memory at which balance_dirty_pages() > callers will synchronously throttle. > > This means that heavy pageout activity can starve heavy writeback activity > completely, but heavy writeback activity will not cause starvation of > pageout. Because we don't want a simple `dd' to be causing excessive > latencies in page reclaim. > > afaict that problem is still there. It is possible to get all of > ZONE_NORMAL dirty on a highmem machine. With a large queue (or lots of > queues), vmscan can them place all of ZONE_NORMAL under IO. > > It could be that we've fixed this problem via other means in the interrim, > but from a quick peek to seems to me that the scanner will still do a 100% > CPU burn when all of a zone's pages are under writeback. Ah, OK. I did read the changelog, but you added quite a bit of translation ;) > throttle_vm_writeout() should be a per-zone thing, I guess. Perhaps fixing > that would fix your deadlock. That's doubtful, but I don't know anything > about your deadlock so I cannot say. No, doing the throttling per-zone won't in itself fix the deadlock. Here's a deadlock example: Total memory = 32M /proc/sys/vm/dirty_ratio = 10 dirty_threshold = 3M ratelimit_pages = 1M Some program dirties 4M (dirty_threshold + ratelimit_pages) of mmap on a fuse fs. Page balancing is called which turns all these into writeback pages. Then userspace filesystem gets a write request, and tries to allocate memory needed to complete the writeout. That will possibly trigger direct reclaim, and throttle_vm_writeout() will be called. That will block until nr_writeback goes below 3.3M (dirty_threshold + 10%). But since all 4M of writeback is from the fuse fs, that will never happen. Does that explain it better? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/