Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759251AbYCBWSx (ORCPT ); Sun, 2 Mar 2008 17:18:53 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752602AbYCBWSm (ORCPT ); Sun, 2 Mar 2008 17:18:42 -0500 Received: from ns2.suse.de ([195.135.220.15]:53326 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751708AbYCBWSl (ORCPT ); Sun, 2 Mar 2008 17:18:41 -0500 From: Neil Brown To: Peter Zijlstra Date: Mon, 3 Mar 2008 09:18:47 +1100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18379.10183.427082.282748@notabene.brown> Cc: Andrew Morton , Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, trond.myklebust@fys.uio.no Subject: Re: [PATCH 00/28] Swap over NFS -v16 In-Reply-To: message from Peter Zijlstra on Friday February 29 References: <20080220144610.548202000@chello.nl> <20080223000620.7fee8ff8.akpm@linux-foundation.org> <18371.43950.150842.429997@notabene.brown> <1204023042.6242.271.camel@lappy> <18372.64081.995262.986841@notabene.brown> <1204099113.6242.353.camel@lappy> <18375.24558.876276.255804@notabene.brown> <1204280500.6243.70.camel@lappy> X-Mailer: VM 7.19 under Emacs 21.4.1 X-face: [Gw_3E*Gng}4rRrKRYotwlE?.2|**#s9D X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2765 Lines: 66 On Friday February 29, a.p.zijlstra@chello.nl wrote: > > The tx path is a bit fuzzed. I assume it has an upper limit, take a stab > at that upper limit and leave it at that. > > It should be full of holes, and there is some work on writeout > throttling to fill some of them - but I haven't seen any lockups in this > area for a long long while. I think this is very interesting and useful. You seem to be saying that the write-throttling is enough to avoid any starvation on the transmit path. i.e. the VM is limiting the amount of dirty memory so that when we desperately need memory on the writeout path we can always get it, without lots of careful accounting. So why doesn't this work on for the receive side? What - exactly - is the difference. I think the difference is that on the receive side we have to hand out memory before we know how it will be used (i.e. whether it is for a SK_MEMALLOC socket or not) and so emergency memory could get stuck in some non-emergency usage. So suppose we forgot about all the allocation tracking (that doesn't seem to be needed on the send side so maybe isn't on the receive side) and just focus on not letting emergency memory get used for the wrong thing. So: Have some global flag that says "we are into the emergency pool" which gets set the first time an allocation has to dip below the low water mark, and cleared when an allocation succeeds without needing to dip that low. Then whenever we have memory that might have been allocated from below the watermark (i.e. an incoming packet) and we find out that it isn't required for write-out (i.e. it gets attached to a socket for which SK_MEMALLOC is not set) we test the global flag and if it is set, we drop the packet and free the memory. To clarify: no new accounting no new reservations just "drop non-writeout packets when we are low on memory" Is there any chance that could work reliably? I call this the "Linus" approach because I remember reading somewhere (that google cannot find for me) where Linus said he didn't think a provably correct implementation was the way to go - just something that made it very likely that we won't run out of memory at an awkward time. I guess my position is that any accounting that we do needs to have a clear theoretical model underneath it so we can reason about it. I cannot see the clear model in beneath the current code so I'm trying to come up with models that seem to capture the important elements of the code, and to explore them. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/