Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763398AbXHCOxw (ORCPT ); Fri, 3 Aug 2007 10:53:52 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1762281AbXHCOxm (ORCPT ); Fri, 3 Aug 2007 10:53:42 -0400 Received: from pentafluge.infradead.org ([213.146.154.40]:39290 "EHLO pentafluge.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761564AbXHCOxk (ORCPT ); Fri, 3 Aug 2007 10:53:40 -0400 Subject: Re: Distributed storage. From: Peter Zijlstra To: Evgeniy Polyakov Cc: Daniel Phillips , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Arnaldo Carvalho de Melo In-Reply-To: <20070803134943.GA21221@2ka.mipt.ru> References: <20070731171347.GA14267@2ka.mipt.ru> <200708021408.24876.phillips@phunq.net> <20070803102629.GB10089@2ka.mipt.ru> <20070803105747.GE10089@2ka.mipt.ru> <1186144072.11797.55.camel@lappy> <20070803134943.GA21221@2ka.mipt.ru> Content-Type: text/plain Date: Fri, 03 Aug 2007 16:53:37 +0200 Message-Id: <1186152817.12034.128.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2955 Lines: 53 On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote: > On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra (peterz@infradead.org) wrote: > > On Fri, 2007-08-03 at 14:57 +0400, Evgeniy Polyakov wrote: > > > > > For receiving situation is worse, since system does not know in advance > > > to which socket given packet will belong to, so it must allocate from > > > global pool (and thus there must be independent global reserve), and > > > then exchange part of the socket's reserve to the global one (or just > > > copy packet to the new one, allocated from socket's reseve is it was > > > setup, or drop it otherwise). Global independent reserve is what I > > > proposed when stopped to advertise network allocator, but it seems that > > > it was not taken into account, and reserve was always allocated only > > > when system has serious memory pressure in Peter's patches without any > > > meaning for per-socket reservation. > > > > This is not true. I have a global reserve which is set-up a priori. You > > cannot allocate a reserve when under pressure, that does not make sense. > > I probably did not cut enough details - my main position is to allocate > per socket reserve from socket's queue, and copy data there from main > reserve, all of which are allocated either in advance (global one) or > per sockoption, so that there would be no fairness issues what to mark > as special and what to not. > > Say we have a page per socket, each socket can assign a reserve for > itself from own memory, this accounts both tx and rx side. Tx is not > interesting, it is simple, rx has global reserve (always allocated on > startup or sometime way before reclaim/oom)where data is originally > received (including skb, shared info and whatever is needed, page is > just an exmaple), then it is copied into per-socket reserve and reused > for the next packet. Having per-socket reserve allows to have progress > in any situation not only in cases where single action must be > received/processed, and allows to be completely fair for all users, but > not only special sockets, thus admin for example would be allowed to > login, ipsec would work and so on... Ah, I think I understand now. Yes this is indeed a good idea! It would be quite doable to implement this on top of that I already have. We would need to extend the socket with a sock_opt that would reserve a specified amount of data for that specific socket. And then on socket demux check if the socket has a non zero reserve and has not yet exceeded said reserve. If so, process the packet. This would also quite neatly work for -rt where we would not want incomming packet processing to be delayed by memory allocations. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/