Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758287AbXHBVJX (ORCPT ); Thu, 2 Aug 2007 17:09:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1761933AbXHBVIc (ORCPT ); Thu, 2 Aug 2007 17:08:32 -0400 Received: from dsl081-085-152.lax1.dsl.speakeasy.net ([64.81.85.152]:53781 "EHLO moonbase" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1761900AbXHBVIa (ORCPT ); Thu, 2 Aug 2007 17:08:30 -0400 From: Daniel Phillips To: Evgeniy Polyakov Subject: Re: Distributed storage. Date: Thu, 2 Aug 2007 14:08:24 -0700 User-Agent: KMail/1.9.5 Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Peter Zijlstra References: <20070731171347.GA14267@2ka.mipt.ru> In-Reply-To: <20070731171347.GA14267@2ka.mipt.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200708021408.24876.phillips@phunq.net> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2188 Lines: 49 On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: > Hi. > > I'm pleased to announce first release of the distributed storage > subsystem, which allows to form a storage on top of remote and local > nodes, which in turn can be exported to another storage as a node to > form tree-like storages. Excellent! This is precisely what the doctor ordered for the OCFS2-based distributed storage system I have been mumbling about for some time. In fact the dd in ddsnap and ddraid stands for "distributed data". The ddsnap/raid devices do not include an actual network transport, that is expected to be provided by a specialized block device, which up till now has been NBD. But NBD has various deficiencies as you note, in addition to its tendency to deadlock when accessed locally. Your new code base may be just the thing we always wanted. We (zumastor et al) will take it for a drive and see if anything breaks. Memory deadlock is a concern of course. From a cursory glance through, it looks like this code is pretty vm-friendly and you have thought quite a lot about it, however I respectfully invite peterz (obsessive/compulsive memory deadlock hunter) to help give it a good going over with me. I see bits that worry me, e.g.: + req = mempool_alloc(st->w->req_pool, GFP_NOIO); which seems to be callable in response to a local request, just the case where NBD deadlocks. Your mempool strategy can work reliably only if you can prove that the pool allocations of the maximum number of requests you can have in flight do not exceed the size of the pool. In other words, if you ever take the pool's fallback path to normal allocation, you risk deadlock. Anyway, if this is as grand as it seems then I would think we ought to factor out a common transfer core that can be used by all of NBD, iSCSI, ATAoE and your own kernel server, in place of the roll-yer-own code those things have now. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/