Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756332AbYLaP5x (ORCPT ); Wed, 31 Dec 2008 10:57:53 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755782AbYLaP5l (ORCPT ); Wed, 31 Dec 2008 10:57:41 -0500 Received: from accolon.hansenpartnership.com ([76.243.235.52]:46346 "EHLO accolon.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755495AbYLaP5j (ORCPT ); Wed, 31 Dec 2008 10:57:39 -0500 Subject: Re: [PATCH 7/9] exofs: mkexofs From: James Bottomley To: Boaz Harrosh Cc: Andrew Morton , avishay@gmail.com, jeff@garzik.org, viro@ZenIV.linux.org.uk, linux-fsdevel@vger.kernel.org, osd-dev@open-osd.org, linux-kernel@vger.kernel.org, linux-scsi In-Reply-To: <495B8D90.1090004@panasas.com> References: <4947BFAA.4030208@panasas.com> <4947CA5C.50104@panasas.com> <20081229121423.efde9d06.akpm@linux-foundation.org> <495B8D90.1090004@panasas.com> Content-Type: text/plain Date: Wed, 31 Dec 2008 15:57:32 +0000 Message-Id: <1230739053.3408.74.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 (2.22.3.1-1.fc9) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4297 Lines: 84 On Wed, 2008-12-31 at 17:19 +0200, Boaz Harrosh wrote: > Andrew Morton wrote: > > On Tue, 16 Dec 2008 17:33:48 +0200 > > Boaz Harrosh wrote: > > > >> We need a mechanism to prepare the file system (mkfs). > >> I chose to implement that by means of a couple of > >> mount-options. Because there is no user-mode API for committing > >> OSD commands. And also, all this stuff is highly internal to > >> the file system itself. > >> > >> - Added two mount options mkfs=0/1,format=capacity_in_meg, so mkfs/format > >> can be executed by kernel code just before mount. An mkexofs utility > >> can now be implemented by means of a script that mounts and unmount the > >> file system with proper options. > > > > Doing mkfs in-kernel is unusual. I don't think the above description > > sufficiently helps the uninitiated understand why mkfs cannot be done > > in userspace as usual. Please flesh it out a bit. > > There are a few main reasons. > - There is no user-mode API for initiating OSD commands. Such a subsystem > would be hundredfold bigger then the mkfs code submitted. I think it would be > hard and stupid to maintain a complex user-mode API just for creating > a couple of objects and writing a couple of on disk structures. This is really a reflection of the whole problem with the OSD paradigm. In theory, a filesystem on OSD is a thin layer of metadata mapping objects to files. Get this right and the storage will manage things, like security and access and attributes (there's even a natural mapping to the VFS concept of extended attributes). Plus, the storage has enough information to manage persistence, backups and replication. The real problem is that no-one has actually managed to come up with a useful VFS<->OSD mapping layer (even by extending or altering the VFS). Every filesystem that currently uses OSD has a separate direct OSD speaking interface (i.e. it slices out the block layer to do this and talks directly to the storage). I suppose this could be taken to show that such a layer is impossibly complex, as you assert, but its lack is reflected in strange looking design decisions like in-kernel mkfs. It would also mean that there would be very little layered code sharing between ODS based filesystems. > - I intend to refactor the code further to make use of more super.c services, > so to make this addition even smaller. Also future direction of raid over > multiple objects will make even more kernel infrastructure needed which > will need even more user-mode code duplication. > - I anticipate problems that are not yet addressed in this body of work > but will be in the future, mainly that a single OSD-target (lun) can > be shared by lots of FSs, and a single FS can span many OSD-targets. > Some central management is much easier to do in Kernel. > > > > > What are the dependencies for this filesystem code? I assume that it > > depends on various block- and scsi-level patches? Which ones, and > > what is their status, and is this code even compileable without them? > > > > This OSD-based file system is dependent on the open-osd initiator library > code that I've submitted for inclusion for 2.6.29. It has been sitting > in linux-next for a while now, and has not been receiving any comments > for the last two updated patchsets I've sent to scsi-misc/lkml. However > it has not yet been submitted into Jame's scsi-misc git tree, and James > is the ultimate maintainer that should submit this work. I hope it will > still be submitted into 2.6.29, as this code is totally self sufficient > and does not endangers or changes any other Kernel subsystems. > (All the needed ground work was already submitted to Linus since 2.6.26) > So why should it not? I don't like it mainly because it's not truly a useful general framework for others to build on. However, as argued above, there might not actually be such a useful framework, so as long as the only two consumers (you and Lustre) want an interface like this, I'll put it in. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/