Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756596AbZAAJyp (ORCPT ); Thu, 1 Jan 2009 04:54:45 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755688AbZAAJyc (ORCPT ); Thu, 1 Jan 2009 04:54:32 -0500 Received: from srv5.dvmed.net ([207.36.208.214]:59505 "EHLO mail.dvmed.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755618AbZAAJyb (ORCPT ); Thu, 1 Jan 2009 04:54:31 -0500 Message-ID: <495C92C8.5040702@garzik.org> Date: Thu, 01 Jan 2009 04:54:16 -0500 From: Jeff Garzik User-Agent: Thunderbird 2.0.0.18 (X11/20081119) MIME-Version: 1.0 To: Benny Halevy CC: James Bottomley , open-osd development , Boaz Harrosh , linux-scsi , linux-kernel@vger.kernel.org, avishay@gmail.com, viro@ZenIV.linux.org.uk, linux-fsdevel@vger.kernel.org, Andrew Morton Subject: Re: [osd-dev] [PATCH 7/9] exofs: mkexofs References: <4947BFAA.4030208@panasas.com> <4947CA5C.50104@panasas.com> <20081229121423.efde9d06.akpm@linux-foundation.org> <495B8D90.1090004@panasas.com> <1230739053.3408.74.camel@localhost.localdomain> <495C8B65.4010202@panasas.com> In-Reply-To: <495C8B65.4010202@panasas.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.4 (----) X-Spam-Report: SpamAssassin version 3.2.5 on srv5.dvmed.net summary: Content analysis details: (-4.4 points, 5.0 required) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4240 Lines: 86 Benny Halevy wrote: > On Dec. 31, 2008, 17:57 +0200, James Bottomley wrote: >> On Wed, 2008-12-31 at 17:19 +0200, Boaz Harrosh wrote: >>> Andrew Morton wrote: >>>> On Tue, 16 Dec 2008 17:33:48 +0200 >>>> Boaz Harrosh wrote: >>>> >>>>> We need a mechanism to prepare the file system (mkfs). >>>>> I chose to implement that by means of a couple of >>>>> mount-options. Because there is no user-mode API for committing >>>>> OSD commands. And also, all this stuff is highly internal to >>>>> the file system itself. >>>>> >>>>> - Added two mount options mkfs=0/1,format=capacity_in_meg, so mkfs/format >>>>> can be executed by kernel code just before mount. An mkexofs utility >>>>> can now be implemented by means of a script that mounts and unmount the >>>>> file system with proper options. >>>> Doing mkfs in-kernel is unusual. I don't think the above description >>>> sufficiently helps the uninitiated understand why mkfs cannot be done >>>> in userspace as usual. Please flesh it out a bit. >>> There are a few main reasons. >>> - There is no user-mode API for initiating OSD commands. Such a subsystem >>> would be hundredfold bigger then the mkfs code submitted. I think it would be >>> hard and stupid to maintain a complex user-mode API just for creating >>> a couple of objects and writing a couple of on disk structures. >> This is really a reflection of the whole problem with the OSD paradigm. >> >> In theory, a filesystem on OSD is a thin layer of metadata mapping >> objects to files. Get this right and the storage will manage things, >> like security and access and attributes (there's even a natural mapping >> to the VFS concept of extended attributes). Plus, the storage has >> enough information to manage persistence, backups and replication. >> >> The real problem is that no-one has actually managed to come up with a >> useful VFS<->OSD mapping layer (even by extending or altering the VFS). >> Every filesystem that currently uses OSD has a separate direct OSD >> speaking interface (i.e. it slices out the block layer to do this and >> talks directly to the storage). >> >> I suppose this could be taken to show that such a layer is impossibly >> complex, as you assert, but its lack is reflected in strange looking >> design decisions like in-kernel mkfs. It would also mean that there >> would be very little layered code sharing between ODS based filesystems. > > I think that we may need to gain some more experience to extract the > commonalities of such file systems. Currently we came up with the > lowest possible denominator the osd initiator library that deals > with command formatting and execution, including attrs, sense status, > and security. Not putting words in James' mouth, but I definitely agree that the in-kernel mkfs raises a red flag or two. mkfs.ext3 for block-based filesystems has direct and intimate knowledge of ext3 filesystem structure, and it writes that information from userland directly to the block(s) necessary. Similarly, mkfs for an object-based filesystem should be issuing SCSI commands to the OSD device from userland, AFAICS. > To provide a higher level abstraction that would help with "administrative" > tasks like mkfs and the like we already tossed an idea in the past - > a file system that will represent the contents of an OSD in a namespace, > for example: partition_id / object_id / {data, attrs / ..., ctl / ...}. > Such a file system could provide a generic mapping which one could > use to easily develop management applications for the OSD. That said, > it's out of the scope of exofs which focuses mostly on the filesystem > data and metadata paths. That's far too complex for what is necessary. Just issue SCSI commands from userland. We don't need an abstract interface specifically for low-level details. The VFS is that abstract interface; anything else should be low-level and purpose-built. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/