2006-02-24 14:44:31

by Steven Whitehouse

[permalink] [raw]
Subject: GFS2 Filesystem [0/16]

Hi,

The following 16 patches make up the GFS2 filesystem as contained in the
git tree at:

http://www.kernel.org/git/?p=linux/kernel/git/steve/gfs2-2.6.git;a=summary

Please consider GFS2 for inclusion in your -mm series of kernel patches.
The DLM is not included in this patch series since that is already in
-mm. The patches are relative to Linus' latest kernel (well as of
yesterday when I last updated the git tree).

There are some slight changes between the DLM in the git tree and that
in -mm being, that at Ingo Molnar's suggestion, it has been moved to be
in the fs/dlm directory in the git tree. Also the (unused) range locking
feature has been removed in the git tree version. Otherwise the two are
identical.

Below are some release notes which explain a bit more about GFS2 along
with pointers to documentation etc. I believe that we've taken into
account all the points which were raised in the comments from our last
posting to linux-kernel but see below for the detailed list,

Steve.

-------------------------------------------------------------------------------------------------
Release notes / State of the Union for GFS2

1. Relationship with GFS1
2. New features
3. Known issues (to be fixed before submission to Linus)
4. Some items from our TODO list
5. Where to find things....

1. Relationship with GFS1

A review of the metadata in GFS2 now means that most of the metadata
is now compatible between GFS1 and GFS2, making the writing of an
upgrade tool a relatively trivial operation. The differences between
the ondisk metadata between GFS1 and GFS2 are:

a) The superblock has different magic numbers to indicate the new
filesystem format.
b) The indirect pointer blocks have pointers starting at a different
offset to GFS1.
c) The addition of the .gfs2_admin directory means that some new
inodes would need to be added in order to upgrade a filesystem.
The journals are now represented on disk as normal inodes as opposed
to the extent based system of GFS1.
d) The ondisk format for data has been changed _only_ in the case where
that data is journaled. The new format for journaled data is in fact
identical to the format for non-journaled data (i.e. the metadata
header which used to be at the start of every journaled block is now
no longer used for data blocks). Note that this change has resulted
in a number of advantages outlined below (see 2(a)).
e) In some cases, fields used in GFS1 are no longer used. These are
left as padding fields in order to ease the upgrade procedure.


2. New features (since last posting to the kernel list)

a) Journaled data files can now be:
i) mmap()ed
ii) exported via NFS
iii) converted to/from normal files at any time
(N.B. GFS1 had a restriction that conversion could only happen
when files were zero sized)

b) The .gfs2_admin directory exposes the internal files that GFS uses
to store various bits of file system related information. This means
that we've been able to remove virtually all the ioctl() calls from
GFS2. There is one ioctl() call left which relates to
getting/setting GFS2 specific flags on files. The various GFS2 tools
will be updated in due course to use this new interface.

c) Sparse annotation for the ondisk structures. (See also 3(e))

d) vm_walk() and friends removed. All I/O is via the page cache now
(aside from direct I/O of course).

e) Recovery should be slightly faster since we now no longer need to
read disk blocks from the journal which appear in the revoke list
at recovery time.

f) Many minor bug fixes and cleanups

g) The code has also got smaller since the last posting to linux-kernel
by approx 40k

3. Known issues (to be fixed before submission to Linus)

a) Deadlock between page locks and GFS2's glocks.
We intend fixing this in the same way that the OCFS2 file system
does, i.e. adding the AOP_TRUNCATED_PAGE return code into the
glock code at a suitable point.

b) Protection of GFS2 system files under .gfs2_admin. Currently, due
to the way in which GFS2's locking works its possible to hang a
process by accessing a system file that's in use under some
circumstances. This is mainly a problem with the journal files. We
intend to add some special casing to prevent this from happening.

c) selinux support will be integrated

d) Various userland tools to be updated, currently mkfs is the only
working userland program for GFS2.

e) Remove the remainder of the endian conversion functions which are
in ondisk.c (quite a few have gone already) in favour of changing
the fields directly. This will remove a lot of sparse annotation
warnings.

4. Some items from our TODO list (probably post-integration, but things
we would like to do)

a) Support for denying of write access to currently executing binaries.
(Currently only works correctly on single node file systems, see the
thread "Re: FMODE_EXEC or alike?" on linux-kernel/linux-fsdevel)

b) Moving list of resource groups into a tree or similar structure
sorted by disk location. This should then allow removal of the
various sorts done in the deallocation code (since the resource
groups will be pre-sorted) and also remove the requirement for
the associated memory allocations.

5. Where to find things....

GFS2 and DLM kernel code is in a GIT tree at kernel.org:

http://www.kernel.org/git/?p=linux/kernel/git/steve/gfs2-2.6.git;a=summary

The mkfs program is currently in the CVS head, details can be found at:

http://sources.redhat.com/cluster/

Also I'll put a tar ball version of mkfs in my directory on kernel.org.
mkfs is not currently hooked into the build system in CVS. Just a simple
make, make install (after editing the Makefile to point it at your
kernel source) should do the trick. This is all you should need to test
GFS in single node mode.

To use GFS2 in clustered mode, see the more detailed instructions on
the cluster page (url above).





2006-02-24 21:37:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: GFS2 Filesystem [0/16]

oh, and please look at Andrews guidelines for submitting patches,
giving every mail the same subject modulo the patch numbering is
not exactly helpful.

2006-02-24 21:36:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: GFS2 Filesystem [0/16]

> b) The .gfs2_admin directory exposes the internal files that GFS uses
> to store various bits of file system related information. This means
> that we've been able to remove virtually all the ioctl() calls from
> GFS2. There is one ioctl() call left which relates to
> getting/setting GFS2 specific flags on files. The various GFS2 tools
> will be updated in due course to use this new interface.

Without even looking at the code a strong NACK here. This is polluting
the namespace which is not acceptable. Please implement a second
filesystem type gfsmeta to do this kind of admin work. Search for ext2meta
which did something similar. Or use a completely different approach,
I'd need to look at the actual functionality provided to give a better
advice, but currently I'm lacking the time for that.

2006-02-24 23:50:34

by Andrew Morton

[permalink] [raw]
Subject: Re: GFS2 Filesystem [0/16]

Steven Whitehouse <[email protected]> wrote:
>
> The following 16 patches make up the GFS2 filesystem as contained in the
> git tree at:
>
> http://www.kernel.org/git/?p=linux/kernel/git/steve/gfs2-2.6.git;a=summary
>
> Please consider GFS2 for inclusion in your -mm series of kernel patches.

Once the various review comments are sorted out I'd prefer that both DLM
and GFS be maintained by you in your git tree (like OCFS2 prior to and
after merge).

That's the most convenient thing for both you and me. It has the downside
that putting things into git trees tends to hide them from view. And GFS
needs a lot of viewing before it can proceed further. That's an ongoing
problem with the git trees.

So, in a way, maintaining DLM and GFS in git trees as I suggest is likely
to retard an upstream merge. But it would be more convenient.

Helpful, aren't I?

2006-02-27 08:58:04

by Steven Whitehouse

[permalink] [raw]
Subject: Re: GFS2 Filesystem [0/16]

Hi,

On Fri, 2006-02-24 at 21:35 +0000, Christoph Hellwig wrote:
> > b) The .gfs2_admin directory exposes the internal files that GFS uses
> > to store various bits of file system related information. This means
> > that we've been able to remove virtually all the ioctl() calls from
> > GFS2. There is one ioctl() call left which relates to
> > getting/setting GFS2 specific flags on files. The various GFS2 tools
> > will be updated in due course to use this new interface.
>
> Without even looking at the code a strong NACK here. This is polluting
> the namespace which is not acceptable. Please implement a second
> filesystem type gfsmeta to do this kind of admin work. Search for ext2meta
> which did something similar. Or use a completely different approach,
> I'd need to look at the actual functionality provided to give a better
> advice, but currently I'm lacking the time for that.
>
Of all the comments we've received so far, this one raises the most
issues for us. Let me think about this one for a day or two and I'll get
back to you. Ideally we'd like to do it the way you propose, but I need
to check that it doesn't raise any other problems before I commit to
actually doing it,

Steve.


2006-02-28 17:19:54

by Phillip Susi

[permalink] [raw]
Subject: Re: GFS2 Filesystem [0/16]

I'm a bit confused. Why exactly is this unacceptable, and what exactly
do you propose instead? Having an entirely separate mount point that is
sort of parallel to the main one, but with extra metadata exposed? So
instead of /path/to/foo/.gfs2_admin/metafile you'd prefer having a
separate mount point like /proc/fs/gfs/path/to/foo/metafile?


Christoph Hellwig wrote:
>> b) The .gfs2_admin directory exposes the internal files that GFS uses
>> to store various bits of file system related information. This means
>> that we've been able to remove virtually all the ioctl() calls from
>> GFS2. There is one ioctl() call left which relates to
>> getting/setting GFS2 specific flags on files. The various GFS2 tools
>> will be updated in due course to use this new interface.
>
> Without even looking at the code a strong NACK here. This is polluting
> the namespace which is not acceptable. Please implement a second
> filesystem type gfsmeta to do this kind of admin work. Search for ext2meta
> which did something similar. Or use a completely different approach,
> I'd need to look at the actual functionality provided to give a better
> advice, but currently I'm lacking the time for that.
>

2006-03-02 09:58:01

by Steven Whitehouse

[permalink] [raw]
Subject: Re: GFS2 Filesystem [0/16]

Hi,

On Tue, Feb 28, 2006 at 12:18:31PM -0500, Phillip Susi wrote:
> I'm a bit confused. Why exactly is this unacceptable, and what exactly
> do you propose instead? Having an entirely separate mount point that is
> sort of parallel to the main one, but with extra metadata exposed? So
> instead of /path/to/foo/.gfs2_admin/metafile you'd prefer having a
> separate mount point like /proc/fs/gfs/path/to/foo/metafile?
>
I believe that is what Christoph is proposing. It does simplify certain
things, not least preventing someone from moving the .gfs2_admin directory
to somewhere other than the root directory of the filesystem or even
removing it completely which would otherwise need to be added as special
cases.

On the otherhand, its not clear to me at the moment, exactly how to
implement this bearing in mind that both the "normal" filesystem and
the metadata filesystem are really one and the same as far as journaling
and locking are concerned. Perhaps what's needed is one fs with two
different roots. I'm still looking into the best way to do this,

Steve.

>
> Christoph Hellwig wrote:
> >> b) The .gfs2_admin directory exposes the internal files that GFS uses
> >> to store various bits of file system related information. This means
> >> that we've been able to remove virtually all the ioctl() calls from
> >> GFS2. There is one ioctl() call left which relates to
> >> getting/setting GFS2 specific flags on files. The various GFS2 tools
> >> will be updated in due course to use this new interface.
> >
> >Without even looking at the code a strong NACK here. This is polluting
> >the namespace which is not acceptable. Please implement a second
> >filesystem type gfsmeta to do this kind of admin work. Search for ext2meta
> >which did something similar. Or use a completely different approach,
> >I'd need to look at the actual functionality provided to give a better
> >advice, but currently I'm lacking the time for that.
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-03-02 10:36:15

by Al Viro

[permalink] [raw]
Subject: Re: GFS2 Filesystem [0/16]

On Thu, Mar 02, 2006 at 10:12:19AM +0000, Steven Whitehouse wrote:
> Hi,
>
> On Tue, Feb 28, 2006 at 12:18:31PM -0500, Phillip Susi wrote:
> > I'm a bit confused. Why exactly is this unacceptable, and what exactly
> > do you propose instead? Having an entirely separate mount point that is
> > sort of parallel to the main one, but with extra metadata exposed? So
> > instead of /path/to/foo/.gfs2_admin/metafile you'd prefer having a
> > separate mount point like /proc/fs/gfs/path/to/foo/metafile?
> >
> I believe that is what Christoph is proposing. It does simplify certain
> things, not least preventing someone from moving the .gfs2_admin directory
> to somewhere other than the root directory of the filesystem or even
> removing it completely which would otherwise need to be added as special
> cases.
>
> On the otherhand, its not clear to me at the moment, exactly how to
> implement this bearing in mind that both the "normal" filesystem and
> the metadata filesystem are really one and the same as far as journaling
> and locking are concerned. Perhaps what's needed is one fs with two
> different roots. I'm still looking into the best way to do this,

Two superblocks, one keeping a reference to another. Filesystem driver is,
of course, the single piece of code, with common locking. There's no need
to have the common struct super_block for that and no benefit in doing so -
only extra complications. You can easily register two filesystem types
in the same driver and have ->get_sb() for your metadata fs parse its
arguments in any way it likes. E.g. by doing pathname lookup on what would
normally be a device name and seeing if its on a filesystem of the primary
type; if it is - grab a reference to struct super_block of that fs
and work with it.