2001-04-19 20:10:29

by Theodore Tso

[permalink] [raw]
Subject: Re: [Ext2-devel] ext2 inode size (on-disk)

On Thu, Apr 19, 2001 at 07:55:20AM -0400, Alexander Viro wrote:
> Erm... Folks, can ->s_inode_size be not a power of 2? Both
> libext2fs and kernel break in that case.

This was a project that was never completed. I thought at one point
of allowing the inode size to be not a power of 2, but if you do that,
you really want to avoid letting an inode cross a block boundary ---
for reliability and performance reasons if nothing else.

It may simply be easiest at this point to require that the inode size
be a power of two, at least as far as going from 128 to 256 bytes,
just for compatibility reasons. (Although if we do that, the folks
who want to use extra space in the inode will come pooring out of the
woodwork, and we're going to have to careful to control who uses what
parts of the extended inode.)

In the long run, it probably makes sense to adjust the algorithms to
allow for non-power-of-two inode sizes, but require an incompatible
filesystem feature flag (so that older kernels and filesystem
utilities won't choke when mounting filesystems with non-standard
sized inodes.

- Ted


2001-04-20 02:24:20

by Alexander Viro

[permalink] [raw]
Subject: Re: [Ext2-devel] ext2 inode size (on-disk)



On Thu, 19 Apr 2001 [email protected] wrote:

> This was a project that was never completed. I thought at one point
> of allowing the inode size to be not a power of 2, but if you do that,
> you really want to avoid letting an inode cross a block boundary ---
> for reliability and performance reasons if nothing else.

Agreed.

> In the long run, it probably makes sense to adjust the algorithms to
> allow for non-power-of-two inode sizes, but require an incompatible
> filesystem feature flag (so that older kernels and filesystem
> utilities won't choke when mounting filesystems with non-standard
> sized inodes.

I don't think that it's needed - old kernels (up to -CURRENT ;-) will
simply refuse to mount if ->s_inode_size != 128. Old utilites may be
trickier, though...

I'm somewhat concerned about the following: last block of inode table
fragment may have less inodes than the rest. Reason: number of inodes
per group should be a multiple of 8 and with inodes bigger than 128
bytes it may give such effect. Comments?

I would really, really like to end up with accurate description of
inode table layout somewhere in Documentation/filesystems. Heck, I
volunteer to write it down and submit into the tree ;-)
Al

2001-04-20 05:38:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Ext2-devel] ext2 inode size (on-disk)

Al writes:
> I don't think that it's needed - old kernels (up to -CURRENT ;-) will
> simply refuse to mount if ->s_inode_size != 128. Old utilites may be
> trickier, though...

Probably would need an incompat flag for changing the inode size anyways,
so old utilities wouldn't set that anyways.

> I'm somewhat concerned about the following: last block of inode table
> fragment may have less inodes than the rest. Reason: number of inodes
> per group should be a multiple of 8 and with inodes bigger than 128
> bytes it may give such effect. Comments?

I don't _think_ that there is a requirement for a multiple-of-8 inodes
per group. OK, looking into mke2fs (actually lib/ext2fs/initialize.c)
it _does_ show that it needs to be a multiple of 8, but I'm not sure
exactly what the "bitmap splicing code" mentioned in the comment is.

In the end, it doesn't really matter much - if we go with multiple-of-2
inode sizes, all it means is that we may need to have multiple-of-2 (or
possibly 4 for 512-byte inodes in a 1k block filesystem) inode table
blocks in each group. Not a big deal. The code already handles this.

> I would really, really like to end up with accurate description of
> inode table layout somewhere in Documentation/filesystems. Heck, I
> volunteer to write it down and submit into the tree ;-)

I can write a few words as well.

Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2001-04-20 06:35:56

by Theodore Tso

[permalink] [raw]
Subject: Re: [Ext2-devel] ext2 inode size (on-disk)

On Thu, Apr 19, 2001 at 10:23:39PM -0400, Alexander Viro wrote:
>
> I'm somewhat concerned about the following: last block of inode table
> fragment may have less inodes than the rest. Reason: number of inodes
> per group should be a multiple of 8 and with inodes bigger than 128
> bytes it may give such effect. Comments?

Yup, that's right. That shouldn't be too bad, though, since we
already calculate things by dividing by INODES_PER_BLOCK_GROUP. So
the fact that the last block of the inode table may have some unused
space shouldn't be a problem.

> I would really, really like to end up with accurate description of
> inode table layout somewhere in Documentation/filesystems. Heck, I
> volunteer to write it down and submit into the tree ;-)

The "design and implementation of ext2" paper has a pretty good
explanation of the inode table, but of course it assumed a convenient
inode size of 128, and didn't really go into the issues of what might
happen if the inode size were larger, or not a power of two.

So yeah, getting something which explains how things work now that
things have gotten a bit more complicated would be a good thing.

- Ted


2001-04-20 06:38:06

by Theodore Tso

[permalink] [raw]
Subject: Re: [Ext2-devel] ext2 inode size (on-disk)

On Thu, Apr 19, 2001 at 11:35:40PM -0600, Andreas Dilger wrote:
> I don't _think_ that there is a requirement for a multiple-of-8 inodes
> per group. OK, looking into mke2fs (actually lib/ext2fs/initialize.c)
> it _does_ show that it needs to be a multiple of 8, but I'm not sure
> exactly what the "bitmap splicing code" mentioned in the comment is.

It's has to be a multiple of 8 because of how e2fsprogs handles
bitmaps --- that is, it takes the various pieces of all of the
bitmaps, and butts them up together in memory. It would be possible
to remove this restriction by reworking the e2fsprogs library code,
but quite frankly, I don't think the restriction is all that
unreasonable.

- Ted