2008-02-19 00:07:01

by Theodore Ts'o

[permalink] [raw]
Subject: How were some of the lustre e2fsprogs test cases generated?


I've started testing my in-development extents against the test cases
found in clusterfs's e2fsprogs patches, and I noticed that with
f_extents (the first one I tried), some of the inodes had non-zero
ee_start_hi fields. (That is to say, they had block numbers in the
extents fields that were much larger than 1 << 32.)

The clusterfs e2fsprogs code doesn't notice this, because it apparently
ignores ee_start_hi field entirely. But when I try running it with my
version that has (incomplete) 64-bit support, I get the following.

e2fsck 1.40.6 (09-Feb-2008)
Pass 1: Checking inodes, blocks, and sizes
Inode 12 is in extent format, but superblock is missing EXTENTS feature
Fix? yes

Inode 12 has an invalid extent
(logical block 0, invalid physical block 21994527527949, len 17)
Clear? yes
...

In contrast, e2fsprogs-interim and the clusterfs patches interpret the
physical block as 5131, because they don't pretend to have any 48-bit
block number support at all. This means the results of the test run are
quite different. From the clusterfs f_extents/expect.1 file:

Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 12: 5133 5124 5125 5129 5132 5133 5142 5143 5
144 5145

Anyway, no big deal, I'll just regenerate test cases as necessary, or
just use them as they with different expect logs. But this just brings
up one question --- are we 100% sure that for all deployed versions of
the clusterfs extents code, that the kernel-side implementation was
always careful to clear the ee_start_hi and ee_leaf_hi fields?

- Ted


2008-02-19 00:36:46

by Theodore Ts'o

[permalink] [raw]
Subject: Re: How were some of the lustre e2fsprogs test cases generated?

On Mon, Feb 18, 2008 at 07:06:58PM -0500, Theodore Ts'o wrote:
>
> The clusterfs e2fsprogs code doesn't notice this, because it apparently
> ignores ee_start_hi field entirely.

One minor correction --- the clusterfs e2fsprogs extents code checks
to see if the ee_leaf_hi field is non-zero, and complains if so.
However, it ignores the ee_start_hi field for interior (non-leaf)
nodes in the extent tree, and a number of tests do have non-zero
ee_start_hi fields which cause my version of e2fsprogs to (rightly)
complain.

If you fix this, a whole bunch of tests will fail as a result, and not
exercise the code paths that the tests were apparently trying to
exercise. Which is what is causing me a bit of worry and wonder about
how those test cases were originally generated....

Regards,

- Ted

2008-02-19 11:28:43

by Andreas Dilger

[permalink] [raw]
Subject: Re: How were some of the lustre e2fsprogs test cases generated?

On Feb 18, 2008 19:36 -0500, Theodore Ts'o wrote:
> One minor correction --- the clusterfs e2fsprogs extents code checks
> to see if the ee_leaf_hi field is non-zero, and complains if so.
> However, it ignores the ee_start_hi field for interior (non-leaf)
> nodes in the extent tree, and a number of tests do have non-zero
> ee_start_hi fields which cause my version of e2fsprogs to (rightly)
> complain.
>
> If you fix this, a whole bunch of tests will fail as a result, and not
> exercise the code paths that the tests were apparently trying to
> exercise. Which is what is causing me a bit of worry and wonder about
> how those test cases were originally generated....

The original CFS extents kernel patch had a bug where the _hi fields
were not initialized correctly to zero. The CFS exents e2fsck
patches would clear the _hi fields in the extents and index blocks,
but I disabled that in the upstream patch submission because it will
be incorrect for 48-bit filesystems.

That's the "high_bits_ok" check in e2fsck_ext_block_verify() for error
PR_1_EXTENT_HI, that only allows the high bits when there are > 2^32
blocks in the filesystem. It's possible I made a mistake when I added
that part of the patch, but the regression tests still passed.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-02-19 11:40:35

by Andreas Dilger

[permalink] [raw]
Subject: Re: How were some of the lustre e2fsprogs test cases generated?

On Feb 18, 2008 19:06 -0500, Theodore Ts'o wrote:
> Anyway, no big deal, I'll just regenerate test cases as necessary, or
> just use them as they with different expect logs. But this just brings
> up one question --- are we 100% sure that for all deployed versions of
> the clusterfs extents code, that the kernel-side implementation was
> always careful to clear the ee_start_hi and ee_leaf_hi fields?

No, it hasn't always been true that we cleared the _hi fields in the
kernel code. But, it has been a year or more since we found this bug,
and all CFS e2fsprogs releases since then have cleared the _hi fields,
and there has not been any other e2fsprogs that supports extents, so
we expect that there are no filesystems left in the field with this
issue, and even then the current code will prefer to clear the _hi
bits instead of considering the whole extent corrupt.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-02-19 12:29:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: How were some of the lustre e2fsprogs test cases generated?

On Tue, Feb 19, 2008 at 04:40:32AM -0700, Andreas Dilger wrote:
>
> No, it hasn't always been true that we cleared the _hi fields in the
> kernel code. But, it has been a year or more since we found this bug,
> and all CFS e2fsprogs releases since then have cleared the _hi fields,
> and there has not been any other e2fsprogs that supports extents, so
> we expect that there are no filesystems left in the field with this
> issue, and even then the current code will prefer to clear the _hi
> bits instead of considering the whole extent corrupt.
>

I checked again, and it looks like the interim code is indeed clearing
the _hi bits. I managed to confuse myself into thinking it didn't for
index nodes, but I checked again and it seems to be doing the right
thing.

The reason why I asked is that the extents code in the 'next' branch
of e2fsprogs *does* consider the whole extent to be corrupt, since in
the long run once we start 64-bit block number extent blocks, if the
physical block number (including the high 16 bits) is greater than
s_blocks_count, simply masking off the high 16 bits of the 48 bit
extent block is probably not the right way of dealing with the
problem.

I think that's probably a safe thing to do since all of your customers
who might have had a filesystem with non-zero _hi fields have almost
certainly run e2fsck to clear the _hi bits at least once; do you
concur that is a safe assumption? Or would you prefer that I add some
code that tries to clear just the _hi bits, perhaps controlled by a
configuration flag in e2fsck.conf?

Regards,

- Ted

2008-02-19 21:13:56

by Andreas Dilger

[permalink] [raw]
Subject: Re: How were some of the lustre e2fsprogs test cases generated?

On Feb 19, 2008 07:29 -0500, Theodore Ts'o wrote:
> On Tue, Feb 19, 2008 at 04:40:32AM -0700, Andreas Dilger wrote:
> > No, it hasn't always been true that we cleared the _hi fields in the
> > kernel code. But, it has been a year or more since we found this bug,
> > and all CFS e2fsprogs releases since then have cleared the _hi fields,
> > and there has not been any other e2fsprogs that supports extents, so
> > we expect that there are no filesystems left in the field with this
> > issue, and even then the current code will prefer to clear the _hi
> > bits instead of considering the whole extent corrupt.
>
> I checked again, and it looks like the interim code is indeed clearing
> the _hi bits. I managed to confuse myself into thinking it didn't for
> index nodes, but I checked again and it seems to be doing the right
> thing.
>
> The reason why I asked is that the extents code in the 'next' branch
> of e2fsprogs *does* consider the whole extent to be corrupt, since in
> the long run once we start 64-bit block number extent blocks, if the
> physical block number (including the high 16 bits) is greater than
> s_blocks_count, simply masking off the high 16 bits of the 48 bit
> extent block is probably not the right way of dealing with the
> problem.
>
> I think that's probably a safe thing to do since all of your customers
> who might have had a filesystem with non-zero _hi fields have almost
> certainly run e2fsck to clear the _hi bits at least once; do you
> concur that is a safe assumption? Or would you prefer that I add some
> code that tries to clear just the _hi bits, perhaps controlled by a
> configuration flag in e2fsck.conf?

I'm OK with either. We might consider patching e2fsck to return to the
more permissive CFS behaviour with _hi bits for our own releases, or
just leave it. Checking back in our patches, we fixed the kernel code
in July '06 and the e2fsck code in Jan '07, so I hope people have run
an e2fsck on their filesystems in the last 1.5 years.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.