2008-11-21 10:39:30

by Solofo.Ramangalahy

[permalink] [raw]
Subject: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

The inode table has been zeroed in setup_new_group_blocks().
Mark it as such in ext4_group_add().

As a side note, online resize and inode zeroing are "dual".

In order to obtain a filesystem with faster formating times one can
do:
. either format a smaller fs and then resize it,
. or format the fs with lazy_itable_init

---
fs/ext4/resize.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.28-rc4-itable_init/fs/ext4/resize.c
===================================================================
--- linux-2.6.28-rc4-itable_init.orig/fs/ext4/resize.c
+++ linux-2.6.28-rc4-itable_init/fs/ext4/resize.c
@@ -865,7 +865,7 @@ int ext4_group_add(struct super_block *s
gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count);
gdp->bg_free_inodes_count = cpu_to_le16(EXT4_INODES_PER_GROUP(sb));
gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);


2008-11-24 23:26:04

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

On Nov 21, 2008 11:23 +0100, [email protected] wrote:
> The inode table has been zeroed in setup_new_group_blocks().
> Mark it as such in ext4_group_add().
>
> As a side note, online resize and inode zeroing are "dual".
>
> In order to obtain a filesystem with faster formating times one can
> do:
> . either format a smaller fs and then resize it,
> . or format the fs with lazy_itable_init

As discussed on the concall, it probably makes more sense to have the
resize code work by marking the inode tables UNINIT (if GDT_CSUM feature
is enabled) and then start the "itable zeroing" thread, if it isn't
already running, to do the zeroing of the itable.

If GDT_CSUM isn't set then the below fix is the right solution.

> Index: linux-2.6.28-rc4-itable_init/fs/ext4/resize.c
> ===================================================================
> --- linux-2.6.28-rc4-itable_init.orig/fs/ext4/resize.c
> +++ linux-2.6.28-rc4-itable_init/fs/ext4/resize.c
> @@ -865,7 +865,7 @@ int ext4_group_add(struct super_block *s
> gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count);
> gdp->bg_free_inodes_count = cpu_to_le16(EXT4_INODES_PER_GROUP(sb));
> gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
> -
> + gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
> /*
> * We can allocate memory for mb_alloc based on the new group
> * descriptor

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-11-25 11:27:06

by Solofo.Ramangalahy

[permalink] [raw]
Subject: Re: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

Andreas Dilger writes:
> > As a side note, online resize and inode zeroing are "dual".
> >
> > In order to obtain a filesystem with faster formating times one can
> > do:
> > . either format a smaller fs and then resize it,
> > . or format the fs with lazy_itable_init
>
> As discussed on the concall, it probably makes more sense to have the
> resize code work by marking the inode tables UNINIT (if GDT_CSUM feature
> is enabled)

If I understand correctly, this is already the case:
#define EXT4_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not in use */
#define EXT4_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not in use */
#define EXT4_BG_INODE_ZEROED 0x0004 /* On-disk itable initialized to zero */

As the EXT4_BG_INODE_ZEROED is not present, the inode table is UNINIT.

By the way, is there any reason the #defines are like this, instead of:
#define EXT4_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not in use */
#define EXT4_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not in use */
#define EXT4_BG_ITABLE_UNINIT 0x0004 /* On-disk itable not initialized */
?

> and then start the "itable zeroing" thread, if it isn't
> already running, to do the zeroing of the itable.

Yes.

Is there other resize changes you could think of?
While working on this, I noted this checkpatch error
"ERROR: do not use assignment in if condition"
(but I am not sure of the exact justification).

Thanks,
--
solofo

2008-11-25 21:18:48

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

On Nov 25, 2008 12:27 +0100, [email protected] wrote:
> Andreas Dilger writes:
> > As discussed on the concall, it probably makes more sense to have the
> > resize code work by marking the inode tables UNINIT (if GDT_CSUM feature
> > is enabled)
>
> If I understand correctly, this is already the case:
> #define EXT4_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not in use */
> #define EXT4_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not in use */
> #define EXT4_BG_INODE_ZEROED 0x0004 /* On-disk itable initialized to zero */

> As the EXT4_BG_INODE_ZEROED is not present, the inode table is UNINIT.

Ah, I suppose you are correct.

> By the way, is there any reason the #defines are like this, instead of:
> #define EXT4_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not in use */
> #define EXT4_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not in use */
> #define EXT4_BG_ITABLE_UNINIT 0x0004 /* On-disk itable not initialized */
> ?

No particular reason, that is just how the implementation was done. In
hindsight that probably would make more sense...

> While working on this, I noted this checkpatch error
> "ERROR: do not use assignment in if condition"
> (but I am not sure of the exact justification).

I'm not sure what you are asking. The reason not to use "assignment in if"
is because of possible coding error like:

if (x = 6) {
/* do something if x is 6 */
}

when coder actually meant to write:

if (x == 6) {
/* do something if x is 6 */
}

The first one will now generate a warning in GCC unless written as:

if ((x = 6)) {
/* do something if x is 6 */
}

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-11-27 04:50:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

On Fri, Nov 21, 2008 at 11:23:10AM +0100, [email protected] wrote:
> The inode table has been zeroed in setup_new_group_blocks().
> Mark it as such in ext4_group_add().

This patch makes sense to apply right away. However:

1) You didn't include a Developer's Certification of Origin (i.e., a
"Signed-off-by" header). Since this is a one line patch, and it seems
pretty clear your intention is to submit this to Linus, I added one on
your behalf so you can see how it should be done. However, in general
you should never add a signoff for someone else, so I need an explicit
OK from you that you agree with the terms of the Developer's
Certification of Origin as found in the Linux kernel source code,
Documentation/SubmittingPatches before I can push this patch to Linus.
This is very important legally.

2) You need to set the flag *before* you calculate the block group
checksum, not afterwards.

So the corrected patch should look like this....

- Ted

ext4: When resizing set the EXT4_BG_INODE_ZEROED flag for new block groups

From: [email protected]

The inode table has been zeroed in setup_new_group_blocks(). Mark it as
such in ext4_group_add(). Since we are currently clearing inode table
for the new block group, we should set the EXT4_BG_INODE_ZEROED flag.
If at some point in the future we don't immediately zero out the inode
table as part of the resize operation, then obviously we shouldn't do
this.

Signed-off-by: [email protected]
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index b6ec184..d448eb1 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -864,6 +864,7 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count);
gdp->bg_free_inodes_count = cpu_to_le16(EXT4_INODES_PER_GROUP(sb));
+ gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);

/*

2008-11-27 09:30:35

by Solofo.Ramangalahy

[permalink] [raw]
Subject: Re: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

Hi Ted,

Theodore Tso writes:
> 2) You need to set the flag *before* you calculate the block group
> checksum, not afterwards.

Sorry about this. I forgot to do the quilt refresh (and check that the
code I submit is the same that the code I run).

> 1) You didn't include a Developer's Certification of Origin (i.e., a
> "Signed-off-by" header). Since this is a one line patch, and it seems
> pretty clear your intention is to submit this to Linus,

This was really an RFC, as you also pointed out.
Regarding this patch, the discussion raised the question of whether
EXT4_BG_INODE_UNINIT or EXT4_BG_ITABLE_UNINIT would be more coherent
than EXT4_BG_INODE_ZEROED wrt. EXT4_BG_INODE_UNINIT and
EXT4_BG_BLOCK_UNINIT.
This was also used as an example for the discussion about doing the
initialization outside of an init thread (which turned up not to be a
good idea).
This is also the first use of EXT4_BG_INODE_ZEROED in the kernel, so
an occasion to revisit the name.
I did not look carefully in the progs (EXT2_BG_INODE_ZEROED) to see if
it is desirable and easy to change it. cscope indicates that it may be
easy (4 instances).

> So the corrected patch should look like this....

Thank you, that's settled then,
--
solofo

2008-11-27 22:35:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

On Thu, Nov 27, 2008 at 10:30:31AM +0100, [email protected] wrote:
> This was really an RFC, as you also pointed out.
> Regarding this patch, the discussion raised the question of whether
> EXT4_BG_INODE_UNINIT or EXT4_BG_ITABLE_UNINIT would be more coherent
> than EXT4_BG_INODE_ZEROED wrt. EXT4_BG_INODE_UNINIT and
> EXT4_BG_BLOCK_UNINIT.

EXT2_BG_ITABLE_UNINIT (or EXT2_BG_ITABLE_PARTIALLY_UNINIT, to be more
correct) would have been better, yes. That way legacy filesystems
that didn't enable uninit_bg would have bg_flags == 0, and we would
know that inode table was properly initialized. Unfortunately we did
it the other way, where EXT2_BG_INODE_ZEROED is set when the inode
table is initialized, instead of the other way around.

> This is also the first use of EXT4_BG_INODE_ZEROED in the kernel, so
> an occasion to revisit the name.

Unfortunately, we've been shipping mke2fs in e2fsprogs that sets the
EXT4_BG_INODE_ZERO for newly created filesystem, and if the
lazy_itable_init configuration parameter is set, it doesn't initialize
the inode table and leaves bg_flags set to EXT2_BG_INODE_UNINIT and
EXT2_BG_BLOCK_UNINIT.

Distributions are already shipping e2fsprogs with this, and there are
ext4 filesystems out there in the wild, so it is indeed probably way
too late to change this.

- Ted

2008-11-27 23:09:57

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC 1/2] ext4 resize: Mark the added group with EXT4_BG_INODE_ZEROED flag

On Nov 27, 2008 17:35 -0500, Theodore Ts'o wrote:
> Unfortunately, we've been shipping mke2fs in e2fsprogs that sets the
> EXT4_BG_INODE_ZERO for newly created filesystem, and if the
> lazy_itable_init configuration parameter is set, it doesn't initialize
> the inode table and leaves bg_flags set to EXT2_BG_INODE_UNINIT and
> EXT2_BG_BLOCK_UNINIT.
>
> Distributions are already shipping e2fsprogs with this, and there are
> ext4 filesystems out there in the wild, so it is indeed probably way
> too late to change this.

I suppose it would be possible to add a new flag and deprecate the old
INODE_ZERO usage after a couple of years. Until we start doing anything
with the kernel we've done everything with mke2fs to zero the inode table,
so that would be a reasonable assumption for the kernel to make.

I agree that having the flag with the opposite meaning would have been
better, but hindsight is 20-20.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.