2008-05-14 18:47:16

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: printk stack trace on ext4_error, ext4_abort and ext4_warning.

This helps in better debugging of the problem reported.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/super.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index cd7cac0..93f4820 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -238,6 +238,7 @@ void ext4_error (struct super_block * sb, const char * function,
vprintk(fmt, args);
printk("\n");
va_end(args);
+ dump_stack();

ext4_handle_error(sb);
}
@@ -320,6 +321,7 @@ void ext4_abort (struct super_block * sb, const char * function,
vprintk(fmt, args);
printk("\n");
va_end(args);
+ dump_stack();

if (test_opt(sb, ERRORS_PANIC))
panic("EXT4-fs panic from previous error\n");
@@ -345,6 +347,7 @@ void ext4_warning (struct super_block * sb, const char * function,
vprintk(fmt, args);
printk("\n");
va_end(args);
+ dump_stack();
}

void ext4_update_dynamic_rev(struct super_block *sb)
--
1.5.5.1.211.g65ea3.dirty



2008-05-14 18:47:16

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: Fix use of uninitialized data

Fix use of uninitialized data with debug enabled.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/mballoc.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 7871f46..4e79e48 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2748,8 +2748,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
sbi = EXT4_SB(sb);
es = sbi->s_es;

- ext4_debug("using block group %lu(%d)\n", ac->ac_b_ex.fe_group,
- gdp->bg_free_blocks_count);

err = -EIO;
bitmap_bh = read_block_bitmap(sb, ac->ac_b_ex.fe_group);
@@ -2765,6 +2763,9 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
if (!gdp)
goto out_err;

+ ext4_debug("using block group %lu(%d)\n", ac->ac_b_ex.fe_group,
+ gdp->bg_free_blocks_count);
+
err = ext4_journal_get_write_access(handle, gdp_bh);
if (err)
goto out_err;
@@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
struct ext4_prealloc_space *pa)
{
- unsigned len = ac->ac_o_ex.fe_len;

2008-05-14 18:47:17

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: Fix FLEX_BG and uninit group usage.

With FLEX_BG we allocate block bitmap, inode bitmap, and
inode table outside the group. So when initialzing the
uninit block group we don't need to set bits corresponding
to these meta-data in the bitmaps. Also return the right
number of free blocks when counting the available free
blocks in uninit group.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/balloc.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 5c80eb5..fb63f01 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -109,7 +109,14 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,

for (bit = 0; bit < bit_max; bit++)
ext4_set_bit(bit, bh->b_data);
-
+ /*
+ * With FLEX_BG uninit group we have all the
+ * blocks available for use. So no need
+ * to set any bits in bitmap
+ */
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb,
+ EXT4_FEATURE_INCOMPAT_FLEX_BG))
+ return free_blocks;
start = ext4_group_first_block_no(sb, block_group);

/* Set bits for block and inode bitmaps, and inode table */
@@ -126,6 +133,12 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
*/
mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data);
}
+ /*
+ * With FLEX_BG uninit group we have all the
+ * blocks available for use.
+ */
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
+ return free_blocks;

return free_blocks - sbi->s_itb_per_group - 2;
}
--
1.5.5.1.211.g65ea3.dirty


2008-05-14 19:08:05

by Jose R. Santos

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix FLEX_BG and uninit group usage.

On Thu, 15 May 2008 00:17:12 +0530
"Aneesh Kumar K.V" <[email protected]> wrote:

> With FLEX_BG we allocate block bitmap, inode bitmap, and
> inode table outside the group. So when initialzing the
> uninit block group we don't need to set bits corresponding
> to these meta-data in the bitmaps. Also return the right
> number of free blocks when counting the available free
> blocks in uninit group.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/balloc.c | 15 ++++++++++++++-
> 1 files changed, 14 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 5c80eb5..fb63f01 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -109,7 +109,14 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
>
> for (bit = 0; bit < bit_max; bit++)
> ext4_set_bit(bit, bh->b_data);
> -
> + /*
> + * With FLEX_BG uninit group we have all the
> + * blocks available for use. So no need
> + * to set any bits in bitmap
> + */
> + if (EXT4_HAS_INCOMPAT_FEATURE(sb,
> + EXT4_FEATURE_INCOMPAT_FLEX_BG))
> + return free_blocks;
> start = ext4_group_first_block_no(sb, block_group);
>
> /* Set bits for block and inode bitmaps, and inode table */
> @@ -126,6 +133,12 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> */
> mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data);
> }
> + /*
> + * With FLEX_BG uninit group we have all the
> + * blocks available for use.
> + */
> + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
> + return free_blocks;
>
> return free_blocks - sbi->s_itb_per_group - 2;
> }

This assumes that if the FLEX_BG feature is enable that all block
groups have no bitmaps or inode tables (which is wrong).

Something like this (ignore the fact that doesnt handle hi bits) should be better.

used_blocks = sbi->s_itb_per_group + 2;
if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
ext4_get_group_no_and_offset(sbi, bgd->bg_block_bitmap_lo, tmp, 0);
if (tmp != block_group)
used_blocks--;
ext4_get_group_no_and_offset(sbi, bgd->bg_inode_bitmap_lo, tmp, 0);
if (tmp != block_group)
used_blocks--;
ext4_get_group_no_and_offset(sbi, bgd->bg_inode_table_lo, tmp, 0);
if (tmp != block_group)
used_blocks -= sbi->s_itb_per_group;
}

return free_blocks - used_blocks;
}

-JRS

2008-05-14 19:08:21

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: printk stack trace on ext4_error, ext4_abort and ext4_warning.

Aneesh Kumar K.V wrote:
> This helps in better debugging of the problem reported.

ext4_error happens potentially often in some scenarios, and if I chose
errors=continue I'm not sure I'd want to dump this much.

Would it be worth limiting how often this goes off (maybe just once per fs?)

-Eric


> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/super.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index cd7cac0..93f4820 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -238,6 +238,7 @@ void ext4_error (struct super_block * sb, const char * function,
> vprintk(fmt, args);
> printk("\n");
> va_end(args);
> + dump_stack();
>
> ext4_handle_error(sb);
> }
> @@ -320,6 +321,7 @@ void ext4_abort (struct super_block * sb, const char * function,
> vprintk(fmt, args);
> printk("\n");
> va_end(args);
> + dump_stack();
>
> if (test_opt(sb, ERRORS_PANIC))
> panic("EXT4-fs panic from previous error\n");
> @@ -345,6 +347,7 @@ void ext4_warning (struct super_block * sb, const char * function,
> vprintk(fmt, args);
> printk("\n");
> va_end(args);
> + dump_stack();
> }
>
> void ext4_update_dynamic_rev(struct super_block *sb)


2008-05-14 19:45:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: printk stack trace on ext4_error, ext4_abort and ext4_warning.

On Wed, May 14, 2008 at 02:07:14PM -0500, Eric Sandeen wrote:
> Aneesh Kumar K.V wrote:
> > This helps in better debugging of the problem reported.
>
> ext4_error happens potentially often in some scenarios, and if I chose
> errors=continue I'm not sure I'd want to dump this much.
>
> Would it be worth limiting how often this goes off (maybe just once per fs?)

We should really do an audit of the calls to ext4_error() and
determine when it is due to a filesystem corruption (where the stack
dump is not useful, and in fact the ext4_error messages should be
detailed enough so that it is obvious where the corruption is --- in
some cases I've wished that the inode number or block group number
where the problem was detected was included) and where it is more of
an assertion failure, where the stack dump is useful.

If there seems to be a need to include dump_stack(), the first
question I would ask is whether ext4_error() was really the right way
of flagging a problem in the first place, as opposed to using a WARN()
or WARN_ON().

The same is true for ext4_warning(). There should be a very clear
distinction between filesystem corruption, I/O errors, and programming
faults, as much as possible.

- Ted

2008-05-15 04:07:03

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix FLEX_BG and uninit group usage.

On Wed, May 14, 2008 at 02:08:02PM -0500, Jose R. Santos wrote:
> On Thu, 15 May 2008 00:17:12 +0530
> "Aneesh Kumar K.V" <[email protected]> wrote:
>
> > With FLEX_BG we allocate block bitmap, inode bitmap, and
> > inode table outside the group. So when initialzing the
> > uninit block group we don't need to set bits corresponding
> > to these meta-data in the bitmaps. Also return the right
> > number of free blocks when counting the available free
> > blocks in uninit group.
> >
> > Signed-off-by: Aneesh Kumar K.V <[email protected]>
> > ---
> > fs/ext4/balloc.c | 15 ++++++++++++++-
> > 1 files changed, 14 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> > index 5c80eb5..fb63f01 100644
> > --- a/fs/ext4/balloc.c
> > +++ b/fs/ext4/balloc.c
> > @@ -109,7 +109,14 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> >
> > for (bit = 0; bit < bit_max; bit++)
> > ext4_set_bit(bit, bh->b_data);
> > -
> > + /*
> > + * With FLEX_BG uninit group we have all the
> > + * blocks available for use. So no need
> > + * to set any bits in bitmap
> > + */
> > + if (EXT4_HAS_INCOMPAT_FEATURE(sb,
> > + EXT4_FEATURE_INCOMPAT_FLEX_BG))
> > + return free_blocks;
> > start = ext4_group_first_block_no(sb, block_group);
> >
> > /* Set bits for block and inode bitmaps, and inode table */
> > @@ -126,6 +133,12 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> > */
> > mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data);
> > }
> > + /*
> > + * With FLEX_BG uninit group we have all the
> > + * blocks available for use.
> > + */
> > + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
> > + return free_blocks;
> >
> > return free_blocks - sbi->s_itb_per_group - 2;
> > }
>
> This assumes that if the FLEX_BG feature is enable that all block
> groups have no bitmaps or inode tables (which is wrong).
>
> Something like this (ignore the fact that doesnt handle hi bits) should be better.
>
> used_blocks = sbi->s_itb_per_group + 2;
> if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
> ext4_get_group_no_and_offset(sbi, bgd->bg_block_bitmap_lo, tmp, 0);
> if (tmp != block_group)
> used_blocks--;
> ext4_get_group_no_and_offset(sbi, bgd->bg_inode_bitmap_lo, tmp, 0);
> if (tmp != block_group)
> used_blocks--;
> ext4_get_group_no_and_offset(sbi, bgd->bg_inode_table_lo, tmp, 0);
> if (tmp != block_group)
> used_blocks -= sbi->s_itb_per_group;
> }
>
> return free_blocks - used_blocks;
> }

But with FLEX_BG won't we mark the group as initialized if we are
placing the bitmap or inode table in the group ?

-aneesh

2008-05-15 04:26:07

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: printk stack trace on ext4_error, ext4_abort and ext4_warning.

On Wed, May 14, 2008 at 02:07:14PM -0500, Eric Sandeen wrote:
> Aneesh Kumar K.V wrote:
> > This helps in better debugging of the problem reported.
>
> ext4_error happens potentially often in some scenarios, and if I chose
> errors=continue I'm not sure I'd want to dump this much.
>
> Would it be worth limiting how often this goes off (maybe just once per fs?)
>

I actually thought of doing that. But won't rate limiting hide different
scenarios in which we can hit the error ? What we would like to know is
what system call actually caused the file system error. So that we can
try to reproduce the same. Rate limiting would prevent multiple possible
errors.

As Ted mentioned the right fix would be audit all the
ext4_error/warning/abort call sites and add a WARN_ON or WARN_ON_ONCE
where ever we find it useful.

-aneesh


2008-05-15 16:32:23

by Jose R. Santos

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix FLEX_BG and uninit group usage.

On Thu, 15 May 2008 09:36:58 +0530
"Aneesh Kumar K.V" <[email protected]> wrote:

> On Wed, May 14, 2008 at 02:08:02PM -0500, Jose R. Santos wrote:
> > On Thu, 15 May 2008 00:17:12 +0530
> > "Aneesh Kumar K.V" <[email protected]> wrote:
> >
> > > With FLEX_BG we allocate block bitmap, inode bitmap, and
> > > inode table outside the group. So when initialzing the
> > > uninit block group we don't need to set bits corresponding
> > > to these meta-data in the bitmaps. Also return the right
> > > number of free blocks when counting the available free
> > > blocks in uninit group.
> > >
> > > Signed-off-by: Aneesh Kumar K.V <[email protected]>
> > > ---
> > > fs/ext4/balloc.c | 15 ++++++++++++++-
> > > 1 files changed, 14 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> > > index 5c80eb5..fb63f01 100644
> > > --- a/fs/ext4/balloc.c
> > > +++ b/fs/ext4/balloc.c
> > > @@ -109,7 +109,14 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> > >
> > > for (bit = 0; bit < bit_max; bit++)
> > > ext4_set_bit(bit, bh->b_data);
> > > -
> > > + /*
> > > + * With FLEX_BG uninit group we have all the
> > > + * blocks available for use. So no need
> > > + * to set any bits in bitmap
> > > + */
> > > + if (EXT4_HAS_INCOMPAT_FEATURE(sb,
> > > + EXT4_FEATURE_INCOMPAT_FLEX_BG))
> > > + return free_blocks;
> > > start = ext4_group_first_block_no(sb, block_group);
> > >
> > > /* Set bits for block and inode bitmaps, and inode table */
> > > @@ -126,6 +133,12 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> > > */
> > > mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data);
> > > }
> > > + /*
> > > + * With FLEX_BG uninit group we have all the
> > > + * blocks available for use.
> > > + */
> > > + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
> > > + return free_blocks;
> > >
> > > return free_blocks - sbi->s_itb_per_group - 2;
> > > }
> >
> > This assumes that if the FLEX_BG feature is enable that all block
> > groups have no bitmaps or inode tables (which is wrong).
> >
> > Something like this (ignore the fact that doesnt handle hi bits) should be better.
> >
> > used_blocks = sbi->s_itb_per_group + 2;
> > if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
> > ext4_get_group_no_and_offset(sbi, bgd->bg_block_bitmap_lo, tmp, 0);
> > if (tmp != block_group)
> > used_blocks--;
> > ext4_get_group_no_and_offset(sbi, bgd->bg_inode_bitmap_lo, tmp, 0);
> > if (tmp != block_group)
> > used_blocks--;
> > ext4_get_group_no_and_offset(sbi, bgd->bg_inode_table_lo, tmp, 0);
> > if (tmp != block_group)
> > used_blocks -= sbi->s_itb_per_group;
> > }
> >
> > return free_blocks - used_blocks;
> > }
>
> But with FLEX_BG won't we mark the group as initialized if we are
> placing the bitmap or inode table in the group ?

The problem is that we define FLEX_BG to mean that bitmpas and inode
tables may not reside in the same block group regardless of whether we
use bg meta-data grouping. Your patch cover the only the meta-data
grouping case but it may break if someone writes another usage case for
flex_bg.

There was also a bug on your patch since it skipped setting bits at the
end of the bitmap if the blocks within the group were less than
blocksize * 8.

How does the following (untested) patch look to you?

-JRS



With FLEX_BG block bitmaps, inode bitmaps and inode tables
_MAY_ be allocated outside the group. So, when initializing
the uninit block group, we need to check the location of this
blocks before setting the corresponding bits in the block
bitmap of the newly initialized group. Also return the right
number of free blocks when counting the available free blocks
in uninit group.

Signed-off-by: Jose R. Santos <[email protected]>
---

fs/ext4/balloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index da99437..38367f4 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -43,6 +43,44 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,

}

+int ext4_block_in_group(struct super_block *sb, ext4_fsblk_t block,
+ ext4_group_t block_group)
+{
+ ext4_group_t actual_group;
+ ext4_get_group_no_and_offset(sb, block, &actual_group, 0);
+ if (actual_group == block_group)
+ return 1;
+ return 0;
+}
+
+int ext4_group_used_meta_blocks(struct super_block *sb,
+ ext4_group_t block_group)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ int used_blocks = sbi->s_itb_per_group + 2;
+
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
+ struct ext4_group_desc *gdp;
+ struct buffer_head *bh;
+
+ gdp = ext4_get_group_desc(sb, block_group, &bh);
+
+ if (ext4_block_in_group(sb, ext4_block_bitmap(sb, gdp),
+ block_group))
+ used_blocks--;
+
+ if (ext4_block_in_group(sb, ext4_inode_bitmap(sb, gdp),
+ block_group))
+ used_blocks--;
+
+ if (ext4_block_in_group(sb, ext4_inode_table(sb, gdp),
+ block_group))
+ used_blocks -= sbi->s_itb_per_group;
+
+ }
+
+ return used_blocks;
+}
/* Initializes an uninitialized block bitmap if given, and returns the
* number of blocks free in the group. */
unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
@@ -105,19 +143,33 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
free_blocks = group_blocks - bit_max;

if (bh) {
- ext4_fsblk_t start;
+ ext4_fsblk_t start, tmp;
+ int flex_bg = 0;

for (bit = 0; bit < bit_max; bit++)
ext4_set_bit(bit, bh->b_data);

start = ext4_group_first_block_no(sb, block_group);

+ if (EXT4_HAS_INCOMPAT_FEATURE(sb,
+ EXT4_FEATURE_INCOMPAT_FLEX_BG))
+ flex_bg = 1;
+
/* Set bits for block and inode bitmaps, and inode table */
- ext4_set_bit(ext4_block_bitmap(sb, gdp) - start, bh->b_data);
- ext4_set_bit(ext4_inode_bitmap(sb, gdp) - start, bh->b_data);
- for (bit = (ext4_inode_table(sb, gdp) - start),
- bit_max = bit + sbi->s_itb_per_group; bit < bit_max; bit++)
- ext4_set_bit(bit, bh->b_data);
+ tmp = ext4_block_bitmap(sb, gdp);
+ if (!flex_bg || ext4_block_in_group(sb, tmp, block_group))
+ ext4_set_bit(tmp - start, bh->b_data);
+
+ tmp = ext4_inode_bitmap(sb, gdp);
+ if (!flex_bg || ext4_block_in_group(sb, tmp, block_group))
+ ext4_set_bit(tmp - start, bh->b_data);
+
+ tmp = ext4_inode_table(sb, gdp);
+ if (!flex_bg || ext4_block_in_group(sb, tmp, block_group))
+ for (bit = (tmp - start),
+ bit_max = bit + sbi->s_itb_per_group;
+ bit < bit_max; bit++)
+ ext4_set_bit(bit, bh->b_data);

/*
* Also if the number of blocks within the group is
@@ -127,7 +179,7 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data);
}

- return free_blocks - sbi->s_itb_per_group - 2;
+ return free_blocks - ext4_group_used_meta_blocks(sb, block_group);
}



2008-06-02 00:10:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data

On Thu, May 15, 2008 at 12:17:11AM +0530, Aneesh Kumar K.V wrote:
> @@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
> static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
> struct ext4_prealloc_space *pa)
> {
> - unsigned len = ac->ac_o_ex.fe_len;
> -
> + unsigned int len = ac->ac_o_ex.fe_len;
> ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
> &ac->ac_b_ex.fe_group,
> &ac->ac_b_ex.fe_start);
> --

This change had nothing to do with fixing the use of unitialized data,
but when I started looking more closely, it raised a potential signed
vs. unsigned issue: ac_o_ex is a struct ext4_free_extent, and fe_len
is an int.

So here we are assigning an int to an unsigned int. Later, len is
assigned to ac_b_ex.len, which means assigning an unsigned int to an
int. In other places, fe_len (an int) is compared against pa_free
(which is an unsigned short), and fe_len gets assined to pa_free, once
again mixing signed and unsigned.

Can someone who is really familiar with this code check this out? I
think the following pseudo-patch to mballoc.h might be in order:

struct ext4_free_extent {
ext4_lblk_t fe_logical;
ext4_grpblk_t fe_start;
ext4_group_t fe_group;
- int fe_len;
+ unsigned int fe_len;
};


- Ted

2008-06-02 09:00:04

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data

On Sun, Jun 01, 2008 at 08:08:42PM -0400, Theodore Tso wrote:
> On Thu, May 15, 2008 at 12:17:11AM +0530, Aneesh Kumar K.V wrote:
> > @@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
> > static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
> > struct ext4_prealloc_space *pa)
> > {
> > - unsigned len = ac->ac_o_ex.fe_len;
> > -
> > + unsigned int len = ac->ac_o_ex.fe_len;
> > ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
> > &ac->ac_b_ex.fe_group,
> > &ac->ac_b_ex.fe_start);
> > --
>
> This change had nothing to do with fixing the use of unitialized data,
> but when I started looking more closely, it raised a potential signed
> vs. unsigned issue: ac_o_ex is a struct ext4_free_extent, and fe_len
> is an int.
>
> So here we are assigning an int to an unsigned int. Later, len is
> assigned to ac_b_ex.len, which means assigning an unsigned int to an
> int. In other places, fe_len (an int) is compared against pa_free
> (which is an unsigned short), and fe_len gets assined to pa_free, once
> again mixing signed and unsigned.
>
> Can someone who is really familiar with this code check this out? I
> think the following pseudo-patch to mballoc.h might be in order:
>
> struct ext4_free_extent {
> ext4_lblk_t fe_logical;
> ext4_grpblk_t fe_start;
> ext4_group_t fe_group;
> - int fe_len;
> + unsigned int fe_len;
> };
>

Looks correct. We have some BUG_ON that is doing fe_len <= 0. May be we
need to remove the < part.

-aneesh

2008-06-02 10:05:01

by Shen Feng

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data



Theodore Tso Wrote:
> On Thu, May 15, 2008 at 12:17:11AM +0530, Aneesh Kumar K.V wrote:
>> @@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
>> static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
>> struct ext4_prealloc_space *pa)
>> {
>> - unsigned len = ac->ac_o_ex.fe_len;
>> -
>> + unsigned int len = ac->ac_o_ex.fe_len;
>> ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
>> &ac->ac_b_ex.fe_group,
>> &ac->ac_b_ex.fe_start);
>> --
>
> This change had nothing to do with fixing the use of unitialized data,
> but when I started looking more closely, it raised a potential signed
> vs. unsigned issue: ac_o_ex is a struct ext4_free_extent, and fe_len
> is an int.
>
> So here we are assigning an int to an unsigned int. Later, len is
> assigned to ac_b_ex.len, which means assigning an unsigned int to an
> int. In other places, fe_len (an int) is compared against pa_free
> (which is an unsigned short), and fe_len gets assined to pa_free, once
> again mixing signed and unsigned.
>
> Can someone who is really familiar with this code check this out? I
> think the following pseudo-patch to mballoc.h might be in order:
>
> struct ext4_free_extent {
> ext4_lblk_t fe_logical;
> ext4_grpblk_t fe_start;
> ext4_group_t fe_group;
> - int fe_len;
> + unsigned int fe_len;
> };
>

I'm studying the ext4 code these days.
The data types always confuse me.

The length of a ext4_extent ee_len is define as unsigned short.

struct ext4_extent {
__le32 ee_block; /* first logical block extent covers */
__le16 ee_len; /* number of blocks covered by extent */
__le16 ee_start_hi; /* high 16 bits of physical block */
__le32 ee_start_lo; /* low 32 bits of physical block */
};

So I think fe_len should also be defined as unsigned short.
Is that right?

-Shen Feng

2008-06-02 10:32:44

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data

On Mon, Jun 02, 2008 at 06:02:21PM +0800, Shen Feng wrote:
>
>
> Theodore Tso Wrote:
> > On Thu, May 15, 2008 at 12:17:11AM +0530, Aneesh Kumar K.V wrote:
> >> @@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
> >> static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
> >> struct ext4_prealloc_space *pa)
> >> {
> >> - unsigned len = ac->ac_o_ex.fe_len;
> >> -
> >> + unsigned int len = ac->ac_o_ex.fe_len;
> >> ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
> >> &ac->ac_b_ex.fe_group,
> >> &ac->ac_b_ex.fe_start);
> >> --
> >
> > This change had nothing to do with fixing the use of unitialized data,
> > but when I started looking more closely, it raised a potential signed
> > vs. unsigned issue: ac_o_ex is a struct ext4_free_extent, and fe_len
> > is an int.
> >
> > So here we are assigning an int to an unsigned int. Later, len is
> > assigned to ac_b_ex.len, which means assigning an unsigned int to an
> > int. In other places, fe_len (an int) is compared against pa_free
> > (which is an unsigned short), and fe_len gets assined to pa_free, once
> > again mixing signed and unsigned.
> >
> > Can someone who is really familiar with this code check this out? I
> > think the following pseudo-patch to mballoc.h might be in order:
> >
> > struct ext4_free_extent {
> > ext4_lblk_t fe_logical;
> > ext4_grpblk_t fe_start;
> > ext4_group_t fe_group;
> > - int fe_len;
> > + unsigned int fe_len;
> > };
> >
>
> I'm studying the ext4 code these days.
> The data types always confuse me.
>
> The length of a ext4_extent ee_len is define as unsigned short.
>
> struct ext4_extent {
> __le32 ee_block; /* first logical block extent covers */
> __le16 ee_len; /* number of blocks covered by extent */
> __le16 ee_start_hi; /* high 16 bits of physical block */
> __le32 ee_start_lo; /* low 32 bits of physical block */
> };
>
> So I think fe_len should also be defined as unsigned short.
> Is that right?

Extents and each prealloc space have at max 2**16 blocks. So the length
of both should be unsigned short. With respect to ext4_free_extent we
use fe_len to store the number of blocks requested for allocation.
( ext4_mb_initialize_context )

The allocated extent will definitely have <= 2**16. But the requested
number of blocks may not.

-aneesh

2008-06-02 13:47:34

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data

Theodore Tso wrote:
> On Thu, May 15, 2008 at 12:17:11AM +0530, Aneesh Kumar K.V wrote:
>> @@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
>> static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
>> struct ext4_prealloc_space *pa)
>> {
>> - unsigned len = ac->ac_o_ex.fe_len;
>> -
>> + unsigned int len = ac->ac_o_ex.fe_len;
>> ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
>> &ac->ac_b_ex.fe_group,
>> &ac->ac_b_ex.fe_start);
>> --
>
> This change had nothing to do with fixing the use of unitialized data,
> but when I started looking more closely, it raised a potential signed
> vs. unsigned issue: ac_o_ex is a struct ext4_free_extent, and fe_len
> is an int.
>
> So here we are assigning an int to an unsigned int. Later, len is
> assigned to ac_b_ex.len, which means assigning an unsigned int to an
> int. In other places, fe_len (an int) is compared against pa_free
> (which is an unsigned short), and fe_len gets assined to pa_free, once
> again mixing signed and unsigned.
>
> Can someone who is really familiar with this code check this out? I
> think the following pseudo-patch to mballoc.h might be in order:
>
> struct ext4_free_extent {
> ext4_lblk_t fe_logical;
> ext4_grpblk_t fe_start;
> ext4_group_t fe_group;
> - int fe_len;
> + unsigned int fe_len;
> };

Hm, ok, so what's going on here:

ext4_mb_normalize_group_request()
{
...
if (EXT4_SB(sb)->s_stripe)
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
else
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
...
}

and that's a long:

unsigned long s_mb_group_prealloc;

Oh, but that's only ever assigned as

sbi->s_mb_group_prealloc = MB_DEFAULT_GROUP_PREALLOC;

which is

/*
* default group prealloc size 512 blocks
*/
#define MB_DEFAULT_GROUP_PREALLOC 512


so it's fine... but why are we carrying around a field in the sbi to
hold a constant that cannot be changed runtime?

-Eric

2008-06-02 14:20:54

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data

On Mon, Jun 02, 2008 at 08:42:24AM -0500, Eric Sandeen wrote:
> Theodore Tso wrote:
> > On Thu, May 15, 2008 at 12:17:11AM +0530, Aneesh Kumar K.V wrote:
> >> @@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
> >> static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
> >> struct ext4_prealloc_space *pa)
> >> {
> >> - unsigned len = ac->ac_o_ex.fe_len;
> >> -
> >> + unsigned int len = ac->ac_o_ex.fe_len;
> >> ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
> >> &ac->ac_b_ex.fe_group,
> >> &ac->ac_b_ex.fe_start);
> >> --
> >
> > This change had nothing to do with fixing the use of unitialized data,
> > but when I started looking more closely, it raised a potential signed
> > vs. unsigned issue: ac_o_ex is a struct ext4_free_extent, and fe_len
> > is an int.
> >
> > So here we are assigning an int to an unsigned int. Later, len is
> > assigned to ac_b_ex.len, which means assigning an unsigned int to an
> > int. In other places, fe_len (an int) is compared against pa_free
> > (which is an unsigned short), and fe_len gets assined to pa_free, once
> > again mixing signed and unsigned.
> >
> > Can someone who is really familiar with this code check this out? I
> > think the following pseudo-patch to mballoc.h might be in order:
> >
> > struct ext4_free_extent {
> > ext4_lblk_t fe_logical;
> > ext4_grpblk_t fe_start;
> > ext4_group_t fe_group;
> > - int fe_len;
> > + unsigned int fe_len;
> > };
>
> Hm, ok, so what's going on here:
>
> ext4_mb_normalize_group_request()
> {
> ...
> if (EXT4_SB(sb)->s_stripe)
> ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
> else
> ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
> ...
> }
>
> and that's a long:
>
> unsigned long s_mb_group_prealloc;
>
> Oh, but that's only ever assigned as
>
> sbi->s_mb_group_prealloc = MB_DEFAULT_GROUP_PREALLOC;
>
> which is
>
> /*
> * default group prealloc size 512 blocks
> */
> #define MB_DEFAULT_GROUP_PREALLOC 512
>
>
> so it's fine... but why are we carrying around a field in the sbi to
> hold a constant that cannot be changed runtime?

We can tune that via MB_PROC_FOPS(group_prealloc);


-aneesh


2008-06-02 14:27:02

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data

Aneesh Kumar K.V wrote:
> On Mon, Jun 02, 2008 at 08:42:24AM -0500, Eric Sandeen wrote:

>> so it's fine... but why are we carrying around a field in the sbi to
>> hold a constant that cannot be changed runtime?
>
> We can tune that via MB_PROC_FOPS(group_prealloc);

MB_PROC_VALUE_WRITE()....

ah, cleverly hidden from cscope with a macro. :)

Ok, so technically then this could be big enough to overflow fe_len:

value = simple_strtol(str, NULL, 0); \
if (value <= 0) \
return -ERANGE; \
sbi->s_mb_##name = value; \


but I guess it's probably not the first thing we need to worry about.

-Eric

2008-06-03 01:00:32

by Shen Feng

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data



Aneesh Kumar K.V Wrote:
> On Mon, Jun 02, 2008 at 06:02:21PM +0800, Shen Feng wrote:
>>
>> Theodore Tso Wrote:
>>> On Thu, May 15, 2008 at 12:17:11AM +0530, Aneesh Kumar K.V wrote:
>>>> @@ -3134,8 +3135,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
>>>> static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
>>>> struct ext4_prealloc_space *pa)
>>>> {
>>>> - unsigned len = ac->ac_o_ex.fe_len;
>>>> -
>>>> + unsigned int len = ac->ac_o_ex.fe_len;
>>>> ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
>>>> &ac->ac_b_ex.fe_group,
>>>> &ac->ac_b_ex.fe_start);
>>>> --
>>> This change had nothing to do with fixing the use of unitialized data,
>>> but when I started looking more closely, it raised a potential signed
>>> vs. unsigned issue: ac_o_ex is a struct ext4_free_extent, and fe_len
>>> is an int.
>>>
>>> So here we are assigning an int to an unsigned int. Later, len is
>>> assigned to ac_b_ex.len, which means assigning an unsigned int to an
>>> int. In other places, fe_len (an int) is compared against pa_free
>>> (which is an unsigned short), and fe_len gets assined to pa_free, once
>>> again mixing signed and unsigned.
>>>
>>> Can someone who is really familiar with this code check this out? I
>>> think the following pseudo-patch to mballoc.h might be in order:
>>>
>>> struct ext4_free_extent {
>>> ext4_lblk_t fe_logical;
>>> ext4_grpblk_t fe_start;
>>> ext4_group_t fe_group;
>>> - int fe_len;
>>> + unsigned int fe_len;
>>> };
>>>
>> I'm studying the ext4 code these days.
>> The data types always confuse me.
>>
>> The length of a ext4_extent ee_len is define as unsigned short.
>>
>> struct ext4_extent {
>> __le32 ee_block; /* first logical block extent covers */
>> __le16 ee_len; /* number of blocks covered by extent */
>> __le16 ee_start_hi; /* high 16 bits of physical block */
>> __le32 ee_start_lo; /* low 32 bits of physical block */
>> };
>>
>> So I think fe_len should also be defined as unsigned short.
>> Is that right?
>
> Extents and each prealloc space have at max 2**16 blocks. So the length
> of both should be unsigned short. With respect to ext4_free_extent we
> use fe_len to store the number of blocks requested for allocation.
> ( ext4_mb_initialize_context )

In ext4_mb_initialize_context, we have

/* just a dirty hack to filter too big requests */
if (len >= EXT4_BLOCKS_PER_GROUP(sb) - 10)
len = EXT4_BLOCKS_PER_GROUP(sb) - 10;

This means that we cannot allocate blocks which is bigger then
EXT4_BLOCKS_PER_GROUP(sb) - 10 ( max 2**16-10 ) with MBALLOC.
But ext4_new_blocks_old can do that.

So ext4_new_blocks may be changed as

ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
ext4_fsblk_t goal, unsigned long *count, int *errp)
{
struct ext4_allocation_request ar;
ext4_fsblk_t ret;

- if (!test_opt(inode->i_sb, MBALLOC)) {
+ if (!test_opt(inode->i_sb, MBALLOC) ||
+ (*count >= EXT4_BLOCKS_PER_GROUP(inode->i_sb) - 10)) {
ret = ext4_new_blocks_old(handle, inode, goal, count, errp);
return ret;
}

memset(&ar, 0, sizeof(ar));
ar.inode = inode;
ar.goal = goal;
ar.len = *count;
ret = ext4_mb_new_blocks(handle, &ar, errp);
*count = ar.len;
return ret;
}


-Shen Feng

2008-06-03 20:03:03

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix use of uninitialized data

On Jun 03, 2008 08:57 +0800, Shen Feng wrote:
> Aneesh Kumar K.V Wrote:
> >> Theodore Tso Wrote:
> >>> Can someone who is really familiar with this code check this out? I
> >>> think the following pseudo-patch to mballoc.h might be in order:
> >>>
> >>> struct ext4_free_extent {
> >>> ext4_lblk_t fe_logical;
> >>> ext4_grpblk_t fe_start;
> >>> ext4_group_t fe_group;
> >>> - int fe_len;
> >>> + unsigned int fe_len;
> >>> };
> >>>
> >> I'm studying the ext4 code these days.
> >> The data types always confuse me.
> >>
> >> The length of a ext4_extent ee_len is define as unsigned short.
> >>
> >> struct ext4_extent {
> >> __le32 ee_block; /* first logical block extent covers */
> >> __le16 ee_len; /* number of blocks covered by extent */
> >> __le16 ee_start_hi; /* high 16 bits of physical block */
> >> __le32 ee_start_lo; /* low 32 bits of physical block */
> >> };
> >>
> >> So I think fe_len should also be defined as unsigned short.
> >> Is that right?
> >
> > Extents and each prealloc space have at max 2**16 blocks. So the length
> > of both should be unsigned short. With respect to ext4_free_extent we
> > use fe_len to store the number of blocks requested for allocation.
> > ( ext4_mb_initialize_context )

I agree that we _could_ use an unsigned short here, but this is not a
native type on some CPUs, and the use of an "int" is more optimal.
Making this an unsigned int (and removing BUG_ON()) is one way to do this.

> In ext4_mb_initialize_context, we have
>
> /* just a dirty hack to filter too big requests */
> if (len >= EXT4_BLOCKS_PER_GROUP(sb) - 10)
> len = EXT4_BLOCKS_PER_GROUP(sb) - 10;
>
> This means that we cannot allocate blocks which is bigger then
> EXT4_BLOCKS_PER_GROUP(sb) - 10 ( max 2**16-10 ) with MBALLOC.
> But ext4_new_blocks_old can do that.

Once we have FLEX_BG in the mix, it should be possible to allocate
a full group worth of blocks at one time. The "- 10" part was only
to take into account some small number of metadata blocks (bitmap,
inode tables, etc) but will actually hurt allocation with FLEX_BG.

> So ext4_new_blocks may be changed as
>
> ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
> ext4_fsblk_t goal, unsigned long *count, int *errp)
> {
> struct ext4_allocation_request ar;
> ext4_fsblk_t ret;
>
> - if (!test_opt(inode->i_sb, MBALLOC)) {
> + if (!test_opt(inode->i_sb, MBALLOC) ||
> + (*count >= EXT4_BLOCKS_PER_GROUP(inode->i_sb) - 10)) {
> ret = ext4_new_blocks_old(handle, inode, goal, count, errp);
> return ret;

In light of the above, I'd prefer if this is change to be:

if (!test_opt(inode->i_sb, MBALLOC) ||
(*count > EXT4_BLOCKS_PER_GROUP(inode->i_sb))) {
ret = ext4_new_blocks_old(handle, inode, goal, count, errp);

or much better would be to split the allocation into several BLOCKS_PER_GROUP
chunks and stick with mballoc, since we don't want to fall back to the slower
ext4_new_blocks_old() just when there are allocations that mballoc is best
suited for.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.