2009-09-04 22:17:47

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH, RFC] ext4: limit block allocations for indirect-block files to < 2^32

Today, the ext4 allocator will happily allocate blocks past
2^32 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.

This patch limits such allocations to < 2^32, and adds
WARN_ONs (maybe should be BUG_ONs) if we do get blocks
larger than that.

This should address RH Bug 519471, ext4 bitmap allocator
must limit blocks to < 2^32

* ext4_find_goal() is modified to choose a goal < UINT_MAX,
so that our starting point is in an acceptable range.

* ext4_xattr_block_set() is modified such that the goal block
is < UINT_MAX, as above.

* ext4_mb_regular_allocator() is modified so that the group
search does not continue into groups which are too high

* ext4_mb_use_preallocated() has a check that we don't use
preallocated space which is too far out

* ext4_alloc_blocks() and ext4_xattr_block_set() add some WARN_ONs

No attempt has been made to limit inode locations to < 2^32,
so we may wind up with blocks far from their inodes. Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.

For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.

Perhaps an ext4-specific #define would be better than UINT_MAX?

The allocator being what it is, I may have missed some spots,
so I'd welcome review.

Thanks,
-Eric

Signed-off-by: Eric Sandeen <[email protected]>
---

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9c642b..cda3f8d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -551,15 +551,21 @@ static ext4_fsblk_t ext4_find_near(struct inode *inode, Indirect *ind)
*
* Normally this function find the preferred place for block allocation,
* returns it.
+ * Because this is only used for bitmap files, we limit the block nr
+ * to 32 bits.
*/
static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
Indirect *partial)
{
+ ext4_fsblk_t goal;
+
/*
* XXX need to get goal block from mballoc's data structures
*/

- return ext4_find_near(inode, partial);
+ goal = ext4_find_near(inode, partial);
+ goal = goal % UINT_MAX;
+ return goal;
}

/**
@@ -640,6 +646,8 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
if (*err)
goto failed_out;

+ WARN_ON(current_block + count > UINT_MAX);
+
target -= count;
/* allocate blocks for indirect blocks */
while (index < indirect_blks && count) {
@@ -674,6 +682,7 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
ar.flags = EXT4_MB_HINT_DATA;

current_block = ext4_mb_new_blocks(handle, &ar, err);
+ WARN_ON(current_block + ar.len > UINT_MAX);

if (*err && (target == blks)) {
/*
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index cd25846..bb10f88 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1943,6 +1943,10 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
sb = ac->ac_sb;
sbi = EXT4_SB(sb);
ngroups = ext4_get_groups_count(sb);
+ /* bitmap files are limited to low blocks */
+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
+ ngroups %= (UINT_MAX / EXT4_BLOCKS_PER_GROUP(sb));
+
BUG_ON(ac->ac_status == AC_STATUS_FOUND);

/* first, try the goal */
@@ -3382,6 +3386,10 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
ac->ac_o_ex.fe_logical >= pa->pa_lstart + pa->pa_len)
continue;

+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL) &&
+ pa->pa_pstart + pa->pa_len > UINT_MAX)
+ continue;
+
/* found preallocated blocks, use them */
spin_lock(&pa->pa_lock);
if (pa->pa_deleted == 0 && pa->pa_free) {
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 62b31c2..9ed0f12 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -810,12 +810,22 @@ inserted:
get_bh(new_bh);
} else {
/* We need to allocate a new block */
- ext4_fsblk_t goal = ext4_group_first_block_no(sb,
+ ext4_fsblk_t goal, block;
+
+ goal = ext4_group_first_block_no(sb,
EXT4_I(inode)->i_block_group);
- ext4_fsblk_t block = ext4_new_meta_blocks(handle, inode,
+
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ goal = goal % UINT_MAX;
+
+ block = ext4_new_meta_blocks(handle, inode,
goal, NULL, &error);
if (error)
goto cleanup;
+
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ WARN_ON(block > UINT_MAX);
+
ea_idebug(inode, "creating block %d", block);

new_bh = sb_getblk(sb, block);



2009-09-05 03:21:50

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH, RFC V2] ext4: limit block allocations for indirect-block files to < 2^32

Today, the ext4 allocator will happily allocate blocks past
232 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.

This patch limits such allocations to < 232, and adds
WARN_ONs (maybe should be BUG_ONs) if we do get blocks
larger than that.

This should address RH Bug 519471, ext4 bitmap allocator
must limit blocks to < 232

* ext4_find_goal() is modified to choose a goal < UINT_MAX,
so that our starting point is in an acceptable range.

* ext4_xattr_block_set() is modified such that the goal block
is < UINT_MAX, as above.

* ext4_mb_regular_allocator() is modified so that the group
search does not continue into groups which are too high

* ext4_mb_use_preallocated() has a check that we don't use
preallocated space which is too far out

* ext4_alloc_blocks() and ext4_xattr_block_set() add some WARN_ONs

No attempt has been made to limit inode locations to < 232,
so we may wind up with blocks far from their inodes. Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.

For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.

Perhaps an ext4-specific #define would be better than UINT_MAX?

The allocator being what it is, I may have missed some spots,
so I'd welcome review.

Thanks,
-Eric

Signed-off-by: Eric Sandeen <[email protected]>
---

V2: got modulo-happy in ext4_mb_regular_allocator, just limit
ngroups to no more than UINT_MAX.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9c642b..cda3f8d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -551,15 +551,21 @@ static ext4_fsblk_t ext4_find_near(struct inode *inode, Indirect *ind)
*
* Normally this function find the preferred place for block allocation,
* returns it.
+ * Because this is only used for non-extent files, we limit the block nr
+ * to 32 bits.
*/
static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
Indirect *partial)
{
+ ext4_fsblk_t goal;
+
/*
* XXX need to get goal block from mballoc's data structures
*/

- return ext4_find_near(inode, partial);
+ goal = ext4_find_near(inode, partial);
+ goal = goal % UINT_MAX;
+ return goal;
}

/**
@@ -640,6 +646,8 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
if (*err)
goto failed_out;

+ WARN_ON(current_block + count > UINT_MAX);
+
target -= count;
/* allocate blocks for indirect blocks */
while (index < indirect_blks && count) {
@@ -674,6 +682,7 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
ar.flags = EXT4_MB_HINT_DATA;

current_block = ext4_mb_new_blocks(handle, &ar, err);
+ WARN_ON(current_block + ar.len > UINT_MAX);

if (*err && (target == blks)) {
/*
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index cd25846..10384c3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1943,6 +1943,11 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
sb = ac->ac_sb;
sbi = EXT4_SB(sb);
ngroups = ext4_get_groups_count(sb);
+ /* non-extent files are limited to low blocks/groups */
+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
+ ngroups = min_t(unsigned long, ngroups,
+ (UINT_MAX / EXT4_BLOCKS_PER_GROUP(sb)));
+
BUG_ON(ac->ac_status == AC_STATUS_FOUND);

/* first, try the goal */
@@ -3382,6 +3387,10 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
ac->ac_o_ex.fe_logical >= pa->pa_lstart + pa->pa_len)
continue;

+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL) &&
+ pa->pa_pstart + pa->pa_len > UINT_MAX)
+ continue;
+
/* found preallocated blocks, use them */
spin_lock(&pa->pa_lock);
if (pa->pa_deleted == 0 && pa->pa_free) {
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 62b31c2..9ed0f12 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -810,12 +810,22 @@ inserted:
get_bh(new_bh);
} else {
/* We need to allocate a new block */
- ext4_fsblk_t goal = ext4_group_first_block_no(sb,
+ ext4_fsblk_t goal, block;
+
+ goal = ext4_group_first_block_no(sb,
EXT4_I(inode)->i_block_group);
- ext4_fsblk_t block = ext4_new_meta_blocks(handle, inode,
+
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ goal = goal % UINT_MAX;
+
+ block = ext4_new_meta_blocks(handle, inode,
goal, NULL, &error);
if (error)
goto cleanup;
+
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ WARN_ON(block > UINT_MAX);
+
ea_idebug(inode, "creating block %d", block);

new_bh = sb_getblk(sb, block);


2009-09-05 16:45:36

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH, RFC V2] ext4: limit block allocations for indirect-block files to < 2^32

On Sep 04, 2009 22:21 -0500, Eric Sandeen wrote:
> Today, the ext4 allocator will happily allocate blocks past
> 232 for indirect-block files, which results in the block
> numbers getting truncated, and corruption ensues.
>
> This patch limits such allocations to < 2^32, and adds
> WARN_ONs (maybe should be BUG_ONs) if we do get blocks
> larger than that.

Eric, thanks for making the patch.

> This should address RH Bug 519471, ext4 bitmap allocator must limit
> blocks to < 2^32
>
> * ext4_find_goal() is modified to choose a goal < UINT_MAX,
> so that our starting point is in an acceptable range.
>
> * ext4_xattr_block_set() is modified such that the goal block
> is < UINT_MAX, as above.

Using UINT_MAX probably isn't wholly safe, as I know of systems
that have e.g. 64-bit ints (though I guess none that have Linux
kernel ports). It should use (u32)~0 or ((1 << 32) - 1) directly.

> Perhaps an ext4-specific #define would be better than UINT_MAX?

I think yes, since we know the maximum value is tied specifically
to the u32 indirect block pointers, and not necessarily to an "int".

> static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
> Indirect *partial)
> {
> + goal = ext4_find_near(inode, partial);
> + goal = goal % UINT_MAX;
> + return goal;

Using "% UINT_MAX" here will result in a 64-bit division on 32-bit
platforms, since ext4_fsblk_t is declared as an unsigned long long.
This should instead be "(u32)" or "& 0xffffffff".

> @@ -1943,6 +1943,11 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> + /* non-extent files are limited to low blocks/groups */
> + if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
> + ngroups = min_t(unsigned long, ngroups,
> + (UINT_MAX / EXT4_BLOCKS_PER_GROUP(sb)));

Since EXT4_BLOCKS_PER_GROUP() is a run-time variable, but is constant
for the life of the filesystem, this could be computed once and stored
in the superblock?

> +++ b/fs/ext4/xattr.c
> @@ -810,12 +810,22 @@ inserted:
> + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> + goal = goal % UINT_MAX;

As above.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-09-05 18:16:38

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC V2] ext4: limit block allocations for indirect-block files to < 2^32

Andreas Dilger wrote:
> On Sep 04, 2009 22:21 -0500, Eric Sandeen wrote:
>> Today, the ext4 allocator will happily allocate blocks past
>> 232 for indirect-block files, which results in the block
>> numbers getting truncated, and corruption ensues.
>>
>> This patch limits such allocations to < 2^32, and adds
>> WARN_ONs (maybe should be BUG_ONs) if we do get blocks
>> larger than that.
>
> Eric, thanks for making the patch.
>
>> This should address RH Bug 519471, ext4 bitmap allocator must limit
>> blocks to < 2^32
>>
>> * ext4_find_goal() is modified to choose a goal < UINT_MAX,
>> so that our starting point is in an acceptable range.
>>
>> * ext4_xattr_block_set() is modified such that the goal block
>> is < UINT_MAX, as above.
>
> Using UINT_MAX probably isn't wholly safe, as I know of systems
> that have e.g. 64-bit ints (though I guess none that have Linux
> kernel ports). It should use (u32)~0 or ((1 << 32) - 1) directly.
>
>> Perhaps an ext4-specific #define would be better than UINT_MAX?
>
> I think yes, since we know the maximum value is tied specifically
> to the u32 indirect block pointers, and not necessarily to an "int".

yep, I had considered that, I should have just done it :) (esp
considering the patch I sent a while back to get rid of similar things) :)

>> static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
>> Indirect *partial)
>> {
>> + goal = ext4_find_near(inode, partial);
>> + goal = goal % UINT_MAX;
>> + return goal;
>
> Using "% UINT_MAX" here will result in a 64-bit division on 32-bit
> platforms, since ext4_fsblk_t is declared as an unsigned long long.
> This should instead be "(u32)" or "& 0xffffffff".

whoops good point. I wasn't thinking of 32-bit boxes, thinking they
can't go past 16T but for smaller blocks we still could go past 2^32
blocks... and it is a 64-bit modulo regardless.

>> @@ -1943,6 +1943,11 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> + /* non-extent files are limited to low blocks/groups */
>> + if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
>> + ngroups = min_t(unsigned long, ngroups,
>> + (UINT_MAX / EXT4_BLOCKS_PER_GROUP(sb)));
>
> Since EXT4_BLOCKS_PER_GROUP() is a run-time variable, but is constant
> for the life of the filesystem, this could be computed once and stored
> in the superblock?

ok.

>> +++ b/fs/ext4/xattr.c
>> @@ -810,12 +810,22 @@ inserted:
>> + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
>> + goal = goal % UINT_MAX;
>
> As above.

Thanks for the review, will fix those up.

-Eric

> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2009-09-10 16:02:15

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

Today, the ext4 allocator will happily allocate blocks past
232 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.

This patch limits such allocations to < 232, and adds
WARN_ONs (maybe should be BUG_ONs) if we do get blocks
larger than that.

This should address RH Bug 519471, ext4 bitmap allocator
must limit blocks to < 232

* ext4_find_goal() is modified to choose a goal < UINT_MAX,
so that our starting point is in an acceptable range.

* ext4_xattr_block_set() is modified such that the goal block
is < UINT_MAX, as above.

* ext4_mb_regular_allocator() is modified so that the group
search does not continue into groups which are too high

* ext4_mb_use_preallocated() has a check that we don't use
preallocated space which is too far out

* ext4_alloc_blocks() and ext4_xattr_block_set() add some WARN_ONs

No attempt has been made to limit inode locations to < 232,
so we may wind up with blocks far from their inodes. Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.

For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.

Perhaps an ext4-specific #define would be better than UINT_MAX?

The allocator being what it is, I may have missed some spots,
so I'd welcome review.

Thanks,
-Eric

Signed-off-by: Eric Sandeen <[email protected]>
---

V2: got modulo-happy in ext4_mb_regular_allocator, just limit
ngroups to no more than UINT_MAX.

V3: address some of Andreas' review points
But I think we need some better macro & sb info member names...


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9714db3..1147994 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -386,6 +386,9 @@ struct ext4_mount_options {
#endif
};

+/* Max physical block we can addres w/o extents */
+#define EXT4_MAX_BLOCK_FILE_PHYS 0xFFFFFFFF
+
/*
* Structure of an inode on the disk
*/
@@ -841,6 +844,7 @@ struct ext4_sb_info {
unsigned long s_gdb_count; /* Number of group descriptor blocks */
unsigned long s_desc_per_block; /* Number of group descriptors per block */
ext4_group_t s_groups_count; /* Number of groups in the fs */
+ ext4_group_t s_blockfile_groups;/* Groups acceptable for non-extent files */
unsigned long s_overhead_last; /* Last calculated overhead */
unsigned long s_blocks_last; /* Last seen block count */
loff_t s_bitmap_maxbytes; /* max bytes for bitmap files */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9c642b..f716d49 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -551,15 +551,21 @@ static ext4_fsblk_t ext4_find_near(struct inode *inode, Indirect *ind)
*
* Normally this function find the preferred place for block allocation,
* returns it.
+ * Because this is only used for non-extent files, we limit the block nr
+ * to 32 bits.
*/
static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
Indirect *partial)
{
+ ext4_fsblk_t goal;
+
/*
* XXX need to get goal block from mballoc's data structures
*/

- return ext4_find_near(inode, partial);
+ goal = ext4_find_near(inode, partial);
+ goal = goal & EXT4_MAX_BLOCK_FILE_PHYS;
+ return goal;
}

/**
@@ -640,6 +646,8 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
if (*err)
goto failed_out;

+ WARN_ON(current_block + count > EXT4_MAX_BLOCK_FILE_PHYS);
+
target -= count;
/* allocate blocks for indirect blocks */
while (index < indirect_blks && count) {
@@ -674,6 +682,7 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
ar.flags = EXT4_MB_HINT_DATA;

current_block = ext4_mb_new_blocks(handle, &ar, err);
+ WARN_ON(current_block + ar.len > EXT4_MAX_BLOCK_FILE_PHYS);

if (*err && (target == blks)) {
/*
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index cd25846..b87854b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1943,6 +1943,10 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
sb = ac->ac_sb;
sbi = EXT4_SB(sb);
ngroups = ext4_get_groups_count(sb);
+ /* non-extent files are limited to low blocks/groups */
+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
+ ngroups = sbi->s_blockfile_groups;
+
BUG_ON(ac->ac_status == AC_STATUS_FOUND);

/* first, try the goal */
@@ -3382,6 +3386,11 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
ac->ac_o_ex.fe_logical >= pa->pa_lstart + pa->pa_len)
continue;

+ /* non-extent files can't have physical blocks past 2^32 */
+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL) &&
+ pa->pa_pstart + pa->pa_len > EXT4_MAX_BLOCK_FILE_PHYS)
+ continue;
+
/* found preallocated blocks, use them */
spin_lock(&pa->pa_lock);
if (pa->pa_deleted == 0 && pa->pa_free) {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8f4f079..8dcdded 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2595,6 +2595,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount;
}
sbi->s_groups_count = blocks_count;
+ sbi->s_blockfile_groups = min_t(ext4_group_t, sbi->s_groups_count,
+ (EXT4_MAX_BLOCK_FILE_PHYS / EXT4_BLOCKS_PER_GROUP(sb)));
db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
EXT4_DESC_PER_BLOCK(sb);
sbi->s_group_desc = kmalloc(db_count * sizeof(struct buffer_head *),
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 62b31c2..6bce3f8 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -810,12 +810,23 @@ inserted:
get_bh(new_bh);
} else {
/* We need to allocate a new block */
- ext4_fsblk_t goal = ext4_group_first_block_no(sb,
+ ext4_fsblk_t goal, block;
+
+ goal = ext4_group_first_block_no(sb,
EXT4_I(inode)->i_block_group);
- ext4_fsblk_t block = ext4_new_meta_blocks(handle, inode,
+
+ /* non-extent files can't have physical blocks past 2^32 */
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ goal = goal & EXT4_MAX_BLOCK_FILE_PHYS;
+
+ block = ext4_new_meta_blocks(handle, inode,
goal, NULL, &error);
if (error)
goto cleanup;
+
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ WARN_ON(block > EXT4_MAX_BLOCK_FILE_PHYS);
+
ea_idebug(inode, "creating block %d", block);

new_bh = sb_getblk(sb, block);


2009-09-10 16:53:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

On Thu, Sep 10, 2009 at 11:02:15AM -0500, Eric Sandeen wrote:
> Today, the ext4 allocator will happily allocate blocks past
> 232 for indirect-block files, which results in the block
> numbers getting truncated, and corruption ensues.

Everywhere where you say 232, you mean 2^32, right?

- Ted

2009-09-10 16:56:52

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

Theodore Tso wrote:
> On Thu, Sep 10, 2009 at 11:02:15AM -0500, Eric Sandeen wrote:
>> Today, the ext4 allocator will happily allocate blocks past
>> 232 for indirect-block files, which results in the block
>> numbers getting truncated, and corruption ensues.
>
> Everywhere where you say 232, you mean 2^32, right?

sorry, cut and paste error, yes.

(email client helpfully turned 2^32 into something prettier, but then
didn't copy it right on the resend)

-Eric

2009-09-10 21:10:11

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

On Sep 10, 2009 11:02 -0500, Eric Sandeen wrote:
> This patch limits such allocations to < 232, and adds
> WARN_ONs (maybe should be BUG_ONs) if we do get blocks
> larger than that.

Given that this may corrupt the filesystem (e.g. block
2^32 turning into block 0 and overwriting the superblock)
I think a BUG_ON() is probably more appropriate. This
should only happen with software bugs, so it is more
appropriate than ext4_error() I think.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-09-10 21:16:30

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

Andreas Dilger wrote:
> On Sep 10, 2009 11:02 -0500, Eric Sandeen wrote:
>> This patch limits such allocations to < 232, and adds
>> WARN_ONs (maybe should be BUG_ONs) if we do get blocks
>> larger than that.
>
> Given that this may corrupt the filesystem (e.g. block
> 2^32 turning into block 0 and overwriting the superblock)
> I think a BUG_ON() is probably more appropriate. This
> should only happen with software bugs, so it is more
> appropriate than ext4_error() I think.

Ok, fine by me. I can send an update.

Any suggestions on the naming issues? (what's the official name for a
"not-extent-based-file?")

I ran it a lot through a mkfs/mount/fsstress/unmount/fsck cycle, and all
seemed well. mkfs was without extents, so I was thinking we were in
good shape.

However, Ric just ran a massive fs_mark test on a 60T filesystem that he
created with "mke2fs" (no extents and no journal - accidentally) and we
got no corruption even without this patch.

I need to see if a filesystem w/o the extents feature (at all, vs. some
old-format files on an extents fs) never even tries to allocate past
2^32; I didn't think so, but now not so sure.

I probably need to do more testing ...

-Eric

2009-09-10 21:33:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

On Thu, Sep 10, 2009 at 04:16:32PM -0500, Eric Sandeen wrote:
> Ok, fine by me. I can send an update.
>
> Any suggestions on the naming issues? (what's the official name for a
> "not-extent-based-file?")

What I normally use is "extent-mapped file" and "indirect block mapped
file", or "non-extent-mapped file".

- Ted

2009-09-10 21:42:53

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

Theodore Tso wrote:
> On Thu, Sep 10, 2009 at 04:16:32PM -0500, Eric Sandeen wrote:
>> Ok, fine by me. I can send an update.
>>
>> Any suggestions on the naming issues? (what's the official name for a
>> "not-extent-based-file?")
>
> What I normally use is "extent-mapped file" and "indirect block mapped
> file", or "non-extent-mapped file".

I'll see if that can fit in a macro name nicely :)

Thanks,
-Eric

> - Ted


2009-09-10 21:52:02

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

On Sep 10, 2009 16:16 -0500, Eric Sandeen wrote:
> Any suggestions on the naming issues? (what's the official name for a
> "not-extent-based-file?")

I've always used "block mapped" (i.e. mapped block-by-block) vs.
"extent mapped".

> However, Ric just ran a massive fs_mark test on a 60T filesystem that he
> created with "mke2fs" (no extents and no journal - accidentally) and we
> got no corruption even without this patch.
>
> I need to see if a filesystem w/o the extents feature (at all, vs. some
> old-format files on an extents fs) never even tries to allocate past
> 2^32; I didn't think so, but now not so sure.

Well, it may depend a lot on which inodes are in use. That will set the
goal block, and may prevent any above-16TB allocations. Either you could
fill the bitmaps with 0xff (and zero the free blocks counters, to avoid
problems with mballoc), or actually fill the first 16TB of the filesystem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-09-10 21:57:36

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

Andreas Dilger wrote:
> On Sep 10, 2009 16:16 -0500, Eric Sandeen wrote:
>> Any suggestions on the naming issues? (what's the official name for a
>> "not-extent-based-file?")
>
> I've always used "block mapped" (i.e. mapped block-by-block) vs.
> "extent mapped".
>
>> However, Ric just ran a massive fs_mark test on a 60T filesystem that he
>> created with "mke2fs" (no extents and no journal - accidentally) and we
>> got no corruption even without this patch.
>>
>> I need to see if a filesystem w/o the extents feature (at all, vs. some
>> old-format files on an extents fs) never even tries to allocate past
>> 2^32; I didn't think so, but now not so sure.
>
> Well, it may depend a lot on which inodes are in use. That will set the
> goal block, and may prevent any above-16TB allocations. Either you could

yep, though I had many, many inodes in the high groups ...

Problem is I don't quite trust debugfs etc to get it right, so when I
see < 32 bits, I'm not sure if it's really there, or if the
reporting/debug tool wrapped it ;)

-Eric

2009-09-10 22:01:01

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

On Sep 10, 2009 23:51 +0200, Andreas Dilger wrote:
> Well, it may depend a lot on which inodes are in use. That will set the
> goal block, and may prevent any above-16TB allocations. Either you could
> fill the bitmaps with 0xff (and zero the free blocks counters, to avoid
> problems with mballoc), or actually fill the first 16TB of the filesystem.

Or, just start creating top-level directories until you get one past
16TB and use that for your test... We have a patch for allowing a
goal inode to be specified, and it might make sense to add a mount
option to allow setting the inode goal for testing...

Hey, look, I even posted that patch, I now recall:
http://osdir.com/ml/linux-ext4/2009-06/msg00233.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-09-10 23:19:52

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

On Thu, Sep 10, 2009 at 04:57:38PM -0500, Eric Sandeen wrote:
>
> Problem is I don't quite trust debugfs etc to get it right, so when I
> see < 32 bits, I'm not sure if it's really there, or if the
> reporting/debug tool wrapped it ;)

Debugfs from pu branch definitely does get it right --- I've checked.

- Ted

2009-09-11 14:15:44

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH, RFC V3] ext4: limit block allocations for indirect-block files to < 2^32

Theodore Tso wrote:
> On Thu, Sep 10, 2009 at 04:57:38PM -0500, Eric Sandeen wrote:
>> Problem is I don't quite trust debugfs etc to get it right, so when I
>> see < 32 bits, I'm not sure if it's really there, or if the
>> reporting/debug tool wrapped it ;)
>
> Debugfs from pu branch definitely does get it right --- I've checked.
>
> - Ted

I couldn't get debugfs from pu to even load a large filesystem, odd...

# debugfs/debugfs ../bigfile
debugfs 1.41.9 (22-Aug-2009)
../bigfile: Filesystem too large to use legacy bitmaps while reading
block bitmap

I haven't yet looked into this...

-Eric

2009-09-14 20:03:48

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH, RFC V4] ext4: limit block allocations for indirect-block files to < 2^32

Today, the ext4 allocator will happily allocate blocks past
2^32 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.

This patch limits such allocations to < 2^32, and adds
BUG_ONs if we do get blocks larger than that.

This should address RH Bug 519471, ext4 bitmap allocator
must limit blocks to < 2^32

* ext4_find_goal() is modified to choose a goal < UINT_MAX,
so that our starting point is in an acceptable range.

* ext4_xattr_block_set() is modified such that the goal block
is < UINT_MAX, as above.

* ext4_mb_regular_allocator() is modified so that the group
search does not continue into groups which are too high

* ext4_mb_use_preallocated() has a check that we don't use
preallocated space which is too far out

* ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs

No attempt has been made to limit inode locations to < 2^32,
so we may wind up with blocks far from their inodes. Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.

For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.

The allocator being what it is, I may have missed some spots,
so I'd welcome review.

Thanks,
-Eric

Signed-off-by: Eric Sandeen <[email protected]>
---

V2: got modulo-happy in ext4_mb_regular_allocator, just limit
ngroups to no more than UINT_MAX.

V3: address some of Andreas' review points
But I think we need some better macro & sb info member names...

V4: Change to BUG_ONs per Andreas's further review

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9714db3..1147994 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -386,6 +386,9 @@ struct ext4_mount_options {
#endif
};

+/* Max physical block we can addres w/o extents */
+#define EXT4_MAX_BLOCK_FILE_PHYS 0xFFFFFFFF
+
/*
* Structure of an inode on the disk
*/
@@ -841,6 +844,7 @@ struct ext4_sb_info {
unsigned long s_gdb_count; /* Number of group descriptor blocks */
unsigned long s_desc_per_block; /* Number of group descriptors per block */
ext4_group_t s_groups_count; /* Number of groups in the fs */
+ ext4_group_t s_blockfile_groups;/* Groups acceptable for non-extent files */
unsigned long s_overhead_last; /* Last calculated overhead */
unsigned long s_blocks_last; /* Last seen block count */
loff_t s_bitmap_maxbytes; /* max bytes for bitmap files */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9c642b..9431c8f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -551,15 +551,21 @@ static ext4_fsblk_t ext4_find_near(struct inode *inode, Indirect *ind)
*
* Normally this function find the preferred place for block allocation,
* returns it.
+ * Because this is only used for non-extent files, we limit the block nr
+ * to 32 bits.
*/
static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
Indirect *partial)
{
+ ext4_fsblk_t goal;
+
/*
* XXX need to get goal block from mballoc's data structures
*/

- return ext4_find_near(inode, partial);
+ goal = ext4_find_near(inode, partial);
+ goal = goal & EXT4_MAX_BLOCK_FILE_PHYS;
+ return goal;
}

/**
@@ -640,6 +646,8 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
if (*err)
goto failed_out;

+ BUG_ON(current_block + count > EXT4_MAX_BLOCK_FILE_PHYS);
+
target -= count;
/* allocate blocks for indirect blocks */
while (index < indirect_blks && count) {
@@ -674,6 +682,7 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
ar.flags = EXT4_MB_HINT_DATA;

current_block = ext4_mb_new_blocks(handle, &ar, err);
+ BUG_ON(current_block + ar.len > EXT4_MAX_BLOCK_FILE_PHYS);

if (*err && (target == blks)) {
/*
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index cd25846..b87854b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1943,6 +1943,10 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
sb = ac->ac_sb;
sbi = EXT4_SB(sb);
ngroups = ext4_get_groups_count(sb);
+ /* non-extent files are limited to low blocks/groups */
+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
+ ngroups = sbi->s_blockfile_groups;
+
BUG_ON(ac->ac_status == AC_STATUS_FOUND);

/* first, try the goal */
@@ -3382,6 +3386,11 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
ac->ac_o_ex.fe_logical >= pa->pa_lstart + pa->pa_len)
continue;

+ /* non-extent files can't have physical blocks past 2^32 */
+ if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL) &&
+ pa->pa_pstart + pa->pa_len > EXT4_MAX_BLOCK_FILE_PHYS)
+ continue;
+
/* found preallocated blocks, use them */
spin_lock(&pa->pa_lock);
if (pa->pa_deleted == 0 && pa->pa_free) {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8f4f079..8dcdded 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2595,6 +2595,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount;
}
sbi->s_groups_count = blocks_count;
+ sbi->s_blockfile_groups = min_t(ext4_group_t, sbi->s_groups_count,
+ (EXT4_MAX_BLOCK_FILE_PHYS / EXT4_BLOCKS_PER_GROUP(sb)));
db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
EXT4_DESC_PER_BLOCK(sb);
sbi->s_group_desc = kmalloc(db_count * sizeof(struct buffer_head *),
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 62b31c2..fed5b01 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -810,12 +810,23 @@ inserted:
get_bh(new_bh);
} else {
/* We need to allocate a new block */
- ext4_fsblk_t goal = ext4_group_first_block_no(sb,
+ ext4_fsblk_t goal, block;
+
+ goal = ext4_group_first_block_no(sb,
EXT4_I(inode)->i_block_group);
- ext4_fsblk_t block = ext4_new_meta_blocks(handle, inode,
+
+ /* non-extent files can't have physical blocks past 2^32 */
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ goal = goal & EXT4_MAX_BLOCK_FILE_PHYS;
+
+ block = ext4_new_meta_blocks(handle, inode,
goal, NULL, &error);
if (error)
goto cleanup;
+
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ BUG_ON(block > EXT4_MAX_BLOCK_FILE_PHYS);
+
ea_idebug(inode, "creating block %d", block);

new_bh = sb_getblk(sb, block);


2009-09-16 18:54:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH, RFC V4] ext4: limit block allocations for indirect-block files to < 2^32

Added to the ext4 patch queue, thanks.

- Ted