2009-10-16 14:46:29

by Vyacheslav Dubeyko

[permalink] [raw]
Subject: Patch on algorithm of place allocation for inode tables in mke2fs

Hi all:

Let's try to make raw partition with 8 Gb size by fdisk and then to format it (for example, mkfs.ext4 -b 4096 -L ext4_4K_8G /dev/sdb1). If we have 2098482 block count on volume with 4 Kb block size and flex block group size as 16 then we will have 65 groups on volume. The last group (that has 1329 blocks) will be the first and sole group in last flex group. Current mke2fs code makes such allocation scheme in last group: Block bitmap at 2097152 (+0), Inode bitmap at 2097168 (+16), Inode table at 8626 - 9130. The inode table of the last group will be placed at the volume begin because of we can't allocate sufficient block count for all inode tables in flex group. I offer the patch for mke2fs utility:

diff --git a/lib/ext2fs/alloc_tables.c b/lib/ext2fs/alloc_tables.c
index 8547ad6..4639527 100644
--- a/lib/ext2fs/alloc_tables.c
+++ b/lib/ext2fs/alloc_tables.c
@@ -181,10 +181,15 @@ errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
blk_t prev_block = 0;
if (group && fs->group_desc[group-1].bg_inode_table)
prev_block = fs->group_desc[group-1].bg_inode_table;
+ int requsted_size = 0;
+ if ((group+1) == fs->group_desc_count &&
+ (fs->group_desc_count % flexbg_size) == 1)
+ requsted_size = fs->inode_blocks_per_group;
+ else
+ requsted_size = fs->inode_blocks_per_group * rem_grps;
group_blk = flexbg_offset(fs, group, prev_block, bmap,
flexbg_size * 2,
- fs->inode_blocks_per_group *
- rem_grps,
+ requsted_size,
fs->inode_blocks_per_group);
last_blk = ext2fs_group_last_block(fs, last_grp);
}

The patch resolves problem with allocation of inode table in last group in such situation.

--

Vyacheslav Dubeyko <[email protected]>
Acronis


2009-10-16 18:05:51

by Andreas Dilger

[permalink] [raw]
Subject: Re: Patch on algorithm of place allocation for inode tables in mke2fs

On 16-Oct-09, at 08:45, Vyacheslav Dubeyko wrote:
> Let's try to make raw partition with 8 Gb size by fdisk and then to
> format it (for example, mkfs.ext4 -b 4096 -L ext4_4K_8G /dev/sdb1).
> If we have 2098482 block count on volume with 4 Kb block size and
> flex block group size as 16 then we will have 65 groups on volume.
> The last group (that has 1329 blocks) will be the first and sole
> group in last flex group. Current mke2fs code makes such allocation
> scheme in last group: Block bitmap at 2097152 (+0), Inode bitmap at
> 2097168 (+16), Inode table at 8626 - 9130. The inode table of the
> last group will be placed at the volume begin because of we can't
> allocate sufficient block count for all inode tables in flex group.
> I offer the patch for mke2fs utility:
>
> diff --git a/lib/ext2fs/alloc_tables.c b/lib/ext2fs/alloc_tables.c
> index 8547ad6..4639527 100644
> --- a/lib/ext2fs/alloc_tables.c
> +++ b/lib/ext2fs/alloc_tables.c
> @@ -181,10 +181,15 @@ errcode_t
> ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
> blk_t prev_block = 0;
> if (group && fs->group_desc[group-1].bg_inode_table)
> prev_block = fs->group_desc[group-1].bg_inode_table;
> + int requsted_size = 0;

This C++ style declaration needs to go at the beginning of the function.

> + if ((group+1) == fs->group_desc_count &&
> + (fs->group_desc_count % flexbg_size) == 1)
> + requsted_size = fs->inode_blocks_per_group;
> + else
> + requsted_size = fs->inode_blocks_per_group * rem_grps;
> group_blk = flexbg_offset(fs, group, prev_block, bmap,
> flexbg_size * 2,
> - fs->inode_blocks_per_group *
> - rem_grps,
> + requsted_size,
> fs->inode_blocks_per_group);
> last_blk = ext2fs_group_last_block(fs, last_grp);
> }


I think the rest of this looks reasonable.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-10-19 07:12:47

by Vyacheslav Dubeyko

[permalink] [raw]
Subject: RE: Patch on algorithm of place allocation for inode tables in mke2fs

Hello all,

I correct the patch:

[root@localhost ext2fs]# git diff alloc_tables.c
diff --git a/lib/ext2fs/alloc_tables.c b/lib/ext2fs/alloc_tables.c
index 8547ad6..8ef4b05 100644
--- a/lib/ext2fs/alloc_tables.c
+++ b/lib/ext2fs/alloc_tables.c
@@ -84,7 +84,7 @@ errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
errcode_t retval;
blk_t group_blk, start_blk, last_blk, new_blk, blk;
dgrp_t last_grp = 0;
- int j, rem_grps = 0, flexbg_size = 0;
+ int j, rem_grps = 0, flexbg_size = 0, inode_table_size = 0;

group_blk = ext2fs_group_first_block(fs, group);
last_blk = ext2fs_group_last_block(fs, group);
@@ -181,10 +181,14 @@ errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
blk_t prev_block = 0;
if (group && fs->group_desc[group-1].bg_inode_table)
prev_block = fs->group_desc[group-1].bg_inode_table;
+ if ((group+1) == fs->group_desc_count &&
+ (fs->group_desc_count % flexbg_size) == 1)
+ inode_table_size = fs->inode_blocks_per_group;
+ else
+ inode_table_size = fs->inode_blocks_per_group * rem_grps;
group_blk = flexbg_offset(fs, group, prev_block, bmap,
flexbg_size * 2,
- fs->inode_blocks_per_group *
- rem_grps,
+ inode_table_size,
fs->inode_blocks_per_group);
last_blk = ext2fs_group_last_block(fs, last_grp);
}

--
Vyacheslav Dubeyko <[email protected]>
Acronis

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Andreas Dilger
Sent: Friday, October 16, 2009 10:06 PM
To: Dubeyko, Vyacheslav
Cc: [email protected]
Subject: Re: Patch on algorithm of place allocation for inode tables in mke2fs

On 16-Oct-09, at 08:45, Vyacheslav Dubeyko wrote:
> Let's try to make raw partition with 8 Gb size by fdisk and then to
> format it (for example, mkfs.ext4 -b 4096 -L ext4_4K_8G /dev/sdb1).
> If we have 2098482 block count on volume with 4 Kb block size and flex
> block group size as 16 then we will have 65 groups on volume.
> The last group (that has 1329 blocks) will be the first and sole group
> in last flex group. Current mke2fs code makes such allocation scheme
> in last group: Block bitmap at 2097152 (+0), Inode bitmap at
> 2097168 (+16), Inode table at 8626 - 9130. The inode table of the last
> group will be placed at the volume begin because of we can't allocate
> sufficient block count for all inode tables in flex group.
> I offer the patch for mke2fs utility:
>
> diff --git a/lib/ext2fs/alloc_tables.c b/lib/ext2fs/alloc_tables.c
> index 8547ad6..4639527 100644
> --- a/lib/ext2fs/alloc_tables.c
> +++ b/lib/ext2fs/alloc_tables.c
> @@ -181,10 +181,15 @@ errcode_t
> ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
> blk_t prev_block = 0;
> if (group && fs->group_desc[group-1].bg_inode_table)
> prev_block = fs->group_desc[group-1].bg_inode_table;
> + int requsted_size = 0;

This C++ style declaration needs to go at the beginning of the function.

> + if ((group+1) == fs->group_desc_count &&
> + (fs->group_desc_count % flexbg_size) == 1)
> + requsted_size = fs->inode_blocks_per_group;
> + else
> + requsted_size = fs->inode_blocks_per_group * rem_grps;
> group_blk = flexbg_offset(fs, group, prev_block, bmap,
> flexbg_size * 2,
> - fs->inode_blocks_per_group *
> - rem_grps,
> + requsted_size,
> fs->inode_blocks_per_group);
> last_blk = ext2fs_group_last_block(fs, last_grp);
> }


I think the rest of this looks reasonable.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-11-30 18:32:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Patch on algorithm of place allocation for inode tables in mke2fs

On Fri, Oct 16, 2009 at 06:45:39PM +0400, Vyacheslav Dubeyko wrote:
> Hi all:
>
> Let's try to make raw partition with 8 Gb size by fdisk and then to
> format it (for example, mkfs.ext4 -b 4096 -L ext4_4K_8G
> /dev/sdb1). If we have 2098482 block count on volume with 4 Kb block
> size and flex block group size as 16 then we will have 65 groups on
> volume. The last group (that has 1329 blocks) will be the first and
> sole group in last flex group. Current mke2fs code makes such
> allocation scheme in last group: Block bitmap at 2097152 (+0), Inode
> bitmap at 2097168 (+16), Inode table at 8626 - 9130. The inode table
> of the last group will be placed at the volume begin because of we
> can't allocate sufficient block count for all inode tables in flex
> group.

Hi Vyacheslav,

My apologies for the delay in getting back to you; I've had a very
intense travel schedule in October and November, and some things had
fallen through the cracks.

Thanks for pointing this out! I've decided to use the following patch
as a cleaner and simpler fix for the problem. What it does is to use
the true size for the inode table, instead of the expected size of the
inode table should the file system get resized, to avoid the problem
you've pointed out.

Best regards,

- Ted

commit bbb60e4fefdd404d8d696369804b556b404bb0c1
Author: Theodore Ts'o <[email protected]>
Date: Mon Nov 30 12:24:59 2009 -0500

libext2fs: Improve flex_bg inode table placement algorithm

When trying to find the best place for the inode table in the last
flex block group, use the true size for the flex_bg's portion of the
inode table instead of the worst case required size of the inode table
fragment if the file system is resized. This fixes a corner case
where if the size of the filesystem is just big enough that there is
only room for a single block group in the last flex_bg, and that
partial block group is too small for the full portion of the inode
table, the inode table is placed in the very first block group:

Group 64: (Blocks 2097152-2099199) [INODE_UNINIT, ITABLE_ZEROED]
Checksum 0xd305, unused inodes 8080
Block bitmap at 2097152 (+0), Inode bitmap at 2097168 (+16)
Inode table at 8626-9130 (+4292878770)
^^^^^^^^^

Thanks to Vyacheslav Dubeyko for pointing this out.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/lib/ext2fs/alloc_tables.c b/lib/ext2fs/alloc_tables.c
index 8547ad6..55e6174 100644
--- a/lib/ext2fs/alloc_tables.c
+++ b/lib/ext2fs/alloc_tables.c
@@ -181,6 +181,8 @@ errcode_t ext2fs_allocate_group_table(ext2_filsys fs, dgrp_t group,
blk_t prev_block = 0;
if (group && fs->group_desc[group-1].bg_inode_table)
prev_block = fs->group_desc[group-1].bg_inode_table;
+ if (last_grp == fs->group_desc_count)
+ rem_grps = last_grp - group;
group_blk = flexbg_offset(fs, group, prev_block, bmap,
flexbg_size * 2,
fs->inode_blocks_per_group *