2006-09-08 04:11:54

by Takashi Sato

[permalink] [raw]
Subject: [RFC][1/4] ext2/3/4: enlarge blocksize

Hi all,

On July 7, 2006, sho wrote:
> On Jun 29, 2006, Andreas wrote:
> > On Jun 28, 2006 17:50 +0200, Johann Lombardi wrote:
> > ext2/ext3_dir_entry_2 has a 16-bit entry(rec_len) and it
> > would overflow
> > with 64KB blocksize. This patch prevent from overflow by limiting
> > rec_len to 65532.
> > Having a max rec_len of 65532 is rather unfortunate, since the dir
> > blocks always need to filled with dir entries. 65536 - 65532 = 4,
> > and the minimum ext3_dir_entry size is 8 bytes. I would instead
> > make this maybe 64 bytes less so that there is room for a filename
> > in the "tail" dir_entry.
>
> The fix, adding dummy entry at the tail of a directory block, needs
> to regenerate dummy entry when all of the entries are removed in
> kernel.
> While in e2fsprogs, e2fsck needs to do the same when destroyed by
> some reason. Thus procedures get more complicated.
> Then I updated the patch to limit rec_len to 65532(64K - 4). The
> difference from the previous patch is that the end of a directory
> block is changed to 65532(64K - 4) with 64K blocksize.
> This is more simple and less tweaky. This necessarily makes 4-bytes
> from the end of a directory block useless, but 4-bytes is negligible
> compared to 64KB, who cares?

In response to Mingming's ext4 patches, I updated my patches.
These patches support large blocksize up to PAGESIZE (max 64KB).
NOTE:
They limit the end of a directory block to 65532(64K - 4)
to avoid overflow only when using 64KB block.

The difference from the previous patches is as follows.
- add ext4 support

This patch applies on top of Mingming's patches(against ext3dev-2.6.18-
rc4.patch) which were posted at:
http://ext2.sourceforge.net/48bitext3/patches/latest/

Through mke2fs, ".." entry of root directory is supposed to have rec_len
which is equal to blocksize. So just to fix kernel ends up occurring
the error in ext2_check_page() etc, if 64KB blocksize. Thus I tested
with the provisional fix against e2fsprogs. This patch doesn't include
that fix.

I tested I/O performance with 4K-64K blocksize on ext3.

my box:
models :NX-7700i
CPU type :Itanium2
number of CPU:1
architecture :ia64
memory size :8309152KB
disk size :70007.196(MB)

Results:
blocksize Read(MB/sec) Write(MB/sec)
4K 58.2 59.7
8K 64.3 60.7
16K 66.8 62.2
32K 65.3 60.2
64K 65.4 60.4

I don't know why 16K-blocksize marks the highest numbers. But
without this patch, >4KB blocksize can't be used at least :)

The Patch-set consists of the following 4 patches.
[1/4] ext2/3/4: enlarge blocksize
- Allow blocksize up to pagesize

[2/4] ext2: fix rec_len overflow
- prevent rec_len from overflow with 64KB blocksize

[3/4] ext3: fix rec_len overflow
- ditto

[4/4] ext4: fix rec_len overflow
- ditto


Signed-off-by: Takashi Sato [email protected]
---
diff -upNr -X linux-2.6.18-rc4-mingming/Documentation/dontdiff linux-2.6.18-rc4-mingming/fs/ext2/super.c linux-2.6.18-rc4-mingming-tnes/fs/ext2/super.c
--- linux-2.6.18-rc4-mingming/fs/ext2/super.c 2006-08-07 03:20:11.000000000 +0900
+++ linux-2.6.18-rc4-mingming-tnes/fs/ext2/super.c 2006-09-08 09:00:40.000000000 +0900
@@ -725,7 +725,7 @@ static int ext2_fill_super(struct super_
brelse(bh);

if (!sb_set_blocksize(sb, blocksize)) {
- printk(KERN_ERR "EXT2-fs: blocksize too small for device.\n");
+ printk(KERN_ERR "EXT2-fs: bad blocksize %d.\n", blocksize);
goto failed_sbi;
}

diff -upNr -X linux-2.6.18-rc4-mingming/Documentation/dontdiff linux-2.6.18-rc4-mingming/fs/ext3/super.c linux-2.6.18-rc4-mingming-tnes/fs/ext3/super.c
--- linux-2.6.18-rc4-mingming/fs/ext3/super.c 2006-08-07 03:20:11.000000000 +0900
+++ linux-2.6.18-rc4-mingming-tnes/fs/ext3/super.c 2006-09-08 09:00:02.000000000 +0900
@@ -1486,7 +1486,10 @@ static int ext3_fill_super (struct super
}

brelse (bh);
- sb_set_blocksize(sb, blocksize);
+ if (!sb_set_blocksize(sb, blocksize)) {
+ printk(KERN_ERR "EXT3-fs: bad blocksize %d.\n", blocksize);
+ goto out_fail;
+ }
logic_sb_block = (sb_block * EXT3_MIN_BLOCK_SIZE) / blocksize;
offset = (sb_block * EXT3_MIN_BLOCK_SIZE) % blocksize;
bh = sb_bread(sb, logic_sb_block);
diff -upNr -X linux-2.6.18-rc4-mingming/Documentation/dontdiff linux-2.6.18-rc4-mingming/fs/ext4/super.c linux-2.6.18-rc4-mingming-tnes/fs/ext4/super.c
--- linux-2.6.18-rc4-mingming/fs/ext4/super.c 2006-08-29 16:29:05.000000000 +0900
+++ linux-2.6.18-rc4-mingming-tnes/fs/ext4/super.c 2006-09-08 09:00:45.000000000 +0900
@@ -1497,7 +1497,10 @@ static int ext4_fill_super (struct super
}

brelse (bh);
- sb_set_blocksize(sb, blocksize);
+ if (!sb_set_blocksize(sb, blocksize)) {
+ printk(KERN_ERR "EXT4-fs: bad blocksize %d.\n", blocksize);
+ goto out_fail;
+ }
logic_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
offset = sector_div(logic_sb_block, blocksize);
bh = sb_bread(sb, logic_sb_block);
diff -upNr -X linux-2.6.18-rc4-mingming/Documentation/dontdiff linux-2.6.18-rc4-mingming/include/linux/ext2_fs.h linux-2.6.18-rc4-mingming-tnes-no_compile/include/linux/ext2_fs.h
--- linux-2.6.18-rc4-mingming/include/linux/ext2_fs.h 2006-08-07 03:20:11.000000000 +0900
+++ linux-2.6.18-rc4-mingming-tnes-no_compile/include/linux/ext2_fs.h 2006-09-04 11:26:26.000000000 +0900
@@ -90,8 +90,8 @@ static inline struct ext2_sb_info *EXT2_
* Macro-instructions used to manage several block sizes
*/
#define EXT2_MIN_BLOCK_SIZE 1024
-#define EXT2_MAX_BLOCK_SIZE 4096
-#define EXT2_MIN_BLOCK_LOG_SIZE 10
+#define EXT2_MAX_BLOCK_SIZE 65536
+#define EXT2_MIN_BLOCK_LOG_SIZE 10
#ifdef __KERNEL__
# define EXT2_BLOCK_SIZE(s) ((s)->s_blocksize)
#else
diff -upNr -X linux-2.6.18-rc4-mingming/Documentation/dontdiff linux-2.6.18-rc4-mingming/include/linux/ext3_fs.h linux-2.6.18-rc4-mingming-tnes-no_compile/include/linux/ext3_fs.h
--- linux-2.6.18-rc4-mingming/include/linux/ext3_fs.h 2006-08-07 03:20:11.000000000 +0900
+++ linux-2.6.18-rc4-mingming-tnes-no_compile/include/linux/ext3_fs.h 2006-09-04 11:26:51.000000000 +0900
@@ -80,8 +80,8 @@
* Macro-instructions used to manage several block sizes
*/
#define EXT3_MIN_BLOCK_SIZE 1024
-#define EXT3_MAX_BLOCK_SIZE 4096
-#define EXT3_MIN_BLOCK_LOG_SIZE 10
+#define EXT3_MAX_BLOCK_SIZE 65536
+#define EXT3_MIN_BLOCK_LOG_SIZE 10
#ifdef __KERNEL__
# define EXT3_BLOCK_SIZE(s) ((s)->s_blocksize)
#else
diff -upNr -X linux-2.6.18-rc4-mingming/Documentation/dontdiff linux-2.6.18-rc4-mingming/include/linux/ext4_fs.h linux-2.6.18-rc4-mingming-tnes-no_compile/include/linux/ext4_fs.h
--- linux-2.6.18-rc4-mingming/include/linux/ext4_fs.h 2006-08-29 16:29:05.000000000 +0900
+++ linux-2.6.18-rc4-mingming-tnes-no_compile/include/linux/ext4_fs.h 2006-09-04 11:27:13.000000000 +0900
@@ -81,8 +81,8 @@
* Macro-instructions used to manage several block sizes
*/
#define EXT4_MIN_BLOCK_SIZE 1024
-#define EXT4_MAX_BLOCK_SIZE 4096
-#define EXT4_MIN_BLOCK_LOG_SIZE 10
+#define EXT4_MAX_BLOCK_SIZE 65536
+#define EXT4_MIN_BLOCK_LOG_SIZE 10
#ifdef __KERNEL__
# define EXT4_BLOCK_SIZE(s) ((s)->s_blocksize)
#else


Cheers, sho


2006-09-08 07:01:10

by Mingming Cao

[permalink] [raw]
Subject: Updated ext4 patches for 2.6.18-rc6

Hello,

Just give you all an update about the latest ext4 patches before I leave
for vacation: The latest ext4 patches (clone ext4 + 48bit ext4) is
against 2.6.18-rc6, as usual, could be found at:

http://ext2.sourceforge.net/ext4/patches/latest/

Haven't done series testing yet, but fsx test runs fine a few hours on
ext4dev filesystem mounted with extents:)

change log since last release (2.6.18-rc4)

rebase ext4/jbd2 clone patches to 2.6.18-rc6 (Mingming Cao<[email protected]>)
rename ext3dev to ext4dev (Randy Dunlap <[email protected]>, Mingming Cao <[email protected])
register-ext4dev.patch
+register-jbd2.patch

*comment fixs in extent patch (Randy Dunlap <[email protected]>)
+extents_comment_fix.patch

*change some micro and inline functions to c fuctions(Avantika Mathur<[email protected])
+64bitmetadata_inline_funcs_fix.patch

*change ext4/jbd2 block type from sector_t to unsigned long long. (Mingming Cao<[email protected]>). remove sector_fmt.patch
+ext4_blk_type_from_sector_t_to_ulonglong.patch
+ext4_remove_sector_t_bits_check.patch
+jbd2_blks_type_from_sector_t_to_ull.patch
-sector_fmt.patch

Andrew, you could pull all the patches(in quilt style) from here(a
series of patches)
http://ext2.sourceforge.net/ext4/patches/latest/broken-out/

Shaggy has nicely offered to maintain and forward all these patches from
here while I am out, thanks, Shaggy:)

Thanks,
Mingming


2006-09-08 10:17:38

by Takashi Sato

[permalink] [raw]
Subject: RE: Updated ext4 patches for 2.6.18-rc6

Hi Mingming,

I found a trivial bug in Mingming's ext4 patch titled
"ext4-extents-48bit.patch".

There happens a type conflicting:

+int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, ext4_fsblk_t iblock,

You should fix extern declaration for ext4_ext_get_blocks.


Cheers, sho

2006-09-08 16:13:36

by Alexandre Ratchov

[permalink] [raw]
Subject: Re: Updated ext4 patches for 2.6.18-rc6

On Fri, Sep 08, 2006 at 12:01:08AM -0700, Mingming Cao wrote:
> Hello,
>
> Just give you all an update about the latest ext4 patches before I leave
> for vacation: The latest ext4 patches (clone ext4 + 48bit ext4) is
> against 2.6.18-rc6, as usual, could be found at:
>
> http://ext2.sourceforge.net/ext4/patches/latest/
>
> Haven't done series testing yet, but fsx test runs fine a few hours on
> ext4dev filesystem mounted with extents:)
>
> change log since last release (2.6.18-rc4)
>
> rebase ext4/jbd2 clone patches to 2.6.18-rc6 (Mingming Cao<[email protected]>)
> rename ext3dev to ext4dev (Randy Dunlap <[email protected]>, Mingming Cao <[email protected])
> register-ext4dev.patch
> +register-jbd2.patch
>
> *comment fixs in extent patch (Randy Dunlap <[email protected]>)
> +extents_comment_fix.patch
>
> *change some micro and inline functions to c fuctions(Avantika Mathur<[email protected])
> +64bitmetadata_inline_funcs_fix.patch
>
> *change ext4/jbd2 block type from sector_t to unsigned long long. (Mingming Cao<[email protected]>). remove sector_fmt.patch
> +ext4_blk_type_from_sector_t_to_ulonglong.patch
> +ext4_remove_sector_t_bits_check.patch
> +jbd2_blks_type_from_sector_t_to_ull.patch
> -sector_fmt.patch
>
> Andrew, you could pull all the patches(in quilt style) from here(a
> series of patches)
> http://ext2.sourceforge.net/ext4/patches/latest/broken-out/
>
> Shaggy has nicely offered to maintain and forward all these patches from
> here while I am out, thanks, Shaggy:)
>

hi,

there are 2 more patches:

* ext4_remove_relative_block_numbers:

use 48bit absolute block numbers instead of mixed relative/absolute block
numbers. This is simpler and seems to fix issues with large file systems.

* ext4_allow_larger_descriptor_size:

allow larger block group descriptors: this patch will allow to add new
features that need more space in the block descriptor.

here is the complete patch set:

http://www.bullopensource.org/ext4/20060908/ext4-linux-2.6.18-rc6.tar.gz

there's also a patch set for the latest e2fsprogs that is in sync with the
kernel patches:

http://www.bullopensource.org/ext4/20060908/ext4-e2fsprogs-1.39.tar.gz

cheers,

-- Alexandre

2006-09-08 16:15:35

by Alexandre Ratchov

[permalink] [raw]
Subject: [patch 1/2] Re: Updated ext4 patches for 2.6.18-rc6

Index: linux-2.6.18-rc6/include/linux/ext4_fs.h
===================================================================
--- linux-2.6.18-rc6.orig/include/linux/ext4_fs.h 2006-09-08 14:37:44.000000000 +0200
+++ linux-2.6.18-rc6/include/linux/ext4_fs.h 2006-09-08 14:38:02.000000000 +0200
@@ -132,34 +132,16 @@ struct ext4_group_desc
__le16 bg_free_blocks_count; /* Free blocks count */
__le16 bg_free_inodes_count; /* Free inodes count */
__le16 bg_used_dirs_count; /* Directories count */
- __u16 bg_pad;
- __le32 bg_reserved[3];
+ __u16 bg_flags; /* reserved for fsck */
+ __le16 bg_block_bitmap_hi; /* Blocks bitmap block MSB */
+ __le16 bg_inode_bitmap_hi; /* Inodes bitmap block MSB */
+ __le16 bg_inode_table_hi; /* Inodes table block MSB */
+ __u16 bg_reserved[3];
};

#ifdef __KERNEL__
#include <linux/ext4_fs_i.h>
#include <linux/ext4_fs_sb.h>
-
-#define EXT4_BLOCK_BITMAP(bg, group_base) \
- ext4_relative_decode(group_base, le32_to_cpu((bg)->bg_block_bitmap))
-#define EXT4_INODE_BITMAP(bg, group_base) \
- ext4_relative_decode(group_base, le32_to_cpu((bg)->bg_inode_bitmap))
-#define EXT4_INODE_TABLE(bg, group_base) \
- ext4_relative_decode(group_base, le32_to_cpu((bg)->bg_inode_table))
-
-#define EXT4_BLOCK_BITMAP_SET(bg, group_base, value) \
- do {(bg)->bg_block_bitmap = ext4_relative_encode(group_base, value);} while(0)
-#define EXT4_INODE_BITMAP_SET(bg, group_base, value) \
- do {(bg)->bg_inode_bitmap = ext4_relative_encode(group_base, value);} while(0)
-#define EXT4_INODE_TABLE_SET(bg, group_base, value) \
- do {(bg)->bg_inode_table = ext4_relative_encode(group_base, value);} while(0)
-
-#define EXT4_IS_USED_BLOCK_BITMAP(bg) \
- ((bg)->bg_block_bitmap != 0)
-#define EXT4_IS_USED_INODE_BITMAP(bg) \
- ((bg)->bg_inode_bitmap != 0)
-#define EXT4_IS_USED_INODE_TABLE(bg) \
- ((bg)->bg_inode_table != 0)
#endif
/*
* Macro-instructions used to manage group descriptors
@@ -223,9 +205,9 @@ struct ext4_group_desc
/* Used to pass group descriptor data when online resize is done */
struct ext4_new_group_input {
__u32 group; /* Group number for this data */
- __u32 block_bitmap; /* Absolute block number of block bitmap */
- __u32 inode_bitmap; /* Absolute block number of inode bitmap */
- __u32 inode_table; /* Absolute block number of inode table start */
+ __u64 block_bitmap; /* Absolute block number of block bitmap */
+ __u64 inode_bitmap; /* Absolute block number of inode bitmap */
+ __u64 inode_table; /* Absolute block number of inode table start */
__u32 blocks_count; /* Total number of blocks in this group */
__u16 reserved_blocks; /* Number of reserved blocks in this group */
__u16 unused;
@@ -234,9 +216,9 @@ struct ext4_new_group_input {
/* The struct ext4_new_group_input in kernel space, with free_blocks_count */
struct ext4_new_group_data {
__u32 group;
- __u32 block_bitmap;
- __u32 inode_bitmap;
- __u32 inode_table;
+ __u64 block_bitmap;
+ __u64 inode_bitmap;
+ __u64 inode_table;
__u32 blocks_count;
__u16 reserved_blocks;
__u16 unused;
@@ -911,8 +893,12 @@ extern void ext4_warning (struct super_b
extern void ext4_update_dynamic_rev (struct super_block *sb);
extern ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es);
extern ext4_fsblk_t ext4_r_blocks_count(struct ext4_super_block *es);
-extern u32 ext4_relative_encode(ext4_fsblk_t group_base, ext4_fsblk_t fs_block);
-extern ext4_fsblk_t ext4_relative_decode(ext4_fsblk_t group_base, u32 gdp_block);
+extern ext4_fsblk_t ext4_block_bitmap(struct ext4_group_desc *bg);
+extern ext4_fsblk_t ext4_inode_bitmap(struct ext4_group_desc *bg);
+extern ext4_fsblk_t ext4_inode_table(struct ext4_group_desc *bg);
+extern void ext4_block_bitmap_set(struct ext4_group_desc *bg, ext4_fsblk_t blk);
+extern void ext4_inode_bitmap_set(struct ext4_group_desc *bg, ext4_fsblk_t blk);
+extern void ext4_inode_table_set(struct ext4_group_desc *bg, ext4_fsblk_t blk);

#define ext4_std_error(sb, errno) \
do { \
Index: linux-2.6.18-rc6/fs/ext4/balloc.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/balloc.c 2006-09-08 14:37:48.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/balloc.c 2006-09-08 14:42:54.000000000 +0200
@@ -87,16 +87,13 @@ read_block_bitmap(struct super_block *sb
desc = ext4_get_group_desc (sb, block_group, NULL);
if (!desc)
goto error_out;
- bh = sb_bread(sb,
- EXT4_BLOCK_BITMAP(desc,
- ext4_group_first_block_no(sb, block_group)));
+ bh = sb_bread(sb, ext4_block_bitmap(desc));
if (!bh)
ext4_error (sb, "read_block_bitmap",
"Cannot read block bitmap - "
"block_group = %d, block_bitmap = %llu",
block_group,
- EXT4_BLOCK_BITMAP(desc,
- ext4_group_first_block_no(sb, block_group)));
+ ext4_block_bitmap(desc));
error_out:
return bh;
}
@@ -338,7 +335,7 @@ void ext4_free_blocks_sb(handle_t *handl
goto error_return;
}

- ext4_debug ("freeing block(s) %lu-%lu\n", block, block + count - 1);
+ ext4_debug ("freeing block(s) %llu-%llu\n", block, block + count - 1);

do_more:
overflow = 0;
@@ -359,20 +356,10 @@ do_more:
if (!desc)
goto error_return;

- if (in_range (EXT4_BLOCK_BITMAP(desc,
- ext4_group_first_block_no(sb, block_group)),
- block, count) ||
- in_range (EXT4_INODE_BITMAP(desc,
- ext4_group_first_block_no(sb, block_group)),
- block, count) ||
- in_range (block,
- EXT4_INODE_TABLE(desc,
- ext4_group_first_block_no(sb, block_group)),
- sbi->s_itb_per_group) ||
- in_range (block + count - 1,
- EXT4_INODE_TABLE(desc,
- ext4_group_first_block_no(sb, block_group)),
- sbi->s_itb_per_group))
+ if (in_range(ext4_block_bitmap(desc), block, count) ||
+ in_range(ext4_inode_bitmap(desc), block, count) ||
+ in_range(block, ext4_inode_table(desc), sbi->s_itb_per_group) ||
+ in_range(block + count - 1, ext4_inode_table(desc), sbi->s_itb_per_group))
ext4_error (sb, "ext4_free_blocks",
"Freeing blocks in system zones - "
"Block = %llu, count = %lu",
@@ -1372,16 +1359,12 @@ allocated:

ret_block = grp_alloc_blk + ext4_group_first_block_no(sb, group_no);

- if (in_range(EXT4_BLOCK_BITMAP(gdp, ext4_group_first_block_no(sb, group_no)),
- ret_block, num) ||
- in_range(EXT4_BLOCK_BITMAP(gdp, ext4_group_first_block_no(sb, group_no)),
- ret_block, num) ||
- in_range(ret_block, EXT4_INODE_TABLE(gdp,
- ext4_group_first_block_no(sb, group_no)),
- EXT4_SB(sb)->s_itb_per_group) ||
- in_range(ret_block + num - 1, EXT4_INODE_TABLE(gdp,
- ext4_group_first_block_no(sb, group_no)),
- EXT4_SB(sb)->s_itb_per_group))
+ if (in_range(ext4_block_bitmap(gdp), ret_block, num) ||
+ in_range(ext4_block_bitmap(gdp), ret_block, num) ||
+ in_range(ret_block, ext4_inode_table(gdp),
+ EXT4_SB(sb)->s_itb_per_group) ||
+ in_range(ret_block + num - 1, ext4_inode_table(gdp),
+ EXT4_SB(sb)->s_itb_per_group))
ext4_error(sb, "ext4_new_block",
"Allocating block in system zone - "
"blocks from %llu, length %lu",
Index: linux-2.6.18-rc6/fs/ext4/resize.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/resize.c 2006-09-08 14:37:48.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/resize.c 2006-09-08 14:38:02.000000000 +0200
@@ -829,12 +829,9 @@ int ext4_group_add(struct super_block *s
/* Update group descriptor block for new group */
gdp = (struct ext4_group_desc *)primary->b_data + gdb_off;

- EXT4_BLOCK_BITMAP_SET(gdp, ext4_group_first_block_no(sb, gdb_num),
- input->block_bitmap); /* LV FIXME */
- EXT4_INODE_BITMAP_SET(gdp, ext4_group_first_block_no(sb, gdb_num),
- input->inode_bitmap); /* LV FIXME */
- EXT4_INODE_TABLE_SET(gdp, ext4_group_first_block_no(sb, gdb_num),
- input->inode_table); /* LV FIXME */
+ ext4_block_bitmap_set(gdp, input->block_bitmap); /* LV FIXME */
+ ext4_inode_bitmap_set(gdp, input->inode_bitmap); /* LV FIXME */
+ ext4_inode_table_set(gdp, input->inode_table); /* LV FIXME */
gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count);
gdp->bg_free_inodes_count = cpu_to_le16(EXT4_INODES_PER_GROUP(sb));

Index: linux-2.6.18-rc6/fs/ext4/ialloc.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/ialloc.c 2006-09-08 14:37:44.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/ialloc.c 2006-09-08 14:38:02.000000000 +0200
@@ -60,14 +60,12 @@ read_inode_bitmap(struct super_block * s
if (!desc)
goto error_out;

- bh = sb_bread(sb, EXT4_INODE_BITMAP(desc,
- ext4_group_first_block_no(sb, block_group)));
+ bh = sb_bread(sb, ext4_inode_bitmap(desc));
if (!bh)
ext4_error(sb, "read_inode_bitmap",
"Cannot read inode bitmap - "
"block_group = %lu, inode_bitmap = %llu",
- block_group, EXT4_INODE_BITMAP(desc,
- ext4_group_first_block_no(sb, block_group)));
+ block_group, ext4_inode_bitmap(desc));
error_out:
return bh;
}
Index: linux-2.6.18-rc6/fs/ext4/inode.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/inode.c 2006-09-08 14:37:54.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/inode.c 2006-09-08 14:38:02.000000000 +0200
@@ -2434,8 +2434,7 @@ static ext4_fsblk_t ext4_get_inode_block
*/
offset = ((ino - 1) % EXT4_INODES_PER_GROUP(sb)) *
EXT4_INODE_SIZE(sb);
- block = EXT4_INODE_TABLE((gdp+desc),
- ext4_group_first_block_no(sb, block_group)) +
+ block = ext4_inode_table(gdp + desc) +
(offset >> EXT4_BLOCK_SIZE_BITS(sb));

iloc->block_group = block_group;
@@ -2502,10 +2501,8 @@ static int __ext4_get_inode_loc(struct i
if (!desc)
goto make_io;

- bitmap_bh = sb_getblk(inode->i_sb,
- EXT4_INODE_BITMAP(desc,
- ext4_group_first_block_no(inode->i_sb,
- block_group)));
+ bitmap_bh = sb_getblk(inode->i_sb,
+ ext4_inode_bitmap(desc));
if (!bitmap_bh)
goto make_io;

Index: linux-2.6.18-rc6/fs/ext4/super.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/super.c 2006-09-08 14:37:44.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/super.c 2006-09-08 14:56:00.000000000 +0200
@@ -74,31 +74,41 @@ ext4_fsblk_t ext4_r_blocks_count(struct
(__u64)le32_to_cpu(es->s_r_blocks_count));
}

-u32 ext4_relative_encode(ext4_fsblk_t group_base, ext4_fsblk_t fs_block)
-{
- s32 gdp_block;
-
- if (fs_block < (1ULL<<32) && group_base < (1ULL<<32))
- return fs_block;
-
- gdp_block = (fs_block - group_base);
- BUG_ON ((group_base + gdp_block) != fs_block);
-
- return gdp_block;
-}
-
-ext4_fsblk_t ext4_relative_decode(ext4_fsblk_t group_base, u32 gdp_block)
-{
- if (group_base >= (1ULL<<32))
- return group_base + (s32) gdp_block;
-
- if ((s32) gdp_block >= 0 && gdp_block < group_base &&
- group_base + gdp_block >= (1ULL<<32))
- return group_base + gdp_block;
-
- return gdp_block;
-}
-
+ext4_fsblk_t ext4_block_bitmap(struct ext4_group_desc *bg)
+{
+ return le32_to_cpu(bg->bg_block_bitmap) |
+ ((ext4_fsblk_t)le32_to_cpu(bg->bg_block_bitmap_hi) << 32);
+}
+
+ext4_fsblk_t ext4_inode_bitmap(struct ext4_group_desc *bg)
+{
+ return le32_to_cpu(bg->bg_inode_bitmap) |
+ ((ext4_fsblk_t)le32_to_cpu(bg->bg_inode_bitmap_hi) << 32);
+}
+
+ext4_fsblk_t ext4_inode_table(struct ext4_group_desc *bg)
+{
+ return le32_to_cpu(bg->bg_inode_table) |
+ ((ext4_fsblk_t)le32_to_cpu(bg->bg_inode_table_hi) << 32);
+}
+
+void ext4_block_bitmap_set(struct ext4_group_desc *bg, ext4_fsblk_t blk)
+{
+ bg->bg_block_bitmap = cpu_to_le32((u32)blk);
+ bg->bg_block_bitmap_hi = cpu_to_le32(blk >> 32);
+}
+
+void ext4_inode_bitmap_set(struct ext4_group_desc *bg, ext4_fsblk_t blk)
+{
+ bg->bg_inode_bitmap = cpu_to_le32((u32)blk);
+ bg->bg_inode_bitmap_hi = cpu_to_le32(blk >> 32);
+}
+
+void ext4_inode_table_set(struct ext4_group_desc *bg, ext4_fsblk_t blk)
+{
+ bg->bg_inode_table = cpu_to_le32((u32)blk);
+ bg->bg_inode_table_hi = cpu_to_le32(blk >> 32);
+}

static void ext4_free_blocks_count_set(struct ext4_super_block *es, __u32 v)
{
@@ -1194,41 +1204,32 @@ static int ext4_check_descriptors (struc
if ((i % EXT4_DESC_PER_BLOCK(sb)) == 0)
gdp = (struct ext4_group_desc *)
sbi->s_group_desc[desc_block++]->b_data;
- if (EXT4_BLOCK_BITMAP(gdp, ext4_group_first_block_no(sb, i)) <
- block ||
- EXT4_BLOCK_BITMAP(gdp, ext4_group_first_block_no(sb, i)) >=
- block + EXT4_BLOCKS_PER_GROUP(sb))
+ if (ext4_block_bitmap(gdp) < block ||
+ ext4_block_bitmap(gdp) >= block + EXT4_BLOCKS_PER_GROUP(sb))
{
ext4_error (sb, "ext4_check_descriptors",
"Block bitmap for group %d"
- " not in group (block %lu)!",
- i, (unsigned long)
- EXT4_BLOCK_BITMAP(gdp, ext4_group_first_block_no(sb, i)));
+ " not in group (block %llu)!",
+ i, ext4_block_bitmap(gdp));
return 0;
}
- if (EXT4_INODE_BITMAP(gdp, ext4_group_first_block_no(sb, i)) <
- block ||
- EXT4_INODE_BITMAP(gdp, ext4_group_first_block_no(sb, i)) >=
- block + EXT4_BLOCKS_PER_GROUP(sb))
+ if (ext4_inode_bitmap(gdp) < block ||
+ ext4_inode_bitmap(gdp) >= block + EXT4_BLOCKS_PER_GROUP(sb))
{
ext4_error (sb, "ext4_check_descriptors",
"Inode bitmap for group %d"
- " not in group (block %lu)!",
- i, (unsigned long)
- EXT4_INODE_BITMAP(gdp, ext4_group_first_block_no(sb, i)));
+ " not in group (block %llu)!",
+ i, ext4_inode_bitmap(gdp));
return 0;
}
- if (EXT4_INODE_TABLE(gdp, ext4_group_first_block_no(sb, i)) <
- block ||
- EXT4_INODE_TABLE(gdp, ext4_group_first_block_no(sb, i)) +
- sbi->s_itb_per_group >=
+ if (ext4_inode_table(gdp) < block ||
+ ext4_inode_table(gdp) + sbi->s_itb_per_group >=
block + EXT4_BLOCKS_PER_GROUP(sb))
{
ext4_error (sb, "ext4_check_descriptors",
"Inode table for group %d"
- " not in group (block %lu)!",
- i, (unsigned long)
- EXT4_INODE_TABLE(gdp, ext4_group_first_block_no(sb, i)));
+ " not in group (block %llu)!",
+ i, ext4_inode_table(gdp));
return 0;
}
block += EXT4_BLOCKS_PER_GROUP(sb);

2006-09-08 16:18:14

by Alexandre Ratchov

[permalink] [raw]
Subject: [patch 2/2] Re: Updated ext4 patches for 2.6.18-rc6

Index: linux-2.6.18-rc6/fs/ext4/balloc.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/balloc.c 2006-09-08 18:29:57.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/balloc.c 2006-09-08 18:30:17.000000000 +0200
@@ -66,10 +66,12 @@ struct ext4_group_desc * ext4_get_group_
return NULL;
}

- desc = (struct ext4_group_desc *) sbi->s_group_desc[group_desc]->b_data;
+ desc = (struct ext4_group_desc *)(
+ (__u8 *)sbi->s_group_desc[group_desc]->b_data +
+ offset * EXT4_DESC_SIZE(sb));
if (bh)
*bh = sbi->s_group_desc[group_desc];
- return desc + offset;
+ return desc;
}

/*
Index: linux-2.6.18-rc6/fs/ext4/inode.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/inode.c 2006-09-08 18:29:57.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/inode.c 2006-09-08 18:34:34.000000000 +0200
@@ -2428,14 +2428,16 @@ static ext4_fsblk_t ext4_get_inode_block
return 0;
}

- gdp = (struct ext4_group_desc *)bh->b_data;
+ gdp = (struct ext4_group_desc *)((__u8 *)bh->b_data +
+ desc * EXT4_DESC_SIZE(sb));
/*
* Figure out the offset within the block group inode table
*/
offset = ((ino - 1) % EXT4_INODES_PER_GROUP(sb)) *
EXT4_INODE_SIZE(sb);
- block = ext4_inode_table(gdp + desc) +
- (offset >> EXT4_BLOCK_SIZE_BITS(sb));
+ block = ext4_inode_table(gdp) + (offset >> EXT4_BLOCK_SIZE_BITS(sb));
+
+

iloc->block_group = block_group;
iloc->offset = offset & (EXT4_BLOCK_SIZE(sb) - 1);
Index: linux-2.6.18-rc6/include/linux/ext4_fs.h
===================================================================
--- linux-2.6.18-rc6.orig/include/linux/ext4_fs.h 2006-09-08 18:29:57.000000000 +0200
+++ linux-2.6.18-rc6/include/linux/ext4_fs.h 2006-09-08 18:30:17.000000000 +0200
@@ -146,6 +146,9 @@ struct ext4_group_desc
/*
* Macro-instructions used to manage group descriptors
*/
+#define EXT4_MIN_DESC_SIZE 32
+#define EXT4_MAX_DESC_SIZE EXT4_MIN_BLOCK_SIZE
+#define EXT4_DESC_SIZE(s) (EXT4_SB(s)->s_desc_size)
#ifdef __KERNEL__
# define EXT4_BLOCKS_PER_GROUP(s) (EXT4_SB(s)->s_blocks_per_group)
# define EXT4_DESC_PER_BLOCK(s) (EXT4_SB(s)->s_desc_per_block)
@@ -153,7 +156,7 @@ struct ext4_group_desc
# define EXT4_DESC_PER_BLOCK_BITS(s) (EXT4_SB(s)->s_desc_per_block_bits)
#else
# define EXT4_BLOCKS_PER_GROUP(s) ((s)->s_blocks_per_group)
-# define EXT4_DESC_PER_BLOCK(s) (EXT4_BLOCK_SIZE(s) / sizeof (struct ext4_group_desc))
+# define EXT4_DESC_PER_BLOCK(s) (EXT4_BLOCK_SIZE(s) / EXT4_DESC_SIZE(s))
# define EXT4_INODES_PER_GROUP(s) ((s)->s_inodes_per_group)
#endif

@@ -461,7 +464,7 @@ struct ext4_super_block {
* things it doesn't understand...
*/
__le32 s_first_ino; /* First non-reserved inode */
- __le16 s_inode_size; /* size of inode structure */
+ __le16 s_inode_size; /* size of inode structure */
__le16 s_block_group_nr; /* block group # of this superblock */
__le32 s_feature_compat; /* compatible feature set */
/*60*/ __le32 s_feature_incompat; /* incompatible feature set */
@@ -487,7 +490,7 @@ struct ext4_super_block {
__le32 s_hash_seed[4]; /* HTREE hash seed */
__u8 s_def_hash_version; /* Default hash version to use */
__u8 s_reserved_char_pad;
- __u16 s_reserved_word_pad;
+ __le16 s_desc_size; /* size of group descriptor */
/*100*/ __le32 s_default_mount_opts;
__le32 s_first_meta_bg; /* First metablock block group */
__le32 s_mkfs_time; /* When the filesystem was created */
Index: linux-2.6.18-rc6/fs/ext4/super.c
===================================================================
--- linux-2.6.18-rc6.orig/fs/ext4/super.c 2006-09-08 18:29:57.000000000 +0200
+++ linux-2.6.18-rc6/fs/ext4/super.c 2006-09-08 18:30:17.000000000 +0200
@@ -1233,7 +1233,8 @@ static int ext4_check_descriptors (struc
return 0;
}
block += EXT4_BLOCKS_PER_GROUP(sb);
- gdp++;
+ gdp = (struct ext4_group_desc *)
+ ((__u8 *)gdp + EXT4_DESC_SIZE(sb));
}

ext4_free_blocks_count_set(sbi->s_es, ext4_count_free_blocks(sb));
@@ -1585,7 +1586,18 @@ static int ext4_fill_super (struct super
sbi->s_frag_size, blocksize);
goto failed_mount;
}
- sbi->s_frags_per_block = 1;
+ sbi->s_desc_size = le16_to_cpu(es->s_desc_size);
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_64BIT)) {
+ if (sbi->s_desc_size < EXT4_MIN_DESC_SIZE ||
+ sbi->s_desc_size > EXT4_MAX_DESC_SIZE ||
+ sbi->s_desc_size & (sbi->s_desc_size - 1)) {
+ printk(KERN_ERR
+ "EXT4-fs: unsupported descriptor size %d\n",
+ sbi->s_desc_size);
+ goto failed_mount;
+ }
+ } else
+ sbi->s_desc_size = EXT4_MIN_DESC_SIZE;
sbi->s_blocks_per_group = le32_to_cpu(es->s_blocks_per_group);
sbi->s_frags_per_group = le32_to_cpu(es->s_frags_per_group);
sbi->s_inodes_per_group = le32_to_cpu(es->s_inodes_per_group);
@@ -1596,7 +1608,7 @@ static int ext4_fill_super (struct super
goto cantfind_ext4;
sbi->s_itb_per_group = sbi->s_inodes_per_group /
sbi->s_inodes_per_block;
- sbi->s_desc_per_block = blocksize / sizeof(struct ext4_group_desc);
+ sbi->s_desc_per_block = blocksize / EXT4_DESC_SIZE(sb);
sbi->s_sbh = bh;
sbi->s_mount_state = le16_to_cpu(es->s_state);
sbi->s_addr_per_block_bits = log2(EXT4_ADDR_PER_BLOCK(sb));
Index: linux-2.6.18-rc6/include/linux/ext4_fs_sb.h
===================================================================
--- linux-2.6.18-rc6.orig/include/linux/ext4_fs_sb.h 2006-09-08 18:29:57.000000000 +0200
+++ linux-2.6.18-rc6/include/linux/ext4_fs_sb.h 2006-09-08 18:30:17.000000000 +0200
@@ -29,6 +29,7 @@
*/
struct ext4_sb_info {
unsigned long s_frag_size; /* Size of a fragment in bytes */
+ unsigned long s_desc_size; /* Size of a group descriptor in bytes */
unsigned long s_frags_per_block;/* Number of fragments per block */
unsigned long s_inodes_per_block;/* Number of inodes per block */
unsigned long s_frags_per_group;/* Number of fragments in a group */

2006-09-08 18:09:36

by Andreas Dilger

[permalink] [raw]
Subject: Re: Updated ext4 patches for 2.6.18-rc6

On Sep 08, 2006 18:13 +0200, Alexandre Ratchov wrote:
> there are 2 more patches:
>
> * ext4_remove_relative_block_numbers:
>
> use 48bit absolute block numbers instead of mixed relative/absolute block
> numbers. This is simpler and seems to fix issues with large file systems.
>
> * ext4_allow_larger_descriptor_size:
>
> allow larger block group descriptors: this patch will allow to add new
> features that need more space in the block descriptor.

Hmm, I'm a bit confused. If we are adding larger block group descriptors,
why wouldn't we put 32-bit "high" block numbers into the larger descriptor
space? That could be part of the INCOMPAT_64BIT support.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


2007-06-21 22:34:47

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC][1/4] ext2/3/4: enlarge blocksize

On Sep 08, 2006 13:11 +0900, [email protected] wrote:
> On July 7, 2006, sho wrote:
> These patches support large blocksize up to PAGESIZE (max 64KB).
> NOTE:
> They limit the end of a directory block to 65532(64K - 4)
> to avoid overflow only when using 64KB block.

Takashi, in light of the (very exciting ;-) patches to handle 64kB
PAGE_SIZE on x86 systems, could you please update the large blocksize
patch to the latest ext4 tree?

> diff -upNr -X linux-2.6.18-rc4-mingming/Documentation/dontdiff linux-2.6.18-rc4-mingming/fs/ext2/super.c linux-2.6.18-rc4-mingming-tnes/fs/ext2/super.c
> --- linux-2.6.18-rc4-mingming/fs/ext2/super.c 2006-08-07 03:20:11.000000000 +0900
> +++ linux-2.6.18-rc4-mingming-tnes/fs/ext2/super.c 2006-09-08 09:00:40.000000000 +0900
> @@ -725,7 +725,7 @@ static int ext2_fill_super(struct super_
> brelse(bh);
>
> if (!sb_set_blocksize(sb, blocksize)) {
> - printk(KERN_ERR "EXT2-fs: blocksize too small for device.\n");
> + printk(KERN_ERR "EXT2-fs: bad blocksize %d.\n", blocksize);
> goto failed_sbi;
> }

We need a check in ext2 (like ext3/ext4) to ensure that blocksize <
EXT2_MAX_BLOCK_SIZE. It could be increased to 32768 without danger
I think, only the directory problem prevents it from working with 65536.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.