2009-02-10 03:56:33

by Wei Yongjun

[permalink] [raw]
Subject: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

The rec_len field in the directory entry is 16 bits, so if the
filesystem is completely empty, rec_len of 0 is used to designate
65536, for the case where the directory entry takes the entire 64k
block.

But if empty directory block is read, error message will be output
by current kernel. You can do the following commands to reproduct it.

> mkfs.ext4 -b $(( 64 * 1024 )) /dev/sda1
> mount -t ext4 /dev/sda1 /mnt
> cd /mnt/lost+found
> ll
> tail /var/log/messages
EXT4-fs error (device sdc1): ext4_readdir: bad entry in \
directory #11: rec_len is smaller than minimal - offset=0, \
inode=0, rec_len=0, name_len=0

This patch fix to treat rec_len of 0 as 65536, like what e2fsprogs do.

Signed-off-by: Wei Yongjun <[email protected]>
---
fs/ext4/ext4.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index aafc9eb..b0c87dc 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -868,7 +868,7 @@ static inline unsigned ext4_rec_len_from_disk(__le16 dlen)
{
unsigned len = le16_to_cpu(dlen);

- if (len == EXT4_MAX_REC_LEN)
+ if (len == EXT4_MAX_REC_LEN || len == 0)
return 1 << 16;
return len;
}
--
1.5.3.8





2009-02-10 08:26:45

by Wei Yongjun

[permalink] [raw]
Subject: [PATCHv2] ext4: Fix support for empty directory blocks in 64k blocksize filesystems

The rec_len field in the directory entry is 16 bits, so if the
filesystem is completely empty, rec_len of 0 is used to designate
65536 in e2fsprogs, for the case where the directory entry takes
the entire 64k block.

But if empty directory is read, error message will be output by
current kernel. You can do the following commands to reproduct it.

- mkfs.ext4 -b $(( 64 * 1024 )) /dev/sdc1
- mount -t ext4 /dev/sda1 /mnt
- cd /mnt/lost+found
- ll
- tail /var/log/messages
EXT4-fs error (device sdc1): ext4_readdir: bad entry in \
directory #11: rec_len is smaller than minimal - offset=0, \
inode=0, rec_len=0, name_len=0

This patch fix to treat rec_len of 0 as 65536, like what e2fsprogs do.

Signed-off-by: Wei Yongjun <[email protected]>
---
fs/ext4/ext4.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index aafc9eb..c64ab94 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -868,7 +868,7 @@ static inline unsigned ext4_rec_len_from_disk(__le16 dlen)
{
unsigned len = le16_to_cpu(dlen);

- if (len == EXT4_MAX_REC_LEN)
+ if (len == EXT4_MAX_REC_LEN || len == 0)
return 1 << 16;
return len;
}
@@ -876,7 +876,7 @@ static inline unsigned ext4_rec_len_from_disk(__le16 dlen)
static inline __le16 ext4_rec_len_to_disk(unsigned len)
{
if (len == (1 << 16))
- return cpu_to_le16(EXT4_MAX_REC_LEN);
+ return cpu_to_le16(0);
else if (len > (1 << 16))
BUG();
return cpu_to_le16(len);
--
1.5.3.8




2009-02-10 18:21:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

This is a good patch, but the patch description doesn't tell the
complete tale. I've replaced it with the following.

- Ted

ext4: Fix to read empty directory blocks correctly in 64k

From: Wei Yongjun <[email protected]>

The rec_len field in the directory entry is 16 bits, so there was a
problem representing rec_len for filesystems with a 64k block size in
the case where the directory entry takes the entire 64k block.
Unfortunately, there were two schemes that were proposed; one where
all zeros meant 65536 and one where all ones (65535) meant 65536.
E2fsprogs used 0, whereas the kernel used 65535. Oops. Fortunately
this case happens extremely rarely, with the most common case being
the lost+found directory, created by mke2fs.

So we will be liberal in what we accept, and accept both encodings,
but we will continue to encode 65536 as 65535. This will require a
change in e2fsprogs, but with fortunately ext4 filesystems normally
have the dir_index feature enabled, which precludes having a
completely empty directory block.

Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>


2009-02-11 05:48:26

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

On Feb 10, 2009 10:20 -0500, Theodore Ts'o wrote:
> The rec_len field in the directory entry is 16 bits, so there was a
> problem representing rec_len for filesystems with a 64k block size in
> the case where the directory entry takes the entire 64k block.
> Unfortunately, there were two schemes that were proposed; one where
> all zeros meant 65536 and one where all ones (65535) meant 65536.
> E2fsprogs used 0, whereas the kernel used 65535. Oops. Fortunately
> this case happens extremely rarely, with the most common case being
> the lost+found directory, created by mke2fs.
>
> So we will be liberal in what we accept, and accept both encodings,
> but we will continue to encode 65536 as 65535. This will require a
> change in e2fsprogs, but with fortunately ext4 filesystems normally
> have the dir_index feature enabled, which precludes having a
> completely empty directory block.

I'm glad that the MAX_REC_LEN value is being kept, because "0" is
too easily hit due to disk corruption.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-02-11 15:16:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

On Wed, Feb 11, 2009 at 12:48:16AM -0500, Andreas Dilger wrote:
>
> I'm glad that the MAX_REC_LEN value is being kept, because "0" is
> too easily hit due to disk corruption.
>

I'm not too worried about that, actually. If there is a disk
corruption, we will detect it easily enough no matter which encoding
we use. I am thinking though that neither 65535 nor 0 is the best way
of encoding 65536. In fact, I would suggest the encoding 0x01.
Specifically, I suggest:

(rec_len & 65532) | ((rec_len >> 16) & 3)

This encodes valid rec_len values in the range 0 through 2**18-4, and
allows for maximum block sizes of up to 256k. To retain backwards
compatibility, and to allow for a 256k blocksize, we can have a
special case where a rec_len of either 0 or 65535 means
EXT4_BLOCK_SIZE(s).

It's unlikely we'll see VM pages of up to 256k, but at some point we
might find that the Linux VM has been enhanced to support filesystem
block sizes > than the VM page size, at which point it might be useful
for some applications to allow very large filesystem block sizes.

- Ted

2009-02-12 06:43:00

by Wei Yongjun

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

Theodore Tso wrote:
> On Wed, Feb 11, 2009 at 12:48:16AM -0500, Andreas Dilger wrote:
>
>> I'm glad that the MAX_REC_LEN value is being kept, because "0" is
>> too easily hit due to disk corruption.
>>
>>
>
> I'm not too worried about that, actually. If there is a disk
> corruption, we will detect it easily enough no matter which encoding
> we use. I am thinking though that neither 65535 nor 0 is the best way
> of encoding 65536. In fact, I would suggest the encoding 0x01.
> Specifically, I suggest:
>
> (rec_len & 65532) | ((rec_len >> 16) & 3)
>
> This encodes valid rec_len values in the range 0 through 2**18-4, and
> allows for maximum block sizes of up to 256k. To retain backwards
> compatibility, and to allow for a 256k blocksize, we can have a
> special case where a rec_len of either 0 or 65535 means
> EXT4_BLOCK_SIZE(s).
>
> It's unlikely we'll see VM pages of up to 256k, but at some point we
> might find that the Linux VM has been enhanced to support filesystem
> block sizes > than the VM page size, at which point it might be useful
> for some applications to allow very large filesystem block sizes.
>
> - Ted
>

Hi, Ted

As your advice, I have changed the patch. Thanks.

[PATCH] ext4: Fix support for empty directory blocks in 64k blocksize filesystems

The rec_len field in the directory entry is 16 bits, so there was a
problem representing rec_len for filesystems with a 64k block size in
the case where the directory entry takes the entire 64k block.
Unfortunately, there were two schemes that were proposed; one where
all zeros meant 65536 and one where all ones (65535) meant 65536.
E2fsprogs used 0, whereas the kernel used 65535. Oops. Fortunately
this case happens extremely rarely, with the most common case being
the lost+found directory, created by mke2fs.

So we will be liberal in what we accept, and accept both encodings,
but we will encode 65536 as 0x01. Use this algorithm:

(rec_len & 65532) | ((rec_len >> 16) & 3)

This encodes valid rec_len values in the range 0 through 2**18-4, and
allows for maximum block sizes of up to 256k. To retain backwards
compatibility, and to allow for a 256k blocksize, we can have a
special case where a rec_len of either 0 or 65535 means
EXT4_BLOCK_SIZE(s).

It's unlikely we'll see VM pages of up to 256k, but at some point we
might find that the Linux VM has been enhanced to support filesystem
block sizes > than the VM page size, at which point it might be useful
for some applications to allow very large filesystem block sizes.

Signed-off-by: Wei Yongjun <[email protected]>
---
fs/ext4/dir.c | 16 ++++++++-----
fs/ext4/ext4.h | 16 +++++++-------
fs/ext4/namei.c | 63 ++++++++++++++++++++++++++++++------------------------
3 files changed, 53 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 2df2e40..5bc0199 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -67,7 +67,8 @@ int ext4_check_dir_entry(const char *function, struct inode *dir,
unsigned int offset)
{
const char *error_msg = NULL;
- const int rlen = ext4_rec_len_from_disk(de->rec_len);
+ const int rlen = ext4_rec_len_from_disk(de->rec_len,
+ dir->i_sb->s_blocksize);

if (rlen < EXT4_DIR_REC_LEN(1))
error_msg = "rec_len is smaller than minimal";
@@ -178,10 +179,11 @@ revalidate:
* least that it is non-zero. A
* failure will be detected in the
* dirent test below. */
- if (ext4_rec_len_from_disk(de->rec_len)
- < EXT4_DIR_REC_LEN(1))
+ if (ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize) < EXT4_DIR_REC_LEN(1))
break;
- i += ext4_rec_len_from_disk(de->rec_len);
+ i += ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize);
}
offset = i;
filp->f_pos = (filp->f_pos & ~(sb->s_blocksize - 1))
@@ -203,7 +205,8 @@ revalidate:
ret = stored;
goto out;
}
- offset += ext4_rec_len_from_disk(de->rec_len);
+ offset += ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize);
if (le32_to_cpu(de->inode)) {
/* We might block in the next section
* if the data destination is
@@ -225,7 +228,8 @@ revalidate:
goto revalidate;
stored++;
}
- filp->f_pos += ext4_rec_len_from_disk(de->rec_len);
+ filp->f_pos += ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize);
}
offset = 0;
brelse(bh);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index aafc9eb..e0c5962 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -864,22 +864,22 @@ struct ext4_dir_entry_2 {
~EXT4_DIR_ROUND)
#define EXT4_MAX_REC_LEN ((1<<16)-1)

-static inline unsigned ext4_rec_len_from_disk(__le16 dlen)
+static inline unsigned ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize)
{
unsigned len = le16_to_cpu(dlen);

- if (len == EXT4_MAX_REC_LEN)
- return 1 << 16;
- return len;
+ if (len == EXT4_MAX_REC_LEN || len == 0)
+ return blocksize;
+ return (len & 65532) | ((len & 3) << 16);
}

static inline __le16 ext4_rec_len_to_disk(unsigned len)
{
- if (len == (1 << 16))
- return cpu_to_le16(EXT4_MAX_REC_LEN);
- else if (len > (1 << 16))
+ if (len == (1 << 18))
+ return cpu_to_le16(0);
+ else if (len > (1 << 18))
BUG();
- return cpu_to_le16(len);
+ return cpu_to_le16((len & 65532) | ((len >> 16) & 3));
}

/*
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index ba702bd..165b378 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -184,10 +184,10 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
* p is at least 6 bytes before the end of page
*/
static inline struct ext4_dir_entry_2 *
-ext4_next_entry(struct ext4_dir_entry_2 *p)
+ext4_next_entry(struct ext4_dir_entry_2 *p, unsigned long blocksize)
{
return (struct ext4_dir_entry_2 *)((char *)p +
- ext4_rec_len_from_disk(p->rec_len));
+ ext4_rec_len_from_disk(p->rec_len, blocksize));
}

/*
@@ -294,7 +294,7 @@ static struct stats dx_show_leaf(struct dx_hash_info *hinfo, struct ext4_dir_ent
space += EXT4_DIR_REC_LEN(de->name_len);
names++;
}
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, size);
}
printk("(%i)\n", names);
return (struct stats) { names, space, 1 };
@@ -585,7 +585,7 @@ static int htree_dirblock_to_tree(struct file *dir_file,
top = (struct ext4_dir_entry_2 *) ((char *) de +
dir->i_sb->s_blocksize -
EXT4_DIR_REC_LEN(0));
- for (; de < top; de = ext4_next_entry(de)) {
+ for (; de < top; de = ext4_next_entry(de, dir->i_sb->s_blocksize)) {
if (!ext4_check_dir_entry("htree_dirblock_to_tree", dir, de, bh,
(block<<EXT4_BLOCK_SIZE_BITS(dir->i_sb))
+((char *)de - bh->b_data))) {
@@ -663,7 +663,7 @@ int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
}
if (start_hash < 2 || (start_hash ==2 && start_minor_hash==0)) {
de = (struct ext4_dir_entry_2 *) frames[0].bh->b_data;
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, dir->i_sb->s_blocksize);
if ((err = ext4_htree_store_dirent(dir_file, 2, 0, de)) != 0)
goto errout;
count++;
@@ -732,7 +732,7 @@ static int dx_make_map (struct ext4_dir_entry_2 *de, int size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, size);
}
return count;
}
@@ -832,7 +832,8 @@ static inline int search_dirblock(struct buffer_head *bh,
return 1;
}
/* prevent looping on a bad block */
- de_len = ext4_rec_len_from_disk(de->rec_len);
+ de_len = ext4_rec_len_from_disk(de->rec_len,
+ dir->i_sb->s_blocksize);
if (de_len <= 0)
return -1;
offset += de_len;
@@ -996,7 +997,7 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir, const struct q
de = (struct ext4_dir_entry_2 *) bh->b_data;
top = (struct ext4_dir_entry_2 *) ((char *) de + sb->s_blocksize -
EXT4_DIR_REC_LEN(0));
- for (; de < top; de = ext4_next_entry(de)) {
+ for (; de < top; de = ext4_next_entry(de, sb->s_blocksize)) {
int off = (block << EXT4_BLOCK_SIZE_BITS(sb))
+ ((char *) de - bh->b_data);

@@ -1137,7 +1138,7 @@ static struct ext4_dir_entry_2* dx_pack_dirents(char *base, int size)

prev = to = de;
while ((char*)de < base + size) {
- next = ext4_next_entry(de);
+ next = ext4_next_entry(de, size);
if (de->inode && de->name_len) {
rec_len = EXT4_DIR_REC_LEN(de->name_len);
if (de > to)
@@ -1287,7 +1288,8 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
return -EEXIST;
}
nlen = EXT4_DIR_REC_LEN(de->name_len);
- rlen = ext4_rec_len_from_disk(de->rec_len);
+ rlen = ext4_rec_len_from_disk(de->rec_len,
+ dir->i_sb->s_blocksize);
if ((de->inode? rlen - nlen: rlen) >= reclen)
break;
de = (struct ext4_dir_entry_2 *)((char *)de + rlen);
@@ -1306,7 +1308,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,

/* By now the buffer is marked for journaling */
nlen = EXT4_DIR_REC_LEN(de->name_len);
- rlen = ext4_rec_len_from_disk(de->rec_len);
+ rlen = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize);
if (de->inode) {
struct ext4_dir_entry_2 *de1 = (struct ext4_dir_entry_2 *)((char *)de + nlen);
de1->rec_len = ext4_rec_len_to_disk(rlen - nlen);
@@ -1380,7 +1382,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
/* The 0th block becomes the root, move the dirents out */
fde = &root->dotdot;
de = (struct ext4_dir_entry_2 *)((char *)fde +
- ext4_rec_len_from_disk(fde->rec_len));
+ ext4_rec_len_from_disk(fde->rec_len, blocksize));
if ((char *) de >= (((char *) root) + blocksize)) {
ext4_error(dir->i_sb, __func__,
"invalid rec_len for '..' in inode %lu",
@@ -1402,7 +1404,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
memcpy (data1, de, len);
de = (struct ext4_dir_entry_2 *) data1;
top = data1 + len;
- while ((char *)(de2 = ext4_next_entry(de)) < top)
+ while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top)
de = de2;
de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de);
/* Initialize the root; the dot dirents already exist */
@@ -1652,8 +1654,10 @@ static int ext4_delete_entry(handle_t *handle,
ext4_journal_get_write_access(handle, bh);
if (pde)
pde->rec_len = ext4_rec_len_to_disk(
- ext4_rec_len_from_disk(pde->rec_len) +
- ext4_rec_len_from_disk(de->rec_len));
+ ext4_rec_len_from_disk(pde->rec_len,
+ dir->i_sb->s_blocksize) +
+ ext4_rec_len_from_disk(de->rec_len,
+ dir->i_sb->s_blocksize));
else
de->inode = 0;
dir->i_version++;
@@ -1661,9 +1665,10 @@ static int ext4_delete_entry(handle_t *handle,
ext4_handle_dirty_metadata(handle, dir, bh);
return 0;
}
- i += ext4_rec_len_from_disk(de->rec_len);
+ i += ext4_rec_len_from_disk(de->rec_len,
+ dir->i_sb->s_blocksize);
pde = de;
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, dir->i_sb->s_blocksize);
}
return -ENOENT;
}
@@ -1827,7 +1832,7 @@ retry:
de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len));
strcpy(de->name, ".");
ext4_set_de_type(dir->i_sb, de, S_IFDIR);
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, dir->i_sb->s_blocksize);
de->inode = cpu_to_le32(dir->i_ino);
de->rec_len = ext4_rec_len_to_disk(inode->i_sb->s_blocksize -
EXT4_DIR_REC_LEN(1));
@@ -1885,7 +1890,7 @@ static int empty_dir(struct inode *inode)
return 1;
}
de = (struct ext4_dir_entry_2 *) bh->b_data;
- de1 = ext4_next_entry(de);
+ de1 = ext4_next_entry(de, sb->s_blocksize);
if (le32_to_cpu(de->inode) != inode->i_ino ||
!le32_to_cpu(de1->inode) ||
strcmp(".", de->name) ||
@@ -1896,9 +1901,9 @@ static int empty_dir(struct inode *inode)
brelse(bh);
return 1;
}
- offset = ext4_rec_len_from_disk(de->rec_len) +
- ext4_rec_len_from_disk(de1->rec_len);
- de = ext4_next_entry(de1);
+ offset = ext4_rec_len_from_disk(de->rec_len, sb->s_blocksize) +
+ ext4_rec_len_from_disk(de1->rec_len, sb->s_blocksize);
+ de = ext4_next_entry(de1, sb->s_blocksize);
while (offset < inode->i_size) {
if (!bh ||
(void *) de >= (void *) (bh->b_data+sb->s_blocksize)) {
@@ -1927,8 +1932,8 @@ static int empty_dir(struct inode *inode)
brelse(bh);
return 0;
}
- offset += ext4_rec_len_from_disk(de->rec_len);
- de = ext4_next_entry(de);
+ offset += ext4_rec_len_from_disk(de->rec_len, sb->s_blocksize);
+ de = ext4_next_entry(de, sb->s_blocksize);
}
brelse(bh);
return 1;
@@ -2297,8 +2302,8 @@ retry:
return err;
}

-#define PARENT_INO(buffer) \
- (ext4_next_entry((struct ext4_dir_entry_2 *)(buffer))->inode)
+#define PARENT_INO(buffer, size) \
+ (ext4_next_entry((struct ext4_dir_entry_2 *)(buffer), size)->inode)

/*
* Anybody can rename anything with this: the permission checks are left to the
@@ -2358,7 +2363,8 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
dir_bh = ext4_bread(handle, old_inode, 0, 0, &retval);
if (!dir_bh)
goto end_rename;
- if (le32_to_cpu(PARENT_INO(dir_bh->b_data)) != old_dir->i_ino)
+ if (le32_to_cpu(PARENT_INO(dir_bh->b_data,
+ old_dir->i_sb->s_blocksize)) != old_dir->i_ino)
goto end_rename;
retval = -EMLINK;
if (!new_inode && new_dir != old_dir &&
@@ -2430,7 +2436,8 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (dir_bh) {
BUFFER_TRACE(dir_bh, "get_write_access");
ext4_journal_get_write_access(handle, dir_bh);
- PARENT_INO(dir_bh->b_data) = cpu_to_le32(new_dir->i_ino);
+ PARENT_INO(dir_bh->b_data, new_dir->i_sb->s_blocksize) =
+ cpu_to_le32(new_dir->i_ino);
BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
ext4_handle_dirty_metadata(handle, old_dir, dir_bh);
ext4_dec_count(handle, old_dir);
--
1.5.3.8





2009-02-15 05:33:13

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

I've modified your patch somewhat, since the encoding and decoding
functions were getting to big to be inlined, and because I wanted to
preserve better backwards compatibility for encoding 64k blocksize
filesystems.

I also wanted to keep a minimal simple patch that I can push to Linus
right away, while this much larger patch will have to wait until the
next merge window. Since this is needed only for 128k and 256k
blocksizes, it's not as urgent.

This patch may also be considered too invasive for ext3; so a simpler
patch which enables 64k block sizes may be more appropriate for ext3.

- Ted

ext4: New rec_len encoding for very large blocksizes

From: Wei Yongjun <[email protected]>

The rec_len field in the directory entry is 16 bits, so to encode
blocksizes larger than 64k becomes problematic. This patch allows us
to supprot block sizes up to 256k, by using the low 2 bits to extend
the range of rec_len to 2**18-1 (since valid rec_len sizes must be a
multiple of 4). We use the convention that a rec_len of 0 or 65535
means the filesystem block size, for compatibility with older kernels.

It's unlikely we'll see VM pages of up to 256k, but at some point we
might find that the Linux VM has been enhanced to support filesystem
block sizes > than the VM page size, at which point it might be useful
for some applications to allow very large filesystem block sizes.

Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 2df2e40..b647899 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -67,7 +67,8 @@ int ext4_check_dir_entry(const char *function, struct inode *dir,
unsigned int offset)
{
const char *error_msg = NULL;
- const int rlen = ext4_rec_len_from_disk(de->rec_len);
+ const int rlen = ext4_rec_len_from_disk(de->rec_len,
+ dir->i_sb->s_blocksize);

if (rlen < EXT4_DIR_REC_LEN(1))
error_msg = "rec_len is smaller than minimal";
@@ -178,10 +179,11 @@ revalidate:
* least that it is non-zero. A
* failure will be detected in the
* dirent test below. */
- if (ext4_rec_len_from_disk(de->rec_len)
- < EXT4_DIR_REC_LEN(1))
+ if (ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize) < EXT4_DIR_REC_LEN(1))
break;
- i += ext4_rec_len_from_disk(de->rec_len);
+ i += ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize);
}
offset = i;
filp->f_pos = (filp->f_pos & ~(sb->s_blocksize - 1))
@@ -203,7 +205,8 @@ revalidate:
ret = stored;
goto out;
}
- offset += ext4_rec_len_from_disk(de->rec_len);
+ offset += ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize);
if (le32_to_cpu(de->inode)) {
/* We might block in the next section
* if the data destination is
@@ -225,7 +228,8 @@ revalidate:
goto revalidate;
stored++;
}
- filp->f_pos += ext4_rec_len_from_disk(de->rec_len);
+ filp->f_pos += ext4_rec_len_from_disk(de->rec_len,
+ sb->s_blocksize);
}
offset = 0;
brelse(bh);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e1a7cf4..b75a3c3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -854,24 +854,6 @@ struct ext4_dir_entry_2 {
~EXT4_DIR_ROUND)
#define EXT4_MAX_REC_LEN ((1<<16)-1)

-static inline unsigned ext4_rec_len_from_disk(__le16 dlen)
-{
- unsigned len = le16_to_cpu(dlen);
-
- if (len == EXT4_MAX_REC_LEN || len == 0)
- return 1 << 16;
- return len;
-}
-
-static inline __le16 ext4_rec_len_to_disk(unsigned len)
-{
- if (len == (1 << 16))
- return cpu_to_le16(EXT4_MAX_REC_LEN);
- else if (len > (1 << 16))
- BUG();
- return cpu_to_le16(len);
-}
-
/*
* Hash Tree Directory indexing
* (c) Daniel Phillips, 2001
@@ -1095,7 +1077,10 @@ extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long);

/* migrate.c */
extern int ext4_ext_migrate(struct inode *);
+
/* namei.c */
+extern unsigned int ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize);
+extern __le16 ext4_rec_len_to_disk(unsigned len, unsigned blocksize);
extern int ext4_orphan_add(handle_t *, struct inode *);
extern int ext4_orphan_del(handle_t *, struct inode *);
extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index e9b6a96..0cade98 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -165,7 +165,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize,
struct dx_hash_info *hinfo, struct dx_map_entry map[]);
static void dx_sort_map(struct dx_map_entry *map, unsigned count);
static struct ext4_dir_entry_2 *dx_move_dirents(char *from, char *to,
- struct dx_map_entry *offsets, int count);
+ struct dx_map_entry *offsets, int count, unsigned blocksize);
static struct ext4_dir_entry_2* dx_pack_dirents(char *base, unsigned blocksize);
static void dx_insert_block(struct dx_frame *frame,
u32 hash, ext4_lblk_t block);
@@ -180,14 +180,38 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir,
static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
struct inode *inode);

+unsigned int ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize)
+{
+ unsigned len = le16_to_cpu(dlen);
+
+ if (len == EXT4_MAX_REC_LEN || len == 0)
+ return blocksize;
+ return (len & 65532) | ((len & 3) << 16);
+}
+
+__le16 ext4_rec_len_to_disk(unsigned len, unsigned blocksize)
+{
+ if ((len > blocksize) || (blocksize > (1 << 18)) || (len & 3))
+ BUG();
+ if (len < 65536)
+ return cpu_to_le16(len);
+ if (len == blocksize) {
+ if (blocksize == 65536)
+ return cpu_to_le16(EXT4_MAX_REC_LEN);
+ else
+ return cpu_to_le16(0);
+ }
+ return cpu_to_le16((len & 65532) | ((len >> 16) & 3));
+}
+
/*
* p is at least 6 bytes before the end of page
*/
static inline struct ext4_dir_entry_2 *
-ext4_next_entry(struct ext4_dir_entry_2 *p)
+ext4_next_entry(struct ext4_dir_entry_2 *p, unsigned long blocksize)
{
return (struct ext4_dir_entry_2 *)((char *)p +
- ext4_rec_len_from_disk(p->rec_len));
+ ext4_rec_len_from_disk(p->rec_len, blocksize));
}

/*
@@ -294,7 +318,7 @@ static struct stats dx_show_leaf(struct dx_hash_info *hinfo, struct ext4_dir_ent
space += EXT4_DIR_REC_LEN(de->name_len);
names++;
}
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, size);
}
printk("(%i)\n", names);
return (struct stats) { names, space, 1 };
@@ -585,7 +609,7 @@ static int htree_dirblock_to_tree(struct file *dir_file,
top = (struct ext4_dir_entry_2 *) ((char *) de +
dir->i_sb->s_blocksize -
EXT4_DIR_REC_LEN(0));
- for (; de < top; de = ext4_next_entry(de)) {
+ for (; de < top; de = ext4_next_entry(de, dir->i_sb->s_blocksize)) {
if (!ext4_check_dir_entry("htree_dirblock_to_tree", dir, de, bh,
(block<<EXT4_BLOCK_SIZE_BITS(dir->i_sb))
+((char *)de - bh->b_data))) {
@@ -663,7 +687,7 @@ int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
}
if (start_hash < 2 || (start_hash ==2 && start_minor_hash==0)) {
de = (struct ext4_dir_entry_2 *) frames[0].bh->b_data;
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, dir->i_sb->s_blocksize);
if ((err = ext4_htree_store_dirent(dir_file, 2, 0, de)) != 0)
goto errout;
count++;
@@ -732,7 +756,7 @@ static int dx_make_map(struct ext4_dir_entry_2 *de, unsigned blocksize,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, blocksize);
}
return count;
}
@@ -832,7 +856,8 @@ static inline int search_dirblock(struct buffer_head *bh,
return 1;
}
/* prevent looping on a bad block */
- de_len = ext4_rec_len_from_disk(de->rec_len);
+ de_len = ext4_rec_len_from_disk(de->rec_len,
+ dir->i_sb->s_blocksize);
if (de_len <= 0)
return -1;
offset += de_len;
@@ -996,7 +1021,7 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir, const struct q
de = (struct ext4_dir_entry_2 *) bh->b_data;
top = (struct ext4_dir_entry_2 *) ((char *) de + sb->s_blocksize -
EXT4_DIR_REC_LEN(0));
- for (; de < top; de = ext4_next_entry(de)) {
+ for (; de < top; de = ext4_next_entry(de, sb->s_blocksize)) {
int off = (block << EXT4_BLOCK_SIZE_BITS(sb))
+ ((char *) de - bh->b_data);

@@ -1109,7 +1134,8 @@ static inline void ext4_set_de_type(struct super_block *sb,
* Returns pointer to last entry moved.
*/
static struct ext4_dir_entry_2 *
-dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count)
+dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count,
+ unsigned blocksize)
{
unsigned rec_len = 0;

@@ -1118,7 +1144,7 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count)
rec_len = EXT4_DIR_REC_LEN(de->name_len);
memcpy (to, de, rec_len);
((struct ext4_dir_entry_2 *) to)->rec_len =
- ext4_rec_len_to_disk(rec_len);
+ ext4_rec_len_to_disk(rec_len, blocksize);
de->inode = 0;
map++;
to += rec_len;
@@ -1137,12 +1163,12 @@ static struct ext4_dir_entry_2* dx_pack_dirents(char *base, unsigned blocksize)

prev = to = de;
while ((char*)de < base + blocksize) {
- next = ext4_next_entry(de);
+ next = ext4_next_entry(de, blocksize);
if (de->inode && de->name_len) {
rec_len = EXT4_DIR_REC_LEN(de->name_len);
if (de > to)
memmove(to, de, rec_len);
- to->rec_len = ext4_rec_len_to_disk(rec_len);
+ to->rec_len = ext4_rec_len_to_disk(rec_len, blocksize);
prev = to;
to = (struct ext4_dir_entry_2 *) (((char *) to) + rec_len);
}
@@ -1215,10 +1241,12 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
hash2, split, count-split));

/* Fancy dance to stay within two buffers */
- de2 = dx_move_dirents(data1, data2, map + split, count - split);
+ de2 = dx_move_dirents(data1, data2, map + split, count - split, blocksize);
de = dx_pack_dirents(data1, blocksize);
- de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de);
- de2->rec_len = ext4_rec_len_to_disk(data2 + blocksize - (char *) de2);
+ de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de,
+ blocksize);
+ de2->rec_len = ext4_rec_len_to_disk(data2 + blocksize - (char *) de2,
+ blocksize);
dxtrace(dx_show_leaf (hinfo, (struct ext4_dir_entry_2 *) data1, blocksize, 1));
dxtrace(dx_show_leaf (hinfo, (struct ext4_dir_entry_2 *) data2, blocksize, 1));

@@ -1268,6 +1296,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
const char *name = dentry->d_name.name;
int namelen = dentry->d_name.len;
unsigned int offset = 0;
+ unsigned int blocksize = dir->i_sb->s_blocksize;
unsigned short reclen;
int nlen, rlen, err;
char *top;
@@ -1275,7 +1304,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
reclen = EXT4_DIR_REC_LEN(namelen);
if (!de) {
de = (struct ext4_dir_entry_2 *)bh->b_data;
- top = bh->b_data + dir->i_sb->s_blocksize - reclen;
+ top = bh->b_data + blocksize - reclen;
while ((char *) de <= top) {
if (!ext4_check_dir_entry("ext4_add_entry", dir, de,
bh, offset)) {
@@ -1287,7 +1316,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
return -EEXIST;
}
nlen = EXT4_DIR_REC_LEN(de->name_len);
- rlen = ext4_rec_len_from_disk(de->rec_len);
+ rlen = ext4_rec_len_from_disk(de->rec_len, blocksize);
if ((de->inode? rlen - nlen: rlen) >= reclen)
break;
de = (struct ext4_dir_entry_2 *)((char *)de + rlen);
@@ -1306,11 +1335,11 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,

/* By now the buffer is marked for journaling */
nlen = EXT4_DIR_REC_LEN(de->name_len);
- rlen = ext4_rec_len_from_disk(de->rec_len);
+ rlen = ext4_rec_len_from_disk(de->rec_len, blocksize);
if (de->inode) {
struct ext4_dir_entry_2 *de1 = (struct ext4_dir_entry_2 *)((char *)de + nlen);
- de1->rec_len = ext4_rec_len_to_disk(rlen - nlen);
- de->rec_len = ext4_rec_len_to_disk(nlen);
+ de1->rec_len = ext4_rec_len_to_disk(rlen - nlen, blocksize);
+ de->rec_len = ext4_rec_len_to_disk(nlen, blocksize);
de = de1;
}
de->file_type = EXT4_FT_UNKNOWN;
@@ -1380,7 +1409,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
/* The 0th block becomes the root, move the dirents out */
fde = &root->dotdot;
de = (struct ext4_dir_entry_2 *)((char *)fde +
- ext4_rec_len_from_disk(fde->rec_len));
+ ext4_rec_len_from_disk(fde->rec_len, blocksize));
if ((char *) de >= (((char *) root) + blocksize)) {
ext4_error(dir->i_sb, __func__,
"invalid rec_len for '..' in inode %lu",
@@ -1402,12 +1431,14 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
memcpy (data1, de, len);
de = (struct ext4_dir_entry_2 *) data1;
top = data1 + len;
- while ((char *)(de2 = ext4_next_entry(de)) < top)
+ while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top)
de = de2;
- de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de);
+ de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de,
+ blocksize);
/* Initialize the root; the dot dirents already exist */
de = (struct ext4_dir_entry_2 *) (&root->dotdot);
- de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(2));
+ de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(2),
+ blocksize);
memset (&root->info, 0, sizeof(root->info));
root->info.info_length = sizeof(root->info);
root->info.hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version;
@@ -1488,7 +1519,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
return retval;
de = (struct ext4_dir_entry_2 *) bh->b_data;
de->inode = 0;
- de->rec_len = ext4_rec_len_to_disk(blocksize);
+ de->rec_len = ext4_rec_len_to_disk(blocksize, blocksize);
return add_dirent_to_buf(handle, dentry, inode, de, bh);
}

@@ -1551,7 +1582,8 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
goto cleanup;
node2 = (struct dx_node *)(bh2->b_data);
entries2 = node2->entries;
- node2->fake.rec_len = ext4_rec_len_to_disk(sb->s_blocksize);
+ node2->fake.rec_len = ext4_rec_len_to_disk(sb->s_blocksize,
+ sb->s_blocksize);
node2->fake.inode = 0;
BUFFER_TRACE(frame->bh, "get_write_access");
err = ext4_journal_get_write_access(handle, frame->bh);
@@ -1639,6 +1671,7 @@ static int ext4_delete_entry(handle_t *handle,
struct buffer_head *bh)
{
struct ext4_dir_entry_2 *de, *pde;
+ unsigned int blocksize = dir->i_sb->s_blocksize;
int i;

i = 0;
@@ -1652,8 +1685,11 @@ static int ext4_delete_entry(handle_t *handle,
ext4_journal_get_write_access(handle, bh);
if (pde)
pde->rec_len = ext4_rec_len_to_disk(
- ext4_rec_len_from_disk(pde->rec_len) +
- ext4_rec_len_from_disk(de->rec_len));
+ ext4_rec_len_from_disk(pde->rec_len,
+ blocksize) +
+ ext4_rec_len_from_disk(de->rec_len,
+ blocksize),
+ blocksize);
else
de->inode = 0;
dir->i_version++;
@@ -1661,9 +1697,9 @@ static int ext4_delete_entry(handle_t *handle,
ext4_handle_dirty_metadata(handle, dir, bh);
return 0;
}
- i += ext4_rec_len_from_disk(de->rec_len);
+ i += ext4_rec_len_from_disk(de->rec_len, blocksize);
pde = de;
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, blocksize);
}
return -ENOENT;
}
@@ -1793,6 +1829,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, int mode)
struct inode *inode;
struct buffer_head *dir_block;
struct ext4_dir_entry_2 *de;
+ unsigned int blocksize = dir->i_sb->s_blocksize;
int err, retries = 0;

if (EXT4_DIR_LINK_MAX(dir))
@@ -1824,13 +1861,14 @@ retry:
de = (struct ext4_dir_entry_2 *) dir_block->b_data;
de->inode = cpu_to_le32(inode->i_ino);
de->name_len = 1;
- de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len));
+ de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len),
+ blocksize);
strcpy(de->name, ".");
ext4_set_de_type(dir->i_sb, de, S_IFDIR);
- de = ext4_next_entry(de);
+ de = ext4_next_entry(de, blocksize);
de->inode = cpu_to_le32(dir->i_ino);
- de->rec_len = ext4_rec_len_to_disk(inode->i_sb->s_blocksize -
- EXT4_DIR_REC_LEN(1));
+ de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(1),
+ blocksize);
de->name_len = 2;
strcpy(de->name, "..");
ext4_set_de_type(dir->i_sb, de, S_IFDIR);
@@ -1885,7 +1923,7 @@ static int empty_dir(struct inode *inode)
return 1;
}
de = (struct ext4_dir_entry_2 *) bh->b_data;
- de1 = ext4_next_entry(de);
+ de1 = ext4_next_entry(de, sb->s_blocksize);
if (le32_to_cpu(de->inode) != inode->i_ino ||
!le32_to_cpu(de1->inode) ||
strcmp(".", de->name) ||
@@ -1896,9 +1934,9 @@ static int empty_dir(struct inode *inode)
brelse(bh);
return 1;
}
- offset = ext4_rec_len_from_disk(de->rec_len) +
- ext4_rec_len_from_disk(de1->rec_len);
- de = ext4_next_entry(de1);
+ offset = ext4_rec_len_from_disk(de->rec_len, sb->s_blocksize) +
+ ext4_rec_len_from_disk(de1->rec_len, sb->s_blocksize);
+ de = ext4_next_entry(de1, sb->s_blocksize);
while (offset < inode->i_size) {
if (!bh ||
(void *) de >= (void *) (bh->b_data+sb->s_blocksize)) {
@@ -1927,8 +1965,8 @@ static int empty_dir(struct inode *inode)
brelse(bh);
return 0;
}
- offset += ext4_rec_len_from_disk(de->rec_len);
- de = ext4_next_entry(de);
+ offset += ext4_rec_len_from_disk(de->rec_len, sb->s_blocksize);
+ de = ext4_next_entry(de, sb->s_blocksize);
}
brelse(bh);
return 1;
@@ -2297,8 +2335,8 @@ retry:
return err;
}

-#define PARENT_INO(buffer) \
- (ext4_next_entry((struct ext4_dir_entry_2 *)(buffer))->inode)
+#define PARENT_INO(buffer, size) \
+ (ext4_next_entry((struct ext4_dir_entry_2 *)(buffer), size)->inode)

/*
* Anybody can rename anything with this: the permission checks are left to the
@@ -2358,7 +2396,8 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
dir_bh = ext4_bread(handle, old_inode, 0, 0, &retval);
if (!dir_bh)
goto end_rename;
- if (le32_to_cpu(PARENT_INO(dir_bh->b_data)) != old_dir->i_ino)
+ if (le32_to_cpu(PARENT_INO(dir_bh->b_data,
+ old_dir->i_sb->s_blocksize)) != old_dir->i_ino)
goto end_rename;
retval = -EMLINK;
if (!new_inode && new_dir != old_dir &&
@@ -2430,7 +2469,8 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (dir_bh) {
BUFFER_TRACE(dir_bh, "get_write_access");
ext4_journal_get_write_access(handle, dir_bh);
- PARENT_INO(dir_bh->b_data) = cpu_to_le32(new_dir->i_ino);
+ PARENT_INO(dir_bh->b_data, new_dir->i_sb->s_blocksize) =
+ cpu_to_le32(new_dir->i_ino);
BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
ext4_handle_dirty_metadata(handle, old_dir, dir_bh);
ext4_dec_count(handle, old_dir);

2009-02-16 23:33:11

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

On Feb 11, 2009 10:15 -0500, Theodore Ts'o wrote:
> On Wed, Feb 11, 2009 at 12:48:16AM -0500, Andreas Dilger wrote:
> > I'm glad that the MAX_REC_LEN value is being kept, because "0" is
> > too easily hit due to disk corruption.
>
> I'm not too worried about that, actually. If there is a disk
> corruption, we will detect it easily enough no matter which encoding
> we use. I am thinking though that neither 65535 nor 0 is the best way
> of encoding 65536. In fact, I would suggest the encoding 0x01.
> Specifically, I suggest:
>
> (rec_len & 65532) | ((rec_len >> 16) & 3)
>
> This encodes valid rec_len values in the range 0 through 2**18-4, and
> allows for maximum block sizes of up to 256k. To retain backwards
> compatibility, and to allow for a 256k blocksize, we can have a
> special case where a rec_len of either 0 or 65535 means
> EXT4_BLOCK_SIZE(s).
>
> It's unlikely we'll see VM pages of up to 256k, but at some point we
> might find that the Linux VM has been enhanced to support filesystem
> block sizes > than the VM page size, at which point it might be useful
> for some applications to allow very large filesystem block sizes.

There are probably a dozen other places in the ext* code that expect
blocksize <= 65536, so I don't think this new encoding is really helping
us at all. We are already restricted to 2^32-1 inodes due to the dirent
format and I expect we will have changed the dirent format by that time
anyways.

I'd rather keep this change as simple as possible (i.e. the original
65535 or 0 values, preferring 65535).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-02-17 15:59:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix to read empty directory blocks correctly in 64k blocksize filesystems

On Mon, Feb 16, 2009 at 04:32:56PM -0700, Andreas Dilger wrote:
> There are probably a dozen other places in the ext* code that expect
> blocksize <= 65536, so I don't think this new encoding is really helping
> us at all. We are already restricted to 2^32-1 inodes due to the dirent
> format and I expect we will have changed the dirent format by that time
> anyways.

I can only find one other place, actually, which would be the extended
attribute code --- e_value_offs is a 16 bit offset (although we could
steal bits form e_value block if necessary).

> I'd rather keep this change as simple as possible (i.e. the original
> 65535 or 0 values, preferring 65535).

For 64k block sizes, I've kept this, although I preferred 0 since it's
a cleaner change if we want to allow bigger blocksize alternative. I
don't think 65535 is any worse than 0 in terms of detecting directory
corruption.

- Ted