2014-11-11 08:00:37

by Xiaoguang Wang

[permalink] [raw]
Subject: [PATCH 1/4] e2fsprogs/libext2fs: replace ext2fs_free_inode_cache() argument

When we call ext2fs_free_inode_cache(fs->icache) to free the inode
cache, indeed it will not make fs->icache be 0, it just makes the
local argument icache be 0, after this call, we need another
'fs->icache = 0' statement. So here we pass the 'ext2_filsys fs' as
arguments directly, to make the free inode cache operation more natural.

Signed-off-by: Xiaoguang Wang <[email protected]>
---
lib/ext2fs/ext2fs.h | 2 +-
lib/ext2fs/freefs.c | 2 +-
lib/ext2fs/inode.c | 10 ++++++----
3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 506d43b..580440f 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1403,7 +1403,7 @@ extern errcode_t ext2fs_inline_data_set(ext2_filsys fs, ext2_ino_t ino,
/* inode.c */
extern errcode_t ext2fs_create_inode_cache(ext2_filsys fs,
unsigned int cache_size);
-extern void ext2fs_free_inode_cache(struct ext2_inode_cache *icache);
+extern void ext2fs_free_inode_cache(ext2_filsys fs);
extern errcode_t ext2fs_flush_icache(ext2_filsys fs);
extern errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan,
ext2_ino_t *ino,
diff --git a/lib/ext2fs/freefs.c b/lib/ext2fs/freefs.c
index ea9742e..9c40c34 100644
--- a/lib/ext2fs/freefs.c
+++ b/lib/ext2fs/freefs.c
@@ -52,7 +52,7 @@ void ext2fs_free(ext2_filsys fs)
ext2fs_free_dblist(fs->dblist);

if (fs->icache)
- ext2fs_free_inode_cache(fs->icache);
+ ext2fs_free_inode_cache(fs);

if (fs->mmp_buf)
ext2fs_free_mem(&fs->mmp_buf);
diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
index 4310b82..f938c37 100644
--- a/lib/ext2fs/inode.c
+++ b/lib/ext2fs/inode.c
@@ -79,10 +79,13 @@ errcode_t ext2fs_flush_icache(ext2_filsys fs)
/*
* Free the inode cache structure
*/
-void ext2fs_free_inode_cache(struct ext2_inode_cache *icache)
+void ext2fs_free_inode_cache(ext2_filsys fs)
{
int i;
+ struct ext2_inode_cache *icache = fs->icache;

+ if (!icache)
+ return;
if (--icache->refcount)
return;
if (icache->buffer)
@@ -92,7 +95,7 @@ void ext2fs_free_inode_cache(struct ext2_inode_cache *icache)
if (icache->cache)
ext2fs_free_mem(&icache->cache);
icache->buffer_blk = 0;
- ext2fs_free_mem(&icache);
+ ext2fs_free_mem(&fs->icache);
}

errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
@@ -131,8 +134,7 @@ errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
ext2fs_flush_icache(fs);
return 0;
errout:
- ext2fs_free_inode_cache(fs->icache);
- fs->icache = 0;
+ ext2fs_free_inode_cache(fs);
return retval;
}

--
1.8.2.1



2014-11-11 08:00:36

by Xiaoguang Wang

[permalink] [raw]
Subject: [PATCH 2/4] e2fsprogs/tune2fs: fix memory leak in inode_scan_and_fix()

When we use ext2fs_open_inode_scan() to iterate inodes and finish
jobs, we also need a ext2fs_close_inode_scan(scan) operation, but in
inode_scan_and_fix(), we forgot to call it, fix this error.

Signed-off-by: Xiaoguang Wang <[email protected]>
---
misc/tune2fs.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 7fee870..065b483 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -2147,6 +2147,7 @@ static int inode_scan_and_fix(ext2_filsys fs, ext2fs_block_bitmap bmap)

err_out:
ext2fs_free_mem(&block_buf);
+ ext2fs_close_inode_scan(scan);

return retval;
}
--
1.8.2.1


2014-11-11 08:00:37

by Xiaoguang Wang

[permalink] [raw]
Subject: [PATCH 4/4] e2fsprogs/tune2fs: fix memory write overflow

If we apply this patch 'e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size',
we will trigger a segfault, this is because of the inode cache issues.

Firstly we should notice that in expand_inode_table(), we have change the super block's s_inode_size
to new inode size(for example, 256).

Then we re-compute metadata checksums, see below code flow:
|-->rewrite_metadata_checksums
|----->rewrite_inodes
|-------->ext2fs_write_inode_full
In ext2fs_write_inode_full(), if an inode cache is hit, the below code will be executed:
/* Check to see if the inode cache needs to be updated */
if (fs->icache) {
for (i=0; i < fs->icache->cache_size; i++) {
if (fs->icache->cache[i].ino == ino) {
memcpy(fs->icache->cache[i].inode, inode,
(bufsize > length) ? length : bufsize);
break;
}
}
}

Before executing rewrite_inodes(), actually the inode in inode cache is allocated by
old inode size(for example, 128), but here the memcpy will obviously write overflow,
'(bufsize > length) ? length : bufsize' here will return 256(new inode size), so this is
wrong, we need to fix this.
I think we should call ext2fs_free_inode_cache() in expand_inode_table(), to drop the
inode cache, because inode size has changed, if necessary, we will re-create this inode cache.

Steps to reproduce this bug(apply 'e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size' first).
dd if=/dev/zero of=file.img bs=1M count=128
device_name=$(/sbin/losetup -f)
/sbin/losetup -f file.img
mkfs.ext4 -I 128 -O ^flex_bg $device_name
tune2fs -I 256 $device_name

Signed-off-by: Xiaoguang Wang <[email protected]>
---
misc/tune2fs.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 91dc7c1..78b3806 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -2244,6 +2244,8 @@ static int expand_inode_table(ext2_filsys fs, unsigned long new_ino_size)

/* Update the meta data */
fs->inode_blocks_per_group = new_ino_blks_per_grp;
+
+ ext2fs_free_inode_cache(fs);
fs->super->s_inode_size = new_ino_size;

err_out:
--
1.8.2.1


2014-11-11 08:00:38

by Xiaoguang Wang

[permalink] [raw]
Subject: [PATCH 3/4] e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size

When we use tune2fs -I new_ino_size to change inode size, if everything is OK,
the corresponding ext4_group_desc.bg_free_blocks_count will be decreased, so
obviously, we need to re-compute the group descriptor checksums, fix this. If
not doing this, mount operation will fail.

Meanwhile, the patch will trigger an existing memory write overflow, which will
casue segfault, please see the next patch.

Signed-off-by: Xiaoguang Wang <[email protected]>
---
misc/tune2fs.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 065b483..91dc7c1 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -2908,8 +2908,7 @@ retry_open:
EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
rewrite_checksums = 1;
}
- if (rewrite_checksums)
- rewrite_metadata_checksums(fs);
+
if (I_flag) {
if (mount_flags & EXT2_MF_MOUNTED) {
fputs(_("The inode size may only be "
@@ -2935,6 +2934,7 @@ retry_open:
if (resize_inode(fs, new_inode_size) == 0) {
printf(_("Setting inode size %lu\n"),
new_inode_size);
+ rewrite_checksums = 1;
} else {
printf("%s", _("Failed to change inode size\n"));
rc = 1;
@@ -2942,6 +2942,9 @@ retry_open:
}
}

+ if (rewrite_checksums)
+ rewrite_metadata_checksums(fs);
+
if (l_flag)
list_super(sb);
if (stride_set) {
--
1.8.2.1


2014-11-11 08:15:02

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 1/4] e2fsprogs/libext2fs: replace ext2fs_free_inode_cache() argument

On Tue, Nov 11, 2014 at 02:59:26PM +0800, Xiaoguang Wang wrote:
> When we call ext2fs_free_inode_cache(fs->icache) to free the inode
> cache, indeed it will not make fs->icache be 0, it just makes the
> local argument icache be 0, after this call, we need another
> 'fs->icache = 0' statement. So here we pass the 'ext2_filsys fs' as
> arguments directly, to make the free inode cache operation more natural.
>
> Signed-off-by: Xiaoguang Wang <[email protected]>
> ---
> lib/ext2fs/ext2fs.h | 2 +-
> lib/ext2fs/freefs.c | 2 +-
> lib/ext2fs/inode.c | 10 ++++++----
> 3 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 506d43b..580440f 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -1403,7 +1403,7 @@ extern errcode_t ext2fs_inline_data_set(ext2_filsys fs, ext2_ino_t ino,
> /* inode.c */
> extern errcode_t ext2fs_create_inode_cache(ext2_filsys fs,
> unsigned int cache_size);
> -extern void ext2fs_free_inode_cache(struct ext2_inode_cache *icache);
> +extern void ext2fs_free_inode_cache(ext2_filsys fs);

This change breaks the libext2fs ABI. If you want to change the pointer type
of the parameter, please declare a new function:

extern void ext2fs_free_inode_cache2(ext2_filsys fs);

Ideally you'd change the old function to return an error code, but it returns
void... sigh. I guess client programs are on their own.

Really, free_inode_cache is a poor interface since the ctor doesn't return a
struct ext2_inode_cache * directly, preferring to attach it to fs->icache
instead. Why are the inode cache ctor/dtor exported in ext2fs.h anyway?
Nobody seems to use them. Ted?

</rant>

Otherwise, the patch looks reasonable.

--D

> extern errcode_t ext2fs_flush_icache(ext2_filsys fs);
> extern errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan,
> ext2_ino_t *ino,
> diff --git a/lib/ext2fs/freefs.c b/lib/ext2fs/freefs.c
> index ea9742e..9c40c34 100644
> --- a/lib/ext2fs/freefs.c
> +++ b/lib/ext2fs/freefs.c
> @@ -52,7 +52,7 @@ void ext2fs_free(ext2_filsys fs)
> ext2fs_free_dblist(fs->dblist);
>
> if (fs->icache)
> - ext2fs_free_inode_cache(fs->icache);
> + ext2fs_free_inode_cache(fs);
>
> if (fs->mmp_buf)
> ext2fs_free_mem(&fs->mmp_buf);
> diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
> index 4310b82..f938c37 100644
> --- a/lib/ext2fs/inode.c
> +++ b/lib/ext2fs/inode.c
> @@ -79,10 +79,13 @@ errcode_t ext2fs_flush_icache(ext2_filsys fs)
> /*
> * Free the inode cache structure
> */
> -void ext2fs_free_inode_cache(struct ext2_inode_cache *icache)
> +void ext2fs_free_inode_cache(ext2_filsys fs)
> {
> int i;
> + struct ext2_inode_cache *icache = fs->icache;
>
> + if (!icache)
> + return;
> if (--icache->refcount)
> return;
> if (icache->buffer)
> @@ -92,7 +95,7 @@ void ext2fs_free_inode_cache(struct ext2_inode_cache *icache)
> if (icache->cache)
> ext2fs_free_mem(&icache->cache);
> icache->buffer_blk = 0;
> - ext2fs_free_mem(&icache);
> + ext2fs_free_mem(&fs->icache);
> }
>
> errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
> @@ -131,8 +134,7 @@ errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
> ext2fs_flush_icache(fs);
> return 0;
> errout:
> - ext2fs_free_inode_cache(fs->icache);
> - fs->icache = 0;
> + ext2fs_free_inode_cache(fs);
> return retval;
> }
>
> --
> 1.8.2.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-11-11 08:33:37

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 3/4] e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size

On Tue, Nov 11, 2014 at 02:59:28PM +0800, Xiaoguang Wang wrote:
> When we use tune2fs -I new_ino_size to change inode size, if everything is OK,
> the corresponding ext4_group_desc.bg_free_blocks_count will be decreased, so
> obviously, we need to re-compute the group descriptor checksums, fix this. If
> not doing this, mount operation will fail.
>
> Meanwhile, the patch will trigger an existing memory write overflow, which will
> casue segfault, please see the next patch.
>
> Signed-off-by: Xiaoguang Wang <[email protected]>
> ---
> misc/tune2fs.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> index 065b483..91dc7c1 100644
> --- a/misc/tune2fs.c
> +++ b/misc/tune2fs.c
> @@ -2908,8 +2908,7 @@ retry_open:
> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> rewrite_checksums = 1;
> }
> - if (rewrite_checksums)
> - rewrite_metadata_checksums(fs);
> +
> if (I_flag) {
> if (mount_flags & EXT2_MF_MOUNTED) {
> fputs(_("The inode size may only be "
> @@ -2935,6 +2934,7 @@ retry_open:
> if (resize_inode(fs, new_inode_size) == 0) {
> printf(_("Setting inode size %lu\n"),
> new_inode_size);
> + rewrite_checksums = 1;

rewrite_metadata_checksums() was designed to rewrite checksums on every
metadata object in the filesystem (extents, ACLs, directory blocks), which I
think does more work than necessary for your use case (if I'm reading this
correctly) of simply rewriting the block group descriptor checksums. Would it
suffice to call ext2fs_group_desc_csum_set() when setting bg_free_blocks_count?
I think this could be done in ext2fs_calculate_summary_stats().

--D

> } else {
> printf("%s", _("Failed to change inode size\n"));
> rc = 1;
> @@ -2942,6 +2942,9 @@ retry_open:
> }
> }
>
> + if (rewrite_checksums)
> + rewrite_metadata_checksums(fs);
> +
> if (l_flag)
> list_super(sb);
> if (stride_set) {
> --
> 1.8.2.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-11-11 08:37:22

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 4/4] e2fsprogs/tune2fs: fix memory write overflow

On Tue, Nov 11, 2014 at 02:59:29PM +0800, Xiaoguang Wang wrote:
> If we apply this patch 'e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size',
> we will trigger a segfault, this is because of the inode cache issues.
>
> Firstly we should notice that in expand_inode_table(), we have change the super block's s_inode_size
> to new inode size(for example, 256).
>
> Then we re-compute metadata checksums, see below code flow:
> |-->rewrite_metadata_checksums
> |----->rewrite_inodes
> |-------->ext2fs_write_inode_full
> In ext2fs_write_inode_full(), if an inode cache is hit, the below code will be executed:
> /* Check to see if the inode cache needs to be updated */
> if (fs->icache) {
> for (i=0; i < fs->icache->cache_size; i++) {
> if (fs->icache->cache[i].ino == ino) {
> memcpy(fs->icache->cache[i].inode, inode,
> (bufsize > length) ? length : bufsize);
> break;
> }
> }
> }
>
> Before executing rewrite_inodes(), actually the inode in inode cache is allocated by
> old inode size(for example, 128), but here the memcpy will obviously write overflow,
> '(bufsize > length) ? length : bufsize' here will return 256(new inode size), so this is
> wrong, we need to fix this.
> I think we should call ext2fs_free_inode_cache() in expand_inode_table(), to drop the
> inode cache, because inode size has changed, if necessary, we will re-create this inode cache.
>
> Steps to reproduce this bug(apply 'e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size' first).
> dd if=/dev/zero of=file.img bs=1M count=128
> device_name=$(/sbin/losetup -f)
> /sbin/losetup -f file.img
> mkfs.ext4 -I 128 -O ^flex_bg $device_name
> tune2fs -I 256 $device_name
>
> Signed-off-by: Xiaoguang Wang <[email protected]>
> ---
> misc/tune2fs.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> index 91dc7c1..78b3806 100644
> --- a/misc/tune2fs.c
> +++ b/misc/tune2fs.c
> @@ -2244,6 +2244,8 @@ static int expand_inode_table(ext2_filsys fs, unsigned long new_ino_size)
>
> /* Update the meta data */
> fs->inode_blocks_per_group = new_ino_blks_per_grp;
> +
> + ext2fs_free_inode_cache(fs);

Aha, yes, we do need to expose the ctor/dtor APIs for the inode cache!

Patches 2 & 4 look ok to me.

--D

> fs->super->s_inode_size = new_ino_size;
>
> err_out:
> --
> 1.8.2.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-11-11 09:10:58

by Xiaoguang Wang

[permalink] [raw]
Subject: Re: [PATCH 3/4] e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size

Hi,

On 11/11/2014 04:33 PM, Darrick J. Wong wrote:
> On Tue, Nov 11, 2014 at 02:59:28PM +0800, Xiaoguang Wang wrote:
>> When we use tune2fs -I new_ino_size to change inode size, if everything is OK,
>> the corresponding ext4_group_desc.bg_free_blocks_count will be decreased, so
>> obviously, we need to re-compute the group descriptor checksums, fix this. If
>> not doing this, mount operation will fail.
>>
>> Meanwhile, the patch will trigger an existing memory write overflow, which will
>> casue segfault, please see the next patch.
>>
>> Signed-off-by: Xiaoguang Wang <[email protected]>
>> ---
>> misc/tune2fs.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
>> index 065b483..91dc7c1 100644
>> --- a/misc/tune2fs.c
>> +++ b/misc/tune2fs.c
>> @@ -2908,8 +2908,7 @@ retry_open:
>> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
>> rewrite_checksums = 1;
>> }
>> - if (rewrite_checksums)
>> - rewrite_metadata_checksums(fs);
>> +
>> if (I_flag) {
>> if (mount_flags & EXT2_MF_MOUNTED) {
>> fputs(_("The inode size may only be "
>> @@ -2935,6 +2934,7 @@ retry_open:
>> if (resize_inode(fs, new_inode_size) == 0) {
>> printf(_("Setting inode size %lu\n"),
>> new_inode_size);
>> + rewrite_checksums = 1;
>
> rewrite_metadata_checksums() was designed to rewrite checksums on every
> metadata object in the filesystem (extents, ACLs, directory blocks), which I
> think does more work than necessary for your use case (if I'm reading this
> correctly) of simply rewriting the block group descriptor checksums. Would it
> suffice to call ext2fs_group_desc_csum_set() when setting bg_free_blocks_count?
> I think this could be done in ext2fs_calculate_summary_stats().

For a new created file system, I think ext2fs_group_desc_csum_set() would be sufficient, because
the needed blocks(used for new inode table, they must be continuous) will be free. But for a file
system, if it already has many directories and files, these blocks may be in use, then we need to
replace these blocks with other blocks, so the ACLs and extent tree may also be modified, see
inode_scan_and_fix(), so I think here we need a rewrite_metadata_checksums(), in other words, it's
the safest :) I'm still new to ext4(learning the code now), so I may also miss something, thanks!

Regards,
Xiaoguang Wang



>
> --D
>
>> } else {
>> printf("%s", _("Failed to change inode size\n"));
>> rc = 1;
>> @@ -2942,6 +2942,9 @@ retry_open:
>> }
>> }
>>
>> + if (rewrite_checksums)
>> + rewrite_metadata_checksums(fs);
>> +
>> if (l_flag)
>> list_super(sb);
>> if (stride_set) {
>> --
>> 1.8.2.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> .
>


2014-11-11 09:51:34

by Xiaoguang Wang

[permalink] [raw]
Subject: Re: [PATCH 1/4] e2fsprogs/libext2fs: replace ext2fs_free_inode_cache() argument

Hi,

On 11/11/2014 04:14 PM, Darrick J. Wong wrote:
> On Tue, Nov 11, 2014 at 02:59:26PM +0800, Xiaoguang Wang wrote:
>> When we call ext2fs_free_inode_cache(fs->icache) to free the inode
>> cache, indeed it will not make fs->icache be 0, it just makes the
>> local argument icache be 0, after this call, we need another
>> 'fs->icache = 0' statement. So here we pass the 'ext2_filsys fs' as
>> arguments directly, to make the free inode cache operation more natural.
>>
>> Signed-off-by: Xiaoguang Wang <[email protected]>
>> ---
>> lib/ext2fs/ext2fs.h | 2 +-
>> lib/ext2fs/freefs.c | 2 +-
>> lib/ext2fs/inode.c | 10 ++++++----
>> 3 files changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
>> index 506d43b..580440f 100644
>> --- a/lib/ext2fs/ext2fs.h
>> +++ b/lib/ext2fs/ext2fs.h
>> @@ -1403,7 +1403,7 @@ extern errcode_t ext2fs_inline_data_set(ext2_filsys fs, ext2_ino_t ino,
>> /* inode.c */
>> extern errcode_t ext2fs_create_inode_cache(ext2_filsys fs,
>> unsigned int cache_size);
>> -extern void ext2fs_free_inode_cache(struct ext2_inode_cache *icache);
>> +extern void ext2fs_free_inode_cache(ext2_filsys fs);
>
> This change breaks the libext2fs ABI. If you want to change the pointer type
> of the parameter, please declare a new function:
>
> extern void ext2fs_free_inode_cache2(ext2_filsys fs);

Oh, I see, thanks!
I'll send a v2 version later :)

Regards,
Xiaoguang Wang

>
> Ideally you'd change the old function to return an error code, but it returns
> void... sigh. I guess client programs are on their own.
>
> Really, free_inode_cache is a poor interface since the ctor doesn't return a
> struct ext2_inode_cache * directly, preferring to attach it to fs->icache
> instead. Why are the inode cache ctor/dtor exported in ext2fs.h anyway?
> Nobody seems to use them. Ted?
>
> </rant>
>
> Otherwise, the patch looks reasonable.
>
> --D
>
>> extern errcode_t ext2fs_flush_icache(ext2_filsys fs);
>> extern errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan,
>> ext2_ino_t *ino,
>> diff --git a/lib/ext2fs/freefs.c b/lib/ext2fs/freefs.c
>> index ea9742e..9c40c34 100644
>> --- a/lib/ext2fs/freefs.c
>> +++ b/lib/ext2fs/freefs.c
>> @@ -52,7 +52,7 @@ void ext2fs_free(ext2_filsys fs)
>> ext2fs_free_dblist(fs->dblist);
>>
>> if (fs->icache)
>> - ext2fs_free_inode_cache(fs->icache);
>> + ext2fs_free_inode_cache(fs);
>>
>> if (fs->mmp_buf)
>> ext2fs_free_mem(&fs->mmp_buf);
>> diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
>> index 4310b82..f938c37 100644
>> --- a/lib/ext2fs/inode.c
>> +++ b/lib/ext2fs/inode.c
>> @@ -79,10 +79,13 @@ errcode_t ext2fs_flush_icache(ext2_filsys fs)
>> /*
>> * Free the inode cache structure
>> */
>> -void ext2fs_free_inode_cache(struct ext2_inode_cache *icache)
>> +void ext2fs_free_inode_cache(ext2_filsys fs)
>> {
>> int i;
>> + struct ext2_inode_cache *icache = fs->icache;
>>
>> + if (!icache)
>> + return;
>> if (--icache->refcount)
>> return;
>> if (icache->buffer)
>> @@ -92,7 +95,7 @@ void ext2fs_free_inode_cache(struct ext2_inode_cache *icache)
>> if (icache->cache)
>> ext2fs_free_mem(&icache->cache);
>> icache->buffer_blk = 0;
>> - ext2fs_free_mem(&icache);
>> + ext2fs_free_mem(&fs->icache);
>> }
>>
>> errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
>> @@ -131,8 +134,7 @@ errcode_t ext2fs_create_inode_cache(ext2_filsys fs, unsigned int cache_size)
>> ext2fs_flush_icache(fs);
>> return 0;
>> errout:
>> - ext2fs_free_inode_cache(fs->icache);
>> - fs->icache = 0;
>> + ext2fs_free_inode_cache(fs);
>> return retval;
>> }
>>
>> --
>> 1.8.2.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> .
>


2014-11-11 16:19:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/4] e2fsprogs/libext2fs: replace ext2fs_free_inode_cache() argument

On Tue, Nov 11, 2014 at 12:14:51AM -0800, Darrick J. Wong wrote:
> Ideally you'd change the old function to return an error code, but it returns
> void... sigh. I guess client programs are on their own.
>
> Really, free_inode_cache is a poor interface since the ctor doesn't return a
> struct ext2_inode_cache * directly, preferring to attach it to fs->icache
> instead. Why are the inode cache ctor/dtor exported in ext2fs.h anyway?
> Nobody seems to use them. Ted?

Looking at the history, the change happened in commit 603e5ebc, which
changed the internal representation of the ext2_inode_cache. At that
point, I moved the inode cache freeing function from
lib/ext2fs/freefs.c to lib/ext2fs/inode.c, on the theory that it made
more sense to keep all of the code that handled the representation of
the inode cache was kept in the same place.

I agree I should have declared the function in ext2fsP.h instead of
ext2fs.h; the interface was not great, but that was probably because
it was originally intended as an internal interface, and I didn't
bother to fix it up before moving it function to another .c file.

At this point, we have a couple of choices.

(1) define a new interface with a new name (ext2fs_free_inode_cache2)

(2) move the function back to freefs.c and make it be static, so that
the function signature disappears from the shared library.

(3) ignore the problem because it's highly unlikely anyone outside of
libext2fs will need to use it, and improving the API/ABI doesn't
really matter all that much. Instead, just change the callers to
clear fs->icache after calling ext2fs_free_inode_cache().

My preference is (2) or (3).

Cheers,

- Ted

2014-11-12 09:41:01

by Xiaoguang Wang

[permalink] [raw]
Subject: Re: [PATCH 1/4] e2fsprogs/libext2fs: replace ext2fs_free_inode_cache() argument

Hi,

On 11/12/2014 12:12 AM, Theodore Ts'o wrote:
> On Tue, Nov 11, 2014 at 12:14:51AM -0800, Darrick J. Wong wrote:
>> Ideally you'd change the old function to return an error code, but it returns
>> void... sigh. I guess client programs are on their own.
>>
>> Really, free_inode_cache is a poor interface since the ctor doesn't return a
>> struct ext2_inode_cache * directly, preferring to attach it to fs->icache
>> instead. Why are the inode cache ctor/dtor exported in ext2fs.h anyway?
>> Nobody seems to use them. Ted?
>
> Looking at the history, the change happened in commit 603e5ebc, which
> changed the internal representation of the ext2_inode_cache. At that
> point, I moved the inode cache freeing function from
> lib/ext2fs/freefs.c to lib/ext2fs/inode.c, on the theory that it made
> more sense to keep all of the code that handled the representation of
> the inode cache was kept in the same place.
>
> I agree I should have declared the function in ext2fsP.h instead of
> ext2fs.h; the interface was not great, but that was probably because
> it was originally intended as an internal interface, and I didn't
> bother to fix it up before moving it function to another .c file.
>
> At this point, we have a couple of choices.
>
> (1) define a new interface with a new name (ext2fs_free_inode_cache2)
>
> (2) move the function back to freefs.c and make it be static, so that
> the function signature disappears from the shared library.
>
> (3) ignore the problem because it's highly unlikely anyone outside of
> libext2fs will need to use it, and improving the API/ABI doesn't
> really matter all that much. Instead, just change the callers to
> clear fs->icache after calling ext2fs_free_inode_cache().
>
> My preference is (2) or (3).

OK, I'll choose (3) :) thanks!
Version 2 will be sent soon.

Regards,
Xiaoguang Wang
>
> Cheers,
>
> - Ted
> .
>


2014-11-12 21:30:54

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 3/4] e2fsprogs/tune2fs: rewrite metadata checksums when resizing inode size

On Tue, Nov 11, 2014 at 05:08:42PM +0800, Xiaoguang Wang wrote:
> Hi,
>
> On 11/11/2014 04:33 PM, Darrick J. Wong wrote:
> > On Tue, Nov 11, 2014 at 02:59:28PM +0800, Xiaoguang Wang wrote:
> >> When we use tune2fs -I new_ino_size to change inode size, if everything is OK,
> >> the corresponding ext4_group_desc.bg_free_blocks_count will be decreased, so
> >> obviously, we need to re-compute the group descriptor checksums, fix this. If
> >> not doing this, mount operation will fail.
> >>
> >> Meanwhile, the patch will trigger an existing memory write overflow, which will
> >> casue segfault, please see the next patch.
> >>
> >> Signed-off-by: Xiaoguang Wang <[email protected]>
> >> ---
> >> misc/tune2fs.c | 7 +++++--
> >> 1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> >> index 065b483..91dc7c1 100644
> >> --- a/misc/tune2fs.c
> >> +++ b/misc/tune2fs.c
> >> @@ -2908,8 +2908,7 @@ retry_open:
> >> EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> >> rewrite_checksums = 1;
> >> }
> >> - if (rewrite_checksums)
> >> - rewrite_metadata_checksums(fs);
> >> +
> >> if (I_flag) {
> >> if (mount_flags & EXT2_MF_MOUNTED) {
> >> fputs(_("The inode size may only be "
> >> @@ -2935,6 +2934,7 @@ retry_open:
> >> if (resize_inode(fs, new_inode_size) == 0) {
> >> printf(_("Setting inode size %lu\n"),
> >> new_inode_size);
> >> + rewrite_checksums = 1;
> >
> > rewrite_metadata_checksums() was designed to rewrite checksums on every
> > metadata object in the filesystem (extents, ACLs, directory blocks), which
> > I think does more work than necessary for your use case (if I'm reading
> > this correctly) of simply rewriting the block group descriptor checksums.
> > Would it suffice to call ext2fs_group_desc_csum_set() when setting
> > bg_free_blocks_count? I think this could be done in
> > ext2fs_calculate_summary_stats().
>
> For a new created file system, I think ext2fs_group_desc_csum_set() would be
> sufficient, because the needed blocks(used for new inode table, they must be
> continuous) will be free. But for a file system, if it already has many
> directories and files, these blocks may be in use, then we need to replace
> these blocks with other blocks, so the ACLs and extent tree may also be
> modified, see inode_scan_and_fix(), so I think here we need a
> rewrite_metadata_checksums(), in other words, it's the safest :) I'm still
> new to ext4(learning the code now), so I may also miss something, thanks!

The checksums for extent blocks, ACLs, and directories don't care about the LBA
of the block, which means that they can be moved around the FS without having
to rewrite the checksum. I still think you can get away with just:

for (dgrp_t i = 0; i < fs->group_desc_count; i++)
ext2fs_group_desc_csum_set(fs, i);

to fix your problem. That said, do you have a test case you could send me?
I'm having trouble reproducing the symptoms on the -next branch. 1.42.12 can
reproduce it reliably... but the funny thing is that 1.42.* doesn't know about
metadata_csum at all.

--D

>
> Regards,
> Xiaoguang Wang
>
>
>
> >
> > --D
> >
> >> } else {
> >> printf("%s", _("Failed to change inode size\n"));
> >> rc = 1;
> >> @@ -2942,6 +2942,9 @@ retry_open:
> >> }
> >> }
> >>
> >> + if (rewrite_checksums)
> >> + rewrite_metadata_checksums(fs);
> >> +
> >> if (l_flag)
> >> list_super(sb);
> >> if (stride_set) {
> >> --
> >> 1.8.2.1
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > .
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html