2009-09-02 03:18:56

by Akira Fujita

[permalink] [raw]
Subject: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

ext4: Return exchanged blocks count to user space in failure

From: Akira Fujita <[email protected]>

Return exchanged blocks count (moved_len) to user space,
if ext4_move_extents() failed on the way.

Signed-off-by: Akira Fujita <[email protected]>
---
fs/ext4/ioctl.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index b0b434b..efd11c8 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -255,10 +255,9 @@ setversion_out:
me.donor_start, me.len, &me.moved_len);
fput(donor_filp);

- if (!err)
- if (copy_to_user((struct move_extent *)arg,
- &me, sizeof(me)))
- return -EFAULT;
+ if (copy_to_user((struct move_extent *)arg, &me, sizeof(me)))
+ return -EFAULT;
+
return err;
}


2009-09-02 15:54:19

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi, Akira,

Akira Fujita wrote:
> ext4: Return exchanged blocks count to user space in failure
>
> From: Akira Fujita <[email protected]>
>
> Return exchanged blocks count (moved_len) to user space,
> if ext4_move_extents() failed on the way.
Even with the patch, I still don't see how users can fix EXT4_IOC_MOVE_EXT failures,
because the orig file itself may be broken.
>
> Signed-off-by: Akira Fujita <[email protected]>
> ---
> fs/ext4/ioctl.c | 7 +++----
> 1 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index b0b434b..efd11c8 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -255,10 +255,9 @@ setversion_out:
> me.donor_start, me.len, &me.moved_len);
> fput(donor_filp);
>
> - if (!err)
> - if (copy_to_user((struct move_extent *)arg,
> - &me, sizeof(me)))
> - return -EFAULT;
> + if (copy_to_user((struct move_extent *)arg, &me, sizeof(me)))
> + return -EFAULT;
> +
> return err;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


--
Best Regards,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.

2009-09-02 20:59:17

by Greg Freemyer

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

On Wed, Sep 2, 2009 at 11:54 AM, Peng Tao<[email protected]> wrote:
> Hi, Akira,
>
> Akira Fujita wrote:
>> ext4: Return exchanged blocks count to user space in failure
>>
>> From: Akira Fujita <[email protected]>
>>
>> Return exchanged blocks count (moved_len) to user space,
>> if ext4_move_extents() failed on the way.
> Even with the patch, I still don't see how users can fix EXT4_IOC_MOVE_EXT failures,
> because the orig file itself may be broken.

Peng,

I have not looked at the code very closely, but can you tell me where
a file corruption can take place? Not completing the replacement of
extents with donor extents is one thing. Corrupting the original file
contents is another.

Clearly we need EXT4_IOC_MOVE_EXT to fail gracefully and not corrupt
the original file the vast majority of the time.

Greg

2009-09-03 05:13:01

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi, Greg,

On Thu, Sep 3, 2009 at 4:59 AM, Greg Freemyer<[email protected]> wrote:
> Peng,
>
> I have not looked at the code very closely, but can you tell me where
> a file corruption can take place?   Not completing the replacement of
> extents with donor extents is one thing.  Corrupting the original file
> contents is another.
The file corruption is mainly because of the half done replacement.

My test case is here:
http://marc.info/?l=linux-ext4&m=124992522305319&w=2

With Akira's previous patch
(http://marc.info/?l=linux-ext4&m=124937430627867&w=2),
EXT4_IOC_MOVE_EXT does not panic the kernel any more. But it leaves
the orig file's extent tree corrupted.

>
> Clearly we need EXT4_IOC_MOVE_EXT to fail gracefully and not corrupt
> the original file the vast majority of the time.
>
> Greg
>



--
Cheers,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.

2009-09-03 13:47:59

by Greg Freemyer

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

On Thu, Sep 3, 2009 at 1:13 AM, Peng Tao<[email protected]> wrote:
> Hi, Greg,
>
> On Thu, Sep 3, 2009 at 4:59 AM, Greg Freemyer<[email protected]> wrote:
>> Peng,
>>
>> I have not looked at the code very closely, but can you tell me where
>> a file corruption can take place? ? Not completing the replacement of
>> extents with donor extents is one thing. ?Corrupting the original file
>> contents is another.
> The file corruption is mainly because of the half done replacement.
>
> My test case is here:
> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>
> With Akira's previous patch
> (http://marc.info/?l=linux-ext4&m=124937430627867&w=2),
> EXT4_IOC_MOVE_EXT does not panic the kernel any more. But it leaves
> the orig file's extent tree corrupted.

Is this highly repeatable, e4defrag using EXT4_IOC_MOVE_EXT corrupts
sparse files?

If so, it seems like a pretty major bug that will be exposed to
userspace when 2.6.31 goes final.

It seems to me at a minimum a Kconfig option should be added to enable
the ioctl to userspace and that it should have depends on EXPERIMENTAL
and default to NO for now.

We don't want people thinking that this feature is stable in 2.6.31.

Greg
--
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
Preservation and Forensic processing of Exchange Repositories White Paper -
<http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html>

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

2009-09-04 03:35:54

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

On Thu, Sep 3, 2009 at 9:48 PM, Greg Freemyer<[email protected]> wrote:
> On Thu, Sep 3, 2009 at 1:13 AM, Peng Tao<[email protected]> wrote:
>> Hi, Greg,
>>
>> On Thu, Sep 3, 2009 at 4:59 AM, Greg Freemyer<[email protected]> wrote:
>>> Peng,
>>>
>>> I have not looked at the code very closely, but can you tell me where
>>> a file corruption can take place?   Not completing the replacement of
>>> extents with donor extents is one thing.  Corrupting the original file
>>> contents is another.
>> The file corruption is mainly because of the half done replacement.
>>
>> My test case is here:
>> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>>
>> With Akira's previous patch
>> (http://marc.info/?l=linux-ext4&m=124937430627867&w=2),
>> EXT4_IOC_MOVE_EXT does not panic the kernel any more. But it leaves
>> the orig file's extent tree corrupted.
>
> Is this highly repeatable, e4defrag using EXT4_IOC_MOVE_EXT corrupts
> sparse files?
It is the EXT4_IOC_MOVE_EXT ioctl corrupts sparse files because it
doesn't handle well with file holes.
The e4defrag program loops to call EXT4_IOC_MOVE_EXT for each
non-continuous parts of a file, so it doesn't expose file holes to
kernel.
But since the ioctl interface is publicly usable, I do suggest making
it behave better, at lease don't bug the kernel, because I did hit
null pointer reference in my last time testing the new ioctl. So IMHO,
Akira's previous patch (http://patchwork.ozlabs.org/patch/30707/) is
really necessary.

>
> If so, it seems like a pretty major bug that will be exposed to
> userspace when 2.6.31 goes final.
>
> It seems to me at a minimum a Kconfig option should be added to enable
> the ioctl to userspace and that it should have depends on EXPERIMENTAL
> and default to NO for now.
>
> We don't want people thinking that this feature is stable in 2.6.31.
I do agree that the feature is still unstable ATM.

>
> Greg
> --
> Greg Freemyer
> Head of EDD Tape Extraction and Processing team
> Litigation Triage Solutions Specialist
> http://www.linkedin.com/in/gregfreemyer
> Preservation and Forensic processing of Exchange Repositories White Paper -
> <http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html>
>
> The Norcross Group
> The Intersection of Evidence & Technology
> http://www.norcrossgroup.com
>



--
Cheers,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.

2009-09-04 09:56:00

by Akira Fujita

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi Peng,
Peng Tao wrote:
> Hi, Greg,
>
> On Thu, Sep 3, 2009 at 4:59 AM, Greg Freemyer<[email protected]> wrote:
>> Peng,
>>
>> I have not looked at the code very closely, but can you tell me where
>> a file corruption can take place? Not completing the replacement of
>> extents with donor extents is one thing. Corrupting the original file
>> contents is another.
> The file corruption is mainly because of the half done replacement.
>
> My test case is here:
> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>

This patch solves your test case problem.

>$dd if=/dev/zero of=zero.img bs=10M count=0 seek=50
>$dd if=../609xp.img of=first.img bs=10M count=1
>$dd if=/dev/zero of=first.img bs=10M count=0 seek=50
>$dd if=../609xp.img of=last.img bs=10M count=1 seek=49
>$dd if=../609xp.img of=middle.img bs=10M count=1 seek=24
>$dd if=/dev/zero of=middle.img bs=10M count=0 seek=50


This problem is caused by the fact that logical offset of
orig file is different from donor file's.
To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
add checks to mext_calc_swap_extents() and handles it as error,
since data exchange must be done between the same blocks.

Note: This problem does not happen in ext4 online defrag
(means with e4defrag command), because the donor file
which is created by e4defrag in user space is
file constitution same as orig file.

And add the extent null check to ext_get_path() for
followings test case.
>$dd if=/dev/zero of=zero.img bs=10M count=0 seek=50

More test cases are needed for EXT4_IOC_MOVE_EXT,
so this patch may not be complete,
but the problem you reported is fixed at least.
I am now testing EXT4_IOC_MOVE_EXT hard.

BTW, I'm now looking into the empty extent issue which
you reported, so I will release the patch soon.
http://marc.info/?l=linux-ext4&m=124975192830024&w=2

Could you do your test case again with this patch?

# Before you apply this patch,
# apply following patch which I sent before:
http://marc.info/?l=linux-ext4&m=125186152727422&w=2

Regards,
Akira Fujita

---
fs/ext4/move_extent.c | 56 ++++++++++++++++++++++++++++++++----------------
1 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 0d10f8b..052acd9 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -25,6 +25,8 @@
if (IS_ERR(path)) { \
ret = PTR_ERR(path); \
path = NULL; \
+ } else if (path[ext_depth(inode)].p_ext == NULL) { \
+ ret = -ENODATA; \
} \
} while (0)

@@ -284,7 +286,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,

if (new_flag) {
get_ext_path(orig_path, orig_inode, eblock, err);
- if (orig_path == NULL)
+ if (err)
goto out;

if (ext4_ext_insert_extent(handle, orig_inode,
@@ -295,7 +297,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
if (end_flag) {
get_ext_path(orig_path, orig_inode,
le32_to_cpu(end_ext->ee_block) - 1, err);
- if (orig_path == NULL)
+ if (err)
goto out;

if (ext4_ext_insert_extent(handle, orig_inode,
@@ -554,8 +556,10 @@ mext_leaf_block(handle_t *handle, struct inode *orig_inode,
* @orig_off: block offset of original inode
* @donor_off: block offset of donor inode
* @max_count: the maximun length of extents
+ *
+ * Return 0 on success, or a negative error value on failure.
*/
-static void
+static int
mext_calc_swap_extents(struct ext4_extent *tmp_dext,
struct ext4_extent *tmp_oext,
ext4_lblk_t orig_off, ext4_lblk_t donor_off,
@@ -564,6 +568,19 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
ext4_lblk_t diff, orig_diff;
struct ext4_extent dext_old, oext_old;

+ BUG_ON(orig_off != donor_off);
+
+ /* original and donor extents have to cover the same block offset */
+ if (orig_off < le32_to_cpu(tmp_oext->ee_block) ||
+ le32_to_cpu(tmp_oext->ee_block) +
+ ext4_ext_get_actual_len(tmp_oext) - 1 < orig_off)
+ return -ENODATA;
+
+ if (orig_off < le32_to_cpu(tmp_dext->ee_block) ||
+ le32_to_cpu(tmp_dext->ee_block) +
+ ext4_ext_get_actual_len(tmp_dext) - 1 < orig_off)
+ return -ENODATA;
+
dext_old = *tmp_dext;
oext_old = *tmp_oext;

@@ -591,6 +608,8 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,

copy_extent_status(&oext_old, tmp_dext);
copy_extent_status(&dext_old, tmp_oext);
+
+ return 0;
}

/**
@@ -632,12 +651,12 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,

/* Get the original extent for the block "orig_off" */
get_ext_path(orig_path, orig_inode, orig_off, err);
- if (orig_path == NULL)
+ if (err)
goto out;

/* Get the donor extent for the head */
get_ext_path(donor_path, donor_inode, donor_off, err);
- if (donor_path == NULL)
+ if (err)
goto out;
depth = ext_depth(orig_inode);
oext = orig_path[depth].p_ext;
@@ -647,8 +666,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
dext = donor_path[depth].p_ext;
tmp_dext = *dext;

- mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
+ err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
donor_off, count);
+ if (err)
+ goto out;

/* Loop for the donor extents */
while (1) {
@@ -679,7 +700,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
if (orig_path)
ext4_ext_drop_refs(orig_path);
get_ext_path(orig_path, orig_inode, orig_off, err);
- if (orig_path == NULL)
+ if (err)
goto out;
depth = ext_depth(orig_inode);
oext = orig_path[depth].p_ext;
@@ -694,7 +715,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
ext4_ext_drop_refs(donor_path);
get_ext_path(donor_path, donor_inode,
donor_off, err);
- if (donor_path == NULL)
+ if (err)
goto out;
depth = ext_depth(donor_inode);
dext = donor_path[depth].p_ext;
@@ -705,9 +726,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
}
tmp_dext = *dext;

- mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
- donor_off,
- count - replaced_count);
+ err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
+ donor_off, count - replaced_count);
+ if (err)
+ goto out;
}

out:
@@ -1147,20 +1169,16 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
len -= block_end - file_end;

get_ext_path(orig_path, orig_inode, block_start, ret);
- if (orig_path == NULL)
+ if (ret)
goto out2;

/* Get path structure to check the hole */
get_ext_path(holecheck_path, orig_inode, block_start, ret);
- if (holecheck_path == NULL)
+ if (ret)
goto out;

depth = ext_depth(orig_inode);
ext_cur = holecheck_path[depth].p_ext;
- if (ext_cur == NULL) {
- ret = -EINVAL;
- goto out;
- }

/*
* Get proper extent whose ee_block is beyond block_start
@@ -1283,7 +1301,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
ext4_ext_drop_refs(holecheck_path);
get_ext_path(holecheck_path, orig_inode,
seq_start, ret);
- if (holecheck_path == NULL)
+ if (ret)
break;
depth = holecheck_path->p_depth;

@@ -1291,7 +1309,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
if (orig_path)
ext4_ext_drop_refs(orig_path);
get_ext_path(orig_path, orig_inode, seq_start, ret);
- if (orig_path == NULL)
+ if (ret)
break;

ext_cur = holecheck_path[depth].p_ext;

2009-09-04 16:43:35

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi, Akira,

Akira Fujita wrote:
> Hi Peng,
> Peng Tao wrote:
>> Hi, Greg,
>>
>> On Thu, Sep 3, 2009 at 4:59 AM, Greg Freemyer<[email protected]> wrote:
>>> Peng,
>>>
>>> I have not looked at the code very closely, but can you tell me where
>>> a file corruption can take place? Not completing the replacement of
>>> extents with donor extents is one thing. Corrupting the original file
>>> contents is another.
>> The file corruption is mainly because of the half done replacement.
>>
>> My test case is here:
>> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>>
>
> This patch solves your test case problem.
>
>> $dd if=/dev/zero of=zero.img bs=10M count=0 seek=50
>> $dd if=../609xp.img of=first.img bs=10M count=1
>> $dd if=/dev/zero of=first.img bs=10M count=0 seek=50
>> $dd if=../609xp.img of=last.img bs=10M count=1 seek=49
>> $dd if=../609xp.img of=middle.img bs=10M count=1 seek=24
>> $dd if=/dev/zero of=middle.img bs=10M count=0 seek=50
>
>
> This problem is caused by the fact that logical offset of
> orig file is different from donor file's.
> To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
> add checks to mext_calc_swap_extents() and handles it as error,
> since data exchange must be done between the same blocks.
>
> Note: This problem does not happen in ext4 online defrag
> (means with e4defrag command), because the donor file
> which is created by e4defrag in user space is
> file constitution same as orig file.
>
> And add the extent null check to ext_get_path() for
> followings test case.
>> $dd if=/dev/zero of=zero.img bs=10M count=0 seek=50
>
> More test cases are needed for EXT4_IOC_MOVE_EXT,
> so this patch may not be complete,
> but the problem you reported is fixed at least.
> I am now testing EXT4_IOC_MOVE_EXT hard.
>
> BTW, I'm now looking into the empty extent issue which
> you reported, so I will release the patch soon.
> http://marc.info/?l=linux-ext4&m=124975192830024&w=2
>
> Could you do your test case again with this patch?
After applying the two patches, I run my test case with first.img as the orig file (and middle.img or
last.img as donor file). My kernel panics and I find following message in /var/log/messages after reboot:

Sep 4 23:21:05 bergwolf -- MARK --
[ 3183.602852] Modules linked in: ext4 ppdev lp parport binfmt_misc i915 kvm_intel kvm uinput ipv6 cpufreq_userspace cpufreq_conservative cpufreq_powersave jbd2 crc16 fuse dm_snapshot dm_mirror dm_region_hash dm_log dm_mod zlib_deflate crc32c acpi_cpufreq sbp2 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss pcmcia rtc_cmos snd_pcm rtc_core i2c_i801 rtc_lib snd_timer snd_page_alloc psmouse yenta_socket rsrc_nonstatic thinkpad_acpi pcmcia_core serio_raw evdev uhci_hcd firewire_ohci firewire_core crc_itu_t video output ehci_hcd e1000e usbcore [last unloaded: ext4]
[ 3183.602951]
[ 3183.602958] Pid: 6937, comm: a.out Not tainted (2.6.31-rc2-drm-intel-next #2) 7676A26
[ 3183.602965] EIP: 0060:[<c01cfa26>] EFLAGS: 00210287 CPU: 1
[ 3183.602977] EIP is at journal_start+0x39/0xb9
[ 3183.602982] EAX: f61a2a80 EBX: f26f048c ECX: f6995200 EDX: f6995000
[ 3183.602988] ESI: f26f048c EDI: f6f59c88 EBP: f1a77c90 ESP: f1a77c7c
[ 3183.602994] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 3183.603010] 00000002 f6995000 f6f59c88 f26f048c f6f59c88 f1a77c98 c01c9cc6 f1a77cac
[ 3183.603024] <0> c01c0b51 f6f59c88 00000001 f6995200 f1a77cc0 c0192799 004cba00 f6f59c88
[ 3183.603039] <0> f68b3840 f1a77cd4 c018ab14 004cba00 00000000 ff7fc000 f1a77d44 c015c7ec
[ 3183.603070] [<c01c9cc6>] ? ext3_journal_start_sb+0x40/0x42
[ 3183.603076] [<c01c0b51>] ? ext3_dirty_inode+0x24/0x67
[ 3183.603087] [<c0192799>] ? __mark_inode_dirty+0x23/0xc6
[ 3183.603097] [<c018ab14>] ? file_update_time+0x7a/0xa3
[ 3183.603108] [<c015c7ec>] ? __generic_file_aio_write_nolock+0x2d6/0x3fe
[ 3183.603151] [<fa29d3b4>] ? ext4_ext_find_extent+0x3f/0x230 [ext4]
[ 3183.603161] [<c015d0e3>] ? generic_file_aio_write+0x57/0xb4
[ 3183.603200] [<fa2a6c26>] ? mext_replace_branches+0x31f/0x329 [ext4]
[ 3183.603209] [<c01bef56>] ? ext3_file_write+0x1a/0x88
[ 3183.603219] [<c017b6e2>] ? do_sync_write+0xab/0xe9
[ 3183.603229] [<c0137403>] ? autoremove_wake_function+0x0/0x33
[ 3183.603239] [<c013dbda>] ? getnstimeofday+0x52/0xda
[ 3183.603249] [<c014d027>] ? do_acct_process+0x68d/0x6b2
[ 3183.603257] [<c015b46c>] ? find_get_page+0x1d/0x81
[ 3183.603268] [<c018df9f>] ? mntput_no_expire+0x19/0xb3
[ 3183.603276] [<c017c9c7>] ? __fput+0x17c/0x184
[ 3183.603286] [<c014d09f>] ? acct_process+0x53/0x66
[ 3183.603294] [<c012a318>] ? do_exit+0x174/0x573
[ 3183.603303] [<c012a778>] ? do_group_exit+0x61/0x88
[ 3183.603311] [<c012a7b2>] ? sys_exit_group+0x13/0x17
[ 3183.603320] [<c0102994>] ? sysenter_do_call+0x12/0x28
[ 3183.603419] ---[ end trace cba419e95b73d96f ]---

I'm not sure why ext3 journal is involved. I've run the case twice and both
turned out with the same trace messages.

>
> # Before you apply this patch,
> # apply following patch which I sent before:
> http://marc.info/?l=linux-ext4&m=125186152727422&w=2
>
> Regards,
> Akira Fujita
>
> ---
> fs/ext4/move_extent.c | 56 ++++++++++++++++++++++++++++++++----------------
> 1 files changed, 37 insertions(+), 19 deletions(-)
>
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index 0d10f8b..052acd9 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -25,6 +25,8 @@
> if (IS_ERR(path)) { \
> ret = PTR_ERR(path); \
> path = NULL; \
> + } else if (path[ext_depth(inode)].p_ext == NULL) { \
> + ret = -ENODATA; \
> } \
> } while (0)
>
> @@ -284,7 +286,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
>
> if (new_flag) {
> get_ext_path(orig_path, orig_inode, eblock, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
>
> if (ext4_ext_insert_extent(handle, orig_inode,
> @@ -295,7 +297,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
> if (end_flag) {
> get_ext_path(orig_path, orig_inode,
> le32_to_cpu(end_ext->ee_block) - 1, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
>
> if (ext4_ext_insert_extent(handle, orig_inode,
> @@ -554,8 +556,10 @@ mext_leaf_block(handle_t *handle, struct inode *orig_inode,
> * @orig_off: block offset of original inode
> * @donor_off: block offset of donor inode
> * @max_count: the maximun length of extents
> + *
> + * Return 0 on success, or a negative error value on failure.
> */
> -static void
> +static int
> mext_calc_swap_extents(struct ext4_extent *tmp_dext,
> struct ext4_extent *tmp_oext,
> ext4_lblk_t orig_off, ext4_lblk_t donor_off,
> @@ -564,6 +568,19 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
> ext4_lblk_t diff, orig_diff;
> struct ext4_extent dext_old, oext_old;
>
> + BUG_ON(orig_off != donor_off);
> +
> + /* original and donor extents have to cover the same block offset */
> + if (orig_off < le32_to_cpu(tmp_oext->ee_block) ||
> + le32_to_cpu(tmp_oext->ee_block) +
> + ext4_ext_get_actual_len(tmp_oext) - 1 < orig_off)
> + return -ENODATA;
> +
> + if (orig_off < le32_to_cpu(tmp_dext->ee_block) ||
> + le32_to_cpu(tmp_dext->ee_block) +
> + ext4_ext_get_actual_len(tmp_dext) - 1 < orig_off)
> + return -ENODATA;
> +
> dext_old = *tmp_dext;
> oext_old = *tmp_oext;
>
> @@ -591,6 +608,8 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
>
> copy_extent_status(&oext_old, tmp_dext);
> copy_extent_status(&dext_old, tmp_oext);
> +
> + return 0;
> }
>
> /**
> @@ -632,12 +651,12 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>
> /* Get the original extent for the block "orig_off" */
> get_ext_path(orig_path, orig_inode, orig_off, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
>
> /* Get the donor extent for the head */
> get_ext_path(donor_path, donor_inode, donor_off, err);
> - if (donor_path == NULL)
> + if (err)
> goto out;
> depth = ext_depth(orig_inode);
> oext = orig_path[depth].p_ext;
> @@ -647,8 +666,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> dext = donor_path[depth].p_ext;
> tmp_dext = *dext;
>
> - mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> + err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> donor_off, count);
> + if (err)
> + goto out;
>
> /* Loop for the donor extents */
> while (1) {
> @@ -679,7 +700,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> if (orig_path)
> ext4_ext_drop_refs(orig_path);
> get_ext_path(orig_path, orig_inode, orig_off, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
> depth = ext_depth(orig_inode);
> oext = orig_path[depth].p_ext;
> @@ -694,7 +715,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> ext4_ext_drop_refs(donor_path);
> get_ext_path(donor_path, donor_inode,
> donor_off, err);
> - if (donor_path == NULL)
> + if (err)
> goto out;
> depth = ext_depth(donor_inode);
> dext = donor_path[depth].p_ext;
> @@ -705,9 +726,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> }
> tmp_dext = *dext;
>
> - mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> - donor_off,
> - count - replaced_count);
> + err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> + donor_off, count - replaced_count);
> + if (err)
> + goto out;
> }
>
> out:
> @@ -1147,20 +1169,16 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
> len -= block_end - file_end;
>
> get_ext_path(orig_path, orig_inode, block_start, ret);
> - if (orig_path == NULL)
> + if (ret)
> goto out2;
>
> /* Get path structure to check the hole */
> get_ext_path(holecheck_path, orig_inode, block_start, ret);
> - if (holecheck_path == NULL)
> + if (ret)
> goto out;
>
> depth = ext_depth(orig_inode);
> ext_cur = holecheck_path[depth].p_ext;
> - if (ext_cur == NULL) {
> - ret = -EINVAL;
> - goto out;
> - }
>
> /*
> * Get proper extent whose ee_block is beyond block_start
> @@ -1283,7 +1301,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
> ext4_ext_drop_refs(holecheck_path);
> get_ext_path(holecheck_path, orig_inode,
> seq_start, ret);
> - if (holecheck_path == NULL)
> + if (ret)
> break;
> depth = holecheck_path->p_depth;
>
> @@ -1291,7 +1309,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
> if (orig_path)
> ext4_ext_drop_refs(orig_path);
> get_ext_path(orig_path, orig_inode, seq_start, ret);
> - if (orig_path == NULL)
> + if (ret)
> break;
>
> ext_cur = holecheck_path[depth].p_ext;
>


--
Best Regards,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.

2009-09-06 03:13:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

On Wed, Sep 02, 2009 at 12:18:02PM +0900, Akira Fujita wrote:
> ext4: Return exchanged blocks count to user space in failure
>
> From: Akira Fujita <[email protected]>
>
> Return exchanged blocks count (moved_len) to user space,
> if ext4_move_extents() failed on the way.
>
> Signed-off-by: Akira Fujita <[email protected]>

Added to the ext4 patch queue.


2009-09-06 03:37:17

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

On Fri, Sep 04, 2009 at 06:55:37PM +0900, Akira Fujita wrote:
>
> More test cases are needed for EXT4_IOC_MOVE_EXT,
> so this patch may not be complete,
> but the problem you reported is fixed at least.
> I am now testing EXT4_IOC_MOVE_EXT hard.

I've not added this patch to the patch queue, yet, based on the fact
that are still doing more testing and Pen Tao seems to have found more
issues.

I have applied your original four patches since I've looked over the
patches and they look good.

Two comments I have of the move_extents() code; it would be preferable
if you replace BUG_ON calls with a call to ext4_error(); that way
instead of crashing the entire kernel, you print an error and then
stop making changes to the file system in question. Users will be
much happier if the system doesn't completely crash, and it also
becomes easier to grab the system messages after an ext4_error(),
compared to BUG_ON().

Secondly, it would be really nice to replace get_ext_path() with an
inline function. The problem with get_ext_path() is that for someone
who is just reading through the code it looks like a function call,
but the first and fourth arguments get modified. But if someone
doesn't jump up to the beginning of the call, this isn't obvious.

If I can't look at the #define, it's not obvious that orig_path and
err will be modified.

get_ext_path(orig_path, orig_inode, eblock, err);

A calling path like this is much more obvious:

err = get_ext_path(orig_inode, eblock, &orig_path);

See? Just one look at the 2nd calling pattern makes it obvious that
err and orig_path functions will be modified. And returning the error
code (as a negative errno code) is a common calling convention used in
the kernel.

Rusty's slides about interface design are especially good to review in
this context:

http://ozlabs.org/~rusty/ols-2003-keynote/img27.html

(His whole slide deck are good to review, but this section is
especially valuable.)

- Ted

2009-09-08 04:12:16

by Akira Fujita

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi Peng,

Peng Tao wrote:
> Hi, Akira,
>
> Akira Fujita wrote:
>> Hi Peng,
>> Peng Tao wrote:
>>> Hi, Greg,
>>>
>>> On Thu, Sep 3, 2009 at 4:59 AM, Greg Freemyer<[email protected]> wrote:
>>>> Peng,
>>>>
>>>> I have not looked at the code very closely, but can you tell me where
>>>> a file corruption can take place? Not completing the replacement of
>>>> extents with donor extents is one thing. Corrupting the original file
>>>> contents is another.
>>> The file corruption is mainly because of the half done replacement.
>>>
>>> My test case is here:
>>> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>>>
>> This patch solves your test case problem.
>>
>>> $dd if=/dev/zero of=zero.img bs=10M count=0 seek=50
>>> $dd if=../609xp.img of=first.img bs=10M count=1
>>> $dd if=/dev/zero of=first.img bs=10M count=0 seek=50
>>> $dd if=../609xp.img of=last.img bs=10M count=1 seek=49
>>> $dd if=../609xp.img of=middle.img bs=10M count=1 seek=24
>>> $dd if=/dev/zero of=middle.img bs=10M count=0 seek=50
>>
>> This problem is caused by the fact that logical offset of
>> orig file is different from donor file's.
>> To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
>> add checks to mext_calc_swap_extents() and handles it as error,
>> since data exchange must be done between the same blocks.
>>
>> Note: This problem does not happen in ext4 online defrag
>> (means with e4defrag command), because the donor file
>> which is created by e4defrag in user space is
>> file constitution same as orig file.
>>
>> And add the extent null check to ext_get_path() for
>> followings test case.
>>> $dd if=/dev/zero of=zero.img bs=10M count=0 seek=50
>> More test cases are needed for EXT4_IOC_MOVE_EXT,
>> so this patch may not be complete,
>> but the problem you reported is fixed at least.
>> I am now testing EXT4_IOC_MOVE_EXT hard.
>>
>> BTW, I'm now looking into the empty extent issue which
>> you reported, so I will release the patch soon.
>> http://marc.info/?l=linux-ext4&m=124975192830024&w=2
>>
>> Could you do your test case again with this patch?
> After applying the two patches, I run my test case with first.img as the orig file (and middle.img or
> last.img as donor file). My kernel panics and I find following message in /var/log/messages after reboot:

I could not reproduce this panic.
Would you tell me about your test environment (1-5)?

1. What is your kernel version? (2.6.31-rc2 + ext4 patch queue + my patch?)
2. What FS mount options are enabled?
3. What options are enabled to create ext4?
4. Are image files (first.img, middle.img and last.img)
same as your previous mail?
http://marc.info/?l=linux-ext4&m=124992522305319&w=2
5. What arguments are set to EXT4_IOC_MOVE_EXT in your test case?

Regards,
Akira Fujita


> Sep 4 23:21:05 bergwolf -- MARK --
> [ 3183.602852] Modules linked in: ext4 ppdev lp parport binfmt_misc i915 kvm_intel kvm uinput ipv6 cpufreq_userspace cpufreq_conservative cpufreq_powersave jbd2 crc16 fuse dm_snapshot dm_mirror dm_region_hash dm_log dm_mod zlib_deflate crc32c acpi_cpufreq sbp2 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss pcmcia rtc_cmos snd_pcm rtc_core i2c_i801 rtc_lib snd_timer snd_page_alloc psmouse yenta_socket rsrc_nonstatic thinkpad_acpi pcmcia_core serio_raw evdev uhci_hcd firewire_ohci firewire_core crc_itu_t video output ehci_hcd e1000e usbcore [last unloaded: ext4]
> [ 3183.602951]
> [ 3183.602958] Pid: 6937, comm: a.out Not tainted (2.6.31-rc2-drm-intel-next #2) 7676A26
> [ 3183.602965] EIP: 0060:[<c01cfa26>] EFLAGS: 00210287 CPU: 1
> [ 3183.602977] EIP is at journal_start+0x39/0xb9
> [ 3183.602982] EAX: f61a2a80 EBX: f26f048c ECX: f6995200 EDX: f6995000
> [ 3183.602988] ESI: f26f048c EDI: f6f59c88 EBP: f1a77c90 ESP: f1a77c7c
> [ 3183.602994] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 3183.603010] 00000002 f6995000 f6f59c88 f26f048c f6f59c88 f1a77c98 c01c9cc6 f1a77cac
> [ 3183.603024] <0> c01c0b51 f6f59c88 00000001 f6995200 f1a77cc0 c0192799 004cba00 f6f59c88
> [ 3183.603039] <0> f68b3840 f1a77cd4 c018ab14 004cba00 00000000 ff7fc000 f1a77d44 c015c7ec
> [ 3183.603070] [<c01c9cc6>] ? ext3_journal_start_sb+0x40/0x42
> [ 3183.603076] [<c01c0b51>] ? ext3_dirty_inode+0x24/0x67
> [ 3183.603087] [<c0192799>] ? __mark_inode_dirty+0x23/0xc6
> [ 3183.603097] [<c018ab14>] ? file_update_time+0x7a/0xa3
> [ 3183.603108] [<c015c7ec>] ? __generic_file_aio_write_nolock+0x2d6/0x3fe
> [ 3183.603151] [<fa29d3b4>] ? ext4_ext_find_extent+0x3f/0x230 [ext4]
> [ 3183.603161] [<c015d0e3>] ? generic_file_aio_write+0x57/0xb4
> [ 3183.603200] [<fa2a6c26>] ? mext_replace_branches+0x31f/0x329 [ext4]
> [ 3183.603209] [<c01bef56>] ? ext3_file_write+0x1a/0x88
> [ 3183.603219] [<c017b6e2>] ? do_sync_write+0xab/0xe9
> [ 3183.603229] [<c0137403>] ? autoremove_wake_function+0x0/0x33
> [ 3183.603239] [<c013dbda>] ? getnstimeofday+0x52/0xda
> [ 3183.603249] [<c014d027>] ? do_acct_process+0x68d/0x6b2
> [ 3183.603257] [<c015b46c>] ? find_get_page+0x1d/0x81
> [ 3183.603268] [<c018df9f>] ? mntput_no_expire+0x19/0xb3
> [ 3183.603276] [<c017c9c7>] ? __fput+0x17c/0x184
> [ 3183.603286] [<c014d09f>] ? acct_process+0x53/0x66
> [ 3183.603294] [<c012a318>] ? do_exit+0x174/0x573
> [ 3183.603303] [<c012a778>] ? do_group_exit+0x61/0x88
> [ 3183.603311] [<c012a7b2>] ? sys_exit_group+0x13/0x17
> [ 3183.603320] [<c0102994>] ? sysenter_do_call+0x12/0x28
> [ 3183.603419] ---[ end trace cba419e95b73d96f ]---
>
> I'm not sure why ext3 journal is involved. I've run the case twice and both
> turned out with the same trace messages.


2009-09-08 08:01:00

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi, Akira,

Akira Fujita wrote:
> Hi Peng,
>
> Peng Tao wrote:
>> Hi, Akira,
>>
>> Akira Fujita wrote:
>>> Hi Peng,
>>> Peng Tao wrote:
>>>> Hi, Greg,
>>>>
>>>> On Thu, Sep 3, 2009 at 4:59 AM, Greg Freemyer<[email protected]> wrote:
>>>>> Peng,
>>>>>
>>>>> I have not looked at the code very closely, but can you tell me where
>>>>> a file corruption can take place? Not completing the replacement of
>>>>> extents with donor extents is one thing. Corrupting the original file
>>>>> contents is another.
>>>> The file corruption is mainly because of the half done replacement.
>>>>
>>>> My test case is here:
>>>> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>>>>
>>> This patch solves your test case problem.
>>>
>>>> $dd if=/dev/zero of=zero.img bs=10M count=0 seek=50
>>>> $dd if=../609xp.img of=first.img bs=10M count=1
>>>> $dd if=/dev/zero of=first.img bs=10M count=0 seek=50
>>>> $dd if=../609xp.img of=last.img bs=10M count=1 seek=49
>>>> $dd if=../609xp.img of=middle.img bs=10M count=1 seek=24
>>>> $dd if=/dev/zero of=middle.img bs=10M count=0 seek=50
>>> This problem is caused by the fact that logical offset of
>>> orig file is different from donor file's.
>>> To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
>>> add checks to mext_calc_swap_extents() and handles it as error,
>>> since data exchange must be done between the same blocks.
>>>
>>> Note: This problem does not happen in ext4 online defrag
>>> (means with e4defrag command), because the donor file
>>> which is created by e4defrag in user space is
>>> file constitution same as orig file.
>>>
>>> And add the extent null check to ext_get_path() for
>>> followings test case.
>>>> $dd if=/dev/zero of=zero.img bs=10M count=0 seek=50
>>> More test cases are needed for EXT4_IOC_MOVE_EXT,
>>> so this patch may not be complete,
>>> but the problem you reported is fixed at least.
>>> I am now testing EXT4_IOC_MOVE_EXT hard.
>>>
>>> BTW, I'm now looking into the empty extent issue which
>>> you reported, so I will release the patch soon.
>>> http://marc.info/?l=linux-ext4&m=124975192830024&w=2
>>>
>>> Could you do your test case again with this patch?
>> After applying the two patches, I run my test case with first.img as the orig file (and middle.img or
>> last.img as donor file). My kernel panics and I find following message in /var/log/messages after reboot:
>
> I could not reproduce this panic.
> Would you tell me about your test environment (1-5)?
>
> 1. What is your kernel version? (2.6.31-rc2 + ext4 patch queue + my patch?)
Only 2.6.31-rc2 from linus tree + your two patches. I didn't apply ext4 patch queue.
> 2. What FS mount options are enabled?
rw,noatime,relatime,commit=360
> 3. What options are enabled to create ext4?
[[email protected]~]$sudo tune2fs -l /dev/sda9
tune2fs 1.41.9 (22-Aug-2009)
Filesystem volume name: <none>
Last mounted on: /other
Filesystem UUID: 90548cb8-5748-4b18-bbe9-e7254439cb82
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 125184
Block count: 500015
Reserved block count: 25000
Free blocks: 299959
Free inodes: 125162
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 122
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 7824
Inode blocks per group: 489
Flex block group size: 16
Filesystem created: Sun Sep 7 15:13:09 2008
Last mount time: Tue Sep 8 15:19:44 2009
Last write time: Tue Sep 8 15:19:44 2009
Mount count: 13
Maximum mount count: 36
Last checked: Fri Sep 4 20:56:50 2009
Check interval: 15552000 (6 months)
Next check after: Wed Mar 3 20:56:50 2010
Lifetime writes: 1128 MB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 3c5f2a77-c446-4124-94f7-958ba6155f37
Journal backup: inode blocks
> 4. Are image files (first.img, middle.img and last.img)
> same as your previous mail?
> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
Yes.
> 5. What arguments are set to EXT4_IOC_MOVE_EXT in your test case?
move_data.donor_fd = donor_fd;
move_data.orig_start = 0;
move_data.donor_start = 0;
move_data.len = SECTOR_TO_BLOCK(statbuf.st_blocks, fs.f_bsize);

err = ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data);
>
> Regards,
> Akira Fujita
>
>
>> Sep 4 23:21:05 bergwolf -- MARK --
>> [ 3183.602852] Modules linked in: ext4 ppdev lp parport binfmt_misc i915 kvm_intel kvm uinput ipv6 cpufreq_userspace cpufreq_conservative cpufreq_powersave jbd2 crc16 fuse dm_snapshot dm_mirror dm_region_hash dm_log dm_mod zlib_deflate crc32c acpi_cpufreq sbp2 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss pcmcia rtc_cmos snd_pcm rtc_core i2c_i801 rtc_lib snd_timer snd_page_alloc psmouse yenta_socket rsrc_nonstatic thinkpad_acpi pcmcia_core serio_raw evdev uhci_hcd firewire_ohci firewire_core crc_itu_t video output ehci_hcd e1000e usbcore [last unloaded: ext4]
>> [ 3183.602951]
>> [ 3183.602958] Pid: 6937, comm: a.out Not tainted (2.6.31-rc2-drm-intel-next #2) 7676A26
>> [ 3183.602965] EIP: 0060:[<c01cfa26>] EFLAGS: 00210287 CPU: 1
>> [ 3183.602977] EIP is at journal_start+0x39/0xb9
>> [ 3183.602982] EAX: f61a2a80 EBX: f26f048c ECX: f6995200 EDX: f6995000
>> [ 3183.602988] ESI: f26f048c EDI: f6f59c88 EBP: f1a77c90 ESP: f1a77c7c
>> [ 3183.602994] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> [ 3183.603010] 00000002 f6995000 f6f59c88 f26f048c f6f59c88 f1a77c98 c01c9cc6 f1a77cac
>> [ 3183.603024] <0> c01c0b51 f6f59c88 00000001 f6995200 f1a77cc0 c0192799 004cba00 f6f59c88
>> [ 3183.603039] <0> f68b3840 f1a77cd4 c018ab14 004cba00 00000000 ff7fc000 f1a77d44 c015c7ec
>> [ 3183.603070] [<c01c9cc6>] ? ext3_journal_start_sb+0x40/0x42
>> [ 3183.603076] [<c01c0b51>] ? ext3_dirty_inode+0x24/0x67
>> [ 3183.603087] [<c0192799>] ? __mark_inode_dirty+0x23/0xc6
>> [ 3183.603097] [<c018ab14>] ? file_update_time+0x7a/0xa3
>> [ 3183.603108] [<c015c7ec>] ? __generic_file_aio_write_nolock+0x2d6/0x3fe
>> [ 3183.603151] [<fa29d3b4>] ? ext4_ext_find_extent+0x3f/0x230 [ext4]
>> [ 3183.603161] [<c015d0e3>] ? generic_file_aio_write+0x57/0xb4
>> [ 3183.603200] [<fa2a6c26>] ? mext_replace_branches+0x31f/0x329 [ext4]
>> [ 3183.603209] [<c01bef56>] ? ext3_file_write+0x1a/0x88
>> [ 3183.603219] [<c017b6e2>] ? do_sync_write+0xab/0xe9
>> [ 3183.603229] [<c0137403>] ? autoremove_wake_function+0x0/0x33
>> [ 3183.603239] [<c013dbda>] ? getnstimeofday+0x52/0xda
>> [ 3183.603249] [<c014d027>] ? do_acct_process+0x68d/0x6b2
>> [ 3183.603257] [<c015b46c>] ? find_get_page+0x1d/0x81
>> [ 3183.603268] [<c018df9f>] ? mntput_no_expire+0x19/0xb3
>> [ 3183.603276] [<c017c9c7>] ? __fput+0x17c/0x184
>> [ 3183.603286] [<c014d09f>] ? acct_process+0x53/0x66
>> [ 3183.603294] [<c012a318>] ? do_exit+0x174/0x573
>> [ 3183.603303] [<c012a778>] ? do_group_exit+0x61/0x88
>> [ 3183.603311] [<c012a7b2>] ? sys_exit_group+0x13/0x17
>> [ 3183.603320] [<c0102994>] ? sysenter_do_call+0x12/0x28
>> [ 3183.603419] ---[ end trace cba419e95b73d96f ]---
>>
>> I'm not sure why ext3 journal is involved. I've run the case twice and both
>> turned out with the same trace messages.
>
>


--
Best Regards,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.

2009-09-11 05:07:21

by Akira Fujita

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi Peng,
Peng Tao wrote:
>>> After applying the two patches, I run my test case with first.img as the orig file (and middle.img or
>>> last.img as donor file). My kernel panics and I find following message in /var/log/messages after reboot:
>> I could not reproduce this panic.
>> Would you tell me about your test environment (1-5)?
>>
>> 1. What is your kernel version? (2.6.31-rc2 + ext4 patch queue + my patch?)
> Only 2.6.31-rc2 from linus tree + your two patches. I didn't apply ext4 patch queue.
>> 2. What FS mount options are enabled?
> rw,noatime,relatime,commit=360
>> 3. What options are enabled to create ext4?
> [[email protected]~]$sudo tune2fs -l /dev/sda9
> tune2fs 1.41.9 (22-Aug-2009)
> Filesystem volume name: <none>
> Last mounted on: /other
> Filesystem UUID: 90548cb8-5748-4b18-bbe9-e7254439cb82
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 125184
> Block count: 500015
> Reserved block count: 25000
> Free blocks: 299959
> Free inodes: 125162
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 122
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 7824
> Inode blocks per group: 489
> Flex block group size: 16
> Filesystem created: Sun Sep 7 15:13:09 2008
> Last mount time: Tue Sep 8 15:19:44 2009
> Last write time: Tue Sep 8 15:19:44 2009
> Mount count: 13
> Maximum mount count: 36
> Last checked: Fri Sep 4 20:56:50 2009
> Check interval: 15552000 (6 months)
> Next check after: Wed Mar 3 20:56:50 2010
> Lifetime writes: 1128 MB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: tea
> Directory Hash Seed: 3c5f2a77-c446-4124-94f7-958ba6155f37
> Journal backup: inode blocks
>> 4. Are image files (first.img, middle.img and last.img)
>> same as your previous mail?
>> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
> Yes.
>> 5. What arguments are set to EXT4_IOC_MOVE_EXT in your test case?
> move_data.donor_fd = donor_fd;
> move_data.orig_start = 0;
> move_data.donor_start = 0;
> move_data.len = SECTOR_TO_BLOCK(statbuf.st_blocks, fs.f_bsize);
>
> err = ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data);

I tried to reproduce your problem with the following steps,
but I couldn't. Please check my procedure.

1. Test environment
linux2.6.31-rc2 + two patches as follows:
http://marc.info/?l=linux-ext4&m=125186152727422&w=3
http://marc.info/?l=linux-ext4&m=125205817410548&w=3

2. Create ext4 filesystem and mount it
mke2fs -t ext4 -b 4096 /dev/XXX
mount -t ext4 -O rw,noatime,relatime,commit=360 /dev/XXX /mnt/XXX

3. Create orig and donor files
dd if=/dev/urandom of=/mnt/XXX/first.img bs=10M count=1
dd if=/dev/zero of=/mnt/XXX/first.img bs=10M count=0 seek=50
dd if=/dev/urandom of=/mnt/XXX/last.img bs=10M count=1 seek=49

4. Call EXT4_MOVE_MOVE_EXT ioctl to files with the following arguments
(orig_file = /mnt/XXX/first.img, donor_file = /mnt/XXX/last.img)
move_data.orig_start = 0;
move_data.donor_start = 0;
move_data.len = 12152;
ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data)

However, EXT4_IOC_MOVE_EXT returned ENODATA (this is a fine result by my patch)
and it didn't occur the kernel panic which you got.
If I chose a wrong step, please correct me.

I appreciate if you could test following environment.
- 2.6.31-rc8 + ext4 patch queue
(commit:bdacccd94d08a3f89e7fca676b28aa262c6d75fa) + attached patch

Regards,
Akira Fujita

---
fs/ext4/move_extent.c | 72 +++++++++++++++++++++++++++++++++---------------
1 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 7bfbd58..7266247 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -25,6 +25,8 @@
if (IS_ERR(path)) { \
ret = PTR_ERR(path); \
path = NULL; \
+ } else if (path[ext_depth(inode)].p_ext == NULL) { \
+ ret = -ENODATA; \
} \
} while (0)

@@ -284,7 +286,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,

if (new_flag) {
get_ext_path(orig_path, orig_inode, eblock, err);
- if (orig_path == NULL)
+ if (err)
goto out;

if (ext4_ext_insert_extent(handle, orig_inode,
@@ -295,7 +297,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
if (end_flag) {
get_ext_path(orig_path, orig_inode,
le32_to_cpu(end_ext->ee_block) - 1, err);
- if (orig_path == NULL)
+ if (err)
goto out;

if (ext4_ext_insert_extent(handle, orig_inode,
@@ -554,8 +556,10 @@ mext_leaf_block(handle_t *handle, struct inode *orig_inode,
* @orig_off: block offset of original inode
* @donor_off: block offset of donor inode
* @max_count: the maximun length of extents
+ *
+ * Return 0 on success, or a negative error value on failure.
*/
-static void
+static int
mext_calc_swap_extents(struct ext4_extent *tmp_dext,
struct ext4_extent *tmp_oext,
ext4_lblk_t orig_off, ext4_lblk_t donor_off,
@@ -564,6 +568,19 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
ext4_lblk_t diff, orig_diff;
struct ext4_extent dext_old, oext_old;

+ BUG_ON(orig_off != donor_off);
+
+ /* original and donor extents have to cover the same block offset */
+ if (orig_off < le32_to_cpu(tmp_oext->ee_block) ||
+ le32_to_cpu(tmp_oext->ee_block) +
+ ext4_ext_get_actual_len(tmp_oext) - 1 < orig_off)
+ return -ENODATA;
+
+ if (orig_off < le32_to_cpu(tmp_dext->ee_block) ||
+ le32_to_cpu(tmp_dext->ee_block) +
+ ext4_ext_get_actual_len(tmp_dext) - 1 < orig_off)
+ return -ENODATA;
+
dext_old = *tmp_dext;
oext_old = *tmp_oext;

@@ -591,6 +608,8 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,

copy_extent_status(&oext_old, tmp_dext);
copy_extent_status(&dext_old, tmp_oext);
+
+ return 0;
}

/**
@@ -632,12 +651,12 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,

/* Get the original extent for the block "orig_off" */
get_ext_path(orig_path, orig_inode, orig_off, err);
- if (orig_path == NULL)
+ if (err)
goto out;

/* Get the donor extent for the head */
get_ext_path(donor_path, donor_inode, donor_off, err);
- if (donor_path == NULL)
+ if (err)
goto out;
depth = ext_depth(orig_inode);
oext = orig_path[depth].p_ext;
@@ -647,8 +666,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
dext = donor_path[depth].p_ext;
tmp_dext = *dext;

- mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
+ err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
donor_off, count);
+ if (err)
+ goto out;

/* Loop for the donor extents */
while (1) {
@@ -679,7 +700,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
if (orig_path)
ext4_ext_drop_refs(orig_path);
get_ext_path(orig_path, orig_inode, orig_off, err);
- if (orig_path == NULL)
+ if (err)
goto out;
depth = ext_depth(orig_inode);
oext = orig_path[depth].p_ext;
@@ -694,7 +715,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
ext4_ext_drop_refs(donor_path);
get_ext_path(donor_path, donor_inode,
donor_off, err);
- if (donor_path == NULL)
+ if (err)
goto out;
depth = ext_depth(donor_inode);
dext = donor_path[depth].p_ext;
@@ -705,9 +726,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
}
tmp_dext = *dext;

- mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
- donor_off,
- count - replaced_count);
+ err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
+ donor_off, count - replaced_count);
+ if (err)
+ goto out;
}

out:
@@ -1156,27 +1178,27 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
len -= block_end - file_end;

get_ext_path(orig_path, orig_inode, block_start, ret);
- if (orig_path == NULL)
+ if (ret)
goto out2;

/* Get path structure to check the hole */
get_ext_path(holecheck_path, orig_inode, block_start, ret);
- if (holecheck_path == NULL)
+ if (ret)
goto out;

depth = ext_depth(orig_inode);
ext_cur = holecheck_path[depth].p_ext;
- if (ext_cur == NULL) {
- ret = -EINVAL;
- goto out;
- }

/*
- * Get proper extent whose ee_block is beyond block_start
- * if block_start was within the hole.
+ * Get proper starting location of block replacement if block_start was
+ * within the hole.
*/
if (le32_to_cpu(ext_cur->ee_block) +
ext4_ext_get_actual_len(ext_cur) - 1 < block_start) {
+ /*
+ * The hole exists between extents or the tail of
+ * original file.
+ */
last_extent = mext_next_extent(orig_inode,
holecheck_path, &ext_cur);
if (last_extent < 0) {
@@ -1189,8 +1211,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
ret = last_extent;
goto out;
}
- }
- seq_start = block_start;
+ seq_start = le32_to_cpu(ext_cur->ee_block);
+ } else if (le32_to_cpu(ext_cur->ee_block) > block_start)
+ /* The hole exists at the beginning of original file. */
+ seq_start = le32_to_cpu(ext_cur->ee_block);
+ else
+ seq_start = block_start;

/* No blocks within the specified range. */
if (le32_to_cpu(ext_cur->ee_block) > block_end) {
@@ -1292,7 +1318,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
ext4_ext_drop_refs(holecheck_path);
get_ext_path(holecheck_path, orig_inode,
seq_start, ret);
- if (holecheck_path == NULL)
+ if (ret)
break;
depth = holecheck_path->p_depth;

@@ -1300,7 +1326,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
if (orig_path)
ext4_ext_drop_refs(orig_path);
get_ext_path(orig_path, orig_inode, seq_start, ret);
- if (orig_path == NULL)
+ if (ret)
break;

ext_cur = holecheck_path[depth].p_ext;

2009-09-11 05:28:48

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi, Arika,

2009/9/11 Akira Fujita <[email protected]>:
> Hi Peng,
> Peng Tao wrote:
>>>> After applying the two patches, I run my test case with first.img as the orig file (and middle.img or
>>>> last.img as donor file). My kernel panics and I find following message in /var/log/messages after reboot:
>>> I could not reproduce this panic.
>>> Would you tell me about your test environment (1-5)?
>>>
>>> 1. What is your kernel version? (2.6.31-rc2 + ext4 patch queue + my patch?)
>> Only 2.6.31-rc2 from linus tree + your two patches. I didn't apply ext4 patch queue.
>>> 2. What FS mount options are enabled?
>> rw,noatime,relatime,commit=360
>>> 3. What options are enabled to create ext4?
>>  [[email protected]~]$sudo tune2fs -l /dev/sda9
>> tune2fs 1.41.9 (22-Aug-2009)
>> Filesystem volume name:   <none>
>> Last mounted on:          /other
>> Filesystem UUID:          90548cb8-5748-4b18-bbe9-e7254439cb82
>> Filesystem magic number:  0xEF53
>> Filesystem revision #:    1 (dynamic)
>> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>> Filesystem flags:         signed_directory_hash
>> Default mount options:    (none)
>> Filesystem state:         clean
>> Errors behavior:          Continue
>> Filesystem OS type:       Linux
>> Inode count:              125184
>> Block count:              500015
>> Reserved block count:     25000
>> Free blocks:              299959
>> Free inodes:              125162
>> First block:              0
>> Block size:               4096
>> Fragment size:            4096
>> Reserved GDT blocks:      122
>> Blocks per group:         32768
>> Fragments per group:      32768
>> Inodes per group:         7824
>> Inode blocks per group:   489
>> Flex block group size:    16
>> Filesystem created:       Sun Sep  7 15:13:09 2008
>> Last mount time:          Tue Sep  8 15:19:44 2009
>> Last write time:          Tue Sep  8 15:19:44 2009
>> Mount count:              13
>> Maximum mount count:      36
>> Last checked:             Fri Sep  4 20:56:50 2009
>> Check interval:           15552000 (6 months)
>> Next check after:         Wed Mar  3 20:56:50 2010
>> Lifetime writes:          1128 MB
>> Reserved blocks uid:      0 (user root)
>> Reserved blocks gid:      0 (group root)
>> First inode:              11
>> Inode size:             256
>> Required extra isize:     28
>> Desired extra isize:      28
>> Journal inode:            8
>> Default directory hash:   tea
>> Directory Hash Seed:      3c5f2a77-c446-4124-94f7-958ba6155f37
>> Journal backup:           inode blocks
>>> 4. Are image files  (first.img, middle.img and last.img)
>>>    same as your previous mail?
>>>    http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>> Yes.
>>> 5. What arguments are set to EXT4_IOC_MOVE_EXT in your test case?
>>         move_data.donor_fd = donor_fd;
>>         move_data.orig_start = 0;
>>         move_data.donor_start = 0;
>>         move_data.len = SECTOR_TO_BLOCK(statbuf.st_blocks, fs.f_bsize);
>>
>>         err = ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data);
>
> I tried to reproduce your problem with the following steps,
> but I couldn't. Please check my procedure.
>
> 1. Test environment
> linux2.6.31-rc2 + two patches as follows:
> http://marc.info/?l=linux-ext4&m=125186152727422&w=3
> http://marc.info/?l=linux-ext4&m=125205817410548&w=3
>
> 2. Create ext4 filesystem and mount it
> mke2fs -t ext4 -b 4096 /dev/XXX
> mount -t ext4 -O rw,noatime,relatime,commit=360 /dev/XXX /mnt/XXX
>
> 3. Create orig and donor files
> dd if=/dev/urandom of=/mnt/XXX/first.img bs=10M count=1
> dd if=/dev/zero of=/mnt/XXX/first.img bs=10M count=0 seek=50
> dd if=/dev/urandom of=/mnt/XXX/last.img bs=10M count=1 seek=49
>
> 4. Call EXT4_MOVE_MOVE_EXT ioctl to files with the following arguments
> (orig_file = /mnt/XXX/first.img, donor_file = /mnt/XXX/last.img)
> move_data.orig_start = 0;
> move_data.donor_start = 0;
> move_data.len = 12152;
> ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data)
>
> However, EXT4_IOC_MOVE_EXT returned ENODATA (this is a fine result by my patch)
> and it didn't occur the kernel panic which you got.
> If I chose a wrong step, please correct me.
Just a quick response.
I don't see any wrong step above. I'll review my test later today when
I am back home and see if I was missing anything.
>
> I appreciate if you could test following environment.
> - 2.6.31-rc8 + ext4 patch queue
>  (commit:bdacccd94d08a3f89e7fca676b28aa262c6d75fa) + attached patch
Will give it a shot later.

>
> Regards,
> Akira Fujita
>
> ---
>  fs/ext4/move_extent.c |   72 +++++++++++++++++++++++++++++++++---------------
>  1 files changed, 49 insertions(+), 23 deletions(-)
>
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index 7bfbd58..7266247 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -25,6 +25,8 @@
>                if (IS_ERR(path)) {                                     \
>                        ret = PTR_ERR(path);                            \
>                        path = NULL;                                    \
> +               } else if (path[ext_depth(inode)].p_ext == NULL) {      \
> +                       ret = -ENODATA;                                 \
>                }                                                       \
>        } while (0)
>
> @@ -284,7 +286,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
>
>        if (new_flag) {
>                get_ext_path(orig_path, orig_inode, eblock, err);
> -               if (orig_path == NULL)
> +               if (err)
>                        goto out;
>
>                if (ext4_ext_insert_extent(handle, orig_inode,
> @@ -295,7 +297,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
>        if (end_flag) {
>                get_ext_path(orig_path, orig_inode,
>                                      le32_to_cpu(end_ext->ee_block) - 1, err);
> -               if (orig_path == NULL)
> +               if (err)
>                        goto out;
>
>                if (ext4_ext_insert_extent(handle, orig_inode,
> @@ -554,8 +556,10 @@ mext_leaf_block(handle_t *handle, struct inode *orig_inode,
>  * @orig_off:          block offset of original inode
>  * @donor_off:         block offset of donor inode
>  * @max_count:         the maximun length of extents
> + *
> + * Return 0 on success, or a negative error value on failure.
>  */
> -static void
> +static int
>  mext_calc_swap_extents(struct ext4_extent *tmp_dext,
>                              struct ext4_extent *tmp_oext,
>                              ext4_lblk_t orig_off, ext4_lblk_t donor_off,
> @@ -564,6 +568,19 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
>        ext4_lblk_t diff, orig_diff;
>        struct ext4_extent dext_old, oext_old;
>
> +       BUG_ON(orig_off != donor_off);
> +
> +       /* original and donor extents have to cover the same block offset */
> +       if (orig_off < le32_to_cpu(tmp_oext->ee_block) ||
> +           le32_to_cpu(tmp_oext->ee_block) +
> +                       ext4_ext_get_actual_len(tmp_oext) - 1 < orig_off)
> +               return -ENODATA;
> +
> +       if (orig_off < le32_to_cpu(tmp_dext->ee_block) ||
> +           le32_to_cpu(tmp_dext->ee_block) +
> +                       ext4_ext_get_actual_len(tmp_dext) - 1 < orig_off)
> +               return -ENODATA;
> +
>        dext_old = *tmp_dext;
>        oext_old = *tmp_oext;
>
> @@ -591,6 +608,8 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
>
>        copy_extent_status(&oext_old, tmp_dext);
>        copy_extent_status(&dext_old, tmp_oext);
> +
> +       return 0;
>  }
>
>  /**
> @@ -632,12 +651,12 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>
>        /* Get the original extent for the block "orig_off" */
>        get_ext_path(orig_path, orig_inode, orig_off, err);
> -       if (orig_path == NULL)
> +       if (err)
>                goto out;
>
>        /* Get the donor extent for the head */
>        get_ext_path(donor_path, donor_inode, donor_off, err);
> -       if (donor_path == NULL)
> +       if (err)
>                goto out;
>        depth = ext_depth(orig_inode);
>        oext = orig_path[depth].p_ext;
> @@ -647,8 +666,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>        dext = donor_path[depth].p_ext;
>        tmp_dext = *dext;
>
> -       mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> +       err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
>                                      donor_off, count);
> +       if (err)
> +               goto out;
>
>        /* Loop for the donor extents */
>        while (1) {
> @@ -679,7 +700,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>                if (orig_path)
>                        ext4_ext_drop_refs(orig_path);
>                get_ext_path(orig_path, orig_inode, orig_off, err);
> -               if (orig_path == NULL)
> +               if (err)
>                        goto out;
>                depth = ext_depth(orig_inode);
>                oext = orig_path[depth].p_ext;
> @@ -694,7 +715,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>                        ext4_ext_drop_refs(donor_path);
>                get_ext_path(donor_path, donor_inode,
>                                      donor_off, err);
> -               if (donor_path == NULL)
> +               if (err)
>                        goto out;
>                depth = ext_depth(donor_inode);
>                dext = donor_path[depth].p_ext;
> @@ -705,9 +726,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>                }
>                tmp_dext = *dext;
>
> -               mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> -                                             donor_off,
> -                                             count - replaced_count);
> +               err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> +                                          donor_off, count - replaced_count);
> +               if (err)
> +                       goto out;
>        }
>
>  out:
> @@ -1156,27 +1178,27 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
>                len -= block_end - file_end;
>
>        get_ext_path(orig_path, orig_inode, block_start, ret);
> -       if (orig_path == NULL)
> +       if (ret)
>                goto out2;
>
>        /* Get path structure to check the hole */
>        get_ext_path(holecheck_path, orig_inode, block_start, ret);
> -       if (holecheck_path == NULL)
> +       if (ret)
>                goto out;
>
>        depth = ext_depth(orig_inode);
>        ext_cur = holecheck_path[depth].p_ext;
> -       if (ext_cur == NULL) {
> -               ret = -EINVAL;
> -               goto out;
> -       }
>
>        /*
> -        * Get proper extent whose ee_block is beyond block_start
> -        * if block_start was within the hole.
> +        * Get proper starting location of block replacement if block_start was
> +        * within the hole.
>         */
>        if (le32_to_cpu(ext_cur->ee_block) +
>                ext4_ext_get_actual_len(ext_cur) - 1 < block_start) {
> +               /*
> +                * The hole exists between extents or the tail of
> +                * original file.
> +                */
>                last_extent = mext_next_extent(orig_inode,
>                                        holecheck_path, &ext_cur);
>                if (last_extent < 0) {
> @@ -1189,8 +1211,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
>                        ret = last_extent;
>                        goto out;
>                }
> -       }
> -       seq_start = block_start;
> +               seq_start = le32_to_cpu(ext_cur->ee_block);
> +       } else if (le32_to_cpu(ext_cur->ee_block) > block_start)
> +               /* The hole exists at the beginning of original file. */
> +               seq_start = le32_to_cpu(ext_cur->ee_block);
> +       else
> +               seq_start = block_start;
>
>        /* No blocks within the specified range. */
>        if (le32_to_cpu(ext_cur->ee_block) > block_end) {
> @@ -1292,7 +1318,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
>                        ext4_ext_drop_refs(holecheck_path);
>                get_ext_path(holecheck_path, orig_inode,
>                                      seq_start, ret);
> -               if (holecheck_path == NULL)
> +               if (ret)
>                        break;
>                depth = holecheck_path->p_depth;
>
> @@ -1300,7 +1326,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
>                if (orig_path)
>                        ext4_ext_drop_refs(orig_path);
>                get_ext_path(orig_path, orig_inode, seq_start, ret);
> -               if (orig_path == NULL)
> +               if (ret)
>                        break;
>
>                ext_cur = holecheck_path[depth].p_ext;
>



--
Cheers,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.

2009-09-11 16:57:30

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi, Akira,
Akira Fujita wrote:
> Hi Peng,
> Peng Tao wrote:
>>>> After applying the two patches, I run my test case with first.img as the orig file (and middle.img or
>>>> last.img as donor file). My kernel panics and I find following message in /var/log/messages after reboot:
>>> I could not reproduce this panic.
>>> Would you tell me about your test environment (1-5)?
>>>
>>> 1. What is your kernel version? (2.6.31-rc2 + ext4 patch queue + my patch?)
>> Only 2.6.31-rc2 from linus tree + your two patches. I didn't apply ext4 patch queue.
>>> 2. What FS mount options are enabled?
>> rw,noatime,relatime,commit=360
>>> 3. What options are enabled to create ext4?
>> [[email protected]~]$sudo tune2fs -l /dev/sda9
>> tune2fs 1.41.9 (22-Aug-2009)
>> Filesystem volume name: <none>
>> Last mounted on: /other
>> Filesystem UUID: 90548cb8-5748-4b18-bbe9-e7254439cb82
>> Filesystem magic number: 0xEF53
>> Filesystem revision #: 1 (dynamic)
>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>> Filesystem flags: signed_directory_hash
>> Default mount options: (none)
>> Filesystem state: clean
>> Errors behavior: Continue
>> Filesystem OS type: Linux
>> Inode count: 125184
>> Block count: 500015
>> Reserved block count: 25000
>> Free blocks: 299959
>> Free inodes: 125162
>> First block: 0
>> Block size: 4096
>> Fragment size: 4096
>> Reserved GDT blocks: 122
>> Blocks per group: 32768
>> Fragments per group: 32768
>> Inodes per group: 7824
>> Inode blocks per group: 489
>> Flex block group size: 16
>> Filesystem created: Sun Sep 7 15:13:09 2008
>> Last mount time: Tue Sep 8 15:19:44 2009
>> Last write time: Tue Sep 8 15:19:44 2009
>> Mount count: 13
>> Maximum mount count: 36
>> Last checked: Fri Sep 4 20:56:50 2009
>> Check interval: 15552000 (6 months)
>> Next check after: Wed Mar 3 20:56:50 2010
>> Lifetime writes: 1128 MB
>> Reserved blocks uid: 0 (user root)
>> Reserved blocks gid: 0 (group root)
>> First inode: 11
>> Inode size: 256
>> Required extra isize: 28
>> Desired extra isize: 28
>> Journal inode: 8
>> Default directory hash: tea
>> Directory Hash Seed: 3c5f2a77-c446-4124-94f7-958ba6155f37
>> Journal backup: inode blocks
>>> 4. Are image files (first.img, middle.img and last.img)
>>> same as your previous mail?
>>> http://marc.info/?l=linux-ext4&m=124992522305319&w=2
>> Yes.
>>> 5. What arguments are set to EXT4_IOC_MOVE_EXT in your test case?
>> move_data.donor_fd = donor_fd;
>> move_data.orig_start = 0;
>> move_data.donor_start = 0;
>> move_data.len = SECTOR_TO_BLOCK(statbuf.st_blocks, fs.f_bsize);
>>
>> err = ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data);
>
> I tried to reproduce your problem with the following steps,
> but I couldn't. Please check my procedure.
>
> 1. Test environment
> linux2.6.31-rc2 + two patches as follows:
> http://marc.info/?l=linux-ext4&m=125186152727422&w=3
> http://marc.info/?l=linux-ext4&m=125205817410548&w=3
>
> 2. Create ext4 filesystem and mount it
> mke2fs -t ext4 -b 4096 /dev/XXX
> mount -t ext4 -O rw,noatime,relatime,commit=360 /dev/XXX /mnt/XXX
>
> 3. Create orig and donor files
> dd if=/dev/urandom of=/mnt/XXX/first.img bs=10M count=1
> dd if=/dev/zero of=/mnt/XXX/first.img bs=10M count=0 seek=50
> dd if=/dev/urandom of=/mnt/XXX/last.img bs=10M count=1 seek=49
>
> 4. Call EXT4_MOVE_MOVE_EXT ioctl to files with the following arguments
> (orig_file = /mnt/XXX/first.img, donor_file = /mnt/XXX/last.img)
> move_data.orig_start = 0;
> move_data.donor_start = 0;
> move_data.len = 12152;
> ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data)
>
> However, EXT4_IOC_MOVE_EXT returned ENODATA (this is a fine result by my patch)
> and it didn't occur the kernel panic which you got.
> If I chose a wrong step, please correct me.
Sorry, I didn't notice that I was at commit 7638d5322bd89d49e013a03fe2afaeb6d214fabd
of linus tree that includes a few patches after 2.6.31-rc2.
>
> I appreciate if you could test following environment.
> - 2.6.31-rc8 + ext4 patch queue
> (commit:bdacccd94d08a3f89e7fca676b28aa262c6d75fa) + attached patch
I can't reproduce the panic in the above environment. I believe the problem
is fixed. Thanks for your work!
>
> Regards,
> Akira Fujita
>
> ---
> fs/ext4/move_extent.c | 72 +++++++++++++++++++++++++++++++++---------------
> 1 files changed, 49 insertions(+), 23 deletions(-)
>
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index 7bfbd58..7266247 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -25,6 +25,8 @@
> if (IS_ERR(path)) { \
> ret = PTR_ERR(path); \
> path = NULL; \
> + } else if (path[ext_depth(inode)].p_ext == NULL) { \
> + ret = -ENODATA; \
> } \
> } while (0)
>
> @@ -284,7 +286,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
>
> if (new_flag) {
> get_ext_path(orig_path, orig_inode, eblock, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
>
> if (ext4_ext_insert_extent(handle, orig_inode,
> @@ -295,7 +297,7 @@ mext_insert_across_blocks(handle_t *handle, struct inode *orig_inode,
> if (end_flag) {
> get_ext_path(orig_path, orig_inode,
> le32_to_cpu(end_ext->ee_block) - 1, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
>
> if (ext4_ext_insert_extent(handle, orig_inode,
> @@ -554,8 +556,10 @@ mext_leaf_block(handle_t *handle, struct inode *orig_inode,
> * @orig_off: block offset of original inode
> * @donor_off: block offset of donor inode
> * @max_count: the maximun length of extents
> + *
> + * Return 0 on success, or a negative error value on failure.
> */
> -static void
> +static int
> mext_calc_swap_extents(struct ext4_extent *tmp_dext,
> struct ext4_extent *tmp_oext,
> ext4_lblk_t orig_off, ext4_lblk_t donor_off,
> @@ -564,6 +568,19 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
> ext4_lblk_t diff, orig_diff;
> struct ext4_extent dext_old, oext_old;
>
> + BUG_ON(orig_off != donor_off);
> +
> + /* original and donor extents have to cover the same block offset */
> + if (orig_off < le32_to_cpu(tmp_oext->ee_block) ||
> + le32_to_cpu(tmp_oext->ee_block) +
> + ext4_ext_get_actual_len(tmp_oext) - 1 < orig_off)
> + return -ENODATA;
> +
> + if (orig_off < le32_to_cpu(tmp_dext->ee_block) ||
> + le32_to_cpu(tmp_dext->ee_block) +
> + ext4_ext_get_actual_len(tmp_dext) - 1 < orig_off)
> + return -ENODATA;
> +
> dext_old = *tmp_dext;
> oext_old = *tmp_oext;
>
> @@ -591,6 +608,8 @@ mext_calc_swap_extents(struct ext4_extent *tmp_dext,
>
> copy_extent_status(&oext_old, tmp_dext);
> copy_extent_status(&dext_old, tmp_oext);
> +
> + return 0;
> }
>
> /**
> @@ -632,12 +651,12 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>
> /* Get the original extent for the block "orig_off" */
> get_ext_path(orig_path, orig_inode, orig_off, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
>
> /* Get the donor extent for the head */
> get_ext_path(donor_path, donor_inode, donor_off, err);
> - if (donor_path == NULL)
> + if (err)
> goto out;
> depth = ext_depth(orig_inode);
> oext = orig_path[depth].p_ext;
> @@ -647,8 +666,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> dext = donor_path[depth].p_ext;
> tmp_dext = *dext;
>
> - mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> + err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> donor_off, count);
> + if (err)
> + goto out;
>
> /* Loop for the donor extents */
> while (1) {
> @@ -679,7 +700,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> if (orig_path)
> ext4_ext_drop_refs(orig_path);
> get_ext_path(orig_path, orig_inode, orig_off, err);
> - if (orig_path == NULL)
> + if (err)
> goto out;
> depth = ext_depth(orig_inode);
> oext = orig_path[depth].p_ext;
> @@ -694,7 +715,7 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> ext4_ext_drop_refs(donor_path);
> get_ext_path(donor_path, donor_inode,
> donor_off, err);
> - if (donor_path == NULL)
> + if (err)
> goto out;
> depth = ext_depth(donor_inode);
> dext = donor_path[depth].p_ext;
> @@ -705,9 +726,10 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> }
> tmp_dext = *dext;
>
> - mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> - donor_off,
> - count - replaced_count);
> + err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
> + donor_off, count - replaced_count);
> + if (err)
> + goto out;
> }
>
> out:
> @@ -1156,27 +1178,27 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
> len -= block_end - file_end;
>
> get_ext_path(orig_path, orig_inode, block_start, ret);
> - if (orig_path == NULL)
> + if (ret)
> goto out2;
>
> /* Get path structure to check the hole */
> get_ext_path(holecheck_path, orig_inode, block_start, ret);
> - if (holecheck_path == NULL)
> + if (ret)
> goto out;
>
> depth = ext_depth(orig_inode);
> ext_cur = holecheck_path[depth].p_ext;
> - if (ext_cur == NULL) {
> - ret = -EINVAL;
> - goto out;
> - }
>
> /*
> - * Get proper extent whose ee_block is beyond block_start
> - * if block_start was within the hole.
> + * Get proper starting location of block replacement if block_start was
> + * within the hole.
> */
> if (le32_to_cpu(ext_cur->ee_block) +
> ext4_ext_get_actual_len(ext_cur) - 1 < block_start) {
> + /*
> + * The hole exists between extents or the tail of
> + * original file.
> + */
> last_extent = mext_next_extent(orig_inode,
> holecheck_path, &ext_cur);
> if (last_extent < 0) {
> @@ -1189,8 +1211,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
> ret = last_extent;
> goto out;
> }
> - }
> - seq_start = block_start;
> + seq_start = le32_to_cpu(ext_cur->ee_block);
> + } else if (le32_to_cpu(ext_cur->ee_block) > block_start)
> + /* The hole exists at the beginning of original file. */
> + seq_start = le32_to_cpu(ext_cur->ee_block);
> + else
> + seq_start = block_start;
>
> /* No blocks within the specified range. */
> if (le32_to_cpu(ext_cur->ee_block) > block_end) {
> @@ -1292,7 +1318,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
> ext4_ext_drop_refs(holecheck_path);
> get_ext_path(holecheck_path, orig_inode,
> seq_start, ret);
> - if (holecheck_path == NULL)
> + if (ret)
> break;
> depth = holecheck_path->p_depth;
>
> @@ -1300,7 +1326,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
> if (orig_path)
> ext4_ext_drop_refs(orig_path);
> get_ext_path(orig_path, orig_inode, seq_start, ret);
> - if (orig_path == NULL)
> + if (ret)
> break;
>
> ext_cur = holecheck_path[depth].p_ext;
>


--
Best Regards,
Peng Tao
State Key Laboratory of Networking and Switching Technology
Beijing Univ. of Posts and Telecoms.




2009-09-14 06:17:24

by Akira Fujita

[permalink] [raw]
Subject: Re: [PATCH 3/4]ext4: Return exchanged blocks count to user space in failure

Hi Peng,

Peng Tao wrote:
>> I tried to reproduce your problem with the following steps,
>> but I couldn't. Please check my procedure.
>>
>> 1. Test environment
>> linux2.6.31-rc2 + two patches as follows:
>> http://marc.info/?l=linux-ext4&m=125186152727422&w=3
>> http://marc.info/?l=linux-ext4&m=125205817410548&w=3
>>
>> 2. Create ext4 filesystem and mount it
>> mke2fs -t ext4 -b 4096 /dev/XXX
>> mount -t ext4 -O rw,noatime,relatime,commit=360 /dev/XXX /mnt/XXX
>>
>> 3. Create orig and donor files
>> dd if=/dev/urandom of=/mnt/XXX/first.img bs=10M count=1
>> dd if=/dev/zero of=/mnt/XXX/first.img bs=10M count=0 seek=50
>> dd if=/dev/urandom of=/mnt/XXX/last.img bs=10M count=1 seek=49
>>
>> 4. Call EXT4_MOVE_MOVE_EXT ioctl to files with the following arguments
>> (orig_file = /mnt/XXX/first.img, donor_file = /mnt/XXX/last.img)
>> move_data.orig_start = 0;
>> move_data.donor_start = 0;
>> move_data.len = 12152;
>> ioctl(orig_fd, EXT4_IOC_MOVE_EXT, &move_data)
>>
>> However, EXT4_IOC_MOVE_EXT returned ENODATA (this is a fine result by my patch)
>> and it didn't occur the kernel panic which you got.
>> If I chose a wrong step, please correct me.
> Sorry, I didn't notice that I was at commit 7638d5322bd89d49e013a03fe2afaeb6d214fabd
> of linus tree that includes a few patches after 2.6.31-rc2.
>> I appreciate if you could test following environment.
>> - 2.6.31-rc8 + ext4 patch queue
>> (commit:bdacccd94d08a3f89e7fca676b28aa262c6d75fa) + attached patch
> I can't reproduce the panic in the above environment. I believe the problem
> is fixed. Thanks for your work!

Ok, I will re-base the patch and then send it to the list.
Thanks for your work, too.

Regards,
Akira Fujita.