2008-10-20 04:26:18

by Hidehiro Kawai

[permalink] [raw]
Subject: status of the ext3/jbd error handling enhancement patches

Hi Andrew,

Thank you for pushing my ext3/jbd to Linus. But some of patches
in -mm (following three patches) haven't been sent to Linus.

#jbd-fix-error-handling-for-checkpoint-io.patch: double-check this
jbd-fix-error-handling-for-checkpoint-io.patch
ext3-add-checks-for-errors-from-jbd.patch
jbd-test-bh_write_eio-to-detect-errors-on-metadata-buffers.patch

They improve filesystem corruption problem and it is needed by
mission critical systems.

Could you tell me what is your concern (you commented, "double-check
this") ? Are there something that I can help? For example, I can
provide some SystemTap scripts to help tests.

Thank you,
--
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center




2008-10-20 05:14:13

by Andrew Morton

[permalink] [raw]
Subject: Re: status of the ext3/jbd error handling enhancement patches

On Mon, 20 Oct 2008 13:25:01 +0900 Hidehiro Kawai <[email protected]> wrote:

> Hi Andrew,
>
> Thank you for pushing my ext3/jbd to Linus. But some of patches
> in -mm (following three patches) haven't been sent to Linus.
>
> #jbd-fix-error-handling-for-checkpoint-io.patch: double-check this
> jbd-fix-error-handling-for-checkpoint-io.patch
> ext3-add-checks-for-errors-from-jbd.patch
> jbd-test-bh_write_eio-to-detect-errors-on-metadata-buffers.patch
>
> They improve filesystem corruption problem and it is needed by
> mission critical systems.
>
> Could you tell me what is your concern (you commented, "double-check
> this") ? Are there something that I can help? For example, I can
> provide some SystemTap scripts to help tests.

I forget my reasons for that - would need to go back and review the
discussions when the patch was first merged.

One of the above patches (I forget which) breaks the build because it
expects the presence of Al Viro's VFS tree, and that hasn't been merged
yet. I need to wait and see if that merge will be happening (seems
unlikely) and if not, rework the patch against mainline.

Quie a few patches (5%?) tend to get delayed like this due to missing
dependencies.

2008-10-20 12:41:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: status of the ext3/jbd error handling enhancement patches

On Sun, Oct 19, 2008 at 10:13:31PM -0700, Andrew Morton wrote:
> > #jbd-fix-error-handling-for-checkpoint-io.patch: double-check this
> > jbd-fix-error-handling-for-checkpoint-io.patch
> > ext3-add-checks-for-errors-from-jbd.patch
> > jbd-test-bh_write_eio-to-detect-errors-on-metadata-buffers.patch
> >
> > They improve filesystem corruption problem and it is needed by
> > mission critical systems.
> >
> > Could you tell me what is your concern (you commented, "double-check
> > this") ? Are there something that I can help? For example, I can
> > provide some SystemTap scripts to help tests.
>
> I forget my reasons for that - would need to go back and review the
> discussions when the patch was first merged.
>
> One of the above patches (I forget which) breaks the build because it
> expects the presence of Al Viro's VFS tree, and that hasn't been merged
> yet. I need to wait and see if that merge will be happening (seems
> unlikely) and if not, rework the patch against mainline.

Strange; I don't recall any of the ext4 variants of the error handling
patches requiring Al's VFS tree. And I thought all of them have been
merged into Linus's tree at this point.

Kawai-san, could you double check and see if I processed all of your
patches and pushed them to Linus? I was pretty sure I had...

- Ted

2008-10-20 17:17:19

by Simon Kirby

[permalink] [raw]
Subject: Re: status of the ext3/jbd error handling enhancement patches

On Mon, Oct 20, 2008 at 01:25:01PM +0900, Hidehiro Kawai wrote:

> Hi Andrew,
>
> Thank you for pushing my ext3/jbd to Linus. But some of patches
> in -mm (following three patches) haven't been sent to Linus.
>
> #jbd-fix-error-handling-for-checkpoint-io.patch: double-check this
> jbd-fix-error-handling-for-checkpoint-io.patch
> ext3-add-checks-for-errors-from-jbd.patch
> jbd-test-bh_write_eio-to-detect-errors-on-metadata-buffers.patch
>
> They improve filesystem corruption problem and it is needed by
> mission critical systems.

Hi Hidehiro,

It seems you have been looking the code behind the problems I reported
(see linux-ext4 post "EXT3 way too happy with write errors", October
14th).

Are you aware of any patches that look at failed writes outside of JBD
also not noticing write errors? It seems that not all write errors are
causing EXT3 to take the action of aborting the journal, which seems to
be a very bad idea (an example in my previous posting, testing with fault
injection).

Anyway, I would be very happy to test out any patches in this area, and
if none exist, I will try to track down why it is ignoring some of these
errors.

Thanks,

Simon-

2008-10-21 06:24:56

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: status of the ext3/jbd error handling enhancement patches

Theodore Tso wrote:

> On Sun, Oct 19, 2008 at 10:13:31PM -0700, Andrew Morton wrote:
>
>>> #jbd-fix-error-handling-for-checkpoint-io.patch: double-check this
>>> jbd-fix-error-handling-for-checkpoint-io.patch
>>> ext3-add-checks-for-errors-from-jbd.patch
>>> jbd-test-bh_write_eio-to-detect-errors-on-metadata-buffers.patch
>>>
>>>They improve filesystem corruption problem and it is needed by
>>>mission critical systems.
>>>
>>>Could you tell me what is your concern (you commented, "double-check
>>>this") ? Are there something that I can help? For example, I can
>>>provide some SystemTap scripts to help tests.
>>
>>I forget my reasons for that - would need to go back and review the
>>discussions when the patch was first merged.
>>
>>One of the above patches (I forget which) breaks the build because it
>>expects the presence of Al Viro's VFS tree, and that hasn't been merged
>>yet. I need to wait and see if that merge will be happening (seems
>>unlikely) and if not, rework the patch against mainline.

I see. Thank you for your work.

> Strange; I don't recall any of the ext4 variants of the error handling
> patches requiring Al's VFS tree. And I thought all of them have been
> merged into Linus's tree at this point.

I checked Al's VFS tree and I found commit
6ac465f99b29f74ca5a62bc32a8772985d9a071b changes codes near where
one of my patch changes (in ext3/ext4_quota_on()), but not the same code.

> Kawai-san, could you double check and see if I processed all of your
> patches and pushed them to Linus? I was pretty sure I had...

I confirmed that all of my ext4/jbd2 patches have been merged into
mainline. Thank you very much!

Regards,
--
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center


2008-10-21 06:37:51

by Andrew Morton

[permalink] [raw]
Subject: Re: status of the ext3/jbd error handling enhancement patches

On Tue, 21 Oct 2008 15:24:47 +0900 Hidehiro Kawai <[email protected]> wrote:

> >>One of the above patches (I forget which) breaks the build because it
> >>expects the presence of Al Viro's VFS tree, and that hasn't been merged
> >>yet. I need to wait and see if that merge will be happening (seems
> >>unlikely) and if not, rework the patch against mainline.
>
> I see. Thank you for your work.
>
> > Strange; I don't recall any of the ext4 variants of the error handling
> > patches requiring Al's VFS tree. And I thought all of them have been
> > merged into Linus's tree at this point.
>
> I checked Al's VFS tree and I found commit
> 6ac465f99b29f74ca5a62bc32a8772985d9a071b changes codes near where
> one of my patch changes (in ext3/ext4_quota_on()), but not the same code.

OK.

Don't worry about it - I'll fix things up some time this week.

2008-10-22 04:54:43

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: status of the ext3/jbd error handling enhancement patches

Hi Simon,

Simon Kirby wrote:

> Hi Hidehiro,
>
> It seems you have been looking the code behind the problems I reported
> (see linux-ext4 post "EXT3 way too happy with write errors", October
> 14th).
>
> Are you aware of any patches that look at failed writes outside of JBD
> also not noticing write errors? It seems that not all write errors are
> causing EXT3 to take the action of aborting the journal, which seems to
> be a very bad idea (an example in my previous posting, testing with fault
> injection).

Which kernel did you use for testing? If you use the latest -mm kernel
(2.6.27-rc5-mm1) with a patch at http://userweb.kernel.org/~akpm/mmotm/
broken-out/jbd-test-bh_write_eio-to-detect-errors-on-metadata-buffers.patch,
most of this kind of problems may be solved. But some additional works are
still needed; nobody checks I/O error on updating a journal super block.

> Anyway, I would be very happy to test out any patches in this area, and
> if none exist, I will try to track down why it is ignoring some of these
> errors.

Thanks,
--
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center