2004-10-05 15:08:24

by Jeffrey Mahoney

[permalink] [raw]
Subject: [PATCH 0/4] I/O Error Handling for ReiserFS v3

Hey all -

One of the most common complaints I've heard about ReiserFS is how
graceless it is in handling critical I/O errors.

ext[23] can handle I/O errors anywhere, with the results being up to the
system admin to determine: continue, go read only, or panic.

ReiserFS doesn't offer the admin any such choice, instead panicking on
any I/O error in the journal.

The available options are read only or panic, since ReiserFS does not
currently support operations without the journal.

In the four messages that follow, you'll find:
* reiserfs-cleanup-buffer-heads.diff
- Cleans up handling of buffer head bitfields - uses
the kernel supplied FNS_BUFFER macros instead.
* reiserfs-cleanup-sb-journal.diff
- Cleans up accessing of the journal structure, prefering
to create a temporary variable in functions that access
the journal structure non-trivially. Should make 0 difference
at compile time.
* reiserfs-io-error-handling.diff
- Allows ReiserFS to gracefully handle I/O errors in critical
code paths. The admin has the option to go read-only or panic.
Since ReiserFS has no option to ignore the use of the journal,
the "continue" method is not enabled.
* reiserfs-write-lock.diff
- Fixes two missing reiserfs_write_unlock() calls on error paths
that are unrelated to reiserfs-io-error-handling.diff

These patches have seen a lot of testing in the SuSE Linux Enterprise
Server 9 kernel, and are considered ready for mainline.

They've received approval[1] from the ReiserFS maintainers also.

Andrew - Apologies for the previous format; Please apply.

Thanks.

-Jeff

[1] http://marc.theaimsgroup.com/?l=reiserfs&m=109587254714180

--
Jeff Mahoney
SuSE Labs


Attachments:
(No filename) (1.73 kB)
(No filename) (189.00 B)
Download all attachments

2004-10-05 16:00:35

by Hans Reiser

[permalink] [raw]
Subject: Re: [PATCH 0/4] I/O Error Handling for ReiserFS v3

Jeffrey Mahoney wrote:

>Hey all -
>
>One of the most common complaints I've heard about ReiserFS is how
>graceless it is in handling critical I/O errors.
>
>
>
I would like to thank Jeff for writing these. They are much needed.

2004-10-05 15:50:48

by Hans Reiser

[permalink] [raw]
Subject: Re: [PATCH 0/4] I/O Error Handling for ReiserFS v3

These have received design approval from zam (and thus me), but zam, did
they receive stress testing by Elena under your guidance?

Hans

Jeffrey Mahoney wrote:

>Hey all -
>
>One of the most common complaints I've heard about ReiserFS is how
>graceless it is in handling critical I/O errors.
>
>ext[23] can handle I/O errors anywhere, with the results being up to the
>system admin to determine: continue, go read only, or panic.
>
>ReiserFS doesn't offer the admin any such choice, instead panicking on
>any I/O error in the journal.
>
>The available options are read only or panic, since ReiserFS does not
>currently support operations without the journal.
>
>In the four messages that follow, you'll find:
>* reiserfs-cleanup-buffer-heads.diff
> - Cleans up handling of buffer head bitfields - uses
> the kernel supplied FNS_BUFFER macros instead.
>* reiserfs-cleanup-sb-journal.diff
> - Cleans up accessing of the journal structure, prefering
> to create a temporary variable in functions that access
> the journal structure non-trivially. Should make 0 difference
> at compile time.
>* reiserfs-io-error-handling.diff
> - Allows ReiserFS to gracefully handle I/O errors in critical
> code paths. The admin has the option to go read-only or panic.
> Since ReiserFS has no option to ignore the use of the journal,
> the "continue" method is not enabled.
>* reiserfs-write-lock.diff
> - Fixes two missing reiserfs_write_unlock() calls on error paths
> that are unrelated to reiserfs-io-error-handling.diff
>
>These patches have seen a lot of testing in the SuSE Linux Enterprise
>Server 9 kernel, and are considered ready for mainline.
>
>They've received approval[1] from the ReiserFS maintainers also.
>
>Andrew - Apologies for the previous format; Please apply.
>
>Thanks.
>
>-Jeff
>
>[1] http://marc.theaimsgroup.com/?l=reiserfs&m=109587254714180
>
>--
>Jeff Mahoney
>SuSE Labs
>
>

2004-10-05 17:30:39

by Alexander Zarochentsev

[permalink] [raw]
Subject: Re: [PATCH 0/4] I/O Error Handling for ReiserFS v3

On Tue, Oct 05, 2004 at 08:44:22AM -0700, Hans Reiser wrote:
> These have received design approval from zam (and thus me), but zam, did
> they receive stress testing by Elena under your guidance?

No. We have a long queue of test tasks. There are fsck.reiser4 testing,
reiser4/dmapper crashes and the benchmarks in the queue.

>
> Hans
>
> Jeffrey Mahoney wrote:
>
> >Hey all -
> >
> >One of the most common complaints I've heard about ReiserFS is how graceless
> >it is in handling critical I/O errors.
> >
> >ext[23] can handle I/O errors anywhere, with the results being up to the
> >system admin to determine: continue, go read only, or panic.
> >
> >ReiserFS doesn't offer the admin any such choice, instead panicking on any
> >I/O error in the journal.
> >
> >The available options are read only or panic, since ReiserFS does not
> >currently support operations without the journal.
> >
> >In the four messages that follow, you'll find: *
> >reiserfs-cleanup-buffer-heads.diff - Cleans up handling of buffer head
> >bitfields - uses the kernel supplied FNS_BUFFER macros instead. *
> >reiserfs-cleanup-sb-journal.diff - Cleans up accessing of the journal
> >structure, prefering to create a temporary variable in functions that access
> >the journal structure non-trivially. Should make 0 difference at compile
> >time. * reiserfs-io-error-handling.diff - Allows ReiserFS to gracefully
> >handle I/O errors in critical code paths. The admin has the option to go
> >read-only or panic. Since ReiserFS has no option to ignore the use of the
> >journal, the "continue" method is not enabled. * reiserfs-write-lock.diff -
> >Fixes two missing reiserfs_write_unlock() calls on error paths that are
> >unrelated to reiserfs-io-error-handling.diff
> >
> >These patches have seen a lot of testing in the SuSE Linux Enterprise Server
> >9 kernel, and are considered ready for mainline.
> >
> >They've received approval[1] from the ReiserFS maintainers also.
> >
> >Andrew - Apologies for the previous format; Please apply.
> >
> >Thanks.
> >
> >-Jeff
> >
> >[1] http://marc.theaimsgroup.com/?l=reiserfs&m=109587254714180
> >
> >-- Jeff Mahoney SuSE Labs
> >
> >
>

--
Alex.

2004-10-05 17:41:12

by Hans Reiser

[permalink] [raw]
Subject: Re: [PATCH 0/4] I/O Error Handling for ReiserFS v3

Alex Zarochentsev wrote:

>On Tue, Oct 05, 2004 at 08:44:22AM -0700, Hans Reiser wrote:
>
>
>>These have received design approval from zam (and thus me), but zam, did
>>they receive stress testing by Elena under your guidance?
>>
>>
>
>No. We have a long queue of test tasks. There are fsck.reiser4 testing,
>reiser4/dmapper crashes and the benchmarks in the queue.
>
>
Well, we cannot let our process be a barrier to good patches getting in,
so let me ask, Jeff, did you test each of these conditions you
improved? How? Did anyone else test them?

>
>
>>Hans
>>
>>Jeffrey Mahoney wrote:
>>
>>
>>
>>>Hey all -
>>>
>>>One of the most common complaints I've heard about ReiserFS is how graceless
>>>it is in handling critical I/O errors.
>>>
>>>ext[23] can handle I/O errors anywhere, with the results being up to the
>>>system admin to determine: continue, go read only, or panic.
>>>
>>>ReiserFS doesn't offer the admin any such choice, instead panicking on any
>>>I/O error in the journal.
>>>
>>>The available options are read only or panic, since ReiserFS does not
>>>currently support operations without the journal.
>>>
>>>In the four messages that follow, you'll find: *
>>>reiserfs-cleanup-buffer-heads.diff - Cleans up handling of buffer head
>>>bitfields - uses the kernel supplied FNS_BUFFER macros instead. *
>>>reiserfs-cleanup-sb-journal.diff - Cleans up accessing of the journal
>>>structure, prefering to create a temporary variable in functions that access
>>>the journal structure non-trivially. Should make 0 difference at compile
>>>time. * reiserfs-io-error-handling.diff - Allows ReiserFS to gracefully
>>>handle I/O errors in critical code paths. The admin has the option to go
>>>read-only or panic. Since ReiserFS has no option to ignore the use of the
>>>journal, the "continue" method is not enabled. * reiserfs-write-lock.diff -
>>>Fixes two missing reiserfs_write_unlock() calls on error paths that are
>>>unrelated to reiserfs-io-error-handling.diff
>>>
>>>These patches have seen a lot of testing in the SuSE Linux Enterprise Server
>>>9 kernel, and are considered ready for mainline.
>>>
>>>They've received approval[1] from the ReiserFS maintainers also.
>>>
>>>Andrew - Apologies for the previous format; Please apply.
>>>
>>>Thanks.
>>>
>>>-Jeff
>>>
>>>[1] http://marc.theaimsgroup.com/?l=reiserfs&m=109587254714180
>>>
>>>-- Jeff Mahoney SuSE Labs
>>>
>>>
>>>
>>>
>
>
>

2004-10-05 18:17:52

by Jeffrey Mahoney

[permalink] [raw]
Subject: Re: [PATCH 0/4] I/O Error Handling for ReiserFS v3

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hans Reiser wrote:
| Alex Zarochentsev wrote:
|
|> On Tue, Oct 05, 2004 at 08:44:22AM -0700, Hans Reiser wrote:
|>
|>
|>> These have received design approval from zam (and thus me), but zam,
|>> did they receive stress testing by Elena under your guidance?
|>>
|>
|>
|> No. We have a long queue of test tasks. There are fsck.reiser4 testing,
|> reiser4/dmapper crashes and the benchmarks in the queue.
|>
| Well, we cannot let our process be a barrier to good patches getting in,
| so let me ask, Jeff, did you test each of these conditions you
| improved? How? Did anyone else test them?

The "testing" version of the code had a another conditional added to
each of the !buffer_update tests that allowed me to trigger an I/O error
handling at each error point. The I/O error path is obviously more
difficult to test in real-world conditions as I/O errors could be caused
by any number of failures.

The testing was done using fsx-linux, the LTP fsstress program, and
stress.sh, sometimes all at once.

This code has also been active in the SUSE Linux Enterprise Server 9
kernel for some time and has seen real-world testing to show that the
normal path is still working as expected.

The end result for the i/o error path is that the write operations still
happen, but the commit block is never written. This means that the end
result is essentially the same as a power outage at the point of
failure. The filesystem is then read-only until the user decides to
umount and correct the problem that caused the I/O error in the first place.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBYuYfLPWxlyuTD7IRAt1OAJ9RgkYWrCKikftGephpWWGlS+acSQCgjDwm
cxcXvSVyldRsJZdagvatw0Y=
=DuY9
-----END PGP SIGNATURE-----

2004-10-05 18:42:25

by Hans Reiser

[permalink] [raw]
Subject: Re: [PATCH 0/4] I/O Error Handling for ReiserFS v3

Well, in a perfect world with all the resources we deserve, we would
have a second person test it. This is the real world though, and Elena
is backed up with things to test, and so we should just take the patch.
Thanks Jeff.

Hans

Jeff Mahoney wrote:

> Hans Reiser wrote:
> | Alex Zarochentsev wrote:
> |
> |> On Tue, Oct 05, 2004 at 08:44:22AM -0700, Hans Reiser wrote:
> |>
> |>
> |>> These have received design approval from zam (and thus me), but zam,
> |>> did they receive stress testing by Elena under your guidance?
> |>>
> |>
> |>
> |> No. We have a long queue of test tasks. There are fsck.reiser4
> testing,
> |> reiser4/dmapper crashes and the benchmarks in the queue.
> |>
> | Well, we cannot let our process be a barrier to good patches getting in,
> | so let me ask, Jeff, did you test each of these conditions you
> | improved? How? Did anyone else test them?
>
> The "testing" version of the code had a another conditional added to
> each of the !buffer_update tests that allowed me to trigger an I/O error
> handling at each error point. The I/O error path is obviously more
> difficult to test in real-world conditions as I/O errors could be caused
> by any number of failures.
>
> The testing was done using fsx-linux, the LTP fsstress program, and
> stress.sh, sometimes all at once.
>
> This code has also been active in the SUSE Linux Enterprise Server 9
> kernel for some time and has seen real-world testing to show that the
> normal path is still working as expected.
>
> The end result for the i/o error path is that the write operations still
> happen, but the commit block is never written. This means that the end
> result is essentially the same as a power outage at the point of
> failure. The filesystem is then read-only until the user decides to
> umount and correct the problem that caused the I/O error in the first
> place.
>
> -Jeff
>
> --
> Jeff Mahoney
> SuSE Labs