Improve the error handling when writes fail to a swap page.
Currently, the kernel will repeatedly retry the write which is unlikely
to ever succeed. Instead we allow the pages to be unused and then marked
as bad at which prevents reuse. It should hopefully be suitable for
testing in -mm.
Hugh Dickins (on a previous incarnation of this series):
> No, not this way, I'm afraid. Sorry, I don't remember the prior
> discussion on LKML, must have flooded past when my attention was
> elsewhere.
I think you were cc'd on some of it but you never commented. Anyhow,
I've reworked this patch series based on your comments. The hints were
appreciated, thanks. This was the way I'd originally hoped to be able to
work things, I just couldn't find the right way to do it.
> Is it worth doing this at all? Probably, but I've no experience
> whatsoever of swap write errors, so it's hard for me to judge: my
> guess is that many cases would turn out to be software errors (e.g.
> lower level needing more memory to perform the write). But you'd
> be right to counter: let's assume they're hardware errors, and
> then fix up any software errors when reported.
I have a swap block driver where hardware write errors are more likely
and hence have a need to handle them more gracefully than IO loops. It
seems like a good idea to avoid the IO loops anyway.
> If it is worth doing this, then you'll need to add code to write
> back the swap header, to note the bad pages permanently: you may
> well have been waiting to see what reception the patches so far
> get, before embarking on that.
You can't proceed to do that until you're able to identify the bad pages
so this would be a necessary first step towards that, yes.
> I was uneasy with 2/4, wondered if swap_free(entry, page) would
> be a better direction to go than your swap_free_markbad(entry).
Agreed, see the following 1/4.
Patch 4/4 in this series is optional but its appended in hope. It cleans
up code at the expense of what looks like a performance optimisation. I
found the code as it stands rather confusing as a newcomer to that code.
Richard
Richard Purdie wrote:
>>No, not this way, I'm afraid. Sorry, I don't remember the prior
>>discussion on LKML, must have flooded past when my attention was
>>elsewhere.
>
>
> I think you were cc'd on some of it but you never commented. Anyhow,
> I've reworked this patch series based on your comments. The hints were
> appreciated, thanks. This was the way I'd originally hoped to be able to
> work things, I just couldn't find the right way to do it.
IMO it seems a bit complex for so small a benefit. Last time I was
working on this, I thought it would be almost as good to do something
simple like stop trying to write out the page if PG_error is set (and
clear that bit in delete_from_swap_cache or try_to_unusesomewhere).
This way the admin could swapoff and scan the swap device at some
point.
>>Is it worth doing this at all? Probably, but I've no experience
>>whatsoever of swap write errors, so it's hard for me to judge: my
>>guess is that many cases would turn out to be software errors (e.g.
>>lower level needing more memory to perform the write). But you'd
>>be right to counter: let's assume they're hardware errors, and
>>then fix up any software errors when reported.
>
>
> I have a swap block driver where hardware write errors are more likely
> and hence have a need to handle them more gracefully than IO loops. It
> seems like a good idea to avoid the IO loops anyway.
>
>
>>If it is worth doing this, then you'll need to add code to write
>>back the swap header, to note the bad pages permanently: you may
>>well have been waiting to see what reception the patches so far
>>get, before embarking on that.
>
>
> You can't proceed to do that until you're able to identify the bad pages
> so this would be a necessary first step towards that, yes.
Agreed here, FWIW. I think that might be just as well done in
userspace?
Nick
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Thu, 2007-01-11 at 09:32 +1100, Nick Piggin wrote:
> Richard Purdie wrote:
> > I think you were cc'd on some of it but you never commented. Anyhow,
> > I've reworked this patch series based on your comments. The hints were
> > appreciated, thanks. This was the way I'd originally hoped to be able to
> > work things, I just couldn't find the right way to do it.
>
> IMO it seems a bit complex for so small a benefit. Last time I was
> working on this, I thought it would be almost as good to do something
> simple like stop trying to write out the page if PG_error is set (and
> clear that bit in delete_from_swap_cache or try_to_unusesomewhere).
> This way the admin could swapoff and scan the swap device at some
> point.
FWIW, the patches have got a lot less invasive and I was pleased with
the way the last set I posted worked out.
1/4 is a lot of noise adding the page parameter to swap_free but doesn't
actually change much.
2/4 is the guts of the solution in the form of two new functions.
3/4 just hooks it in.
4/4 is an optional cleanup.
I guess the key point for me is that the lack of proper handling of this
was bringing one of my systems to its knees due to IO loops before these
patches. Yes, there are ways to minimise the impact but why not fix it
properly? This proposal is certainly nowhere near as invasive as the
previous ones and since its got this far...
> > You can't proceed to do that until you're able to identify the bad pages
> > so this would be a necessary first step towards that, yes.
>
> Agreed here, FWIW. I think that might be just as well done in
> userspace?
Maybe, I haven't made my mind up about that yet. I'd have to see how the
code looked I guess.
Cheers,
Richard