2006-05-20 13:03:44

by Peter Lundkvist

[permalink] [raw]
Subject: [PATCH] Page writeback broken after resume: wb_timer lost

Hi,
I have noticed for some time that nr_dirty never drops but
increases except when VM pressure forces it down. This only
occurs after a resume, never on a freshly booted system.

It seems the wb_timer is lost when the timer function is
trying to start a frozen pdflush thread, and this occurs
during suspend or resume.

I have included a patch which work for me. Don't know if the
test also should include a check for freezing to be safe, ie
if ( !frozen(..) && !freezing(..) )



diff -ru linux-2.6.17.org/mm/pdflush.c linux-2.6.17/mm/pdflush.c
--- linux-2.6.17.org/mm/pdflush.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.17/mm/pdflush.c 2006-05-20 14:22:35.000000000 +0200
@@ -213,12 +213,16 @@
struct pdflush_work *pdf;

pdf = list_entry(pdflush_list.next, struct pdflush_work, list);
- list_del_init(&pdf->list);
- if (list_empty(&pdflush_list))
- last_empty_jifs = jiffies;
- pdf->fn = fn;
- pdf->arg0 = arg0;
- wake_up_process(pdf->who);
+ if (!frozen(pdf->who)) {
+ list_del_init(&pdf->list);
+ if (list_empty(&pdflush_list))
+ last_empty_jifs = jiffies;
+ pdf->fn = fn;
+ pdf->arg0 = arg0;
+ wake_up_process(pdf->who);
+ }
+ else
+ ret = -1;
spin_unlock_irqrestore(&pdflush_lock, flags);
}
return ret;

--
Peter Lundkvist


2006-05-20 17:37:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Page writeback broken after resume: wb_timer lost

Peter Lundkvist <[email protected]> wrote:
>
> Hi,
> I have noticed for some time that nr_dirty never drops but
> increases except when VM pressure forces it down. This only
> occurs after a resume, never on a freshly booted system.
>
> It seems the wb_timer is lost when the timer function is
> trying to start a frozen pdflush thread, and this occurs
> during suspend or resume.
>
> I have included a patch which work for me. Don't know if the
> test also should include a check for freezing to be safe, ie
> if ( !frozen(..) && !freezing(..) )
>
>
>
> diff -ru linux-2.6.17.org/mm/pdflush.c linux-2.6.17/mm/pdflush.c
> --- linux-2.6.17.org/mm/pdflush.c 2006-03-20 06:53:29.000000000 +0100
> +++ linux-2.6.17/mm/pdflush.c 2006-05-20 14:22:35.000000000 +0200
> @@ -213,12 +213,16 @@
> struct pdflush_work *pdf;
>
> pdf = list_entry(pdflush_list.next, struct pdflush_work, list);
> - list_del_init(&pdf->list);
> - if (list_empty(&pdflush_list))
> - last_empty_jifs = jiffies;
> - pdf->fn = fn;
> - pdf->arg0 = arg0;
> - wake_up_process(pdf->who);
> + if (!frozen(pdf->who)) {
> + list_del_init(&pdf->list);
> + if (list_empty(&pdflush_list))
> + last_empty_jifs = jiffies;
> + pdf->fn = fn;
> + pdf->arg0 = arg0;
> + wake_up_process(pdf->who);
> + }
> + else
> + ret = -1;
> spin_unlock_irqrestore(&pdflush_lock, flags);
> }
> return ret;

Maybe the code over in page-writeback.c should just rearm the timee within
the timer handler rather than waiting for a pdflush thread to do it. I'll
think about that.

But the main questions is: what on earth is going on here? We've taken a
kernel thread and we've done a wake_up_process() on it, but because it was
in a frozen state it just never gets to run, even after the resume.
Presumably it goes back into interruptible sleep after the resume. We took
it off the list (in the expectation that it'd run again) so we've lost
control of it.

Pavel, Rafael: this amounts to a lost wakeup. What's the story?

2006-05-20 22:51:01

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Page writeback broken after resume: wb_timer lost

Hi!

> > I have noticed for some time that nr_dirty never drops but
> > increases except when VM pressure forces it down. This only
> > occurs after a resume, never on a freshly booted system.
> >
> > It seems the wb_timer is lost when the timer function is
> > trying to start a frozen pdflush thread, and this occurs
> > during suspend or resume.
> >
> > I have included a patch which work for me. Don't know if the
> > test also should include a check for freezing to be safe, ie
> > if ( !frozen(..) && !freezing(..) )

Yep, I have seen this too. Sync took *way* too long and I believe I
lost some data because of this problem.

> Maybe the code over in page-writeback.c should just rearm the timee within
> the timer handler rather than waiting for a pdflush thread to do it. I'll
> think about that.
>
> But the main questions is: what on earth is going on here? We've taken a
> kernel thread and we've done a wake_up_process() on it, but because it was
> in a frozen state it just never gets to run, even after the resume.
> Presumably it goes back into interruptible sleep after the resume. We took
> it off the list (in the expectation that it'd run again) so we've lost
> control of it.

I guess you should not try to wake up process while it is frozen. Such
wakeups are likely to get lost. Should we add some BUG_ON() somewhere?

...we have to eat some wakeups, because we fake some.

Or perhaps we should do WARN_ON(frozen(current)) just after schedule()
below?

> Pavel, Rafael: this amounts to a lost wakeup. What's the story?

Pavel

Refrigerator looks like this:

/* Refrigerator is place where frozen processes are stored :-). */
void refrigerator(void)
{
/* Hmm, should we be allowed to suspend when there are
realtime
processes around? */
long save;
save = current->state;
pr_debug("%s entered refrigerator\n", current->comm);
printk("=");

frozen_process(current);
spin_lock_irq(&current->sighand->siglock);
recalc_sigpending(); /* We sent fake signal, clean it up */
spin_unlock_irq(&current->sighand->siglock);

while (frozen(current)) {
current->state = TASK_UNINTERRUPTIBLE;
schedule();
}
pr_debug("%s left refrigerator\n", current->comm);
current->state = save;
}

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-05-21 00:13:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Page writeback broken after resume: wb_timer lost

Pavel Machek <[email protected]> wrote:
>
> Refrigerator looks like this:
>
> /* Refrigerator is place where frozen processes are stored :-). */
> void refrigerator(void)
> {
> /* Hmm, should we be allowed to suspend when there are
> realtime
> processes around? */
> long save;
> save = current->state;
> pr_debug("%s entered refrigerator\n", current->comm);
> printk("=");
>
> frozen_process(current);
> spin_lock_irq(&current->sighand->siglock);
> recalc_sigpending(); /* We sent fake signal, clean it up */
> spin_unlock_irq(&current->sighand->siglock);
>
> while (frozen(current)) {
> current->state = TASK_UNINTERRUPTIBLE;
> schedule();
> }
> pr_debug("%s left refrigerator\n", current->comm);
> current->state = save;
> }

Well that's a crock, isn't it?


Peter, does this fix it?


From: Andrew Morton <[email protected]>

pdflush is carefully designed to ensure that all wakeups have some
corresponding work to do - if a woken-up pdflush thread discovers that it
hasn't been given any work to do then this is considered an error.

That all broke when swsusp came along - because a timer-delivered wakeup to a
frozen pdflush thread will just get lost. This causes the pdflush thread to
get lost as well: the writeback timer is supposed to be re-armed by pdflush in
process context, but pdflush doesn't execute the callout which does this.

Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
see if we have any work pending and only go back to sleep if that is not the
case.


Signed-off-by: Andrew Morton <[email protected]>
---

mm/pdflush.c | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)

diff -puN mm/pdflush.c~pdflush-handle-resume-wakeups mm/pdflush.c
--- devel/mm/pdflush.c~pdflush-handle-resume-wakeups 2006-05-20 17:02:21.000000000 -0700
+++ devel-akpm/mm/pdflush.c 2006-05-20 17:11:25.000000000 -0700
@@ -104,13 +104,8 @@ static int __pdflush(struct pdflush_work
list_move(&my_work->list, &pdflush_list);
my_work->when_i_went_to_sleep = jiffies;
spin_unlock_irq(&pdflush_lock);
-
schedule();
- if (try_to_freeze()) {
- spin_lock_irq(&pdflush_lock);
- continue;
- }
-
+ try_to_freeze();
spin_lock_irq(&pdflush_lock);
if (!list_empty(&my_work->list)) {
printk("pdflush: bogus wakeup!\n");
@@ -118,7 +113,7 @@ static int __pdflush(struct pdflush_work
continue;
}
if (my_work->fn == NULL) {
- printk("pdflush: NULL work function\n");
+ printk("pflush: resuming\n");
continue;
}
spin_unlock_irq(&pdflush_lock);
_



2006-05-21 06:53:00

by Peter Lundkvist

[permalink] [raw]
Subject: Re: [PATCH] Page writeback broken after resume: wb_timer lost

On Sat, May 20, 2006 at 05:12:44PM -0700, Andrew Morton wrote:
>
> Peter, does this fix it?
>

Yes, it does fix the problem.

Hopefully we'll see this fix in the next stable update, because
of the risk of data loss.

>
> From: Andrew Morton <[email protected]>
>
> pdflush is carefully designed to ensure that all wakeups have some
> corresponding work to do - if a woken-up pdflush thread discovers that it
> hasn't been given any work to do then this is considered an error.
>
> That all broke when swsusp came along - because a timer-delivered wakeup to a
> frozen pdflush thread will just get lost. This causes the pdflush thread to
> get lost as well: the writeback timer is supposed to be re-armed by pdflush in
> process context, but pdflush doesn't execute the callout which does this.
>
> Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
> see if we have any work pending and only go back to sleep if that is not the
> case.
>
>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> mm/pdflush.c | 9 ++-------
> 1 files changed, 2 insertions(+), 7 deletions(-)
>
> diff -puN mm/pdflush.c~pdflush-handle-resume-wakeups mm/pdflush.c
> --- devel/mm/pdflush.c~pdflush-handle-resume-wakeups 2006-05-20 17:02:21.000000000 -0700
> +++ devel-akpm/mm/pdflush.c 2006-05-20 17:11:25.000000000 -0700
> @@ -104,13 +104,8 @@ static int __pdflush(struct pdflush_work
> list_move(&my_work->list, &pdflush_list);
> my_work->when_i_went_to_sleep = jiffies;
> spin_unlock_irq(&pdflush_lock);
> -
> schedule();
> - if (try_to_freeze()) {
> - spin_lock_irq(&pdflush_lock);
> - continue;
> - }
> -
> + try_to_freeze();
> spin_lock_irq(&pdflush_lock);
> if (!list_empty(&my_work->list)) {
> printk("pdflush: bogus wakeup!\n");
> @@ -118,7 +113,7 @@ static int __pdflush(struct pdflush_work
> continue;
> }
> if (my_work->fn == NULL) {
> - printk("pdflush: NULL work function\n");
> + printk("pflush: resuming\n");
> continue;
> }
> spin_unlock_irq(&pdflush_lock);
> _
>
>
>

2006-05-21 10:09:04

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Page writeback broken after resume: wb_timer lost

Hi!

> Well that's a crock, isn't it?
>
>
> Peter, does this fix it?
>
>
> From: Andrew Morton <[email protected]>
>
> pdflush is carefully designed to ensure that all wakeups have some
> corresponding work to do - if a woken-up pdflush thread discovers that it
> hasn't been given any work to do then this is considered an error.
>
> That all broke when swsusp came along - because a timer-delivered wakeup to a
> frozen pdflush thread will just get lost. This causes the pdflush thread to
> get lost as well: the writeback timer is supposed to be re-armed by pdflush in
> process context, but pdflush doesn't execute the callout which does this.
>
> Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
> see if we have any work pending and only go back to sleep if that is not the
> case.

Looks okay to me.

Pavel
(who wonders what are the other places where we have similar
problems)
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html