LinuxLists.cc - Bisected: s2disk (uswsusp only) hangs just before poweroff

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

Justin P. Mattock wrote:
> On 12/01/09 11:59, Alan Jenkins wrote:
>> Hi
>>
>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I finally
>> got around to bisecting it, which blamed the following commit by Mel:
>>
>> 5f8dcc2 "page-allocator: split per-cpu list into
>> one-list-per-migrate-type"
>>
>> I was able to confirm this by reverting the commit, which fixed the
>> hang. I had to revert one other commit first to avoid a conflict:
>>
>> a6f9edd "page-allocator: maintain rolling count of pages to free from
>> the PCP"
>>
>> -- detail --
>>
>> When I suspend my EeePc 701 to disk, it sometimes hangs after writing
>> out the hibernation image. The system is still able to resume from this
>> image (after working around the hang by pressing the power button).
>> This is specific to s2disk from the uswsusp package (which is now
>> installed by default on debian unstable). It doesn't happen if I
>> uninstall uswsusp and use the in-kernel suspend instead.
>>
>> The hang doesn't happen if I boot with "init=/bin/bash" and run s2disk.
>> Nor does it happen if I boot normally, then switch to single user mode
>> ("telinit 12").
>>
>> It only happens if I've logged in to KDE. In the past, this has
>> indicated a problem in a network driver, since NetworkManager only made
>> a connection once I logged in. But it still hangs if I remove both ath5k
>> and atl2 before I log into KDE. (I actually tried removing as many
>> modules as possible: atl2, ath5k, usbcore, snd-hda-intel, psmouse,
>> pcspkr, battery, ac, themal, fan, and eeepc-laptop). Perhaps it's
>> something to do with the size of the hibernation image.
>>
>> -- confidence in the bisection result --
>>
>> The randomness was a bit annoying, but it's not too bad. The hang would
>> normally show up in the first 3 hibernation cycles; I don't remember
>> having to wait more than 6.
>>
>> I wrote a script to do s2disk + rtcwake so I could leave it testing
>> without constantly hitting the power button. This let me test 2.6.32-rc8
>> with the reverts for at least 20 hibernation cycles.
>>
>> Regards
>> Alan
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
> hmm.. maybe this is what I'm seeing over here
> i.g. on my macbook(ati gpu) the first s2ram will
> lagg for about 30/40 secs before waking up
> then every other cycle afterwards reacts as it should.
> Threw in kernels 2.6.30,31 before doing anything
> but showed the same results, leading me to beleive maybe
> I have some userspace issue(libx86)going on.
>
> I'll try reverting that commit to see.
>
> Thanks
>
> Justin P. Mattock

I don't expect so. My problem doesn't affect s2ram. And as far as I
can tell, my hang is forever... it's more than just 40 seconds.

Regards
Alan

2009-12-01 21:14:18

by Justin P. Mattock

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

On 12/01/09 12:27, Alan Jenkins wrote:
> Justin P. Mattock wrote:
>> On 12/01/09 11:59, Alan Jenkins wrote:
>>> Hi
>>>
>>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I finally
>>> got around to bisecting it, which blamed the following commit by Mel:
>>>
>>> 5f8dcc2 "page-allocator: split per-cpu list into
>>> one-list-per-migrate-type"
>>>
>>> I was able to confirm this by reverting the commit, which fixed the
>>> hang. I had to revert one other commit first to avoid a conflict:
>>>
>>> a6f9edd "page-allocator: maintain rolling count of pages to free from
>>> the PCP"
>>>
>>> -- detail --
>>>
>>> When I suspend my EeePc 701 to disk, it sometimes hangs after writing
>>> out the hibernation image. The system is still able to resume from this
>>> image (after working around the hang by pressing the power button).
>>> This is specific to s2disk from the uswsusp package (which is now
>>> installed by default on debian unstable). It doesn't happen if I
>>> uninstall uswsusp and use the in-kernel suspend instead.
>>>
>>> The hang doesn't happen if I boot with "init=/bin/bash" and run s2disk.
>>> Nor does it happen if I boot normally, then switch to single user mode
>>> ("telinit 12").
>>>
>>> It only happens if I've logged in to KDE. In the past, this has
>>> indicated a problem in a network driver, since NetworkManager only made
>>> a connection once I logged in. But it still hangs if I remove both ath5k
>>> and atl2 before I log into KDE. (I actually tried removing as many
>>> modules as possible: atl2, ath5k, usbcore, snd-hda-intel, psmouse,
>>> pcspkr, battery, ac, themal, fan, and eeepc-laptop). Perhaps it's
>>> something to do with the size of the hibernation image.
>>>
>>> -- confidence in the bisection result --
>>>
>>> The randomness was a bit annoying, but it's not too bad. The hang would
>>> normally show up in the first 3 hibernation cycles; I don't remember
>>> having to wait more than 6.
>>>
>>> I wrote a script to do s2disk + rtcwake so I could leave it testing
>>> without constantly hitting the power button. This let me test 2.6.32-rc8
>>> with the reverts for at least 20 hibernation cycles.
>>>
>>> Regards
>>> Alan
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>
>> hmm.. maybe this is what I'm seeing over here
>> i.g. on my macbook(ati gpu) the first s2ram will
>> lagg for about 30/40 secs before waking up
>> then every other cycle afterwards reacts as it should.
>> Threw in kernels 2.6.30,31 before doing anything
>> but showed the same results, leading me to beleive maybe
>> I have some userspace issue(libx86)going on.
>>
>> I'll try reverting that commit to see.
>>
>> Thanks
>>
>> Justin P. Mattock
>
> I don't expect so. My problem doesn't affect s2ram. And as far as I can
> tell, my hang is forever... it's more than just 40 seconds.
>
> Regards
> Alan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

then weve two totally different issues.

looking more into it, I think my problem
is somewhere in udev i.e.
I get stuckage during boot becuase of some
rule in rules.d for about the same time
as what I see with s2ram.

my guess udev is sitting with some command
in *.rules in of which doesn't
know what to do(reason for 30 secs)before
going onto the next task.
(but could be wrong).

interesting thing is my other machine
all though different, is running the same
udev and rules with no problem.

Justin P. Mattock

2009-12-01 21:45:31

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
> Hi
>
> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I finally
> got around to bisecting it, which blamed the following commit by Mel:
>
> 5f8dcc2 "page-allocator: split per-cpu list into one-list-per-migrate-type"
>
> I was able to confirm this by reverting the commit, which fixed the
> hang. I had to revert one other commit first to avoid a conflict:
>
> a6f9edd "page-allocator: maintain rolling count of pages to free from
> the PCP"
>

Which RC kernel? Specifically, are the commits

cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when restarting an allocation attempt
9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use ALLOC_HARDER

applied?

The latter one in particular might make a difference if s2disk is
pushing the system far below the watermarks. I don't suppose you know
where it's hanging? i.e. is it hanging in the allocator itself?

If those patches are applied, then one difference that 5f8dcc2 makes is
that pages on the PCP lists but not of the right migratetype are not
used. Prior to that commit, an allocation might succeed even if the
buddy lists were empty because one of the other PCP page types would be
used.

> -- detail --
>
> When I suspend my EeePc 701 to disk, it sometimes hangs after writing
> out the hibernation image. The system is still able to resume from this
> image (after working around the hang by pressing the power button).
>
> This is specific to s2disk from the uswsusp package (which is now
> installed by default on debian unstable). It doesn't happen if I
> uninstall uswsusp and use the in-kernel suspend instead.
>

This leads me to believe that uswsusp is able to push available pages
far below what is expected. It's a total guess though, I have no idea
how uswsusp is implemented or how it differs from what is in kernel.

> The hang doesn't happen if I boot with "init=/bin/bash" and run s2disk.
> Nor does it happen if I boot normally, then switch to single user mode
> ("telinit 12").
>
> It only happens if I've logged in to KDE. In the past, this has
> indicated a problem in a network driver, since NetworkManager only made
> a connection once I logged in. But it still hangs if I remove both
> ath5k and atl2 before I log into KDE. (I actually tried removing as
> many modules as possible: atl2, ath5k, usbcore, snd-hda-intel, psmouse,
> pcspkr, battery, ac, themal, fan, and eeepc-laptop). Perhaps it's
> something to do with the size of the hibernation image.
>

I believe you are correct in that it's something to do with the size of
the hibernation image and how close to the edge the kernel gets pushed
as a result.

Please confirm first that the two commits I mentioned above are in your
kernel. If not, would you mind trying the following patch?
Unfortunately, it's totally untested. The intention of the patch is to
use other PCP lists if the desired one cannot be refilled.

Thanks.

==== CUT HERE ====
page allocator: Allow use of other pcp lists if the buddy lists are depleted

Commit [5f8dcc2 page-allocator: split per-cpu list into
one-list-per-migrate-type] has been identified as a cause of lockups with
the uswsusp package. One possibility is that something in uswsusp allows
watermarks to be pushed far below expectations. Prior to this commit, a
failure to get new pages for the PCP lists could result in pages of another
migratetype being used instead. This patch attempts to restore that behaviour
in case it was required for uswsusp. It's untested.

Experimental-patch-by: Mel Gorman <[email protected]>
---
page_alloc.c | 36 ++++++++++++++++++++++++++++++++++--
1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2bc2ac6..a485963 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1171,6 +1171,35 @@ void split_page(struct page *page, unsigned int order)
}

/*
+ * If the PCP lists for the desired migrate type are empty but the buddy
+ * allocator cannot allocate more pages, then this function is used to
+ * allocate a page of another migratetype. This impacts anti-fragmentation
+ * but in the event the system has totally locked itself up, it saves
+ * itself from deadlock
+ */
+static noinline struct list_head *find_next_pcplist(struct per_cpu_pages *pcp,
+ int start_migratetype)
+{
+ int i, migratetype;
+ struct list_head *list;
+
+ for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+ migratetype = fallbacks[start_migratetype][i];
+ if (migratetype >= MIGRATE_PCPTYPES) {
+ list = NULL;
+ break;
+ }
+
+ list = &pcp->lists[migratetype];
+
+ if (!list_empty(list))
+ break;
+ }
+
+ return list;
+}
+
+/*
* Really, prep_compound_page() should be called from __rmqueue_bulk(). But
* we cheat by calling it from here, in the order > 0 path. Saves a branch
* or two.
@@ -1198,8 +1227,11 @@ again:
pcp->count += rmqueue_bulk(zone, 0,
pcp->batch, list,
migratetype, cold);
- if (unlikely(list_empty(list)))
- goto failed;
+ if (unlikely(list_empty(list))) {
+ list = find_next_pcplist(pcp, migratetype);
+ if (!list)
+ goto failed;
+ }
}

if (cold)

2009-12-01 21:52:53

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

On Tuesday 01 December 2009, Mel Gorman wrote:
> On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
> > Hi
> >
> > Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I finally
> > got around to bisecting it, which blamed the following commit by Mel:
> >
> > 5f8dcc2 "page-allocator: split per-cpu list into one-list-per-migrate-type"
> >
> > I was able to confirm this by reverting the commit, which fixed the
> > hang. I had to revert one other commit first to avoid a conflict:
> >
> > a6f9edd "page-allocator: maintain rolling count of pages to free from
> > the PCP"
> >
>
> Which RC kernel? Specifically, are the commits
>
> cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when restarting an allocation attempt
> 9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use ALLOC_HARDER
>
> applied?
>
> The latter one in particular might make a difference if s2disk is
> pushing the system far below the watermarks. I don't suppose you know
> where it's hanging? i.e. is it hanging in the allocator itself?
>
> If those patches are applied, then one difference that 5f8dcc2 makes is
> that pages on the PCP lists but not of the right migratetype are not
> used. Prior to that commit, an allocation might succeed even if the
> buddy lists were empty because one of the other PCP page types would be
> used.
>
> > -- detail --
> >
> > When I suspend my EeePc 701 to disk, it sometimes hangs after writing
> > out the hibernation image. The system is still able to resume from this
> > image (after working around the hang by pressing the power button).
> >
> > This is specific to s2disk from the uswsusp package (which is now
> > installed by default on debian unstable). It doesn't happen if I
> > uninstall uswsusp and use the in-kernel suspend instead.
> >
>
> This leads me to believe that uswsusp is able to push available pages
> far below what is expected. It's a total guess though, I have no idea
> how uswsusp is implemented or how it differs from what is in kernel.

It doesn't differ at all in that respect. Actually, it uses the same code, but
the distro configuration may be such that it leaves fewer available pages
than the default in-kernel hibernation.

Thanks,
Rafael

2009-12-02 08:57:07

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

Mel Gorman wrote:
> On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
>
>> Hi
>>
>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I finally
>> got around to bisecting it, which blamed the following commit by Mel:
>>
>> 5f8dcc2 "page-allocator: split per-cpu list into one-list-per-migrate-type"
>>
>> I was able to confirm this by reverting the commit, which fixed the
>> hang. I had to revert one other commit first to avoid a conflict:
>>
>> a6f9edd "page-allocator: maintain rolling count of pages to free from
>> the PCP"
>>
>>
>
> Which RC kernel? Specifically, are the commits
>
> cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when restarting an allocation attempt
> 9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use ALLOC_HARDER
>
> applied?
>
> The latter one in particular might make a difference if s2disk is
> pushing the system far below the watermarks. I don't suppose you know
> where it's hanging? i.e. is it hanging in the allocator itself?
>
> If those patches are applied, then one difference that 5f8dcc2 makes is
> that pages on the PCP lists but not of the right migratetype are not
> used. Prior to that commit, an allocation might succeed even if the
> buddy lists were empty because one of the other PCP page types would be
> used.
>
>
>> -- detail --
>>
>> When I suspend my EeePc 701 to disk, it sometimes hangs after writing
>> out the hibernation image. The system is still able to resume from this
>> image (after working around the hang by pressing the power button).
>>
>> This is specific to s2disk from the uswsusp package (which is now
>> installed by default on debian unstable). It doesn't happen if I
>> uninstall uswsusp and use the in-kernel suspend instead.
>>
>>
>
> This leads me to believe that uswsusp is able to push available pages
> far below what is expected. It's a total guess though, I have no idea
> how uswsusp is implemented or how it differs from what is in kernel.
>
>
>> The hang doesn't happen if I boot with "init=/bin/bash" and run s2disk.
>> Nor does it happen if I boot normally, then switch to single user mode
>> ("telinit 12").
>>
>> It only happens if I've logged in to KDE. In the past, this has
>> indicated a problem in a network driver, since NetworkManager only made
>> a connection once I logged in. But it still hangs if I remove both
>> ath5k and atl2 before I log into KDE. (I actually tried removing as
>> many modules as possible: atl2, ath5k, usbcore, snd-hda-intel, psmouse,
>> pcspkr, battery, ac, themal, fan, and eeepc-laptop). Perhaps it's
>> something to do with the size of the hibernation image.
>>
>>
>
> I believe you are correct in that it's something to do with the size of
> the hibernation image and how close to the edge the kernel gets pushed
> as a result.
>
> Please confirm first that the two commits I mentioned above are in your
> kernel. If not, would you mind trying the following patch?
> Unfortunately, it's totally untested. The intention of the patch is to
> use other PCP lists if the desired one cannot be refilled.
>
> Thanks.
>

The hang happens on 2.6.32-rc8, which includes the two commits above.

Thanks!
Alan

2009-12-02 10:35:39

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

On Wed, Dec 02, 2009 at 08:57:05AM +0000, Alan Jenkins wrote:
> Mel Gorman wrote:
>> On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
>>
>>> Hi
>>>
>>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I
>>> finally got around to bisecting it, which blamed the following
>>> commit by Mel:
>>>
>>> 5f8dcc2 "page-allocator: split per-cpu list into one-list-per-migrate-type"
>>>
>>> I was able to confirm this by reverting the commit, which fixed the
>>> hang. I had to revert one other commit first to avoid a conflict:
>>>
>>> a6f9edd "page-allocator: maintain rolling count of pages to free from
>>> the PCP"
>>>
>>>
>>
>> Which RC kernel? Specifically, are the commits
>>
>> cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when restarting an allocation attempt
>> 9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use ALLOC_HARDER
>>
>> applied?
>>
>> The latter one in particular might make a difference if s2disk is
>> pushing the system far below the watermarks. I don't suppose you know
>> where it's hanging? i.e. is it hanging in the allocator itself?
>>
>> If those patches are applied, then one difference that 5f8dcc2 makes is
>> that pages on the PCP lists but not of the right migratetype are not
>> used. Prior to that commit, an allocation might succeed even if the
>> buddy lists were empty because one of the other PCP page types would be
>> used.
>>
>>
>>> -- detail --
>>>
>>> When I suspend my EeePc 701 to disk, it sometimes hangs after writing
>>> out the hibernation image. The system is still able to resume from
>>> this image (after working around the hang by pressing the power
>>> button).
>>>
>>> This is specific to s2disk from the uswsusp package (which is now
>>> installed by default on debian unstable). It doesn't happen if I
>>> uninstall uswsusp and use the in-kernel suspend instead.
>>>
>>>
>>
>> This leads me to believe that uswsusp is able to push available pages
>> far below what is expected. It's a total guess though, I have no idea
>> how uswsusp is implemented or how it differs from what is in kernel.
>>
>>
>>> The hang doesn't happen if I boot with "init=/bin/bash" and run
>>> s2disk. Nor does it happen if I boot normally, then switch to
>>> single user mode ("telinit 12").
>>>
>>> It only happens if I've logged in to KDE. In the past, this has
>>> indicated a problem in a network driver, since NetworkManager only
>>> made a connection once I logged in. But it still hangs if I remove
>>> both ath5k and atl2 before I log into KDE. (I actually tried
>>> removing as many modules as possible: atl2, ath5k, usbcore,
>>> snd-hda-intel, psmouse, pcspkr, battery, ac, themal, fan, and
>>> eeepc-laptop). Perhaps it's something to do with the size of the
>>> hibernation image.
>>>
>>>
>>
>> I believe you are correct in that it's something to do with the size of
>> the hibernation image and how close to the edge the kernel gets pushed
>> as a result.
>>
>> Please confirm first that the two commits I mentioned above are in your
>> kernel. If not, would you mind trying the following patch?
>> Unfortunately, it's totally untested. The intention of the patch is to
>> use other PCP lists if the desired one cannot be refilled.
>>
>> Thanks.
>>
>
> The hang happens on 2.6.32-rc8, which includes the two commits above.
>

Ok, that was somewhat expected as they only had an impact if there was a
storm of interrupts which was unlikely in this case. How about the
additional patch?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-02 11:11:11

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

2009-12-02 11:35:04

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

Mel Gorman wrote:
> On Wed, Dec 02, 2009 at 08:57:05AM +0000, Alan Jenkins wrote:
>
>> Mel Gorman wrote:
>>
>>> On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
>>>
>>>
>>>> Hi
>>>>
>>>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I
>>>> finally got around to bisecting it, which blamed the following
>>>> commit by Mel:
>>>>
>>>> 5f8dcc2 "page-allocator: split per-cpu list into one-list-per-migrate-type"
>>>>
>>>> I was able to confirm this by reverting the commit, which fixed the
>>>> hang. I had to revert one other commit first to avoid a conflict:
>>>>
>>>> a6f9edd "page-allocator: maintain rolling count of pages to free from
>>>> the PCP"
>>>>
>>>>
>>>>
>>> Which RC kernel? Specifically, are the commits
>>>
>>> cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when restarting an allocation attempt
>>> 9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use ALLOC_HARDER
>>>
>>> applied?
>>>
>>> The latter one in particular might make a difference if s2disk is
>>> pushing the system far below the watermarks. I don't suppose you know
>>> where it's hanging? i.e. is it hanging in the allocator itself?
>>>
>>> If those patches are applied, then one difference that 5f8dcc2 makes is
>>> that pages on the PCP lists but not of the right migratetype are not
>>> used. Prior to that commit, an allocation might succeed even if the
>>> buddy lists were empty because one of the other PCP page types would be
>>> used.
>>>
>>>
>>>
>>>> -- detail --
>>>>
>>>> When I suspend my EeePc 701 to disk, it sometimes hangs after writing
>>>> out the hibernation image. The system is still able to resume from
>>>> this image (after working around the hang by pressing the power
>>>> button).
>>>>
>>>> This is specific to s2disk from the uswsusp package (which is now
>>>> installed by default on debian unstable). It doesn't happen if I
>>>> uninstall uswsusp and use the in-kernel suspend instead.
>>>>
>>>>
>>>>
>>> This leads me to believe that uswsusp is able to push available pages
>>> far below what is expected. It's a total guess though, I have no idea
>>> how uswsusp is implemented or how it differs from what is in kernel.
>>>
>>>
>>>
>>>> The hang doesn't happen if I boot with "init=/bin/bash" and run
>>>> s2disk. Nor does it happen if I boot normally, then switch to
>>>> single user mode ("telinit 12").
>>>>
>>>> It only happens if I've logged in to KDE. In the past, this has
>>>> indicated a problem in a network driver, since NetworkManager only
>>>> made a connection once I logged in. But it still hangs if I remove
>>>> both ath5k and atl2 before I log into KDE. (I actually tried
>>>> removing as many modules as possible: atl2, ath5k, usbcore,
>>>> snd-hda-intel, psmouse, pcspkr, battery, ac, themal, fan, and
>>>> eeepc-laptop). Perhaps it's something to do with the size of the
>>>> hibernation image.
>>>>
>>>>
>>>>
>>> I believe you are correct in that it's something to do with the size of
>>> the hibernation image and how close to the edge the kernel gets pushed
>>> as a result.
>>>
>>> Please confirm first that the two commits I mentioned above are in your
>>> kernel. If not, would you mind trying the following patch?
>>> Unfortunately, it's totally untested. The intention of the patch is to
>>> use other PCP lists if the desired one cannot be refilled.
>>>
>>> Thanks.
>>>
>>>
>> The hang happens on 2.6.32-rc8, which includes the two commits above.
>>
>>
>
> Ok, that was somewhat expected as they only had an impact if there was a
> storm of interrupts which was unlikely in this case. How about the
> additional patch?
>

The patch doesn't help. I still see the same hang (and the hung task
backtraces look the same as before).

Thanks
Alan

2009-12-02 11:49:46

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

Rafael J. Wysocki wrote:
> On Tuesday 01 December 2009, Mel Gorman wrote:
>
>> On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
>>
>>> Hi
>>>
>>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I finally
>>> got around to bisecting it, which blamed the following commit by Mel:
>>>
>>> 5f8dcc2 "page-allocator: split per-cpu list into one-list-per-migrate-type"
>>>
>>> I was able to confirm this by reverting the commit, which fixed the
>>> hang. I had to revert one other commit first to avoid a conflict:
>>>
>>> a6f9edd "page-allocator: maintain rolling count of pages to free from
>>> the PCP"
>>>
>>>
>> Which RC kernel? Specifically, are the commits
>>
>> cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when restarting an allocation attempt
>> 9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use ALLOC_HARDER
>>
>> applied?
>>
>> The latter one in particular might make a difference if s2disk is
>> pushing the system far below the watermarks. I don't suppose you know
>> where it's hanging? i.e. is it hanging in the allocator itself?
>>
>> If those patches are applied, then one difference that 5f8dcc2 makes is
>> that pages on the PCP lists but not of the right migratetype are not
>> used. Prior to that commit, an allocation might succeed even if the
>> buddy lists were empty because one of the other PCP page types would be
>> used.
>>
>>
>>> -- detail --
>>>
>>> When I suspend my EeePc 701 to disk, it sometimes hangs after writing
>>> out the hibernation image. The system is still able to resume from this
>>> image (after working around the hang by pressing the power button).
>>>
>>> This is specific to s2disk from the uswsusp package (which is now
>>> installed by default on debian unstable). It doesn't happen if I
>>> uninstall uswsusp and use the in-kernel suspend instead.
>>>
>>>
>> This leads me to believe that uswsusp is able to push available pages
>> far below what is expected. It's a total guess though, I have no idea
>> how uswsusp is implemented or how it differs from what is in kernel.
>>
>
> It doesn't differ at all in that respect. Actually, it uses the same code, but
> the distro configuration may be such that it leaves fewer available pages
> than the default in-kernel hibernation.
>
> Thanks,
> Rafael
>

It seems unintuitive that lack of memory is a problem _after we've
written out the hibernation image_. The backtrace I captured shows the
hang happens within hibernation_platform_enter()...

Hmm. Doesn't the in-kernel suspend free the in-memory image before
powering off?

int hibernate(void)
...
pr_debug("PM: writing image.\n");
error = swsusp_write(flags);
swsusp_free();
if (!error)
power_down();

Would that explain why only uswsusp is affected? Do we want to fix
snapshot_read() in user.c, so that it calls swsusp_free() once all the
data has been read?

Regards
Alan

2009-12-02 12:20:20

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

On Wed, Dec 02, 2009 at 11:49:47AM +0000, Alan Jenkins wrote:
> Rafael J. Wysocki wrote:
>> On Tuesday 01 December 2009, Mel Gorman wrote:
>>
>>> On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
>>>
>>>> Hi
>>>>
>>>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I
>>>> finally got around to bisecting it, which blamed the following
>>>> commit by Mel:
>>>>
>>>> 5f8dcc2 "page-allocator: split per-cpu list into one-list-per-migrate-type"
>>>>
>>>> I was able to confirm this by reverting the commit, which fixed the
>>>> hang. I had to revert one other commit first to avoid a conflict:
>>>>
>>>> a6f9edd "page-allocator: maintain rolling count of pages to free
>>>> from the PCP"
>>>>
>>>>
>>> Which RC kernel? Specifically, are the commits
>>>
>>> cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when restarting an allocation attempt
>>> 9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use ALLOC_HARDER
>>>
>>> applied?
>>>
>>> The latter one in particular might make a difference if s2disk is
>>> pushing the system far below the watermarks. I don't suppose you know
>>> where it's hanging? i.e. is it hanging in the allocator itself?
>>>
>>> If those patches are applied, then one difference that 5f8dcc2 makes is
>>> that pages on the PCP lists but not of the right migratetype are not
>>> used. Prior to that commit, an allocation might succeed even if the
>>> buddy lists were empty because one of the other PCP page types would be
>>> used.
>>>
>>>
>>>> -- detail --
>>>>
>>>> When I suspend my EeePc 701 to disk, it sometimes hangs after
>>>> writing out the hibernation image. The system is still able to
>>>> resume from this image (after working around the hang by pressing
>>>> the power button).
>>>>
>>>> This is specific to s2disk from the uswsusp package (which is now
>>>> installed by default on debian unstable). It doesn't happen if I
>>>> uninstall uswsusp and use the in-kernel suspend instead.
>>>>
>>>>
>>> This leads me to believe that uswsusp is able to push available pages
>>> far below what is expected. It's a total guess though, I have no idea
>>> how uswsusp is implemented or how it differs from what is in kernel.
>>>
>>
>> It doesn't differ at all in that respect. Actually, it uses the same code, but
>> the distro configuration may be such that it leaves fewer available pages
>> than the default in-kernel hibernation.
>>
>> Thanks,
>> Rafael
>>
>
> It seems unintuitive that lack of memory is a problem _after we've
> written out the hibernation image_. The backtrace I captured shows the
> hang happens within hibernation_platform_enter()...
>

I think the backtrace is also showing that it's trying to create a kernel
thread. For this to be getting locked up, memory must be exceptionally
tight. One thing that the patch changes is that in certain circumstances,
an additional 128K of memory per-CPU could be on each the PCP lists.

Ordinarily it doesn't matter because reclaim would resolve the situation
or the PCP lists would be drained very shortly after. However, if the
CPUs were no longer being used but still have pages pinned, it could be
causing a problem.

> Hmm. Doesn't the in-kernel suspend free the in-memory image before
> powering off?
>
> int hibernate(void)
> ...
> pr_debug("PM: writing image.\n");
> error = swsusp_write(flags);
> swsusp_free();
> if (!error)
> power_down();
>
>
>
> Would that explain why only uswsusp is affected? Do we want to fix
> snapshot_read() in user.c, so that it calls swsusp_free() once all the
> data has been read?
>

Could you try it please?

Another possibility would be to call drain_all_pages() before powering
off. If that makes a difference, it would confirm that pages are pinned
on PCP lists of inactive processors.

Thanks

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-02 14:25:24

[permalink] [raw]

Subject: Re: Bisected: s2disk (uswsusp only) hangs just before poweroff

On 12/2/09, Mel Gorman <[email protected]> wrote:
> On Wed, Dec 02, 2009 at 11:49:47AM +0000, Alan Jenkins wrote:
>> Rafael J. Wysocki wrote:
>>> On Tuesday 01 December 2009, Mel Gorman wrote:
>>>
>>>> On Tue, Dec 01, 2009 at 07:59:40PM +0000, Alan Jenkins wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Suspend to disk is (sometimes) hanging for me in 2.6.32-rc. I
>>>>> finally got around to bisecting it, which blamed the following
>>>>> commit by Mel:
>>>>>
>>>>> 5f8dcc2 "page-allocator: split per-cpu list into
>>>>> one-list-per-migrate-type"
>>>>>
>>>>> I was able to confirm this by reverting the commit, which fixed the
>>>>> hang. I had to revert one other commit first to avoid a conflict:
>>>>>
>>>>> a6f9edd "page-allocator: maintain rolling count of pages to free
>>>>> from the PCP"
>>>>>
>>>>>
>>>> Which RC kernel? Specifically, are the commits
>>>>
>>>> cc4a6851466039a8a688c843962a05689059ff3b always wake kswapd when
>>>> restarting an allocation attempt
>>>> 9d0ed60fe9cd1fbf57f755cd27a23ae9114d7210 Do not allow interrupts to use
>>>> ALLOC_HARDER
>>>>
>>>> applied?
>>>>
>>>> The latter one in particular might make a difference if s2disk is
>>>> pushing the system far below the watermarks. I don't suppose you know
>>>> where it's hanging? i.e. is it hanging in the allocator itself?
>>>>
>>>> If those patches are applied, then one difference that 5f8dcc2 makes is
>>>> that pages on the PCP lists but not of the right migratetype are not
>>>> used. Prior to that commit, an allocation might succeed even if the
>>>> buddy lists were empty because one of the other PCP page types would be
>>>> used.
>>>>
>>>>
>>>>> -- detail --
>>>>>
>>>>> When I suspend my EeePc 701 to disk, it sometimes hangs after
>>>>> writing out the hibernation image. The system is still able to
>>>>> resume from this image (after working around the hang by pressing
>>>>> the power button).
>>>>>
>>>>> This is specific to s2disk from the uswsusp package (which is now
>>>>> installed by default on debian unstable). It doesn't happen if I
>>>>> uninstall uswsusp and use the in-kernel suspend instead.
>>>>>
>>>>>
>>>> This leads me to believe that uswsusp is able to push available pages
>>>> far below what is expected. It's a total guess though, I have no idea
>>>> how uswsusp is implemented or how it differs from what is in kernel.
>>>>
>>>
>>> It doesn't differ at all in that respect. Actually, it uses the same
>>> code, but
>>> the distro configuration may be such that it leaves fewer available pages
>>> than the default in-kernel hibernation.
>>>
>>> Thanks,
>>> Rafael
>>>
>>
>> It seems unintuitive that lack of memory is a problem _after we've
>> written out the hibernation image_. The backtrace I captured shows the
>> hang happens within hibernation_platform_enter()...
>>
>
> I think the backtrace is also showing that it's trying to create a kernel
> thread. For this to be getting locked up, memory must be exceptionally
> tight. One thing that the patch changes is that in certain circumstances,
> an additional 128K of memory per-CPU could be on each the PCP lists.
>
> Ordinarily it doesn't matter because reclaim would resolve the situation
> or the PCP lists would be drained very shortly after. However, if the
> CPUs were no longer being used but still have pages pinned, it could be
> causing a problem.
>
>> Hmm. Doesn't the in-kernel suspend free the in-memory image before
>> powering off?
>>
>> int hibernate(void)
>> ...
>> pr_debug("PM: writing image.\n");
>> error = swsusp_write(flags);
>> swsusp_free();
>> if (!error)
>> power_down();
>>
>>
>>
>> Would that explain why only uswsusp is affected? Do we want to fix
>> snapshot_read() in user.c, so that it calls swsusp_free() once all the
>> data has been read?
>>
>
> Could you try it please?

Yes, that fixes it. I left it running over lunch, and it did 24
hibernations cycles without hanging. I'll post it and we'll see what
Rafael thinks. It's only four lines of code, and I think there's a
strong case for it.

> Another possibility would be to call drain_all_pages() before powering
> off. If that makes a difference, it would confirm that pages are pinned
> on PCP lists of inactive processors.

Probably not, since this is a single processor machine :). It's the
original EeePC model with a Celeron processor, no fancy dual cores or
hyperthreading.

Thanks
Alan

2009-12-02 14:28:12

[permalink] [raw]

Subject: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

The original in-kernel suspend (swsusp) frees the in-memory hibernation
image before powering off the machine. s2disk doesn't, so there is
_much_ less free memory when it tries to power off.

This is a gratuitous difference. The userspace suspend interface
/dev/snapshot only allows the hibernation image to be read once.
Once the s2disk program has read the last page, we can free the entire
image.

This avoids a hang after writing the hibernation image which was
triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
"page-allocator: split per-cpu list into one-list-per-migrate-type":

[top of trace lost due to screen height]
? shrink_zone
? try_to_free_pages
? isolate_pages_global
? __alloc_pages_nodemask
? kthread
? __get_free_pages
? copy_process
? kthread
? do_fork
...

INFO: task s2disk:2036 blocked for more than 120 seconds
...
Call Trace:
...
? wait_for_common
? default_wake_function
? kthread_create
? worker_thread
? create_workqueue_thread
? worker_thread
? __create_workqueue_key
? stop_machine_create
? disable_nonboot_cpus
? hibernation_platform_enter
? snapshot_ioctl
...
? sys_ioctl
...

Signed-off-by: Alan Jenkins <[email protected]>
---
kernel/power/user.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/power/user.c b/kernel/power/user.c
index bf0014d..94d0210 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -165,6 +165,10 @@ static ssize_t snapshot_read(struct file *filp, char __user *buf,
res = -EFAULT;
else
*offp = data->handle.offset;
+ } else {
+ swsusp_free();
+ memset(&data->handle, 0, sizeof(struct snapshot_handle));
+ data->ready = 0;
}

Unlock:
--
1.6.3.3

2009-12-02 21:11:10

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
> The original in-kernel suspend (swsusp) frees the in-memory hibernation
> image before powering off the machine. s2disk doesn't, so there is
> _much_ less free memory when it tries to power off.
>
> This is a gratuitous difference. The userspace suspend interface
> /dev/snapshot only allows the hibernation image to be read once.
> Once the s2disk program has read the last page, we can free the entire
> image.
>
> This avoids a hang after writing the hibernation image which was
> triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> "page-allocator: split per-cpu list into one-list-per-migrate-type":

Yes, you work around page-allocator hang. But is it right thing to do?

I mean... Power down should not be too memory-critical. Why can't new
page allocator handle that?

(And yes, when page allocator is fixed we may want to do something
like that, but...)

Pavel

> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -165,6 +165,10 @@ static ssize_t snapshot_read(struct file *filp, char __user *buf,
> res = -EFAULT;
> else
> *offp = data->handle.offset;
> + } else {
> + swsusp_free();
> + memset(&data->handle, 0, sizeof(struct snapshot_handle));
> + data->ready = 0;
> }
>
> Unlock:

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-12-02 21:46:39

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Wednesday 02 December 2009, Alan Jenkins wrote:
> The original in-kernel suspend (swsusp) frees the in-memory hibernation
> image before powering off the machine. s2disk doesn't, so there is
> _much_ less free memory when it tries to power off.
>
> This is a gratuitous difference. The userspace suspend interface
> /dev/snapshot only allows the hibernation image to be read once.
> Once the s2disk program has read the last page, we can free the entire
> image.
>
> This avoids a hang after writing the hibernation image which was
> triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> "page-allocator: split per-cpu list into one-list-per-migrate-type":
>
> [top of trace lost due to screen height]
> ? shrink_zone
> ? try_to_free_pages
> ? isolate_pages_global
> ? __alloc_pages_nodemask
> ? kthread
> ? __get_free_pages
> ? copy_process
> ? kthread
> ? do_fork
> ...
>
> INFO: task s2disk:2036 blocked for more than 120 seconds
> ...
> Call Trace:
> ...
> ? wait_for_common
> ? default_wake_function
> ? kthread_create
> ? worker_thread
> ? create_workqueue_thread
> ? worker_thread
> ? __create_workqueue_key
> ? stop_machine_create
> ? disable_nonboot_cpus
> ? hibernation_platform_enter
> ? snapshot_ioctl
> ...
> ? sys_ioctl
> ...
>
> Signed-off-by: Alan Jenkins <[email protected]>

Thanks for tracking this, great job!

I'm going to push this patch to Linus in the next batch (before 2.6.32 if I
manage to).

Best,
Rafael

> ---
> kernel/power/user.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index bf0014d..94d0210 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -165,6 +165,10 @@ static ssize_t snapshot_read(struct file *filp, char __user *buf,
> res = -EFAULT;
> else
> *offp = data->handle.offset;
> + } else {
> + swsusp_free();
> + memset(&data->handle, 0, sizeof(struct snapshot_handle));
> + data->ready = 0;
> }
>
> Unlock:
>

2009-12-02 22:07:18

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
> On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
> > The original in-kernel suspend (swsusp) frees the in-memory hibernation
> > image before powering off the machine. s2disk doesn't, so there is
> > _much_ less free memory when it tries to power off.
> >
> > This is a gratuitous difference. The userspace suspend interface
> > /dev/snapshot only allows the hibernation image to be read once.
> > Once the s2disk program has read the last page, we can free the entire
> > image.
> >
> > This avoids a hang after writing the hibernation image which was
> > triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> > "page-allocator: split per-cpu list into one-list-per-migrate-type":
>
> Yes, you work around page-allocator hang. But is it right thing to do?
>

What's wrong with it? The hang is likely because the allocator has no
memory to work with. The patch in question makes small changes to the
amount of available memory but it shouldn't matter on uni-core. Some
structures are slightly larger but it's extremely borderline. I'm at a
loss to explain actually why it makes a difference untill things were
extremely borderline to begin with.

> I mean... Power down should not be too memory-critical. Why can't new
> page allocator handle that?
>

The allocator isn't new. I wouldn't have expected power-down to make any
allocation at all but clearly it is.

> (And yes, when page allocator is fixed we may want to do something
> like that, but...)
>

At the moment, I'm not sure what needs fixing in there. There is no
memory and it gets stuck in a loop. I better stand up if I'm going to
pull something out of my ass :/

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-02 22:15:31

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
> On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
> > On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
> > > The original in-kernel suspend (swsusp) frees the in-memory hibernation
> > > image before powering off the machine. s2disk doesn't, so there is
> > > _much_ less free memory when it tries to power off.
> > >
> > > This is a gratuitous difference. The userspace suspend interface
> > > /dev/snapshot only allows the hibernation image to be read once.
> > > Once the s2disk program has read the last page, we can free the entire
> > > image.
> > >
> > > This avoids a hang after writing the hibernation image which was
> > > triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> > > "page-allocator: split per-cpu list into one-list-per-migrate-type":
> >
> > Yes, you work around page-allocator hang. But is it right thing to do?
> >
>
> What's wrong with it? The hang is likely because the allocator has no
> memory to work with. The patch in question makes small changes to the
> amount of available memory but it shouldn't matter on uni-core. Some
> structures are slightly larger but it's extremely borderline. I'm at a
> loss to explain actually why it makes a difference untill things were
> extremely borderline to begin with.

We reserve 4MB, for such purposes, and we already wrote image to disk
with such constrains, so memory should not be _too_ tight.

Can you try increasing PAGES_FOR_IO to 8MB or something like that?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-12-02 22:25:17

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Wed, Dec 02, 2009 at 11:15:24PM +0100, Pavel Machek wrote:
> On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
> > On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
> > > On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
> > > > The original in-kernel suspend (swsusp) frees the in-memory hibernation
> > > > image before powering off the machine. s2disk doesn't, so there is
> > > > _much_ less free memory when it tries to power off.
> > > >
> > > > This is a gratuitous difference. The userspace suspend interface
> > > > /dev/snapshot only allows the hibernation image to be read once.
> > > > Once the s2disk program has read the last page, we can free the entire
> > > > image.
> > > >
> > > > This avoids a hang after writing the hibernation image which was
> > > > triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> > > > "page-allocator: split per-cpu list into one-list-per-migrate-type":
> > >
> > > Yes, you work around page-allocator hang. But is it right thing to do?
> > >
> >
> > What's wrong with it? The hang is likely because the allocator has no
> > memory to work with. The patch in question makes small changes to the
> > amount of available memory but it shouldn't matter on uni-core. Some
> > structures are slightly larger but it's extremely borderline. I'm at a
> > loss to explain actually why it makes a difference untill things were
> > extremely borderline to begin with.
>
> We reserve 4MB, for such purposes, and we already wrote image to disk
> with such constrains, so memory should not be _too_ tight.
>
> Can you try increasing PAGES_FOR_IO to 8MB or something like that?
>

What's wrong with just freeing the memory that is no longer required?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-02 23:22:13

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Wednesday 02 December 2009, Mel Gorman wrote:
> On Wed, Dec 02, 2009 at 11:15:24PM +0100, Pavel Machek wrote:
> > On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
> > > On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
> > > > On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
> > > > > The original in-kernel suspend (swsusp) frees the in-memory hibernation
> > > > > image before powering off the machine. s2disk doesn't, so there is
> > > > > _much_ less free memory when it tries to power off.
> > > > >
> > > > > This is a gratuitous difference. The userspace suspend interface
> > > > > /dev/snapshot only allows the hibernation image to be read once.
> > > > > Once the s2disk program has read the last page, we can free the entire
> > > > > image.
> > > > >
> > > > > This avoids a hang after writing the hibernation image which was
> > > > > triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> > > > > "page-allocator: split per-cpu list into one-list-per-migrate-type":
> > > >
> > > > Yes, you work around page-allocator hang. But is it right thing to do?
> > > >
> > >
> > > What's wrong with it? The hang is likely because the allocator has no
> > > memory to work with. The patch in question makes small changes to the
> > > amount of available memory but it shouldn't matter on uni-core. Some
> > > structures are slightly larger but it's extremely borderline. I'm at a
> > > loss to explain actually why it makes a difference untill things were
> > > extremely borderline to begin with.
> >
> > We reserve 4MB, for such purposes, and we already wrote image to disk
> > with such constrains, so memory should not be _too_ tight.
> >
> > Can you try increasing PAGES_FOR_IO to 8MB or something like that?
> >
>
> What's wrong with just freeing the memory that is no longer required?

Nothing, and the Alan's patch is going to Linus.

Thanks,
Rafael

2009-12-03 07:53:06

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Wed 2009-12-02 22:25:16, Mel Gorman wrote:
> On Wed, Dec 02, 2009 at 11:15:24PM +0100, Pavel Machek wrote:
> > On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
> > > On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
> > > > On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
> > > > > The original in-kernel suspend (swsusp) frees the in-memory hibernation
> > > > > image before powering off the machine. s2disk doesn't, so there is
> > > > > _much_ less free memory when it tries to power off.
> > > > >
> > > > > This is a gratuitous difference. The userspace suspend interface
> > > > > /dev/snapshot only allows the hibernation image to be read once.
> > > > > Once the s2disk program has read the last page, we can free the entire
> > > > > image.
> > > > >
> > > > > This avoids a hang after writing the hibernation image which was
> > > > > triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> > > > > "page-allocator: split per-cpu list into one-list-per-migrate-type":
> > > >
> > > > Yes, you work around page-allocator hang. But is it right thing to do?
> > > >
> > >
> > > What's wrong with it? The hang is likely because the allocator has no
> > > memory to work with. The patch in question makes small changes to the
> > > amount of available memory but it shouldn't matter on uni-core. Some
> > > structures are slightly larger but it's extremely borderline. I'm at a
> > > loss to explain actually why it makes a difference untill things were
> > > extremely borderline to begin with.
> >
> > We reserve 4MB, for such purposes, and we already wrote image to disk
> > with such constrains, so memory should not be _too_ tight.
> >
> > Can you try increasing PAGES_FOR_IO to 8MB or something like that?
> >
>
> What's wrong with just freeing the memory that is no longer required?

Nothing. But 4MB was enough to power down before, it is not enough
now, and I'd like to understand why.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-12-03 13:03:06

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

Pavel Machek wrote:
> On Wed 2009-12-02 22:25:16, Mel Gorman wrote:
>
>> On Wed, Dec 02, 2009 at 11:15:24PM +0100, Pavel Machek wrote:
>>
>>> On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
>>>
>>>> On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
>>>>
>>>>> On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
>>>>>
>>>>>> The original in-kernel suspend (swsusp) frees the in-memory hibernation
>>>>>> image before powering off the machine. s2disk doesn't, so there is
>>>>>> _much_ less free memory when it tries to power off.
>>>>>>
>>>>>> This is a gratuitous difference. The userspace suspend interface
>>>>>> /dev/snapshot only allows the hibernation image to be read once.
>>>>>> Once the s2disk program has read the last page, we can free the entire
>>>>>> image.
>>>>>>
>>>>>> This avoids a hang after writing the hibernation image which was
>>>>>> triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
>>>>>> "page-allocator: split per-cpu list into one-list-per-migrate-type":
>>>>>>
>>>>> Yes, you work around page-allocator hang. But is it right thing to do?
>>>>>
>>>>>
>>>> What's wrong with it? The hang is likely because the allocator has no
>>>> memory to work with. The patch in question makes small changes to the
>>>> amount of available memory but it shouldn't matter on uni-core. Some
>>>> structures are slightly larger but it's extremely borderline. I'm at a
>>>> loss to explain actually why it makes a difference untill things were
>>>> extremely borderline to begin with.
>>>>
>>> We reserve 4MB, for such purposes, and we already wrote image to disk
>>> with such constrains, so memory should not be _too_ tight.
>>>
>>> Can you try increasing PAGES_FOR_IO to 8MB or something like that?
>>>
>>>
>> What's wrong with just freeing the memory that is no longer required?
>>
>
> Nothing. But 4MB was enough to power down before, it is not enough
> now, and I'd like to understand why.
> Pavel
>

Here's a new datum:

Applying this patch has left a less frequent hang. So far it has
happened twice. (Once playing last night, and once today testing
hibernation with KMS enabled).

This hang happens at a different point. It happens _before_ writing out
the hibernation image. That is, I don't see the textual progress bar,
and if I force a power-cycle then it doesn't resume (and complains about
uncleanly unmounted filesystems).

Here is the backtrace:

[top of screen]
s2disk D c1c05580 0 5988 5809 0x00000000
...
Call Trace:
...
? wait_for_common
? default_wake_function
? kthread_create
? worker_thread
? create_workqueue_thread
? worker_thread
? __create_workqueue_thread
? stop_machine_create
? disable_nonboot_cpus
? hibernation_snapshot
? snapshot_ioctl
...
? sys_ioctl

It looks like hibernation_snapshot() calls disable_nonboot_cpus()
_before_ we allocate the hibernation image. (I.e. before
swsusp_arch_suspend(), which calls swsusp_save()).

So I think Pavel's right, we still need to work out what's happening here.

Regards
Alan

2009-12-03 14:50:20

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Thu, Dec 03, 2009 at 12:57:28PM +0000, Alan Jenkins wrote:
> Pavel Machek wrote:
>> On Wed 2009-12-02 22:25:16, Mel Gorman wrote:
>>
>>> On Wed, Dec 02, 2009 at 11:15:24PM +0100, Pavel Machek wrote:
>>>
>>>> On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
>>>>
>>>>> On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
>>>>>
>>>>>> On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
>>>>>>
>>>>>>> The original in-kernel suspend (swsusp) frees the in-memory hibernation
>>>>>>> image before powering off the machine. s2disk doesn't, so there is
>>>>>>> _much_ less free memory when it tries to power off.
>>>>>>>
>>>>>>> This is a gratuitous difference. The userspace suspend interface
>>>>>>> /dev/snapshot only allows the hibernation image to be read once.
>>>>>>> Once the s2disk program has read the last page, we can free the entire
>>>>>>> image.
>>>>>>>
>>>>>>> This avoids a hang after writing the hibernation image which was
>>>>>>> triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
>>>>>>> "page-allocator: split per-cpu list into one-list-per-migrate-type":
>>>>>>>
>>>>>> Yes, you work around page-allocator hang. But is it right thing to do?
>>>>>>
>>>>>>
>>>>> What's wrong with it? The hang is likely because the allocator has no
>>>>> memory to work with. The patch in question makes small changes to the
>>>>> amount of available memory but it shouldn't matter on uni-core. Some
>>>>> structures are slightly larger but it's extremely borderline. I'm at a
>>>>> loss to explain actually why it makes a difference untill things were
>>>>> extremely borderline to begin with.
>>>>>
>>>> We reserve 4MB, for such purposes, and we already wrote image to disk
>>>> with such constrains, so memory should not be _too_ tight.
>>>>
>>>> Can you try increasing PAGES_FOR_IO to 8MB or something like that?
>>>>
>>>>
>>> What's wrong with just freeing the memory that is no longer required?
>>>
>>
>> Nothing. But 4MB was enough to power down before, it is not enough
>> now, and I'd like to understand why.
>> Pavel
>>
>
> Here's a new datum:
>
> Applying this patch has left a less frequent hang. So far it has
> happened twice. (Once playing last night, and once today testing
> hibernation with KMS enabled).
>
> This hang happens at a different point. It happens _before_ writing out
> the hibernation image. That is, I don't see the textual progress bar,
> and if I force a power-cycle then it doesn't resume (and complains about
> uncleanly unmounted filesystems).
>
> Here is the backtrace:
>
> [top of screen]
> s2disk D c1c05580 0 5988 5809 0x00000000
> ...
> Call Trace:
> ...
> ? wait_for_common
> ? default_wake_function
> ? kthread_create
> ? worker_thread
> ? create_workqueue_thread
> ? worker_thread
> ? __create_workqueue_thread
> ? stop_machine_create
> ? disable_nonboot_cpus
> ? hibernation_snapshot
> ? snapshot_ioctl
> ...
> ? sys_ioctl
>

Can you reconfirm that backing out both of those patches makes this 100%
reliable or is it just a lot harder to trigger. It does not even appear
that it's locked up within the page allocator at this trace message.
Assuming c1c05580 is where it's stuck at, where does addr2line say that
is (requires CONFIG_DEBUG_INFO) ?

> It looks like hibernation_snapshot() calls disable_nonboot_cpus()
> _before_ we allocate the hibernation image. (I.e. before
> swsusp_arch_suspend(), which calls swsusp_save()).
>

I'm not that familiar with the area but considering where we are getting
stuck and what the path affected, I thought it might be CPU related.
There is a patch below that prints debugging messages to show how the
CPU is being taken down with respect to PCP draining in case something
has changed there. It also puts in some debugging code in the most
likely place to be infinite looping due to the patch.

> So I think Pavel's right, we still need to work out what's happening here.
>

Can you apply the following patch please and retry?

Two things to watch out for. First, do either of the BUG_ON triggers?
Second, for the TRACE messages, do they always appear in the order of
"draining pages" and then "deleting pagesets"?

Thanks

==== CUT HERE ====
page allocator,suspend: Debugging patch

---
mm/page_alloc.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b11915d..f36d7bd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -557,10 +557,13 @@ static void free_pcppages_bulk(struct zone *zone, int count,
zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
zone->pages_scanned = 0;

+ BUG_ON(count > pcp->count);
+
__mod_zone_page_state(zone, NR_FREE_PAGES, count);
while (count) {
struct page *page;
struct list_head *list;
+ int debug_migratetype = -1;

/*
* Remove pages from lists in a round-robin fashion. A
@@ -573,6 +576,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
batch_free++;
if (++migratetype == MIGRATE_PCPTYPES)
migratetype = 0;
+ if (debug_migratetype == -1)
+ debug_migratetype = migratetype;
+ else
+ BUG_ON(migratetype == debug_migratetype);
list = &pcp->lists[migratetype];
} while (list_empty(list));

@@ -3251,6 +3258,7 @@ static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
case CPU_DEAD_FROZEN:
+ printk("TRACE: CPU %d deleting pagesets\n", cpu);
free_zone_pagesets(cpu);
break;
default:
@@ -4549,6 +4557,7 @@ static int page_alloc_cpu_notify(struct notifier_block *self,
int cpu = (unsigned long)hcpu;

if (action == CPU_DEAD || action == CPU_DEAD_FROZEN) {
+ printk("TRACE: CPU %d draining pages\n", cpu);
drain_pages(cpu);

/*

2009-12-03 19:49:50

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Thursday 03 December 2009, Alan Jenkins wrote:
> Pavel Machek wrote:
> > On Wed 2009-12-02 22:25:16, Mel Gorman wrote:
> >
> >> On Wed, Dec 02, 2009 at 11:15:24PM +0100, Pavel Machek wrote:
> >>
> >>> On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
> >>>
> >>>> On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
> >>>>
> >>>>> On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
> >>>>>
> >>>>>> The original in-kernel suspend (swsusp) frees the in-memory hibernation
> >>>>>> image before powering off the machine. s2disk doesn't, so there is
> >>>>>> _much_ less free memory when it tries to power off.
> >>>>>>
> >>>>>> This is a gratuitous difference. The userspace suspend interface
> >>>>>> /dev/snapshot only allows the hibernation image to be read once.
> >>>>>> Once the s2disk program has read the last page, we can free the entire
> >>>>>> image.
> >>>>>>
> >>>>>> This avoids a hang after writing the hibernation image which was
> >>>>>> triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
> >>>>>> "page-allocator: split per-cpu list into one-list-per-migrate-type":
> >>>>>>
> >>>>> Yes, you work around page-allocator hang. But is it right thing to do?
> >>>>>
> >>>>>
> >>>> What's wrong with it? The hang is likely because the allocator has no
> >>>> memory to work with. The patch in question makes small changes to the
> >>>> amount of available memory but it shouldn't matter on uni-core. Some
> >>>> structures are slightly larger but it's extremely borderline. I'm at a
> >>>> loss to explain actually why it makes a difference untill things were
> >>>> extremely borderline to begin with.
> >>>>
> >>> We reserve 4MB, for such purposes, and we already wrote image to disk
> >>> with such constrains, so memory should not be _too_ tight.
> >>>
> >>> Can you try increasing PAGES_FOR_IO to 8MB or something like that?
> >>>
> >>>
> >> What's wrong with just freeing the memory that is no longer required?
> >>
> >
> > Nothing. But 4MB was enough to power down before, it is not enough
> > now, and I'd like to understand why.
> > Pavel
> >
>
> Here's a new datum:
>
> Applying this patch has left a less frequent hang. So far it has
> happened twice. (Once playing last night, and once today testing
> hibernation with KMS enabled).
>
> This hang happens at a different point. It happens _before_ writing out
> the hibernation image. That is, I don't see the textual progress bar,
> and if I force a power-cycle then it doesn't resume (and complains about
> uncleanly unmounted filesystems).
>
> Here is the backtrace:
>
> [top of screen]
> s2disk D c1c05580 0 5988 5809 0x00000000
> ...
> Call Trace:
> ...
> ? wait_for_common
> ? default_wake_function
> ? kthread_create
> ? worker_thread
> ? create_workqueue_thread
> ? worker_thread
> ? __create_workqueue_thread
> ? stop_machine_create
> ? disable_nonboot_cpus
> ? hibernation_snapshot
> ? snapshot_ioctl
> ...
> ? sys_ioctl
>
>
> It looks like hibernation_snapshot() calls disable_nonboot_cpus()
> _before_ we allocate the hibernation image. (I.e. before
> swsusp_arch_suspend(), which calls swsusp_save()).
>
> So I think Pavel's right, we still need to work out what's happening here.

OK, patch dropped.

Thanks,
Rafael

2009-12-03 20:16:18

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

Hi!

> >>>>What's wrong with it? The hang is likely because the allocator has no
> >>>>memory to work with. The patch in question makes small changes to the
> >>>>amount of available memory but it shouldn't matter on uni-core. Some
> >>>>structures are slightly larger but it's extremely borderline. I'm at a
> >>>>loss to explain actually why it makes a difference untill things were
> >>>>extremely borderline to begin with.
> >>>We reserve 4MB, for such purposes, and we already wrote image to disk
> >>>with such constrains, so memory should not be _too_ tight.
> >>>
> >>>Can you try increasing PAGES_FOR_IO to 8MB or something like that?
> >>>
> >>What's wrong with just freeing the memory that is no longer required?
> >
> >Nothing. But 4MB was enough to power down before, it is not enough
> >now, and I'd like to understand why.
>
> Here's a new datum:
>
> Applying this patch has left a less frequent hang. So far it has
> happened twice. (Once playing last night, and once today testing
> hibernation with KMS enabled).
>
> This hang happens at a different point. It happens _before_ writing
> out the hibernation image. That is, I don't see the textual
> progress bar, and if I force a power-cycle then it doesn't resume
> (and complains about uncleanly unmounted filesystems).

Can you drop the patches, and try increasing PAGES_FOR_IO to 8MB? That
should give you enough memory...
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-12-08 00:37:34

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On 12/3/09, Mel Gorman <[email protected]> wrote:
> On Thu, Dec 03, 2009 at 12:57:28PM +0000, Alan Jenkins wrote:
>> Pavel Machek wrote:
>>> On Wed 2009-12-02 22:25:16, Mel Gorman wrote:
>>>
>>>> On Wed, Dec 02, 2009 at 11:15:24PM +0100, Pavel Machek wrote:
>>>>
>>>>> On Wed 2009-12-02 22:07:18, Mel Gorman wrote:
>>>>>
>>>>>> On Wed, Dec 02, 2009 at 10:11:07PM +0100, Pavel Machek wrote:
>>>>>>
>>>>>>> On Wed 2009-12-02 14:28:12, Alan Jenkins wrote:
>>>>>>>
>>>>>>>> The original in-kernel suspend (swsusp) frees the in-memory
>>>>>>>> hibernation
>>>>>>>> image before powering off the machine. s2disk doesn't, so there is
>>>>>>>> _much_ less free memory when it tries to power off.
>>>>>>>>
>>>>>>>> This is a gratuitous difference. The userspace suspend interface
>>>>>>>> /dev/snapshot only allows the hibernation image to be read once.
>>>>>>>> Once the s2disk program has read the last page, we can free the
>>>>>>>> entire
>>>>>>>> image.
>>>>>>>>
>>>>>>>> This avoids a hang after writing the hibernation image which was
>>>>>>>> triggered by commit 5f8dcc21211a3d4e3a7a5ca366b469fb88117f61
>>>>>>>> "page-allocator: split per-cpu list into one-list-per-migrate-type":
>>>>>>>>
>>>>>>> Yes, you work around page-allocator hang. But is it right thing to
>>>>>>> do?

>> Here's a new datum:
>>
>> Applying this patch has left a less frequent hang. So far it has
>> happened twice. (Once playing last night, and once today testing
>> hibernation with KMS enabled).
>>
>> This hang happens at a different point. It happens _before_ writing out
>> the hibernation image. That is, I don't see the textual progress bar,
>> and if I force a power-cycle then it doesn't resume (and complains about
>> uncleanly unmounted filesystems).
>>
>> Here is the backtrace:
>>
>> [top of screen]
>> s2disk D c1c05580 0 5988 5809 0x00000000
>> ...
>> Call Trace:
>> ...
>> ? wait_for_common
>> ? default_wake_function
>> ? kthread_create
>> ? worker_thread
>> ? create_workqueue_thread
>> ? worker_thread
>> ? __create_workqueue_thread
>> ? stop_machine_create
>> ? disable_nonboot_cpus
>> ? hibernation_snapshot
>> ? snapshot_ioctl
>> ...
>> ? sys_ioctl
>>

> Can you reconfirm that backing out both of those patches makes this 100%
> reliable or is it just a lot harder to trigger. It does not even appear
> that it's locked up within the page allocator at this trace message.
> Assuming c1c05580 is where it's stuck at, where does addr2line say that
> is (requires CONFIG_DEBUG_INFO) ?

The new hang happened with only one patch applied (my "uswsusp:
automatically free the in-memory image once s2disk has finished with
it").

I was able to capture a longer version of the above backtrace by using
KMS [1]. This pre-writeout hang is similar to the post-writeout hang
which occurred on vanilla 2-6.32-rc8 [2]. In both cases the s2disk
process is hanging in disable_nonboot_cpus(). [Which is in turn
blocked on stop_machine_create(), which is apparently failing to
allocate pages for a new task]. The only difference is where
disable_nonboot_cpus() is called from.

And then, the problem went away :-(. I was unable to reproduce either
hang, even using the same unpatched kernel binaries as before. Sorry.

[1] Infrequent pre-writeout hang (new, longer backtrace):
<http://picasaweb.google.com/Alan.Christopher.Jenkins/Screenshots#5412613393538769410>

[2] Frequent post-writeout hang:
<http://picasaweb.google.com/Alan.Christopher.Jenkins/Screenshots#5410594126006567282>

> On Thu, Dec 03, 2009 at 12:57:28PM +0000, Alan Jenkins wrote:
>> It looks like hibernation_snapshot() calls disable_nonboot_cpus()
>> _before_ we allocate the hibernation image. (I.e. before
>> swsusp_arch_suspend(), which calls swsusp_save()).
>>

Sorry, I was wrong here. The hang occurs after "PM: Preallocating
image memory...". So it's a bit less mysterious; we can expect to be
low on memory at this point (although it's still a mystery why we
should run out completely).

> I'm not that familiar with the area but considering where we are getting
> stuck and what the path affected, I thought it might be CPU related.
> There is a patch below that prints debugging messages to show how the
> CPU is being taken down with respect to PCP draining in case something
> has changed there. It also puts in some debugging code in the most
> likely place to be infinite looping due to the patch.
>
>> So I think Pavel's right, we still need to work out what's happening here.
>>
>
> Can you apply the following patch please and retry?
>
> Two things to watch out for. First, do either of the BUG_ON triggers?
> Second, for the TRACE messages, do they always appear in the order of
> "draining pages" and then "deleting pagesets"?

I went ahead and tried this, even though I couldn't reproduce the hang anymore.

It didn't BUG. It didn't show any TRACEs either. I guess the cpu
notifiers weren't called at all, since no cpu hotplug is necessary on
my uni-core system.

So...
It looks like I can't provide any more data.

I can confidently say that post-writeout hangs would be avoided by my
patch. But I don't think we want to apply it, because it didn't
solve the pre-writeout hang - which appears to have a similar root
cause. The post-writeout hang happened to be easier to reproduce, and
it was better in that it didn't cause data loss / fsck (the system
could still resume).

As a curious tester, I would favour not increasing PAGES_FOR_IO on
similar grounds. Call me naive but 4Mb should be plenty, at least for
this system. That said, I wouldn't mind if we reserve an extra 4Mb to
avoid the hang, _and then abort the hibernation if we actually have to
use it_. (We can't simply print a warning message; no-one would see
it because it wouldn't survive the power-down).

Thanks
Alan

2009-12-11 10:53:54

[permalink] [raw]

Subject: Re: [PATCH] uswsusp: automatically free the in-memory image once s2disk has finished with it

On Tue, Dec 08, 2009 at 12:37:36AM +0000, Alan Jenkins wrote:
> >> <SNIP>
> >> Here's a new datum:
> >>
> >> Applying this patch has left a less frequent hang. So far it has
> >> happened twice. (Once playing last night, and once today testing
> >> hibernation with KMS enabled).
> >>
> >> This hang happens at a different point. It happens _before_ writing out
> >> the hibernation image. That is, I don't see the textual progress bar,
> >> and if I force a power-cycle then it doesn't resume (and complains about
> >> uncleanly unmounted filesystems).
> >>
> >> Here is the backtrace:
> >>
> >> [top of screen]
> >> s2disk D c1c05580 0 5988 5809 0x00000000
> >> ...
> >> Call Trace:
> >> ...
> >> ? wait_for_common
> >> ? default_wake_function
> >> ? kthread_create
> >> ? worker_thread
> >> ? create_workqueue_thread
> >> ? worker_thread
> >> ? __create_workqueue_thread
> >> ? stop_machine_create
> >> ? disable_nonboot_cpus
> >> ? hibernation_snapshot
> >> ? snapshot_ioctl
> >> ...
> >> ? sys_ioctl
> >>
>
> > Can you reconfirm that backing out both of those patches makes this 100%
> > reliable or is it just a lot harder to trigger. It does not even appear
> > that it's locked up within the page allocator at this trace message.
> > Assuming c1c05580 is where it's stuck at, where does addr2line say that
> > is (requires CONFIG_DEBUG_INFO) ?
>
> The new hang happened with only one patch applied (my "uswsusp:
> automatically free the in-memory image once s2disk has finished with
> it").
>

Ok. I'm learning towards believing that the system is extremely
borderline and what c1c05580 is doing is changing very slightly how many
pages are available. Why it makes a difference on uni-core, I have no
idea but it could be very small differences in available memory as it
does increase the size of some in-kernel structures.

> I was able to capture a longer version of the above backtrace by using
> KMS [1]. This pre-writeout hang is similar to the post-writeout hang
> which occurred on vanilla 2-6.32-rc8 [2]. In both cases the s2disk
> process is hanging in disable_nonboot_cpus(). [Which is in turn
> blocked on stop_machine_create(), which is apparently failing to
> allocate pages for a new task]. The only difference is where
> disable_nonboot_cpus() is called from.
>
> And then, the problem went away :-(. I was unable to reproduce either
> hang, even using the same unpatched kernel binaries as before. Sorry.
>
> [1] Infrequent pre-writeout hang (new, longer backtrace):
> <http://picasaweb.google.com/Alan.Christopher.Jenkins/Screenshots#5412613393538769410>
>
> [2] Frequent post-writeout hang:
> <http://picasaweb.google.com/Alan.Christopher.Jenkins/Screenshots#5410594126006567282>
>
> > On Thu, Dec 03, 2009 at 12:57:28PM +0000, Alan Jenkins wrote:
> >> It looks like hibernation_snapshot() calls disable_nonboot_cpus()
> >> _before_ we allocate the hibernation image. (I.e. before
> >> swsusp_arch_suspend(), which calls swsusp_save()).
> >>
>
> Sorry, I was wrong here. The hang occurs after "PM: Preallocating
> image memory...". So it's a bit less mysterious; we can expect to be
> low on memory at this point (although it's still a mystery why we
> should run out completely).
>
> > I'm not that familiar with the area but considering where we are getting
> > stuck and what the path affected, I thought it might be CPU related.
> > There is a patch below that prints debugging messages to show how the
> > CPU is being taken down with respect to PCP draining in case something
> > has changed there. It also puts in some debugging code in the most
> > likely place to be infinite looping due to the patch.
> >
> >> So I think Pavel's right, we still need to work out what's happening here.
> >>
> >
> > Can you apply the following patch please and retry?
> >
> > Two things to watch out for. First, do either of the BUG_ON triggers?
> > Second, for the TRACE messages, do they always appear in the order of
> > "draining pages" and then "deleting pagesets"?
>
> I went ahead and tried this, even though I couldn't reproduce the hang anymore.
>
> It didn't BUG. It didn't show any TRACEs either. I guess the cpu
> notifiers weren't called at all, since no cpu hotplug is necessary on
> my uni-core system.
>

Ok, at least it's not something that is obviously very wrong.

> So...
> It looks like I can't provide any more data.
>
> I can confidently say that post-writeout hangs would be avoided by my
> patch. But I don't think we want to apply it, because it didn't
> solve the pre-writeout hang - which appears to have a similar root
> cause.

I think the underlying cause is very tight memory space. A reasonable
approach is to apply your patch for the post-writeout case because why
hold onto a large chunk of memory that is not in use? For the
pre-writeout pause, up the PAGES_FOR_IO. It wouldn't be the first time
the kernels memory requirements grew :(

> The post-writeout hang happened to be easier to reproduce, and
> it was better in that it didn't cause data loss / fsck (the system
> could still resume).
>
> As a curious tester, I would favour not increasing PAGES_FOR_IO on
> similar grounds. Call me naive but 4Mb should be plenty, at least for
> this system. That said, I wouldn't mind if we reserve an extra 4Mb to
> avoid the hang, _and then abort the hibernation if we actually have to
> use it_. (We can't simply print a warning message; no-one would see
> it because it wouldn't survive the power-down).
>

At one level, I can see your point. It'd prove for example that the low
memory was the problem but how should a user respond when hibernation
fails because 4MB was not enough?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-14 11:08:49