2004-09-24 02:23:18

by Kevin Fenzi

[permalink] [raw]
Subject: 2.6.9-rc2-mm1 swsusp bug report.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Was trying to swsusp my 2.6.9-rc2-mm1 laptop tonight. It churned for a
while, but didn't hibernate. Here are the messages.

I do have PREEMPT and HIMEM enabled.

Sep 23 16:53:37 voldemort kernel: Stopping tasks: ==================================================
=================================================|
Sep 23 16:53:37 voldemort kernel: Freeing memory... ^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H
/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-
^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^
H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H
/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-
^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^
H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H
/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-
^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^
H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H
/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\
Sep 23 16:53:37 voldemort kernel: ..................................................................
....................................................................................................
.........................swsusp: Need to copy 34850 pages
Sep 23 16:53:37 voldemort kernel: hibernate: page allocation failure. order:8, mode:0x120
Sep 23 16:53:37 voldemort kernel: [<c013fc1e>] __alloc_pages+0x21e/0x3e0
Sep 23 16:53:37 voldemort kernel: [<c013fe05>] __get_free_pages+0x25/0x3f
Sep 23 16:53:37 voldemort kernel: [<c01373b5>] alloc_pagedir+0x1f/0x6b
Sep 23 16:53:37 voldemort kernel: [<c01374e3>] swsusp_alloc+0x2c/0x62
Sep 23 16:53:37 voldemort kernel: [<c0137549>] suspend_prepare_image+0x30/0x6e
Sep 23 16:53:37 voldemort kernel: [<c0284fea>] swsusp_arch_suspend+0x2a/0x2c
Sep 23 16:53:37 voldemort kernel: [<c01375d5>] swsusp_suspend+0x24/0x33
Sep 23 16:53:37 voldemort kernel: [<c01379c2>] pm_suspend_disk+0x28/0x7e
Sep 23 16:53:37 voldemort kernel: [<c0135fd0>] enter_state+0x91/0x95
Sep 23 16:53:39 voldemort kernel: [<c013fc30>] __alloc_pages+0x230/0x3e0
Sep 23 16:53:39 voldemort kernel: [<c01360fb>] state_store+0xb1/0xc8
Sep 23 16:53:39 voldemort kernel: [<c0192748>] subsys_attr_store+0x3a/0x3e
Sep 23 16:53:39 voldemort kernel: [<c01929ce>] flush_write_buffer+0x3e/0x4a
Sep 23 16:53:39 voldemort kernel: [<c0192a5c>] sysfs_write_file+0x82/0x98
Sep 23 16:53:39 voldemort kernel: [<c01929da>] sysfs_write_file+0x0/0x98
Sep 23 16:53:39 voldemort kernel: [<c015926d>] vfs_write+0xd0/0x135
Sep 23 16:53:39 voldemort kernel: [<c015882b>] filp_close+0x59/0x86
Sep 23 16:53:39 voldemort kernel: [<c01593a3>] sys_write+0x51/0x80
Sep 23 16:53:39 voldemort kernel: [<c0106019>] sysenter_past_esp+0x52/0x71
Sep 23 16:53:39 voldemort kernel: swsusp: Restoring Highmem
Sep 23 16:53:39 voldemort kernel: ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ
11
Sep 23 16:53:39 voldemort kernel: ACPI: PCI interrupt 0000:00:1f.5[B] -> GSI 5 (level, low) -> IRQ 5
Sep 23 16:53:39 voldemort kernel: PCI: Setting latency timer of device 0000:00:1f.5 to 64
Sep 23 16:53:39 voldemort kernel: Restarting tasks... done

kevin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBU4RM3imCezTjY0ERAgEQAJ4qj2PmNsL/5ao+3swoOxioop7G1gCeLLUg
ZOWJKH6q3k9UMG8xaYDGLx0=
=Csmg
-----END PGP SIGNATURE-----


2004-09-24 20:40:16

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> Was trying to swsusp my 2.6.9-rc2-mm1 laptop tonight. It churned for a
> while, but didn't hibernate. Here are the messages.
>
> ....................................................................................................
> .........................swsusp: Need to copy 34850 pages
> Sep 23 16:53:37 voldemort kernel: hibernate: page allocation failure. order:8, mode:0x120
> Sep 23 16:53:37 voldemort kernel:
Out of memory... Try again with less loaded system.
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-09-24 21:11:26

by Kevin Fenzi

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> "Pavel" == Pavel Machek <[email protected]> writes:

Pavel> Hi!
>> Was trying to swsusp my 2.6.9-rc2-mm1 laptop tonight. It churned
>> for a while, but didn't hibernate. Here are the messages.
>>
>> ....................................................................................................
>> .........................swsusp: Need to copy 34850 pages Sep 23
>> 16:53:37 voldemort kernel: hibernate: page allocation
>> failure. order:8, mode:0x120 Sep 23 16:53:37 voldemort kernel:
Pavel> Out of memory... Try again with less loaded system.

The system was no more loaded than usual. I have 1GB memory and 4GB of
swap defined. I almost never touch swap. It might have been 100mb into
the 4Gb of swap when this happened.

What would cause it to be out of memory?
swsup needs to be reliable... rebooting when you are using your memory
kinda defeats the purpose of swsusp.

Felipe W Damasio <[email protected]> sent me a patch, but I
haven't had a chance to try it yet:

- --- linux-2.6.9-rc2-mm2/kernel/power/swsusp.c.orig 2004-09-23 23:46:49.292975768 -0300
+++ linux-2.6.9-rc2-mm2/kernel/power/swsusp.c 2004-09-24 00:07:01.933626368 -0300
@@ -657,6 +657,9 @@
int diff = 0;
int order = 0;

+ order = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages));
+ nr_copy_pages += 1 << order;
+
do {
diff = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages)) - order;
if (diff) {


kevin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBVI0m3imCezTjY0ERAgI1AJ0VatDEm27SAh2dvS65XwNNpReSEACeNBkn
uRXNP9tQcUlEZ1BAKON1nSo=
=3rnm
-----END PGP SIGNATURE-----

2004-09-24 23:40:29

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Sat, 2004-09-25 at 07:09, Kevin Fenzi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> >>>>> "Pavel" == Pavel Machek <[email protected]> writes:
>
> Pavel> Hi!
> >> Was trying to swsusp my 2.6.9-rc2-mm1 laptop tonight. It churned
> >> for a while, but didn't hibernate. Here are the messages.
> >>
> >> ....................................................................................................
> >> .........................swsusp: Need to copy 34850 pages Sep 23
> >> 16:53:37 voldemort kernel: hibernate: page allocation
> >> failure. order:8, mode:0x120 Sep 23 16:53:37 voldemort kernel:
> Pavel> Out of memory... Try again with less loaded system.
>
> The system was no more loaded than usual. I have 1GB memory and 4GB of
> swap defined. I almost never touch swap. It might have been 100mb into
> the 4Gb of swap when this happened.
>
> What would cause it to be out of memory?
> swsup needs to be reliable... rebooting when you are using your memory
> kinda defeats the purpose of swsusp.

The problem isn't really that you're out of memory. Rather, the memory
is so fragmented that swsusp is unable to get an order 8 allocation in
which to store its metadata. There isn't really anything you can do to
avoid this issue apart from eating memory (which swsusp is doing
anyway).

Regards,

Nigel

2004-09-25 01:06:06

by Pascal Schmidt

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

On Sat, 25 Sep 2004 01:50:08 +0200, you wrote in linux.kernel:

> The problem isn't really that you're out of memory. Rather, the memory
> is so fragmented that swsusp is unable to get an order 8 allocation in
> which to store its metadata. There isn't really anything you can do to
> avoid this issue apart from eating memory (which swsusp is doing
> anyway).

That's one megabyte, right? Can't we preallocate that on boot, while
there's still chance to get that much contiguous memory? If the
user has swsusp compiled into his kernel, he probably wants it to
function, so it's not really "wasted".

--
Ciao,
Pascal

2004-09-25 01:47:30

by Kevin Fenzi

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> "Nigel" == Nigel Cunningham <[email protected]> writes:

Nigel> Hi. On Sat, 2004-09-25 at 07:09, Kevin Fenzi wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>
>> >>>>> "Pavel" == Pavel Machek <[email protected]> writes:
>>
Pavel> Hi!
>> >> Was trying to swsusp my 2.6.9-rc2-mm1 laptop tonight. It churned
>> >> for a while, but didn't hibernate. Here are the messages.
>> >>
>> >>
>> ....................................................................................................
>> >> .........................swsusp: Need to copy 34850 pages Sep 23
>> >> 16:53:37 voldemort kernel: hibernate: page allocation >>
>> failure. order:8, mode:0x120 Sep 23 16:53:37 voldemort kernel:
Pavel> Out of memory... Try again with less loaded system.
>> The system was no more loaded than usual. I have 1GB memory and 4GB
>> of swap defined. I almost never touch swap. It might have been
>> 100mb into the 4Gb of swap when this happened.
>>
>> What would cause it to be out of memory? swsup needs to be
>> reliable... rebooting when you are using your memory kinda defeats
>> the purpose of swsusp.

Nigel> The problem isn't really that you're out of memory. Rather, the
Nigel> memory is so fragmented that swsusp is unable to get an order 8
Nigel> allocation in which to store its metadata. There isn't really
Nigel> anything you can do to avoid this issue apart from eating
Nigel> memory (which swsusp is doing anyway).

Odd. I have never run into this before with either swsusp2 or
swsusp1.

What causes memory to be so fragmented?
Nothing can be done to prevent it?

Nigel> Regards,
Nigel> Nigel

kevin


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBVM3K3imCezTjY0ERAh75AJ43eMOWlXc2HFQGhTBfgO9G+nI8tACgl7t9
bc6Lo+2guz9WRcu5FhInWMc=
=hunr
-----END PGP SIGNATURE-----

2004-09-25 10:15:43

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> >> Was trying to swsusp my 2.6.9-rc2-mm1 laptop tonight. It churned
> >> for a while, but didn't hibernate. Here are the messages.
> >>
> >> ....................................................................................................
> >> .........................swsusp: Need to copy 34850 pages Sep 23
> >> 16:53:37 voldemort kernel: hibernate: page allocation
> >> failure. order:8, mode:0x120 Sep 23 16:53:37 voldemort kernel:
> Pavel> Out of memory... Try again with less loaded system.
>
> The system was no more loaded than usual. I have 1GB memory and 4GB of
> swap defined. I almost never touch swap. It might have been 100mb into
> the 4Gb of swap when this happened.
>
> What would cause it to be out of memory?
> swsup needs to be reliable... rebooting when you are using your memory
> kinda defeats the purpose of swsusp.

Read FAQ.

> Felipe W Damasio <[email protected]> sent me a patch, but I
> haven't had a chance to try it yet:
>
> - --- linux-2.6.9-rc2-mm2/kernel/power/swsusp.c.orig 2004-09-23 23:46:49.292975768 -0300
> +++ linux-2.6.9-rc2-mm2/kernel/power/swsusp.c 2004-09-24 00:07:01.933626368 -0300
> @@ -657,6 +657,9 @@
> int diff = 0;
> int order = 0;
>
> + order = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages));
> + nr_copy_pages += 1 << order;
> +
> do {
> diff = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages)) - order;
> if (diff) {
>
>

That does not look like it could help. I do not see why this patch
should be good thing.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-09-25 10:17:08

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> > The problem isn't really that you're out of memory. Rather, the memory
> > is so fragmented that swsusp is unable to get an order 8 allocation in
> > which to store its metadata. There isn't really anything you can do to
> > avoid this issue apart from eating memory (which swsusp is doing
> > anyway).
>
> That's one megabyte, right? Can't we preallocate that on boot, while
> there's still chance to get that much contiguous memory? If the
> user has swsusp compiled into his kernel, he probably wants it to
> function, so it's not really "wasted".

You do not know how much you should preallocate, because it depends on
ammount of memory used. You could preallocate maximum possible
ammount...

OTOH this is first report of this failure. If it fails once in a blue
moon, it is probably better to let it fail than waste memory.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-09-25 11:51:56

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Sat, 2004-09-25 at 11:45, Kevin Fenzi wrote:
> Nigel> The problem isn't really that you're out of memory. Rather, the
> Nigel> memory is so fragmented that swsusp is unable to get an order 8
> Nigel> allocation in which to store its metadata. There isn't really
> Nigel> anything you can do to avoid this issue apart from eating
> Nigel> memory (which swsusp is doing anyway).
>
> Odd. I have never run into this before with either swsusp2 or
> swsusp1.

You won't run into it with suspend2 because it doesn't use high order
allocations. There might be one exception, but apart from that, all of
suspend2's data is stored in order zero allocated pages, so
fragmentation is not an issue. This is the real solution to the problem.
I had to do it this way because I aim to have suspend work without
eating any memory.

> What causes memory to be so fragmented?

Normal usage; the pattern of pages being freed and allocated inevitably
leads to fragmentation. The buddy allocator does a good job of
minimising it, but what is really needed is a run-time defragmenter. I
saw mention of this recently, but it's probably not that practical to
implement IMHO.

> Nothing can be done to prevent it?

Apart from the above, no, sorry.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. True tolerance, however, can cope with others
being intolerant.

2004-09-25 12:22:29

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Nigel Cunningham wrote:
> Hi.
>
> On Sat, 2004-09-25 at 11:45, Kevin Fenzi wrote:

>>What causes memory to be so fragmented?
>
>
> Normal usage; the pattern of pages being freed and allocated inevitably
> leads to fragmentation. The buddy allocator does a good job of
> minimising it, but what is really needed is a run-time defragmenter. I
> saw mention of this recently, but it's probably not that practical to
> implement IMHO.
>

Well, by this stage it looks like memory is already pretty well shrunk
as much as it is going to be, which means that even a pretty capable
defragmenter won't be able to do anything.

2004-09-25 12:56:30

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Sat, 2004-09-25 at 22:22, Nick Piggin wrote:
> Nigel Cunningham wrote:
> > Hi.
> >
> > On Sat, 2004-09-25 at 11:45, Kevin Fenzi wrote:
>
> >>What causes memory to be so fragmented?
> >
> >
> > Normal usage; the pattern of pages being freed and allocated inevitably
> > leads to fragmentation. The buddy allocator does a good job of
> > minimising it, but what is really needed is a run-time defragmenter. I
> > saw mention of this recently, but it's probably not that practical to
> > implement IMHO.
> >
>
> Well, by this stage it looks like memory is already pretty well shrunk
> as much as it is going to be, which means that even a pretty capable
> defragmenter won't be able to do anything.

Surely it would be able to rearrange pages to get a contiguous megabyte?
Regardless, not using order 8 allocations seems to me to be a better
solution (but then I have a patch to push once I finish my current round
of cleanups :>).

Nigel

2004-09-25 13:01:23

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Nigel Cunningham wrote:
>> Normal usage; the pattern of pages being freed and allocated inevitably
>> leads to fragmentation. The buddy allocator does a good job of
>> minimising it, but what is really needed is a run-time defragmenter. I
>> saw mention of this recently, but it's probably not that practical to
>> implement IMHO.
>
On Sat, Sep 25, 2004 at 10:22:22PM +1000, Nick Piggin wrote:
> Well, by this stage it looks like memory is already pretty well shrunk
> as much as it is going to be, which means that even a pretty capable
> defragmenter won't be able to do anything.

For however useful defragmentation may be to make speculative use of
physically or virtually contiguous memory more probable to succeed, it
can never be made deterministic or even reliable, not even in pageable
kernels (which Linux is not). Fallback to allocations no larger than
the kernel's internal allocation unit, potentially in tandem with
scatter/gather capabilities, is essential.


-- wli

2004-09-25 13:14:02

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Sat, 2004-09-25 at 22:22, Nick Piggin wrote:
> Nigel Cunningham wrote:
> > Hi.
> >
> > On Sat, 2004-09-25 at 11:45, Kevin Fenzi wrote:
>
> >>What causes memory to be so fragmented?
> >
> >
> > Normal usage; the pattern of pages being freed and allocated inevitably
> > leads to fragmentation. The buddy allocator does a good job of
> > minimising it, but what is really needed is a run-time defragmenter. I
> > saw mention of this recently, but it's probably not that practical to
> > implement IMHO.
> >
>
> Well, by this stage it looks like memory is already pretty well shrunk
> as much as it is going to be, which means that even a pretty capable
> defragmenter won't be able to do anything.

Surely it would be able to rearrange pages to get a contiguous megabyte?
Regardless, not using order 8 allocations seems to me to be a better
solution (but then I have a barrow^H^H^H^H^H^Hpatch to push once I finish my current round
of cleanups :>).

Nigel

2004-09-25 13:21:05

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Sat, 2004-09-25 at 22:56, William Lee Irwin III wrote:
> Nigel Cunningham wrote:
> >> Normal usage; the pattern of pages being freed and allocated inevitably
> >> leads to fragmentation. The buddy allocator does a good job of
> >> minimising it, but what is really needed is a run-time defragmenter. I
> >> saw mention of this recently, but it's probably not that practical to
> >> implement IMHO.
> >
> On Sat, Sep 25, 2004 at 10:22:22PM +1000, Nick Piggin wrote:
> > Well, by this stage it looks like memory is already pretty well shrunk
> > as much as it is going to be, which means that even a pretty capable
> > defragmenter won't be able to do anything.
>
> For however useful defragmentation may be to make speculative use of
> physically or virtually contiguous memory more probable to succeed, it
> can never be made deterministic or even reliable, not even in pageable
> kernels (which Linux is not). Fallback to allocations no larger than
> the kernel's internal allocation unit, potentially in tandem with
> scatter/gather capabilities, is essential.

I fully agree. That's why I do it :>

Regards,

Nigel

2004-09-25 13:38:28

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Nigel Cunningham wrote:
> Hi.
>
> On Sat, 2004-09-25 at 22:22, Nick Piggin wrote:
>

>>
>>Well, by this stage it looks like memory is already pretty well shrunk
>>as much as it is going to be, which means that even a pretty capable
>>defragmenter won't be able to do anything.
>
>
> Surely it would be able to rearrange pages to get a contiguous megabyte?

For lots of stuff it is just infeasable. Just about all kernel memory,
for example.

But yeah, regardless, really the best thing is not to use such large
allocations at all.

2004-09-25 15:45:49

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> >>What causes memory to be so fragmented?
> >
> >
> >Normal usage; the pattern of pages being freed and allocated inevitably
> >leads to fragmentation. The buddy allocator does a good job of
> >minimising it, but what is really needed is a run-time defragmenter. I
> >saw mention of this recently, but it's probably not that practical to
> >implement IMHO.
>
> Well, by this stage it looks like memory is already pretty well shrunk
> as much as it is going to be, which means that even a pretty capable
> defragmenter won't be able to do anything.

True, defragmenter would not help.

Anyway, conversion from order-8 allocation should be pretty easy, but
I never seen that failure case and this is first report... So I'm not
doing that work just yet. [There's big chunk of changes waiting in
-mm, that needs to be merged because any other work should be done.]

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-09-25 22:01:43

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Sun, 2004-09-26 at 01:45, Pavel Machek wrote:
> Hi!
>
> > >>What causes memory to be so fragmented?
> > >
> > >
> > >Normal usage; the pattern of pages being freed and allocated inevitably
> > >leads to fragmentation. The buddy allocator does a good job of
> > >minimising it, but what is really needed is a run-time defragmenter. I
> > >saw mention of this recently, but it's probably not that practical to
> > >implement IMHO.
> >
> > Well, by this stage it looks like memory is already pretty well shrunk
> > as much as it is going to be, which means that even a pretty capable
> > defragmenter won't be able to do anything.
>
> True, defragmenter would not help.
>
> Anyway, conversion from order-8 allocation should be pretty easy, but
> I never seen that failure case and this is first report... So I'm not
> doing that work just yet. [There's big chunk of changes waiting in
> -mm, that needs to be merged because any other work should be done.]

Are we still planning on having suspend2 replace swsusp eventually? It
was a lot of work to switch from those high order allocations, and if we
are still going to replace swsusp, perhaps it's would be a better use of
your time to do other things?

Regards,

Nigel

2004-09-26 10:07:38

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> > True, defragmenter would not help.
> >
> > Anyway, conversion from order-8 allocation should be pretty easy, but
> > I never seen that failure case and this is first report... So I'm not
> > doing that work just yet. [There's big chunk of changes waiting in
> > -mm, that needs to be merged because any other work should be done.]
>
> Are we still planning on having suspend2 replace swsusp eventually? It
> was a lot of work to switch from those high order allocations, and if we
> are still going to replace swsusp, perhaps it's would be a better use of
> your time to do other things?

I do not know if I'm more scared of swsusp1 to kill order-8
allocations or if suspend2's two page sets scare me more. (Hooks
suspend2 needs to stop all page cache activity are scary...)

I certainly want some parts of suspend2 (like improved freezer, if it
can be made small enough), but I'm no longer sure I want all of it. I
expected many people complaining about highmem problems in swsusp1,
and that just did not happen; SMP support turned out to be reasonably
simple...
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-09-26 17:58:58

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

On Sat, Sep 25, 2004 at 09:53:55PM +1000, Nigel Cunningham wrote:
> Hi.
>
> On Sat, 2004-09-25 at 11:45, Kevin Fenzi wrote:
> > Nigel> The problem isn't really that you're out of memory. Rather, the
> > Nigel> memory is so fragmented that swsusp is unable to get an order 8
> > Nigel> allocation in which to store its metadata. There isn't really
> > Nigel> anything you can do to avoid this issue apart from eating
> > Nigel> memory (which swsusp is doing anyway).
> >
> > Odd. I have never run into this before with either swsusp2 or
> > swsusp1.
>
> You won't run into it with suspend2 because it doesn't use high order
> allocations. There might be one exception, but apart from that, all of
> suspend2's data is stored in order zero allocated pages, so
> fragmentation is not an issue. This is the real solution to the problem.
> I had to do it this way because I aim to have suspend work without
> eating any memory.
>
> > What causes memory to be so fragmented?
>
> Normal usage; the pattern of pages being freed and allocated inevitably
> leads to fragmentation. The buddy allocator does a good job of
> minimising it, but what is really needed is a run-time defragmenter. I
> saw mention of this recently, but it's probably not that practical to
> implement IMHO.

I think it is possible to have a defragmenter: allocate new page,
invalidate mapped pte's, invalidate radix tree entry (and block radix lookups),`
copy data from oldpage to newpage, remap pte's, insert radix tree
entry, free oldpage.

The memory hotplug patches do it - I'm trying to implement a similar version
to free physically nearby pages and form high order pages.

2004-09-26 18:39:34

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> > > What causes memory to be so fragmented?
> >
> > Normal usage; the pattern of pages being freed and allocated inevitably
> > leads to fragmentation. The buddy allocator does a good job of
> > minimising it, but what is really needed is a run-time defragmenter. I
> > saw mention of this recently, but it's probably not that practical to
> > implement IMHO.
>
> I think it is possible to have a defragmenter: allocate new page,
> invalidate mapped pte's, invalidate radix tree entry (and block radix lookups),`
> copy data from oldpage to newpage, remap pte's, insert radix tree
> entry, free oldpage.
>
> The memory hotplug patches do it - I'm trying to implement a similar version
> to free physically nearby pages and form high order pages.

Well, swsusp is kind of special case. If it is possible to swap that
page out or discard, it is swapped out/discarded already. What remains
are things like kmalloc(), and you can't move them...

Anyway solution for swsusp is to avoid using such big pages, it is
less complex than doing defragmenter.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-09-26 21:59:51

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Sun, 2004-09-26 at 20:04, Pavel Machek wrote:
> > Are we still planning on having suspend2 replace swsusp eventually? It
> > was a lot of work to switch from those high order allocations, and if we
> > are still going to replace swsusp, perhaps it's would be a better use of
> > your time to do other things?
>
> I do not know if I'm more scared of swsusp1 to kill order-8
> allocations or if suspend2's two page sets scare me more. (Hooks
> suspend2 needs to stop all page cache activity are scary...)

Hooks to stop all page cache activity? I'm not sure what you mean.

> I certainly want some parts of suspend2 (like improved freezer, if it
> can be made small enough), but I'm no longer sure I want all of it. I
> expected many people complaining about highmem problems in swsusp1,
> and that just did not happen; SMP support turned out to be reasonably
> simple...

Okay. There are other advantages too, of course :>

Nigel

2004-09-26 22:44:00

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> > > Are we still planning on having suspend2 replace swsusp eventually? It
> > > was a lot of work to switch from those high order allocations, and if we
> > > are still going to replace swsusp, perhaps it's would be a better use of
> > > your time to do other things?
> >
> > I do not know if I'm more scared of swsusp1 to kill order-8
> > allocations or if suspend2's two page sets scare me more. (Hooks
> > suspend2 needs to stop all page cache activity are scary...)
>
> Hooks to stop all page cache activity? I'm not sure what you mean.

You have system where you write image in two parts, and there are some
pretty special rules what you may not touch when writing first part,
IIRC. That is what scares me...
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-09-27 10:17:42

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Mon, 2004-09-27 at 08:43, Pavel Machek wrote:
> You have system where you write image in two parts, and there are some
> pretty special rules what you may not touch when writing first part,
> IIRC. That is what scares me...

No special rules. It's just the LRU that shouldn't change, and it won't
because all other activity is stopped and I'm using direct bio submits
to do the reading and writing. I really should get around to finishing
that 'how-it-works' document so I can clear up all the FUD. :>

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. True tolerance, however, can cope with others
being intolerant.

2004-10-10 18:17:08

by Jan Rychter

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

>>>>> "Pavel" == Pavel Machek <[email protected]> writes:
Pavel> Hi!
> The problem isn't really that you're out of memory. Rather, the
> memory is so fragmented that swsusp is unable to get an order 8
> allocation in which to store its metadata. There isn't really
> anything you can do to avoid this issue apart from eating memory
> (which swsusp is doing anyway).
>>
>> That's one megabyte, right? Can't we preallocate that on boot, while
>> there's still chance to get that much contiguous memory? If the user
>> has swsusp compiled into his kernel, he probably wants it to
>> function, so it's not really "wasted".

Pavel> You do not know how much you should preallocate, because it
Pavel> depends on ammount of memory used. You could preallocate maximum
Pavel> possible ammount...

Pavel> OTOH this is first report of this failure. If it fails once in a
Pavel> blue moon, it is probably better to let it fail than waste
Pavel> memory.

This is *exactly* why I choose to use swsusp2. There is a marked
difference in the maintainer's approach to these kinds of problems.

The net result is that swsusp2 has worked for me very well for many
months now: I have been suspending and resuming happily for several
months, with exactly 0 swsusp-caused crashes or failures.

BTW, on a related note, I believe there is too much acceptance for
crashes and failures in the Linux world recently. Take an example: I can
bring down any of my machines (kernels 2.4 or 2.6) in less than 10
minutes just by plugging in and unplugging USB devices. There is
something fundamentally wrong with the USB subsystem if it is possible
to do that.

--J.

2004-10-11 13:32:36

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> Pavel> You do not know how much you should preallocate, because it
> Pavel> depends on ammount of memory used. You could preallocate maximum
> Pavel> possible ammount...
>
> Pavel> OTOH this is first report of this failure. If it fails once in a
> Pavel> blue moon, it is probably better to let it fail than waste
> Pavel> memory.
>
> This is *exactly* why I choose to use swsusp2. There is a marked
> difference in the maintainer's approach to these kinds of problems.

Okay, and do you have something to say or do you want to start
flamewar? That is also why swsusp2 is 10 times code size of swsusp...

Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2004-10-11 13:53:41

by Stefan Seyfried

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi,

Pavel Machek wrote:

> OTOH this is first report of this failure. If it fails once in a blue
> moon, it is probably better to let it fail than waste memory.

PM: Attempting to suspend to disk.
PM: snapshotting memory.
swsusp: critical section:
swsusp: Saving Highmem
[nosave pfn 0x3be]<7>[nosave pfn 0x3bf]swsusp: Need to copy 30519 pages
suspend: (pages needed: 30519 + 512 free: 100469)
do_acpi_sleep: page allocation failure. order:7, mode:0x120
[<c013a628>] __alloc_pages+0x3a8/0x3b0
[<c013a648>] __get_free_pages+0x18/0x30
[<c0132c37>] alloc_pagedir+0x17/0x60
[<c0132ddb>] swsusp_alloc+0x4b/0xa0
[<c0132e63>] suspend_prepare_image+0x33/0x80
[<c028beda>] swsusp_arch_suspend+0x2a/0x30
[<c0132f1b>] swsusp_suspend+0x2b/0x40
[<c01332ad>] pm_suspend_disk+0x3d/0xb0
[<c0131765>] enter_state+0x85/0x90
[<c01318b1>] state_store+0xc1/0xc3
[<c01317f0>] state_store+0x0/0xc3
[<c01852e6>] subsys_attr_store+0x26/0x30
[<c018548d>] flush_write_buffer+0x1d/0x30
[<c01854c9>] sysfs_write_file+0x29/0x40
[<c01854a0>] sysfs_write_file+0x0/0x40
[<c0150c7f>] vfs_write+0x9f/0x100
[<c0150d8c>] sys_write+0x3c/0x70
[<c0105c69>] sysenter_past_esp+0x52/0x79
suspend: Allocating pagedir failed.
swsusp: Restoring Highmem

this happened right now, after running fine over the weekend and doing a
successful suspend/resume cycle this morning.
It was a "battery critical" suspend, so this is not nice :-( I had about
2 minutes left until hard powerdown during which i tried to get it to
suspend but failed. Yes, userspace should handle the "failed
battery-critical suspend" case better and probably call "shutdown -h now".

Stefan

2004-10-11 14:57:27

by Jan Rychter

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

>>>>> "Pavel" == Pavel Machek <[email protected]> writes:
Pavel> Hi! You do not know how much you should preallocate, because it
Pavel> depends on ammount of memory used. You could preallocate maximum
Pavel> possible ammount...
>>
Pavel> OTOH this is first report of this failure. If it fails once in a
Pavel> blue moon, it is probably better to let it fail than waste
Pavel> memory.
>>
>> This is *exactly* why I choose to use swsusp2. There is a marked
>> difference in the maintainer's approach to these kinds of problems.

Pavel> Okay, and do you have something to say or do you want to start
Pavel> flamewar? That is also why swsusp2 is 10 times code size of
Pavel> swsusp...

Sure, flame me if you think this is the right thing to do. But I will
continue to pitch in with a users' opinion sometimes, because I really
believe it is important.

It is easy to lose sight of the user perspective on these things if all
you deal with is kernel development. You probably reboot your machine
dozens of times a day anyway. However, for some users crashes and
reboots are *very* expensive. These people (myself included) consider
sprinkling the code with panics, crashing and failing an unacceptable
thing to do.

I also believe your reply shows how important it is for me to actually
write things like these from time to time (even risking getting
flamed). As a user I don't care whatsoever what the code size
is. Actually, I don't care that much about its performance, either. What
I do care about is that my operating system doesn't crash from under me,
doesn't lose my data, and doesn't fail on me with suspending when I
really need it to suspend now. Give me a userspace USB implementation
that works 10x slower and is 10x larger but doesn't crash my machine and
I'll take it any day.

--J.

2004-10-11 15:04:55

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> > OTOH this is first report of this failure. If it fails once in a blue
> > moon, it is probably better to let it fail than waste memory.
>
> PM: Attempting to suspend to disk.
> PM: snapshotting memory.
> swsusp: critical section:
> swsusp: Saving Highmem
> [nosave pfn 0x3be]<7>[nosave pfn 0x3bf]swsusp: Need to copy 30519 pages
> suspend: (pages needed: 30519 + 512 free: 100469)
> do_acpi_sleep: page allocation failure. order:7, mode:0x120
> [<c013a628>] __alloc_pages+0x3a8/0x3b0
> [<c013a648>] __get_free_pages+0x18/0x30
> [<c0132c37>] alloc_pagedir+0x17/0x60
> [<c0132ddb>] swsusp_alloc+0x4b/0xa0
> [<c0132e63>] suspend_prepare_image+0x33/0x80
> [<c028beda>] swsusp_arch_suspend+0x2a/0x30
> [<c0132f1b>] swsusp_suspend+0x2b/0x40
> [<c01332ad>] pm_suspend_disk+0x3d/0xb0
> [<c0131765>] enter_state+0x85/0x90
> [<c01318b1>] state_store+0xc1/0xc3
> [<c01317f0>] state_store+0x0/0xc3
> [<c01852e6>] subsys_attr_store+0x26/0x30
> [<c018548d>] flush_write_buffer+0x1d/0x30
> [<c01854c9>] sysfs_write_file+0x29/0x40
> [<c01854a0>] sysfs_write_file+0x0/0x40
> [<c0150c7f>] vfs_write+0x9f/0x100
> [<c0150d8c>] sys_write+0x3c/0x70
> [<c0105c69>] sysenter_past_esp+0x52/0x79
> suspend: Allocating pagedir failed.
> swsusp: Restoring Highmem
>
> this happened right now, after running fine over the weekend and doing a
> successful suspend/resume cycle this morning.
> It was a "battery critical" suspend, so this is not nice :-( I had about
> 2 minutes left until hard powerdown during which i tried to get it to
> suspend but failed. Yes, userspace should handle the "failed
> battery-critical suspend" case better and probably call "shutdown -h now".

Ok... And I guess it is nearly impossible to trigger this on demand,
right?

I do not think I can use vmalloc easily because reallocate_pagedir
depends on it being contiguous.. Switching to link list is "just a
simple matter of coding", but it is going to be quite a lot of
changes.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-10-11 17:26:17

by Stefan Seyfried

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi,

Pavel Machek wrote:

> Ok... And I guess it is nearly impossible to trigger this on demand,
> right?

Of course. I just wanted to say "yes, it does happen". I did not say
fixing it would be easy ;-)

Stefan

2004-10-11 19:57:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

On Monday 11 of October 2004 19:18, Stefan Seyfried wrote:
> Hi,
>
> Pavel Machek wrote:
>
> > Ok... And I guess it is nearly impossible to trigger this on demand,
> > right?

I think it is possible. Seemingly, on my box it's only a question of the
number of apps started. I think I can work out a method to trigger it 90% of
the time or so. Please let me know if it's worthy of doing.

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-10-12 09:02:44

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> > > Ok... And I guess it is nearly impossible to trigger this on demand,
> > > right?
>
> I think it is possible. Seemingly, on my box it's only a question of the
> number of apps started. I think I can work out a method to trigger it 90% of
> the time or so. Please let me know if it's worthy of doing.

Yes, it would certainly help with testing...
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-10-13 17:27:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

On Tuesday 12 of October 2004 10:55, Pavel Machek wrote:
> Hi!
>
> > > > Ok... And I guess it is nearly impossible to trigger this on demand,
> > > > right?
> >
> > I think it is possible. Seemingly, on my box it's only a question of the
> > number of apps started. I think I can work out a method to trigger it
> > 90% of the time or so. Please let me know if it's worthy of doing.
>
> Yes, it would certainly help with testing...

So far, the most reliable method seems to be to use the box for a day after a
successful suspend/resume cycle (I've got an 8-order allocation failure 3
times out of 3 attempts). Still, I'm working on something that's less
time-consuming. ;-)

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-10-14 21:46:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: swsusp: 8-order allocation failure on demand (was: Re: 2.6.9-rc2-mm1 swsusp bug report.)

On Wednesday 13 of October 2004 19:29, Rafael J. Wysocki wrote:
> On Tuesday 12 of October 2004 10:55, Pavel Machek wrote:
> > Hi!
> >
> > > > > Ok... And I guess it is nearly impossible to trigger this on demand,
> > > > > right?
> > >
> > > I think it is possible. Seemingly, on my box it's only a question of
the
> > > number of apps started. I think I can work out a method to trigger it
> > > 90% of the time or so. Please let me know if it's worthy of doing.
> >
> > Yes, it would certainly help with testing...

Well, I can do that, it seems, 100% of the time.

The method is to do "init 5" (my default runlevel is 3, because vts become
unreadable after I start X), log into KDE (as a non-root), start some X apps
at random (eg. I run gkrellm, kmail, konqueror, Mozilla FireFox 32-bit w/
Flash plugin, and konsole with "su -") and run updatedb (as root, of course).

Apparently, running updatedb is essential. After it finishes, on my box, you
can forget of suspending to disk from under the X+KDE combo, even if the X
apps (ie. kmail, konqueror, FireFox) are stopped before. However, if
updatedb is not run, the box usually suspends successfully.

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-10-14 22:14:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: swsusp: 8-order allocation failure on demand (update)

On Thursday 14 of October 2004 23:47, Rafael J. Wysocki wrote:
> On Wednesday 13 of October 2004 19:29, Rafael J. Wysocki wrote:
> > On Tuesday 12 of October 2004 10:55, Pavel Machek wrote:
> > > Hi!
> > >
> > > > > > Ok... And I guess it is nearly impossible to trigger this on
demand,
> > > > > > right?
> > > >
> > > > I think it is possible. Seemingly, on my box it's only a question of
> the
> > > > number of apps started. I think I can work out a method to trigger it
> > > > 90% of the time or so. Please let me know if it's worthy of doing.
> > >
> > > Yes, it would certainly help with testing...
>
> Well, I can do that, it seems, 100% of the time.
>
> The method is to do "init 5" (my default runlevel is 3, because vts become
> unreadable after I start X), log into KDE (as a non-root), start some X apps
> at random (eg. I run gkrellm, kmail, konqueror, Mozilla FireFox 32-bit w/
> Flash plugin, and konsole with "su -") and run updatedb (as root, of
> course).

To be precise, the method always leads to a failure, but it seems to be either
8-order or 9-order page allocation failure.

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-10-16 16:43:50

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp: 8-order allocation failure on demand (update)

Hi!

> > > > > > > Ok... And I guess it is nearly impossible to trigger this on
> demand,
> > > > > > > right?
> > > > >
> > > > > I think it is possible. Seemingly, on my box it's only a question of
> > the
> > > > > number of apps started. I think I can work out a method to trigger it
> > > > > 90% of the time or so. Please let me know if it's worthy of doing.
> > > >
> > > > Yes, it would certainly help with testing...
> >
> > Well, I can do that, it seems, 100% of the time.
> >
> > The method is to do "init 5" (my default runlevel is 3, because vts become
> > unreadable after I start X), log into KDE (as a non-root), start some X apps
> > at random (eg. I run gkrellm, kmail, konqueror, Mozilla FireFox 32-bit w/
> > Flash plugin, and konsole with "su -") and run updatedb (as root, of
> > course).
>
> To be precise, the method always leads to a failure, but it seems to be either
> 8-order or 9-order page allocation failure.

Okay, you could probably pre-allocate 512K block during bootup, then
just use that instead of allocating new one during suspend.

Unfortunately that's rather ugly. You'd ~32 bytes per 4K page, that's
almost 1% overhead, not nice. Better solution (but more work) is to
switch to link-lists or integrate swsusp2.
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2004-10-16 19:32:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: swsusp: 8-order allocation failure on demand (update)

On Saturday 16 of October 2004 18:43, Pavel Machek wrote:
> Hi!
>
> > > > > > > > Ok... And I guess it is nearly impossible to trigger this on
> > > > > > > > demand, right?
> > > > > >
> > > > > > I think it is possible. Seemingly, on my box it's only a question
> > > > > > of the number of apps started. I think I can work out a method
> > > > > > to trigger it 90% of the time or so. Please let me know if it's
> > > > > > worthy of doing.
> > > > >
> > > > > Yes, it would certainly help with testing...
> > >
> > > Well, I can do that, it seems, 100% of the time.
> > >
> > > The method is to do "init 5" (my default runlevel is 3, because vts
> > > become unreadable after I start X), log into KDE (as a non-root),
> > > start some X apps at random (eg. I run gkrellm, kmail, konqueror,
> > > Mozilla FireFox 32-bit w/ Flash plugin, and konsole with "su -") and
> > > run updatedb (as root, of course).
> >
> > To be precise, the method always leads to a failure, but it seems to
> > be either 8-order or 9-order page allocation failure.
>
> Okay, you could probably pre-allocate 512K block during bootup, then
> just use that instead of allocating new one during suspend.
>
> Unfortunately that's rather ugly. You'd ~32 bytes per 4K page, that's
> almost 1% overhead, not nice. Better solution (but more work) is to
> switch to link-lists or integrate swsusp2.

Well, I wonder if the page allocation failures are a swsusp problem, really.
I've just tried it the other way around and ran updatedb _first_, then
started X+KDE (no additional apps) and tried to suspend from under it. Guess
what: a 9-order page allocation failure, here you go.

It seems to me that updatedb leaves a mess in memory, which IMO should not
happen or at least the kernel should be able to clean it, but apparently it
is not. I'd be grateful if someone could explain to me why that is so,
please.

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-10-16 20:40:47

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp: 8-order allocation failure on demand (update)

Hi!

> > Unfortunately that's rather ugly. You'd ~32 bytes per 4K page, that's
> > almost 1% overhead, not nice. Better solution (but more work) is to
> > switch to link-lists or integrate swsusp2.
>
> Well, I wonder if the page allocation failures are a swsusp problem,
> really.

Yes, they are. Kernel memory allocation is not design to do 8-order
allocations properly. swsusp really should not use them.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-10-16 21:06:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: swsusp: 8-order allocation failure on demand (update)

On Saturday 16 of October 2004 22:40, Pavel Machek wrote:
> Hi!
>
> > > Unfortunately that's rather ugly. You'd ~32 bytes per 4K page, that's
> > > almost 1% overhead, not nice. Better solution (but more work) is to
> > > switch to link-lists or integrate swsusp2.
> >
> > Well, I wonder if the page allocation failures are a swsusp problem,
> > really.
>
> Yes, they are. Kernel memory allocation is not design to do 8-order
> allocations properly. swsusp really should not use them.

Now that's clear, thanks. Could you tell me, please, what I need to know to
understand the swsusp code and what I should start with?

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-10-16 21:25:24

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp: 8-order allocation failure on demand (update)

Hi!

> > > > Unfortunately that's rather ugly. You'd ~32 bytes per 4K page, that's
> > > > almost 1% overhead, not nice. Better solution (but more work) is to
> > > > switch to link-lists or integrate swsusp2.
> > >
> > > Well, I wonder if the page allocation failures are a swsusp problem,
> > > really.
> >
> > Yes, they are. Kernel memory allocation is not design to do 8-order
> > allocations properly. swsusp really should not use them.
>
> Now that's clear, thanks. Could you tell me, please, what I need to know to
> understand the swsusp code and what I should start with?

On bootup, prealloc, say, order-9 alocation.

In alloc_pagedir(), do not allocate anything, but check that data fit
in preallocated area, and fail if not.

There's free_pages( pagedir_save, ...) somewhere, remove that so
pagedir is never freed and you can resume multiple times.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-10-17 19:14:19

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi!

> Sure, flame me if you think this is the right thing to do. But I will
> continue to pitch in with a users' opinion sometimes, because I really
> believe it is important.
>
> It is easy to lose sight of the user perspective on these things if all
> you deal with is kernel development. You probably reboot your machine
> dozens of times a day anyway. However, for some users crashes and
> reboots are *very* expensive. These people (myself included) consider
> sprinkling the code with panics, crashing and failing an unacceptable
> thing to do.

You can have code that does not panic, does not crash, does not
corrupt your data, never fails to suspend and is in Linus' tree.

...no, that is too good. It sounds like a fairy tale.

So pick any four.
Pavel

PS: And it is real. We have conflicting goals here and I consider
"refuses to suspend" least critical.
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-10-17 21:43:34

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.9-rc2-mm1 swsusp bug report.

Hi.

On Mon, 2004-10-18 at 05:10, Pavel Machek wrote:
> You can have code that does not panic, does not crash, does not
> corrupt your data, never fails to suspend and is in Linus' tree.
>
> ...no, that is too good. It sounds like a fairy tale.
>
> So pick any four.
> Pavel
>
> PS: And it is real. We have conflicting goals here and I consider
> "refuses to suspend" least critical.

I'm going for all five! You're probably right, nevertheless. I can't say
suspend2 _never_ fails to suspend. It's just very rare.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. True tolerance, however, can cope with others
being intolerant.