2010-08-26 10:37:14

by M. Vefa Bicakci

[permalink] [raw]
Subject: Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernationto hang

On 26/08/10 11:09 AM, KOSAKI Motohiro wrote:
>> If there is anything I can do to debug this problem please let me
>> know.
>
> Vefa, your above log is very useful. really thanks.
> So, I'm convinced two things.
> - my patch is purely unrelated.
> - how to fix your issue.
>
> Can you please try attached patch? I bet this can solve your issue.
>
> Thanks for your patience.

Hello!

First of all, thanks a lot for your help - I really appreciate it.

I applied your new patches on top of your old patches. Hopefully that
was okay.

Unfortunately, it didn't work this time. Here's a sample output from the
new patch.

=== 8< ===
[58.050208] PM: Preallocating image memory...
[58.159881] shrink_all_memory start
[58.232411] PM: shrink memory: pass=1, req:312373 reclaimed:15864 free:358420
[58.342041] PM: shrink memory: pass=2, req:296509 reclaimed:21837 free:362167
[60.690035] PM: shrink memory: pass=3, req:274672 reclaimed:25982 free:348006
[61.754931] PM: shrink memory: pass=4, req:248690 reclaimed:49623 free:371589
[64.361714] PM: shrink memory: pass=5, req:199067 reclaimed:74683 free:396695
[64.361769] shrink_all_memory: req:124384 reclaimed:74683 free:396695
=== >8 ===

The interesting thing is that even though there is a lot of free memory at the
end, it still hangs. I also included the timestamps; note the one and two second
delays between the passes.

Please let me know if there is anything I can do.

Regards,

M. Vefa Bicakci


2010-08-30 02:28:11

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernationto hang

> On 26/08/10 11:09 AM, KOSAKI Motohiro wrote:
> >> If there is anything I can do to debug this problem please let me
> >> know.
> >
> > Vefa, your above log is very useful. really thanks.
> > So, I'm convinced two things.
> > - my patch is purely unrelated.
> > - how to fix your issue.
> >
> > Can you please try attached patch? I bet this can solve your issue.
> >
> > Thanks for your patience.
>
> Hello!
>
> First of all, thanks a lot for your help - I really appreciate it.
>
> I applied your new patches on top of your old patches. Hopefully that
> was okay.
>
> Unfortunately, it didn't work this time. Here's a sample output from the
> new patch.
>
> === 8< ===
> [58.050208] PM: Preallocating image memory...
> [58.159881] shrink_all_memory start
> [58.232411] PM: shrink memory: pass=1, req:312373 reclaimed:15864 free:358420
> [58.342041] PM: shrink memory: pass=2, req:296509 reclaimed:21837 free:362167
> [60.690035] PM: shrink memory: pass=3, req:274672 reclaimed:25982 free:348006
> [61.754931] PM: shrink memory: pass=4, req:248690 reclaimed:49623 free:371589
> [64.361714] PM: shrink memory: pass=5, req:199067 reclaimed:74683 free:396695
> [64.361769] shrink_all_memory: req:124384 reclaimed:74683 free:396695
> === >8 ===
>
> The interesting thing is that even though there is a lot of free memory at the
> end, it still hangs.

Grr. I'm surprised this result ;-)
shrink_all_memory() finish to shrink memory successfully. but your
system still hang immediately after. I have no idea why this mysterious
occur.

I prepared next debugging patch. It added prenty debug printk. I hope
it enlighten up which path makes system hang-up.

1. apply my new patch

2. Enable following PM debug option in Kconfig

[*] Power Management support
[*] Power Management Debug Support
[*] Extra PM attributes in sysfs for low-level debugging/testing
[*] Verbose Power Management debugging

3. append following kernel boot option into grub configration file

no_console_suspend=1

3. kernel build and reboot
4. some prepare
# echo 8 > /proc/sysrq-trigger
# cd /sys/power
# echo 1 > pm_trace
# echo 0 > pm_async

5. run your test program


> I also included the timestamps; note the one and two second
> delays between the passes.

This is expected result because tmpfs shrink need swap-out. then
we need i/o time.

>
> Please let me know if there is anything I can do.

Please send me your .config and full dmesg.


Thanks many and many help us!


Attachments:
0001-debug-messages.patch (2.77 kB)

2010-08-30 16:55:32

by M. Vefa Bicakci

[permalink] [raw]
Subject: Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernation to hang

On 30/08/10 05:28 AM, KOSAKI Motohiro wrote:
>> [snip]
>>
>> The interesting thing is that even though there is a lot of free memory at the
>> end, it still hangs.
>
> Grr. I'm surprised this result ;-)
> shrink_all_memory() finish to shrink memory successfully. but your
> system still hang immediately after. I have no idea why this mysterious
> occur.
>
> I prepared next debugging patch. It added prenty debug printk. I hope
> it enlighten up which path makes system hang-up.
>
> 1. apply my new patch
>
> 2. Enable following PM debug option in Kconfig
>
> [*] Power Management support
> [*] Power Management Debug Support
> [*] Extra PM attributes in sysfs for low-level debugging/testing
> [*] Verbose Power Management debugging
>
> 3. append following kernel boot option into grub configration file
>
> no_console_suspend=1
>
> 3. kernel build and reboot
> 4. some prepare
> # echo 8 > /proc/sysrq-trigger
> # cd /sys/power
> # echo 1 > pm_trace
> # echo 0 > pm_async
>
> 5. run your test program
>
>
>> I also included the timestamps; note the one and two second
>> delays between the passes.
>
> This is expected result because tmpfs shrink need swap-out. then
> we need i/o time.
>
>>
>> Please let me know if there is anything I can do.
>
> Please send me your .config and full dmesg.
>
>
> Thanks many and many help us!

Hello,

I have followed your instructions, with one exception: I have also
enabled CONFIG_PM_TRACE so that I would have /sys/power/pm_trace.

This time I had some more output, as expected. I double checked what
I typed while looking at the screen-shot I took with my camera. Here's
the output:

=== 8< ===
PM: Marking nosave pages: ...0009f000 - ...000100000
PM: basic memory bitmaps created
PM: Syncing filesystems ... done
Freezing user space processes ... (elapsed 0.01 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
PM: Preallocating image memory...
shrink_all_memory start
PM: shrink memory: pass=1, req:310171 reclaimed:15492 free:360936
PM: shrink memory: pass=2, req:294679 reclaimed:28864 free:373981
PM: shrink memory: pass=3, req:265815 reclaimed:60311 free:405374
PM: shrink memory: pass=4, req:205504 reclaimed:97870 free:443024
PM: shrink memory: pass=5, req:107634 reclaimed:146948 free:492141
shrink_all_memory: req:107634 reclaimed:146948 free:492141
PM: preallocate_image_highmem 556658 278329
PM: preallocate_image_memory 103139 103139
PM: preallocate_highmem_fraction 183908 556658 760831 -> 183908
=== >8 ===

According to your patch, the next output should have been
"preallocate_image_memory ...", but it never gets printed, so the
hang point should be that function.

I am attaching my dmesg output which I got after the failed hibernation
attempt and my .config file. Please note that the attached .config file
is a trimmed version of the .config I usually use on my computer. I trimmed
it so that it compiles faster, but (mostly) has support for devices I might
use.

Thanks a lot for your help, and please let me know if I can do anything else.

Regards,

M. Vefa Bicakci


Attachments:
config.txt (78.32 kB)
dmesg.txt (80.79 kB)
Download all attachments

2010-08-31 06:35:38

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernation to hang

> On 30/08/10 05:28 AM, KOSAKI Motohiro wrote:
> >> [snip]
> >>
> >> The interesting thing is that even though there is a lot of free memory at the
> >> end, it still hangs.
> >
> > Grr. I'm surprised this result ;-)
> > shrink_all_memory() finish to shrink memory successfully. but your
> > system still hang immediately after. I have no idea why this mysterious
> > occur.
> >
> > I prepared next debugging patch. It added prenty debug printk. I hope
> > it enlighten up which path makes system hang-up.
> >
> > 1. apply my new patch
> >
> > 2. Enable following PM debug option in Kconfig
> >
> > [*] Power Management support
> > [*] Power Management Debug Support
> > [*] Extra PM attributes in sysfs for low-level debugging/testing
> > [*] Verbose Power Management debugging
> >
> > 3. append following kernel boot option into grub configration file
> >
> > no_console_suspend=1
> >
> > 3. kernel build and reboot
> > 4. some prepare
> > # echo 8 > /proc/sysrq-trigger
> > # cd /sys/power
> > # echo 1 > pm_trace
> > # echo 0 > pm_async
> >
> > 5. run your test program
> >
> >
> >> I also included the timestamps; note the one and two second
> >> delays between the passes.
> >
> > This is expected result because tmpfs shrink need swap-out. then
> > we need i/o time.
> >
> >>
> >> Please let me know if there is anything I can do.
> >
> > Please send me your .config and full dmesg.
> >
> >
> > Thanks many and many help us!
>
> Hello,
>
> I have followed your instructions, with one exception: I have also
> enabled CONFIG_PM_TRACE so that I would have /sys/power/pm_trace.
>
> This time I had some more output, as expected. I double checked what
> I typed while looking at the screen-shot I took with my camera. Here's
> the output:
>
> === 8< ===
> PM: Marking nosave pages: ...0009f000 - ...000100000
> PM: basic memory bitmaps created
> PM: Syncing filesystems ... done
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> PM: Preallocating image memory...
> shrink_all_memory start
> PM: shrink memory: pass=1, req:310171 reclaimed:15492 free:360936
> PM: shrink memory: pass=2, req:294679 reclaimed:28864 free:373981
> PM: shrink memory: pass=3, req:265815 reclaimed:60311 free:405374
> PM: shrink memory: pass=4, req:205504 reclaimed:97870 free:443024
> PM: shrink memory: pass=5, req:107634 reclaimed:146948 free:492141
> shrink_all_memory: req:107634 reclaimed:146948 free:492141
> PM: preallocate_image_highmem 556658 278329
> PM: preallocate_image_memory 103139 103139
> PM: preallocate_highmem_fraction 183908 556658 760831 -> 183908
> === >8 ===
>
> According to your patch, the next output should have been
> "preallocate_image_memory ...", but it never gets printed, so the
> hang point should be that function.

Great!
I've attached more verbose debug message patch and trial bug fixing patch.
Could you please to try this?



> I am attaching my dmesg output which I got after the failed hibernation
> attempt and my .config file. Please note that the attached .config file
> is a trimmed version of the .config I usually use on my computer. I trimmed
> it so that it compiles faster, but (mostly) has support for devices I might
> use.
>
> Thanks a lot for your help, and please let me know if I can do anything else.
>


Attachments:
0001-debug2.patch (5.14 kB)
0002-add-gfp_noretry.patch (825.00 B)
Download all attachments

2010-08-31 06:54:16

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernation to hang

> > === 8< ===
> > PM: Marking nosave pages: ...0009f000 - ...000100000
> > PM: basic memory bitmaps created
> > PM: Syncing filesystems ... done
> > Freezing user space processes ... (elapsed 0.01 seconds) done.
> > Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> > PM: Preallocating image memory...
> > shrink_all_memory start
> > PM: shrink memory: pass=1, req:310171 reclaimed:15492 free:360936
> > PM: shrink memory: pass=2, req:294679 reclaimed:28864 free:373981
> > PM: shrink memory: pass=3, req:265815 reclaimed:60311 free:405374
> > PM: shrink memory: pass=4, req:205504 reclaimed:97870 free:443024
> > PM: shrink memory: pass=5, req:107634 reclaimed:146948 free:492141
> > shrink_all_memory: req:107634 reclaimed:146948 free:492141
> > PM: preallocate_image_highmem 556658 278329
> > PM: preallocate_image_memory 103139 103139
> > PM: preallocate_highmem_fraction 183908 556658 760831 -> 183908
> > === >8 ===
> >
> > According to your patch, the next output should have been
> > "preallocate_image_memory ...", but it never gets printed, so the
> > hang point should be that function.
>
> Great!
> I've attached more verbose debug message patch and trial bug fixing patch.
> Could you please to try this?

Oops, please apply attached patch instead 0002-add-gfp_noretry.patch.

Thanks.


Attachments:
0001-do_try_to_free_pages-oom_killer_disabled.patch (955.00 B)

2010-08-31 11:26:08

by M. Vefa Bicakci

[permalink] [raw]
Subject: Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernationto hang

On 31/08/10 09:54 AM, KOSAKI Motohiro wrote:
>> Great!
>> I've attached more verbose debug message patch and trial bug fixing patch.
>> Could you please to try this?
>
> Oops, please apply attached patch instead 0002-add-gfp_noretry.patch.
>
> Thanks.

Hello!

I have applied the patches you mentioned, and rebuilt and tested the
2.6.35.4 kernel. I am really happy to say that your patches (cumulatively)
fixed the issue!

Unfortunately, because the hibernation is rather quick, I am having a
hard time getting screen-shots with my camera. If you would like, I can
try to put some sleeps around the code so that I can get the output for
you.

For the record, the attached patch is the cumulative version of all of
your patches. It applies cleanly to 2.6.35.4, and most importantly, it
fixes the issue.

All in all, thanks a lot!

Is there anything else I can do? Would you like me to try a trimmed
version of your patch, maybe without the debugging parts and the 5-pass
swap-out procedure, which I am not sure is essential or not?

Thanks again,

M. Vefa Bicakci


Attachments:
mm-patch-debug+fix-full.txt (10.14 kB)

2010-09-01 00:48:55

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernation to hang

> === 8< ===
> PM: Marking nosave pages: ...0009f000 - ...000100000
> PM: basic memory bitmaps created
> PM: Syncing filesystems ... done
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> PM: Preallocating image memory...
> shrink_all_memory start
> PM: shrink memory: pass=1, req:310171 reclaimed:15492 free:360936
> PM: shrink memory: pass=2, req:294679 reclaimed:28864 free:373981
> PM: shrink memory: pass=3, req:265815 reclaimed:60311 free:405374
> PM: shrink memory: pass=4, req:205504 reclaimed:97870 free:443024
> PM: shrink memory: pass=5, req:107634 reclaimed:146948 free:492141
> shrink_all_memory: req:107634 reclaimed:146948 free:492141
> PM: preallocate_image_highmem 556658 278329
> PM: preallocate_image_memory 103139 103139
> PM: preallocate_highmem_fraction 183908 556658 760831 -> 183908
> === >8 ===

Rafael, this log mean hibernate_preallocate_memory() has a bug.
It allocate memory as following order.
1. preallocate_image_highmem() (i.e. __GFP_HIGHMEM)
2. preallocate_image_memory() (i.e. GFP_KERNEL)
3. preallocate_highmem_fraction (i.e. __GFP_HIGHMEM)
4. preallocate_image_memory() (i.e. GFP_KERNEL)

But, please imazine following scenario (as Vefa's scenario).
- system has 3GB memory. 1GB is normal. 2GB is highmem.
- all normal memory is free
- 1.5GB memory of highmem are used for tmpfs. rest 500MB is free.

At that time, hibernate_preallocate_memory() works as following.

1. call preallocate_image_highmem(1GB)
2. call preallocate_image_memory(500M) total 1.5GB allocated
3. call preallocate_highmem_fraction(660M) total 2.2GB allocated

then, all of normal zone memory was exhaust. next preallocate_image_memory()
makes OOM, and oom_killer_disabled makes infinite loop.
(oom_killer_disabled careless is vmscan bug. I'll fix it soon)

The problem is, alloc_pages(__GFP_HIGHMEM) -> alloc_pages(GFP_KERNEL) is
wrong order. alloc_pages(__GFP_HIGHMEM) may allocate page from lower zone.
then, next alloc_pages(GFP_KERNEL) lead to OOM.

Please consider alloc_pages(GFP_KERNEL) -> alloc_pages(__GFP_HIGHMEM) order.
Even though vmscan fix can avoid infinite loop, OOM situation might makes
big slow down on highmem machine. It seems no good.


Thanks.