LinuxLists.cc - Kernel falls apart under light memory pressure (i.e. linking vmlinux)

2011-05-11 22:43:05

Subject: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

For the last few days (since moving my disk to a new laptop), my
system has been hanging, usually unrecoverably, under light memory
pressure. When this happens, I usually see soft lockups and no OOM
kill. Mouse and keyboard input stop working. Sometimes I can switch
VTs; sometimes I can't. If I just wait it out, sometimes the system
comes back after a couple of minutes but usually even ten minutes or
so isn't enough. If I force an OOM kill (Alt-SysRq-F), my system
sometimes recovers. I've attached the dmesg from when that happened
(in that case the freeze was triggered by linking a kernel and the OOM
killer killed ld.)

I can trigger it about half of the time my building a kernel (it
usually dies while linking or doing the .tmp_* stuff) and 100% of the
time by running the attached script with parameters "1500 1400 1".
The script creates a 1500M file on a ramfs, sets up dm-crypt over
loopback on that file, formats it as ext4, and mounts it, then starts
writing a 1400M file over and over on the ext4 partition.

I cannot trigger the problem by running the same script on a different
machine (with 8 GB RAM) with parameters 6000 5500 1. I can't trigger
it on this machine from initramfs (same kernel image) or from
systemd's emergency shell. I can trigger it some of the time from
systemd's rescue shell (which as a little bit more stuff running).
The problem seems about equally prevalent with ACHI or compatibility
mode and with aesni-intel enabled and disabled. (aesni-intel causes
cryptd to get pulled in, so I thought that might be the issue.)

I can sometimes (but not always) trigger this by enabling swap and
running dirty_ram 2048 (attached). (One time it took the system down
completely. I have ~8 GB of swap, all of which was empty when I ran
the program.)

I see this problem on 2.6.38.{5,6}, 2.6.39-<something from today>, and
Fedora 15's kernel, so I doubt it's an oddity of my kernel config.

I also had this problem while running Fedora 15's installer to upgrade
from Fedora 14 to 15, which rules out a lot of weird userspace issues.

This box is a Lenovo X220 Sandy Bridge laptop with 2G of RAM (the old
box had more) and runs ext4 on LVM on dm-crypt on an SSD. I see the
problem with and without a swap partition. I've also tried unloading
most drivers and the test still fails. Memtest passes.

If I had to guess, I'd say that the VM gets confused when it's forced
to write data out to my LVM-over-dm-crypt partition and either starts
OOM-killing things when it's not out of memory or deadlocks because it
runs out of available RAM and can't service new dm-crypt and block
requests.

Please help fix/debug this. It's making my shiny new laptop almost useless.

--Andy

Attachments:

successful-oom-kill.txt (86.14 kB)
test_mempressure.sh (1.95 kB)
OOM-with-lots-of-swap.txt (33.86 kB)
dirty_ram.cc (583.00 B)
Download all attachments

2011-05-11 23:08:04

by Andi Kleen

[permalink] [raw]

Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

Andrew Lutomirski <[email protected]> writes:
>
> I can sometimes (but not always) trigger this by enabling swap and
> running dirty_ram 2048 (attached). (One time it took the system down
> completely. I have ~8 GB of swap, all of which was empty when I ran

Never configure that much swap (> 1*RAM). It will just make any OOM more
painful because it'll thrash forever. If you're 4x overcommited
no workload will be happy.

> This box is a Lenovo X220 Sandy Bridge laptop with 2G of RAM (the old
> box had more) and runs ext4 on LVM on dm-crypt on an SSD. I see the

FWIW i had problems in swapping over dmcrypt for a long time -- not
quite as severe as you. Never really tracked it down.

But I suspect just not doing the swap over dmcrypt would make
it a lot more usable.

> If I had to guess, I'd say that the VM gets confused when it's forced
> to write data out to my LVM-over-dm-crypt partition and either starts
> OOM-killing things when it's not out of memory or deadlocks because it
> runs out of available RAM and can't service new dm-crypt and block
> requests.
>
> Please help fix/debug this. It's making my shiny new laptop almost useless.

I would add some tracing to the dmcrypt paths and then log
it over the network during the problem. Most likely some part
of it stalls or tries to allocate more memory.

-Andi

--
[email protected] -- Speaking for myself only

2011-05-11 23:29:08

by Andrew Lutomirski

[permalink] [raw]

Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

On Wed, May 11, 2011 at 7:07 PM, Andi Kleen <[email protected]> wrote:
> Andrew Lutomirski <[email protected]> writes:
>>
>> I can sometimes (but not always) trigger this by enabling swap and
>> running dirty_ram 2048 (attached). ?(One time it took the system down
>> completely. ?I have ~8 GB of swap, all of which was empty when I ran
>
> Never configure that much swap (> 1*RAM). It will just make any OOM more
> painful because it'll thrash forever. If you're 4x overcommited
> no workload will be happy.

Agreed. But I only need to overcommit by a little to get it to crash.

>
>> This box is a Lenovo X220 Sandy Bridge laptop with 2G of RAM (the old
>> box had more) and runs ext4 on LVM on dm-crypt on an SSD. ?I see the
>
> FWIW i had problems in swapping over dmcrypt for a long time -- not
> quite as severe as you. Never really tracked it down.
>
> But I suspect just not doing the swap over dmcrypt would make
> it a lot more usable.

Maybe. But I can get to it crash just fine without any swap at all,
which I think ought to be the most stable configuration.

>
>> If I had to guess, I'd say that the VM gets confused when it's forced
>> to write data out to my LVM-over-dm-crypt partition and either starts
>> OOM-killing things when it's not out of memory or deadlocks because it
>> runs out of available RAM and can't service new dm-crypt and block
>> requests.
>>
>> Please help fix/debug this. ?It's making my shiny new laptop almost useless.
>
> I would add some tracing to the dmcrypt paths and then log
> it over the network during the problem. Most likely some part
> of it stalls or tries to allocate more memory.

Yep, that's next. I just added some instrumentation in mempool_alloc
to warn if it can't satisfy an allocation for five seconds and it
didn't trigger. Most of the dm-crypt allocations I could find go
through mempool, so I think they're ruled out.

Do softlockups in kswapd0 mean anything? I think I can rule out a
traditional vm deadlock, because the machine is currently stuck with
tons of things hitting the softlockup warning but with 809M of DMA32
space free (as well as 8M DMA and 16kB normal).

Here's a nice picture of alt-sysrq-m with lots of memory free but the
system mostly hung. I can still switch VTs.

http://web.mit.edu/luto/www/meminfo.jpg

alt-sysrq-j to thaw filesystems caused the system to start printing
"Emergency Thaw on dm-2" in an infinite loop. Time to power off and
go home...

--Andy

>
> -Andi
>
> --
> [email protected] -- Speaking for myself only
>

2011-05-12 05:46:37

by Andi Kleen

[permalink] [raw]

Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

> Here's a nice picture of alt-sysrq-m with lots of memory free but the
> system mostly hung. I can still switch VTs.

Would rather need backtraces. Try setting up netconsole or crashdump
first.

-Andi

--
[email protected] -- Speaking for myself only.

2011-05-12 11:54:50

On Sun, May 22, 2011 at 7:12 PM, Minchan Kim <[email protected]> wrote:
> Could you test below patch based on vanilla 2.6.38.6?
> The expect result is that system hang never should happen.
> I hope this is last test about hang.
>
> Thanks.
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 292582c..1663d24 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> ? ? ? if (scanned == 0)
> ? ? ? ? ? ? ? scanned = SWAP_CLUSTER_MAX;
>
> - ? ? ? if (!down_read_trylock(&shrinker_rwsem))
> - ? ? ? ? ? ? ? return 1; ? ? ? /* Assume we'll be able to shrink next time */
> + ? ? ? if (!down_read_trylock(&shrinker_rwsem)) {
> + ? ? ? ? ? ? ? /* Assume we'll be able to shrink next time */
> + ? ? ? ? ? ? ? ret = 1;
> + ? ? ? ? ? ? ? goto out;
> + ? ? ? }
>
> ? ? ? list_for_each_entry(shrinker, &shrinker_list, list) {
> ? ? ? ? ? ? ? unsigned long long delta;
> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> ? ? ? ? ? ? ? shrinker->nr += total_scan;
> ? ? ? }
> ? ? ? up_read(&shrinker_rwsem);
> +out:
> + ? ? ? cond_resched();
> ? ? ? return ret;
> ?}
>
> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
> *pgdat, int order, long remaining,
> ? ? ? ?* must be balanced
> ? ? ? ?*/
> ? ? ? if (order)
> - ? ? ? ? ? ? ? return pgdat_balanced(pgdat, balanced, classzone_idx);
> + ? ? ? ? ? ? ? return !pgdat_balanced(pgdat, balanced, classzone_idx);
> ? ? ? else
> ? ? ? ? ? ? ? return !all_zones_ok;
> ?}

So far with this patch I can't reproduce the hang or the bogus OOM.

To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm
running 2.6.38.6, and I have exactly two patches applied. One is the
attached patch and the other is a the fpu.ko/aesni_intel.ko merger
which I need to get dracut to boot my box.

For fun, I also upgraded to 8GB of RAM and it still works.

--Andy

>
> --
> Kind regards,
> Minchan Kim
>

Attachments:

minchan-patch-v3.patch (1.19 kB)

2011-05-24 01:34:26

by Minchan Kim

[permalink] [raw]

Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski <[email protected]> wrote:
> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim <[email protected]> wrote:
>> Could you test below patch based on vanilla 2.6.38.6?
>> The expect result is that system hang never should happen.
>> I hope this is last test about hang.
>>
>> Thanks.
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 292582c..1663d24 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>> if (scanned == 0)
>> scanned = SWAP_CLUSTER_MAX;
>>
>> - if (!down_read_trylock(&shrinker_rwsem))
>> - return 1; /* Assume we'll be able to shrink next time */
>> + if (!down_read_trylock(&shrinker_rwsem)) {
>> + /* Assume we'll be able to shrink next time */
>> + ret = 1;
>> + goto out;
>> + }
>>
>> list_for_each_entry(shrinker, &shrinker_list, list) {
>> unsigned long long delta;
>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>> shrinker->nr += total_scan;
>> }
>> up_read(&shrinker_rwsem);
>> +out:
>> + cond_resched();
>> return ret;
>> }
>>
>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
>> *pgdat, int order, long remaining,
>> * must be balanced
>> */
>> if (order)
>> - return pgdat_balanced(pgdat, balanced, classzone_idx);
>> + return !pgdat_balanced(pgdat, balanced, classzone_idx);
>> else
>> return !all_zones_ok;
>> }
>
> So far with this patch I can't reproduce the hang or the bogus OOM.
>
> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm
> running 2.6.38.6, and I have exactly two patches applied. One is the
> attached patch and the other is a the fpu.ko/aesni_intel.ko merger
> which I need to get dracut to boot my box.
>
> For fun, I also upgraded to 8GB of RAM and it still works.
>

Hmm. Could you test it with enable thp and 2G RAM?
Isn't it a original test environment?
Please don't change test environment. :)

Thanks for your effort, Andrew.

--
Kind regards,
Minchan Kim

2011-05-24 11:24:36

by Andrew Lutomirski

[permalink] [raw]

Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

On Mon, May 23, 2011 at 9:34 PM, Minchan Kim <[email protected]> wrote:
> On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski <[email protected]> wrote:
>> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim <[email protected]> wrote:
>>> Could you test below patch based on vanilla 2.6.38.6?
>>> The expect result is that system hang never should happen.
>>> I hope this is last test about hang.
>>>
>>> Thanks.
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 292582c..1663d24 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>> ? ? ? if (scanned == 0)
>>> ? ? ? ? ? ? ? scanned = SWAP_CLUSTER_MAX;
>>>
>>> - ? ? ? if (!down_read_trylock(&shrinker_rwsem))
>>> - ? ? ? ? ? ? ? return 1; ? ? ? /* Assume we'll be able to shrink next time */
>>> + ? ? ? if (!down_read_trylock(&shrinker_rwsem)) {
>>> + ? ? ? ? ? ? ? /* Assume we'll be able to shrink next time */
>>> + ? ? ? ? ? ? ? ret = 1;
>>> + ? ? ? ? ? ? ? goto out;
>>> + ? ? ? }
>>>
>>> ? ? ? list_for_each_entry(shrinker, &shrinker_list, list) {
>>> ? ? ? ? ? ? ? unsigned long long delta;
>>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>> ? ? ? ? ? ? ? shrinker->nr += total_scan;
>>> ? ? ? }
>>> ? ? ? up_read(&shrinker_rwsem);
>>> +out:
>>> + ? ? ? cond_resched();
>>> ? ? ? return ret;
>>> ?}
>>>
>>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
>>> *pgdat, int order, long remaining,
>>> ? ? ? ?* must be balanced
>>> ? ? ? ?*/
>>> ? ? ? if (order)
>>> - ? ? ? ? ? ? ? return pgdat_balanced(pgdat, balanced, classzone_idx);
>>> + ? ? ? ? ? ? ? return !pgdat_balanced(pgdat, balanced, classzone_idx);
>>> ? ? ? else
>>> ? ? ? ? ? ? ? return !all_zones_ok;
>>> ?}
>>
>> So far with this patch I can't reproduce the hang or the bogus OOM.
>>
>> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm
>> running 2.6.38.6, and I have exactly two patches applied. ?One is the
>> attached patch and the other is a the fpu.ko/aesni_intel.ko merger
>> which I need to get dracut to boot my box.
>>
>> For fun, I also upgraded to 8GB of RAM and it still works.
>>
>
> Hmm. Could you test it with enable thp and 2G RAM?
> Isn't it a original test environment?
> Please don't change test environment. :)

The test that passed last night was an environment (hardware and
config) that I had confirmed earlier as failing without the patch.

I just re-tested my original config (from a backup -- migration,
compaction, and thp "always" are enabled). I get bogus OOMs but not a
hang. (I'm running with mem=2G right now -- I'll swap the DIMMs back
out later on if you want.)

I attached the bogus OOM (actually several that happened in sequence).
They look readahead-related. There was plenty of free swap space.

--Andy

2011-05-24 11:55:44

by Andrew Lutomirski

[permalink] [raw]

Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

On Tue, May 24, 2011 at 7:24 AM, Andrew Lutomirski <[email protected]> wrote:
> On Mon, May 23, 2011 at 9:34 PM, Minchan Kim <[email protected]> wrote:
>> On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski <[email protected]> wrote:
>>> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim <[email protected]> wrote:
>>>> Could you test below patch based on vanilla 2.6.38.6?
>>>> The expect result is that system hang never should happen.
>>>> I hope this is last test about hang.
>>>>
>>>> Thanks.
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 292582c..1663d24 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>> ? ? ? if (scanned == 0)
>>>> ? ? ? ? ? ? ? scanned = SWAP_CLUSTER_MAX;
>>>>
>>>> - ? ? ? if (!down_read_trylock(&shrinker_rwsem))
>>>> - ? ? ? ? ? ? ? return 1; ? ? ? /* Assume we'll be able to shrink next time */
>>>> + ? ? ? if (!down_read_trylock(&shrinker_rwsem)) {
>>>> + ? ? ? ? ? ? ? /* Assume we'll be able to shrink next time */
>>>> + ? ? ? ? ? ? ? ret = 1;
>>>> + ? ? ? ? ? ? ? goto out;
>>>> + ? ? ? }
>>>>
>>>> ? ? ? list_for_each_entry(shrinker, &shrinker_list, list) {
>>>> ? ? ? ? ? ? ? unsigned long long delta;
>>>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>> ? ? ? ? ? ? ? shrinker->nr += total_scan;
>>>> ? ? ? }
>>>> ? ? ? up_read(&shrinker_rwsem);
>>>> +out:
>>>> + ? ? ? cond_resched();
>>>> ? ? ? return ret;
>>>> ?}
>>>>
>>>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
>>>> *pgdat, int order, long remaining,
>>>> ? ? ? ?* must be balanced
>>>> ? ? ? ?*/
>>>> ? ? ? if (order)
>>>> - ? ? ? ? ? ? ? return pgdat_balanced(pgdat, balanced, classzone_idx);
>>>> + ? ? ? ? ? ? ? return !pgdat_balanced(pgdat, balanced, classzone_idx);
>>>> ? ? ? else
>>>> ? ? ? ? ? ? ? return !all_zones_ok;
>>>> ?}
>>>
>>> So far with this patch I can't reproduce the hang or the bogus OOM.
>>>
>>> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm
>>> running 2.6.38.6, and I have exactly two patches applied. ?One is the
>>> attached patch and the other is a the fpu.ko/aesni_intel.ko merger
>>> which I need to get dracut to boot my box.
>>>
>>> For fun, I also upgraded to 8GB of RAM and it still works.
>>>
>>
>> Hmm. Could you test it with enable thp and 2G RAM?
>> Isn't it a original test environment?
>> Please don't change test environment. :)
>
> The test that passed last night was an environment (hardware and
> config) that I had confirmed earlier as failing without the patch.
>
> I just re-tested my original config (from a backup -- migration,
> compaction, and thp "always" are enabled). ?I get bogus OOMs but not a
> hang. ?(I'm running with mem=2G right now -- I'll swap the DIMMs back
> out later on if you want.)
>
> I attached the bogus OOM (actually several that happened in sequence).
> ?They look readahead-related. ?There was plenty of free swap space.

Now with log actually attached.

>
> --Andy
>

Attachments:

bogus_oom.txt.xz (20.56 kB)

2011-05-25 00:44:06

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

(2011/05/24 20:55), Andrew Lutomirski wrote:
> On Tue, May 24, 2011 at 7:24 AM, Andrew Lutomirski <[email protected]> wrote:
>> On Mon, May 23, 2011 at 9:34 PM, Minchan Kim <[email protected]> wrote:
>>> On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski <[email protected]> wrote:
>>>> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim <[email protected]> wrote:
>>>>> Could you test below patch based on vanilla 2.6.38.6?
>>>>> The expect result is that system hang never should happen.
>>>>> I hope this is last test about hang.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index 292582c..1663d24 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>>> if (scanned == 0)
>>>>> scanned = SWAP_CLUSTER_MAX;
>>>>>
>>>>> - if (!down_read_trylock(&shrinker_rwsem))
>>>>> - return 1; /* Assume we'll be able to shrink next time */
>>>>> + if (!down_read_trylock(&shrinker_rwsem)) {
>>>>> + /* Assume we'll be able to shrink next time */
>>>>> + ret = 1;
>>>>> + goto out;
>>>>> + }
>>>>>
>>>>> list_for_each_entry(shrinker, &shrinker_list, list) {
>>>>> unsigned long long delta;
>>>>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>>> shrinker->nr += total_scan;
>>>>> }
>>>>> up_read(&shrinker_rwsem);
>>>>> +out:
>>>>> + cond_resched();
>>>>> return ret;
>>>>> }
>>>>>
>>>>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
>>>>> *pgdat, int order, long remaining,
>>>>> * must be balanced
>>>>> */
>>>>> if (order)
>>>>> - return pgdat_balanced(pgdat, balanced, classzone_idx);
>>>>> + return !pgdat_balanced(pgdat, balanced, classzone_idx);
>>>>> else
>>>>> return !all_zones_ok;
>>>>> }
>>>>
>>>> So far with this patch I can't reproduce the hang or the bogus OOM.
>>>>
>>>> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm
>>>> running 2.6.38.6, and I have exactly two patches applied. One is the
>>>> attached patch and the other is a the fpu.ko/aesni_intel.ko merger
>>>> which I need to get dracut to boot my box.
>>>>
>>>> For fun, I also upgraded to 8GB of RAM and it still works.
>>>>
>>>
>>> Hmm. Could you test it with enable thp and 2G RAM?
>>> Isn't it a original test environment?
>>> Please don't change test environment. :)
>>
>> The test that passed last night was an environment (hardware and
>> config) that I had confirmed earlier as failing without the patch.
>>
>> I just re-tested my original config (from a backup -- migration,
>> compaction, and thp "always" are enabled). I get bogus OOMs but not a
>> hang. (I'm running with mem=2G right now -- I'll swap the DIMMs back
>> out later on if you want.)
>>
>> I attached the bogus OOM (actually several that happened in sequence).
>> They look readahead-related. There was plenty of free swap space.
>
> Now with log actually attached.

Unfortnately, this log don't tell us why DM don't issue any swap io. ;-)
I doubt it's DM issue. Can you please try to make swap on out of DM?