2008-01-21 05:33:06

by Matthias Wolle

[permalink] [raw]
Subject: LowFree/LowMem problem

Hi,

my company is running several servers with kernel 2.6.23.12. This are Dual
Quad Core servers (CPU Intel) with 16GB RAM using a 32Bit kernel.
After some days nicely running the oom killer killed our processes.
Our research discovered that the free low memory was reduced to about 11MB. We
found out that the decreasing of low memory is related to the count off
executed and ended programs.

I could reproduce the same bug on a maschine with 2GB RAM without PAE (option
4GB HIGHMEM). The following code reproduces the bug.

#!/bin/sh

while [ true ]; do
cat /proc/meminfo| grep LowFree
done

Please run this script only on a test system. It reduces the low memory quite
fast (within minutes).

The latest available kernel 2.6.24-rc8 is also affected.

Please fell free to request more information from me.
Please always CC to me, I'm not subscribed.

Thanks for reading

Matthias


2008-01-21 16:45:22

by Parag Warudkar

[permalink] [raw]
Subject: Re: LowFree/LowMem problem

Matthias Wolle <Matthias.Wolle <at> gmx.de> writes:

>
> Hi,
>
> my company is running several servers with kernel 2.6.23.12. This are Dual
> Quad Core servers (CPU Intel) with 16GB RAM using a 32Bit kernel.

This is a common problem with running 32-bit kernels with more than 8Gb RAM.
(Search the archives and you will find similar problem reports.)

Anyone in your situation has two options - Preferred one is to switch
to a 64-bit kernel. In most cases you can just upgrade the kernel to
64-bits and use the same 32-bit userspace. It works fine.

If you cannot for some reason upgrade to a 64-bit kernel, the second
option is to try and use RHEL/CentOS kernels which according to Alan
Cox are better tuned for this kind of setup and you have a better
chance that they will work.

HTH

Parag

2008-01-23 22:06:59

by Andrew Morton

[permalink] [raw]
Subject: Re: LowFree/LowMem problem

> On Mon, 21 Jan 2008 06:32:41 +0100 Matthias Wolle <[email protected]> wrote:
> Hi,
>
> my company is running several servers with kernel 2.6.23.12. This are Dual
> Quad Core servers (CPU Intel) with 16GB RAM using a 32Bit kernel.
> After some days nicely running the oom killer killed our processes.
> Our research discovered that the free low memory was reduced to about 11MB. We
> found out that the decreasing of low memory is related to the count off
> executed and ended programs.
>
> I could reproduce the same bug on a maschine with 2GB RAM without PAE (option
> 4GB HIGHMEM). The following code reproduces the bug.
>
> #!/bin/sh
>
> while [ true ]; do
> cat /proc/meminfo| grep LowFree
> done
>
> Please run this script only on a test system. It reduces the low memory quite
> fast (within minutes).

That would be very strange.

> The latest available kernel 2.6.24-rc8 is also affected.
>

Can you please send the full dmesg output from one such oom-killing event?

2008-01-23 22:41:27

by Matthias Wolle

[permalink] [raw]
Subject: Re: LowFree/LowMem problem

On Wednesday 23 January 2008 23:06 Andrew Morton wrote:
> Can you please send the full dmesg output from one such oom-killing event?

Jan 17 23:31:58 franklin72 kernel: sshd invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Jan 17 23:31:58 franklin72 kernel: cat invoked oom-killer: gfp_mask=0x84d0, order=0, oomkilladj=0
Jan 17 23:31:58 franklin72 kernel: [<c014af39>] out_of_memory+0x69/0x1a4
Jan 17 23:31:58 franklin72 kernel: [<c014c588>] __alloc_pages+0x20a/0x291
Jan 17 23:31:58 franklin72 kernel: [<c0153618>] __pte_alloc+0x11/0x94
Jan 17 23:31:58 franklin72 kernel: [<c015439e>] handle_mm_fault+0xa7/0x7bd
Jan 17 23:31:58 franklin72 kernel: [<c0119aa2>] do_page_fault+0x0/0x751
Jan 17 23:31:58 franklin72 kernel: [<c0119d56>] do_page_fault+0x2b4/0x751
Jan 17 23:31:58 franklin72 kernel: [<c0157442>] mmap_region+0x32f/0x3eb
Jan 17 23:31:58 franklin72 kernel: [<c0119aa2>] do_page_fault+0x0/0x751
Jan 17 23:31:58 franklin72 kernel: [<c02e03d2>] error_code+0x72/0x78
Jan 17 23:31:58 franklin72 kernel: [<c01e3091>] clear_user+0x27/0x32
Jan 17 23:31:58 franklin72 kernel: [<c0188691>] padzero+0x16/0x24
Jan 17 23:31:58 franklin72 kernel: [<c0189082>] load_elf_binary+0x7d4/0x142b
Jan 17 23:31:58 franklin72 kernel: [<c0154d07>] get_user_pages+0x253/0x2be
Jan 17 23:31:58 franklin72 kernel: [<c015190e>] page_address+0x78/0x98
Jan 17 23:31:58 franklin72 kernel: [<c0151a66>] kmap_high+0x1a/0x171
Jan 17 23:31:58 franklin72 kernel: [<c015190e>] page_address+0x78/0x98
Jan 17 23:31:58 franklin72 kernel: [<c016656e>] copy_strings+0x169/0x173
Jan 17 23:31:58 franklin72 kernel: [<c0166623>] search_binary_handler+0x84/0x19c
Jan 17 23:31:58 franklin72 kernel: [<c0167905>] do_execve+0x13b/0x1a4
Jan 17 23:31:58 franklin72 kernel: [<c01026b8>] sys_execve+0x2f/0x7b
Jan 17 23:31:58 franklin72 kernel: [<c0103d0e>] sysenter_past_esp+0x5f/0x85
Jan 17 23:31:58 franklin72 kernel: [<c02e0000>] __down_interruptible+0xba/0x10c
Jan 17 23:31:58 franklin72 kernel: =======================
Jan 17 23:31:58 franklin72 kernel: Mem-info:
Jan 17 23:31:58 franklin72 kernel: DMA per-cpu:
Jan 17 23:31:58 franklin72 kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 4: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 5: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 7: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 17 23:31:58 franklin72 kernel: Normal per-cpu:
Jan 17 23:31:58 franklin72 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 26 Cold: hi: 62, btch: 15 usd: 49
Jan 17 23:31:58 franklin72 kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 165 Cold: hi: 62, btch: 15 usd: 53
Jan 17 23:31:58 franklin72 kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 28 Cold: hi: 62, btch: 15 usd: 4
Jan 17 23:31:58 franklin72 kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 174 Cold: hi: 62, btch: 15 usd: 10
Jan 17 23:31:58 franklin72 kernel: CPU 4: Hot: hi: 186, btch: 31 usd: 30 Cold: hi: 62, btch: 15 usd: 9
Jan 17 23:31:58 franklin72 kernel: CPU 5: Hot: hi: 186, btch: 31 usd: 161 Cold: hi: 62, btch: 15 usd: 7
Jan 17 23:31:58 franklin72 kernel: CPU 6: Hot: hi: 186, btch: 31 usd: 12 Cold: hi: 62, btch: 15 usd: 12
Jan 17 23:31:58 franklin72 kernel: CPU 7: Hot: hi: 186, btch: 31 usd: 178 Cold: hi: 62, btch: 15 usd: 3
Jan 17 23:31:58 franklin72 kernel: HighMem per-cpu:
Jan 17 23:31:58 franklin72 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 41 Cold: hi: 62, btch: 15 usd: 8
Jan 17 23:31:58 franklin72 kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 182 Cold: hi: 62, btch: 15 usd: 10
Jan 17 23:31:58 franklin72 kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 17 Cold: hi: 62, btch: 15 usd: 7
Jan 17 23:31:58 franklin72 kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 156 Cold: hi: 62, btch: 15 usd: 4
Jan 17 23:31:58 franklin72 kernel: CPU 4: Hot: hi: 186, btch: 31 usd: 19 Cold: hi: 62, btch: 15 usd: 1
Jan 17 23:31:58 franklin72 kernel: CPU 5: Hot: hi: 186, btch: 31 usd: 165 Cold: hi: 62, btch: 15 usd: 13
Jan 17 23:31:58 franklin72 kernel: CPU 6: Hot: hi: 186, btch: 31 usd: 2 Cold: hi: 62, btch: 15 usd: 0
Jan 17 23:31:58 franklin72 kernel: CPU 7: Hot: hi: 186, btch: 31 usd: 182 Cold: hi: 62, btch: 15 usd: 3
Jan 17 23:31:58 franklin72 kernel: Active:490332 inactive:764 dirty:0 writeback:0 unstable:0
Jan 17 23:31:58 franklin72 kernel: free:3472003 slab:2279 mapped:858 pagetables:1152 bounce:0
Jan 17 23:31:58 franklin72 kernel: DMA free:3560kB min:68kB low:84kB high:100kB active:4kB inactive:0kB present:16256kB pages_scanned:21 all_unreclaimable? yes
Jan 17 23:31:58 franklin72 kernel: lowmem_reserve[]: 0 873 17002 17002
Jan 17 23:31:58 franklin72 kernel: Normal free:3688kB min:3744kB low:4680kB high:5616kB active:92kB inactive:0kB present:894080kB pages_scanned:224 all_unreclaimable? yes
Jan 17 23:31:58 franklin72 kernel: lowmem_reserve[]: 0 0 129032 129032
Jan 17 23:31:58 franklin72 kernel: HighMem free:13880764kB min:512kB low:17820kB high:35128kB active:1961232kB inactive:3064kB present:16516096kB pages_scanned:0 all_unreclaimable? no
Jan 17 23:31:58 franklin72 kernel: lowmem_reserve[]: 0 0 0 0
Jan 17 23:31:58 franklin72 kernel: DMA: 2*4kB 2*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3560kB
Jan 17 23:31:58 franklin72 kernel: Normal: 1*4kB 6*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4100kB
Jan 17 23:31:58 franklin72 kernel: HighMem: 641*4kB 375*8kB 197*16kB 77*32kB 68*64kB 55*128kB 36*256kB 23*512kB 23*1024kB 9*2048kB 3368*4096kB = 13880876kB
Jan 17 23:31:58 franklin72 kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0
Jan 17 23:31:58 franklin72 kernel: Free swap = 15623204kB
Jan 17 23:31:58 franklin72 kernel: Total swap = 15623204kB
Jan 17 23:31:58 franklin72 kernel: Free swap: 15623204kB
Jan 17 23:31:58 franklin72 kernel: 4390911 pages of RAM
Jan 17 23:31:58 franklin72 kernel: 4161535 pages of HIGHMEM
Jan 17 23:31:58 franklin72 kernel: 232469 reserved pages
Jan 17 23:31:58 franklin72 kernel: 4302 pages shared
Jan 17 23:31:58 franklin72 kernel: 0 pages swap cached
Jan 17 23:31:58 franklin72 kernel: 0 pages dirty
Jan 17 23:31:58 franklin72 kernel: 0 pages writeback
Jan 17 23:31:58 franklin72 kernel: 858 pages mapped
Jan 17 23:31:58 franklin72 kernel: 2281 pages slab
Jan 17 23:31:58 franklin72 kernel: 1138 pages pagetables
Jan 17 23:31:58 franklin72 kernel: Out of memory: kill process 3177 (bash) score 15352 or a child
Jan 17 23:31:58 franklin72 kernel: Killed process 3286 (memtest)

At this time no program run except some standard services (sshd, ntpd, syslogd ...) and the test script.

Kind regards
Matthias

2008-01-24 07:16:54

by Andi Kleen

[permalink] [raw]
Subject: Re: LowFree/LowMem problem

Matthias Wolle <[email protected]> writes:

> Jan 17 23:31:58 franklin72 kernel: sshd invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
> Jan 17 23:31:58 franklin72 kernel: cat invoked oom-killer: gfp_mask=0x84d0, order=0, oomkilladj=0
> Jan 17 23:31:58 franklin72 kernel: [<c014af39>] out_of_memory+0x69/0x1a4
> Jan 17 23:31:58 franklin72 kernel: [<c014c588>] __alloc_pages+0x20a/0x291
> Jan 17 23:31:58 franklin72 kernel: [<c0153618>] __pte_alloc+0x11/0x94

Do you perhaps have a kernel compiled without CONFIG_HIGHPTE? Normally
__pte_alloc should be able to allocate highmem unless that option is
not set. Before HIGHPTE was implemented running out of low memory
due to page tables was pretty common.

BTW the ultimate fix for most lowmem problems is to go 64bit

-Andi

2008-01-25 01:45:48

by Matthias Wolle

[permalink] [raw]
Subject: Re: LowFree/LowMem problem

On Thursday 24 January 2008 08:16 Andi Kleen wrote:
> Do you perhaps have a kernel compiled without CONFIG_HIGHPTE? Normally
> __pte_alloc should be able to allocate highmem unless that option is
> not set. Before HIGHPTE was implemented running out of low memory
> due to page tables was pretty common.

The kernel from 17th Jan 2008 didn't have CONFIG_HIGHPTE enabled. Yesterday we
checked a 2.6.23.14 kernel with CONFIG_HIGHPTE enabled. The 4GB test machine
showed the same fast low memory reducing effect. LowFree stopped at 400MB
like without CONFIG_HIGHPTE. A 2GB machine had a limit of 690MB LowFree. The
difference of LowTotal and LowFree is shown as used memory. In case of the
4GB machine, this means I have 400MB used memory which is not related to any
process.
I hope this helps. I can't test on the 16GB RAM machines anymore, because they
are working productive now.

> BTW the ultimate fix for most lowmem problems is to go 64bit

We switched back to the distribution default 2.6.18-bigmem kernel of debian
which doesn't have this problem. On this kernel the test script has no effect
to the low memory.

Regards
Matthias

On Thursday 24 January 2008 08:16 Andi Kleen wrote:
> Matthias Wolle <[email protected]> writes:
> > Jan 17 23:31:58 franklin72 kernel: sshd invoked oom-killer:
> > gfp_mask=0xd0, order=0, oomkilladj=0 Jan 17 23:31:58 franklin72 kernel:
> > cat invoked oom-killer: gfp_mask=0x84d0, order=0, oomkilladj=0 Jan 17
> > 23:31:58 franklin72 kernel: [<c014af39>] out_of_memory+0x69/0x1a4 Jan 17
> > 23:31:58 franklin72 kernel: [<c014c588>] __alloc_pages+0x20a/0x291 Jan
> > 17 23:31:58 franklin72 kernel: [<c0153618>] __pte_alloc+0x11/0x94
>
> Do you perhaps have a kernel compiled without CONFIG_HIGHPTE? Normally
> __pte_alloc should be able to allocate highmem unless that option is
> not set. Before HIGHPTE was implemented running out of low memory
> due to page tables was pretty common.
>
> BTW the ultimate fix for most lowmem problems is to go 64bit
>
> -Andi