We have a database server which we've had some problems with, we've
had some serious crashes (particularly during postgres vacuuming)
which couldn't be fixed without hardware reboot. We assumed that the
problem would go away by itself after upgrading the server to a new
box with 32G of RAM ... but, alas, one of the first days in operation
it first killed lots of postgres children during the evening, and
finally it crashed and required hardware reboot. The box is certainly
not out of memory, but after reading some posts here, I've understood
that memory management is a complex issue.
We're running 32 bits linux, maybe it would help to upgrade to 64 bits?
I'm pretty sure that everything we need to know is located in
/var/log/kernel.log - if we could only interpret it, that is. I'm
particularly curious about those lines:
Nov 8 18:15:17 localhost kernel: DMA free:3556kB min:68kB low:84kB
high:100kB active:52kB inactive:0kB present:16256kB pages_scanned:208
all_unreclaimable? yes
Nov 8 18:15:17 localhost kernel: lowmem_reserve[]: 0 873 33512 33512
Nov 8 18:15:17 localhost kernel: Normal free:3676kB min:3744kB
low:4680kB high:5616kB active:892kB inactive:1792kB present:894080kB
pages_scanned:3468 all_unreclaimable? yes
Nov 8 18:15:17 localhost kernel: lowmem_reserve[]: 0 0 261112 261112
Nov 8 18:15:17 localhost kernel: HighMem free:7347404kB min:512kB
low:35536kB high:70560kB active:15982108kB inactive:9299956kB
present:33422336kB pages_scanned:0 all_unreclaimable? no
Nov 8 18:15:17 localhost kernel: lowmem_reserve[]: 0 0 0 0
Nov 8 18:15:17 localhost kernel: DMA: 7*4kB 15*8kB 5*16kB 0*32kB
0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3556kB
Nov 8 18:15:17 localhost kernel: Normal: 1*4kB 0*8kB 24*16kB 2*32kB
1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
Nov 8 18:15:17 localhost kernel: HighMem: 2054*4kB 60332*8kB
54264*16kB 14863*32kB 4487*64kB 3716*128kB 1243*256kB 3228*512kB
80*1024kB 979*2048kB 169*4096kB = 7347608kB
I've copied out relevant files on
http://oppetid.no/~tobixen/oom_problem/ - it's the complete kernel
log, /proc/config.gz, /proc/slabinfo, /proc/meminfo and dmesg. Those
are of course fetched after the reboot.
atop shows a relatively stable situation with plenty of free RAM and
caches all the evening.
(CCs preferred)
"Tobias Brox" <[email protected]> writes:
> We have a database server which we've had some problems with, we've
> had some serious crashes (particularly during postgres vacuuming)
> which couldn't be fixed without hardware reboot. We assumed that the
> problem would go away by itself after upgrading the server to a new
> box with 32G of RAM ... but, alas, one of the first days in operation
> it first killed lots of postgres children during the evening, and
> finally it crashed and required hardware reboot. The box is certainly
> not out of memory, but after reading some posts here, I've understood
> that memory management is a complex issue.
>
> We're running 32 bits linux, maybe it would help to upgrade to 64 bits?
Big-time. There are a lot of "artificial" internal kernel memory
limits in the 32-bit architecture, so you can run up against those
(triggering the OOM killer) even if you have plenty of application
memory.
32GB in the box actually might be counter-productive unless you're
running 64-bit, because just tracking that much memory will consume
precious low kernel RAM.
-Doug
Hi,
On Fri, Nov 09, 2007 at 10:35:14PM +0100, Tobias Brox wrote:
> We're running 32 bits linux, maybe it would help to upgrade to 64 bits?
Although you can address up to 64GB of RAM with PAE, the address space
of directly accessible memory is still at 32bit (4GB) and access to
higher memory costs low memory and processing overhead for the mapping
into the 32bit address space.
> I've copied out relevant files on
> http://oppetid.no/~tobixen/oom_problem/ - it's the complete kernel
> log, /proc/config.gz, /proc/slabinfo, /proc/meminfo and dmesg. Those
> are of course fetched after the reboot.
kern.log has wrong permissions, slabinfo and meminfo are empty.
Hannes
[Tobias Brox]
> > We're running 32 bits linux, maybe it would help to upgrade to 64 bits?
[Johannes Weiner]
> Although you can address up to 64GB of RAM with PAE, the address space
> of directly accessible memory is still at 32bit (4GB) and access to
> higher memory costs low memory and processing overhead for the mapping
> into the 32bit address space.
Point taken. I've gotten quite some replies on this, also off the
list. We will upgrade to 64 bits, but I don't know when we will get
that done, so now we're wondering what we can do for short-term
solutions.
[Tobias Brox]
> > I've copied out relevant files on
> > http://oppetid.no/~tobixen/oom_problem/ - it's the complete kernel
> > log, /proc/config.gz, /proc/slabinfo, /proc/meminfo and dmesg. Those
> > are of course fetched after the reboot.
[Hannes]
> kern.log has wrong permissions, slabinfo and meminfo are empty.
My oh my. I'm really awfully bad at testing things. Sorry for that.
Links should work now. Ok, now it's even tested ;-)