I've been receiving these error messages during times of near complete
memory depletion. This particular machine runs a bare minimum of
processes and a our own application which is a threaded long running (1
day, 5:39) which consumes most of the resources on the machine. Oddly
enough however, the mallinfo() for this process shows a discrepancy of
650 megs with ps and top.
This process handles a large number of TCP connections and does a lot of
dynamic memory allocation, so I assumed the difference was due to memory
fragmentation on our part, however I thought that kswapd would reclaim
memory once it started swapping it out.
Another oddity is the bogomips reported by each CPU are somewhat
different from eachother.
The message being received is:
Nov 3 10:04:30 n175 kernel: __alloc_pages: 0-order allocation failed.
Nov 3 10:04:30 n175 last message repeated 363 times
The kernel version:
Linux version 2.4.0-test10 (root@stp) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 SMP Tue Oct 31 13:13:05 PST 2000
What free reports:
total used free shared buffers
cached
Mem: 1028256 1024172 4084 0 148
59296
-/+ buffers/cache: 964728 63528
Swap: 136512 75588 60924
ps and top report that the process taking up all this memory has an RSS
of 967656 KB and a VSIZE of 1005892, but mallinfo() on the process shows
a completely different number:
Memory statistics from mallinfo:
Total space allocated from system: 361406208
Number of non-inuse chunks: 1079273
Number of mmapped regions: 0
Total space in mmapped regions: 0
Total allocated space: 235536032
Total non-inuse space: 125870176
Top-most, releasable (via malloc_trim) space: 68776
All memory in this process is allocated via new or malloc (new calls
malloc though) and the numbers mallinfo() and ps report are 99.5%
accurate up until a sort of "slide" period where they diverge fairly
quickly.
$ cat /proc/slabinfo
slabinfo - version: 1.1 (SMP)
kmem_cache 68 68 232 4 4 1 : 252 126
nfs_read_data 0 0 352 0 0 1 : 124 62
nfs_write_data 0 0 384 0 0 1 : 124 62
nfs_page 0 0 96 0 0 1 : 252 126
nfs_fh 80 80 96 2 2 1 : 252 126
tcp_tw_bucket 39 80 96 2 2 1 : 252 126
tcp_bind_bucket 33 452 32 4 4 1 : 252 126
tcp_open_request 403 413 64 7 7 1 : 252 126
inet_peer_cache 1 59 64 1 1 1 : 252 126
ip_fib_hash 11 113 32 1 1 1 : 252 126
ip_dst_cache 15813 23856 160 994 994 1 : 252 126
arp_cache 46 90 128 3 3 1 : 252 126
blkdev_requests 768 800 96 20 20 1 : 252 126
dnotify cache 0 0 20 0 0 1 : 252 126
file lock cache 0 0 92 0 0 1 : 252 126
fasync cache 0 0 16 0 0 1 : 252 126
uid_cache 3 226 32 2 2 1 : 252 126
skbuff_head_cache 2840 12984 160 541 541 1 : 252 126
sock 7757 10310 800 2062 2062 1 : 124 62
inode_cache 7616 11000 384 1100 1100 1 : 124 62
bdev_cache 7 118 64 2 2 1 : 252 126
sigqueue 58 58 132 2 2 1 : 252 126
kiobuf 0 0 128 0 0 1 : 252 126
dentry_cache 7730 12420 128 414 414 1 : 252 126
filp 11758 11800 96 295 295 1 : 252 126
names_cache 2 2 4096 2 2 1 : 60 30
buffer_head 15200 25520 96 637 638 1 : 252 126
mm_struct 72 72 160 3 3 1 : 252 126
vm_area_struct 1048 1062 64 18 18 1 : 252 126
fs_cache 118 118 64 2 2 1 : 252 126
files_cache 27 27 416 3 3 1 : 124 62
signal_act 24 24 1312 8 8 1 : 60 30
size-131072(DMA) 0 0 131072 0 0 32 : 0 0
size-131072 0 0 131072 0 0 32 : 0 0
size-65536(DMA) 0 0 65536 0 0 16 : 0 0
size-65536 0 0 65536 0 0 16 : 0 0
size-32768(DMA) 0 0 32768 0 0 8 : 0 0
size-32768 0 0 32768 0 0 8 : 0 0
size-16384(DMA) 0 0 16384 0 0 4 : 0 0
size-16384 0 0 16384 0 0 4 : 0 0
size-8192(DMA) 0 0 8192 0 0 2 : 0 0
size-8192 3 3 8192 3 3 2 : 0 0
size-4096(DMA) 0 0 4096 0 0 1 : 60 30
size-4096 12 12 4096 12 12 1 : 60 30
size-2048(DMA) 0 0 2048 0 0 1 : 60 30
size-2048 661 1600 2048 507 800 1 : 60 30
size-1024(DMA) 0 0 1024 0 0 1 : 124 62
size-1024 1258 2096 1024 523 524 1 : 124 62
size-512(DMA) 0 0 512 0 0 1 : 124 62
size-512 90 104 512 12 13 1 : 124 62
size-256(DMA) 0 0 256 0 0 1 : 252 126
size-256 469 615 256 41 41 1 : 252 126
size-128(DMA) 0 0 128 0 0 1 : 252 126
size-128 1423 4170 128 139 139 1 : 252 126
size-64(DMA) 0 0 64 0 0 1 : 252 126
size-64 637 1298 64 22 22 1 : 252 126
size-32(DMA) 0 0 32 0 0 1 : 252 126
size-32 1192 4068 32 36 36 1 : 252 126
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 1
cpu MHz : 601.000371
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 mmx fxsr xmm
bogomips : 1199.31
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 1
cpu MHz : 601.000371
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 mmx fxsr xmm
bogomips : 1202.59
$ cat /proc/interrupts
CPU0 CPU1
0: 5383441 5395290 IO-APIC-edge timer
1: 0 2 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 1 0 IO-APIC-edge rtc
13: 0 0 XT-PIC fpu
14: 0 4 IO-APIC-edge ide0
16: 97170576 97185015 IO-APIC-level eth0
19: 220372 221162 IO-APIC-level aic7xxx
NMI: 10778640 10778640
LOC: 10778106 10778105
ERR: 0
$ cat /proc/iomem
00000000-0009ffff : System RAM
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000cc000-000d17ff : Extension ROM
000f0000-000fffff : System ROM
00100000-3fffcfff : System RAM
00100000-0027e137 : Kernel code
0027e138-003151bf : Kernel data
3fffd000-3fffefff : ACPI Tables
3ffff000-3fffffff : ACPI Non-volatile Storage
de800000-df7fffff : PCI Bus #02
de800000-de8fffff : Intel Corporation 82557 [Ethernet Pro 100] (#2)
df000000-df0fffff : Intel Corporation 82557 [Ethernet Pro 100]
df800000-dfbfffff : Trident Microsystems 3DIm`age 975
e0000000-e001ffff : Trident Microsystems 3DIm`age 975
e0800000-e0bfffff : Trident Microsystems 3DIm`age 975
e1000000-e1000fff : Adaptec AHA-2940U2/W / 7890
e1f00000-e3efffff : PCI Bus #02
e2000000-e2000fff : Intel Corporation 82557 [Ethernet Pro 100] (#2)
e2000000-e2000fff : eepro100
e3000000-e3000fff : Intel Corporation 82557 [Ethernet Pro 100]
e3000000-e3000fff : eepro100
e4000000-e7ffffff : Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
ffff0000-ffffffff : reserved
$ cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 1052934144 1046482944 6451200 0 196608 121032704
Swap: 139788288 134557696 5230592
MemTotal: 1028256 kB
MemFree: 6300 kB
MemShared: 0 kB
Buffers: 192 kB
Cached: 118196 kB
Active: 115828 kB
Inact_dirty: 276 kB
Inact_clean: 2284 kB
Inact_target: 2332 kB
HighTotal: 131060 kB
HighFree: 4232 kB
LowTotal: 897196 kB
LowFree: 2068 kB
SwapTotal: 136512 kB
SwapFree: 5108 kB
Just for laughs, here is the status file for the long running process:
$ cat /proc/410/status
Name: napd
State: R (running)
Pid: 410
PPid: 409
TracerPid: 0
Uid: 150 150 150 150
Gid: 150 150 150 150
FDSize: 16384
Groups:
VmSize: 1005892 kB
VmLck: 0 kB
VmRSS: 978924 kB
VmData: 999880 kB
VmStk: 32 kB
VmExe: 4760 kB
VmLib: 1056 kB
SigPnd: 0000000000000000
SigBlk: 0000000080000000
SigIgn: 8000000000081004
SigCgt: 0000000380014002
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Any help would be greatly appreciated.
Jordan