2012-06-04 15:27:19

by Dave Jones

[permalink] [raw]
Subject: oomkillers gone wild.

I managed to drive a machine into a state of constant oom killing.
Looking over the logs, it started killing a bunch of processes that look
'wrong' to me..

Here's the first example. From here it just gets worse, taking out
sshd & NetworkManager, and other useful things rather than the memory hogs.

Dave

[21622.923039] modprobe invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
[21622.924540] modprobe cpuset=/ mems_allowed=0
[21622.925989] Pid: 10940, comm: modprobe Not tainted 3.5.0-rc1+ #51
[21622.927447] Call Trace:
[21622.928984] [<ffffffff816426b7>] dump_header+0x81/0x2d8
[21622.930427] [<ffffffff810b7cd5>] ? trace_hardirqs_on_caller+0x115/0x1a0
[21622.931860] [<ffffffff8164d742>] ? _raw_spin_unlock_irqrestore+0x42/0x80
[21622.933238] [<ffffffff8131ee0e>] ? ___ratelimit+0x9e/0x130
[21622.934705] [<ffffffff81144402>] oom_kill_process+0x282/0x2b0
[21622.936165] [<ffffffff81144919>] out_of_memory+0x249/0x410
[21622.937688] [<ffffffff8114a85d>] __alloc_pages_nodemask+0xa6d/0xab0
[21622.939215] [<ffffffff81189cb6>] alloc_pages_vma+0xb6/0x190
[21622.940720] [<ffffffff81168940>] __do_fault+0x2f0/0x530
[21622.942235] [<ffffffff81086f81>] ? get_parent_ip+0x11/0x50
[21622.943802] [<ffffffff81086f81>] ? get_parent_ip+0x11/0x50
[21622.945313] [<ffffffff8116b253>] handle_pte_fault+0x93/0x9d0
[21622.946819] [<ffffffff810b2450>] ? lock_release_holdtime.part.24+0x90/0x140
[21622.948527] [<ffffffff8116c39f>] handle_mm_fault+0x21f/0x2e0
[21622.950080] [<ffffffff81650a8b>] do_page_fault+0x13b/0x4b0
[21622.951620] [<ffffffff81079c03>] ? up_write+0x23/0x40
[21622.953089] [<ffffffff8164de03>] ? error_sti+0x5/0x6
[21622.954648] [<ffffffff813264fd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[21622.956244] [<ffffffff8164dc0f>] page_fault+0x1f/0x30
[21622.957866] Mem-Info:
[21622.959468] Node 0 DMA per-cpu:
[21622.961067] CPU 0: hi: 0, btch: 1 usd: 0
[21622.962682] CPU 1: hi: 0, btch: 1 usd: 0
[21622.964277] CPU 2: hi: 0, btch: 1 usd: 0
[21622.965877] CPU 3: hi: 0, btch: 1 usd: 0
[21622.967463] CPU 4: hi: 0, btch: 1 usd: 0
[21622.969056] CPU 5: hi: 0, btch: 1 usd: 0
[21622.970673] CPU 6: hi: 0, btch: 1 usd: 0
[21622.972272] CPU 7: hi: 0, btch: 1 usd: 0
[21622.973840] Node 0 DMA32 per-cpu:
[21622.975414] CPU 0: hi: 186, btch: 31 usd: 46
[21622.977075] CPU 1: hi: 186, btch: 31 usd: 0
[21622.978557] CPU 2: hi: 186, btch: 31 usd: 90
[21622.980103] CPU 3: hi: 186, btch: 31 usd: 102
[21622.981609] CPU 4: hi: 186, btch: 31 usd: 0
[21622.983067] CPU 5: hi: 186, btch: 31 usd: 70
[21622.984445] CPU 6: hi: 186, btch: 31 usd: 63
[21622.985854] CPU 7: hi: 186, btch: 31 usd: 0
[21622.987208] Node 0 Normal per-cpu:
[21622.988498] CPU 0: hi: 186, btch: 31 usd: 0
[21622.989772] CPU 1: hi: 186, btch: 31 usd: 0
[21622.990972] CPU 2: hi: 186, btch: 31 usd: 0
[21622.992201] CPU 3: hi: 186, btch: 31 usd: 0
[21622.993348] CPU 4: hi: 186, btch: 31 usd: 0
[21622.994475] CPU 5: hi: 186, btch: 31 usd: 0
[21622.995528] CPU 6: hi: 186, btch: 31 usd: 0
[21622.996551] CPU 7: hi: 186, btch: 31 usd: 0
[21622.997520] active_anon:38553 inactive_anon:39448 isolated_anon:224
[21622.997520] active_file:84 inactive_file:233 isolated_file:0
[21622.997520] unevictable:85 dirty:0 writeback:37692 unstable:0
[21622.997520] free:19526 slab_reclaimable:23656 slab_unreclaimable:553126
[21622.997520] mapped:50 shmem:1391 pagetables:2892 bounce:0
[21623.001856] Node 0 DMA free:15768kB min:264kB low:328kB high:396kB active_anon:20kB inactive_anon:100kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15632kB mlocked:0kB dirty:0kB writeback:100kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2 all_unreclaimable? no
[21623.004699] lowmem_reserve[]: 0 2643 3878 3878
[21623.005684] Node 0 DMA32 free:51344kB min:45888kB low:57360kB high:68832kB active_anon:153016kB inactive_anon:152172kB active_file:268kB inactive_file:916kB unevictable:340kB isolated(anon):896kB isolated(file):0kB present:2707204kB mlocked:340kB dirty:0kB writeback:149916kB mapped:200kB shmem:44kB slab_reclaimable:3696kB slab_unreclaimable:1524768kB kernel_stack:725000kB pagetables:6976kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[21623.008972] lowmem_reserve[]: 0 0 1234 1234
[21623.010090] Node 0 Normal free:10992kB min:21424kB low:26780kB high:32136kB active_anon:1176kB inactive_anon:5520kB active_file:68kB inactive_file:16kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1264032kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:5520kB slab_reclaimable:90928kB slab_unreclaimable:687736kB kernel_stack:306280kB pagetables:4592kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:154144 all_unreclaimable? yes
[21623.013696] lowmem_reserve[]: 0 0 0 0
[21623.014957] Node 0 DMA: 3*4kB 0*8kB 1*16kB 0*32kB 4*64kB 1*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15772kB
[21623.016244] Node 0 DMA32: 740*4kB 306*8kB 113*16kB 101*32kB 219*64kB 104*128kB 39*256kB 4*512kB 1*1024kB 0*2048kB 0*4096kB = 50832kB
[21623.017559] Node 0 Normal: 1356*4kB 382*8kB 5*16kB 0*32kB 2*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 10992kB
[21623.018880] 41885 total pagecache pages
[21623.020193] 39819 pages in swap cache
[21623.021507] Swap cache stats: add 51603753, delete 51563934, find 307374090/308668676
[21623.022846] Free swap = 5162300kB
[21623.024181] Total swap = 6029308kB
[21623.035243] 1041904 pages RAM
[21623.036660] 70330 pages reserved
[21623.038082] 9037 pages shared
[21623.039509] 814628 pages non-shared
[21623.040899] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[21623.042271] [ 335] 0 335 7906 1 2 0 0 systemd-journal
[21623.043645] [ 341] 0 341 6590 1 0 -17 -1000 udevd
[21623.045019] [ 466] 0 466 86847 2 6 0 0 NetworkManager
[21623.046421] [ 470] 0 470 4866 3 0 0 0 smartd
[21623.047891] [ 476] 0 476 29599 10 2 0 0 crond
[21623.049313] [ 487] 0 487 7068 1 7 0 0 systemd-logind
[21623.050768] [ 488] 38 488 8301 10 6 0 0 ntpd
[21623.052189] [ 501] 81 501 5435 1 7 -13 -900 dbus-daemon
[21623.053606] [ 504] 0 504 47106 1 2 -13 -900 polkitd
[21623.054998] [ 507] 0 507 27239 1 7 0 0 agetty
[21623.056575] [ 509] 0 509 17260 1 3 -13 -900 modem-manager
[21623.058055] [ 520] 0 520 4790 2 4 0 0 rpcbind
[21623.059532] [ 530] 0 530 19403 1 5 -17 -1000 sshd
[21623.060988] [ 554] 0 554 60782 1 0 0 0 rsyslogd
[21623.062421] [ 562] 29 562 5883 1 4 0 0 rpc.statd
[21623.063957] [ 570] 0 570 6361 1 1 0 0 rpc.idmapd
[21623.065440] [ 571] 0 571 5993 1 3 0 0 rpc.mountd
[21623.066911] [ 588] 0 588 22206 1 2 0 0 dhclient
[21623.068409] [ 613] 0 613 21273 26 7 0 0 sendmail
[21623.069905] [ 633] 51 633 20167 10 1 0 0 sendmail
[21623.071427] [ 736] 1000 736 6212 139 7 0 0 tmux
[21623.072933] [ 737] 1000 737 29011 1 4 0 0 bash
[21623.074461] [ 811] 1000 811 28022 2 3 0 0 test-random.sh
[21623.075977] [ 818] 1000 818 4295 7 3 0 0 trinity
[21623.077512] [ 820] 1000 820 4311 16 5 0 0 trinity
[21623.079038] [ 821] 1000 821 4309 11 6 0 0 trinity
[21623.080572] [ 822] 1000 822 4317 9 1 0 0 trinity
[21623.082114] [ 823] 1000 823 4301 9 5 0 0 trinity
[21623.083689] [ 825] 1000 825 4229 8 4 0 0 trinity
[21623.085225] [ 826] 1000 826 4335 14 4 0 0 trinity
[21623.086784] [ 827] 1000 827 4403 13 6 0 0 trinity
[21623.088311] [ 828] 1000 828 4343 10 5 0 0 trinity
[21623.089849] [ 829] 1000 829 4287 16 6 0 0 trinity
[21623.091393] [ 830] 1000 830 4319 9 3 0 0 trinity
[21623.092927] [ 831] 1000 831 4321 7 5 0 0 trinity
[21623.094467] [ 832] 1000 832 4327 15 2 0 0 trinity
[21623.096003] [ 833] 1000 833 4389 14 4 0 0 trinity
[21623.097496] [ 889] 0 889 6589 1 1 -17 -1000 udevd
[21623.098983] [ 899] 0 899 6589 1 4 -17 -1000 udevd
[21623.100424] [23414] 1000 23414 4309 14 2 0 0 trinity-watchdo
[21623.101840] [23752] 1000 23752 4319 9 4 0 0 trinity-watchdo
[21623.103213] [23753] 1000 23753 4343 17 4 0 0 trinity-watchdo
[21623.104583] [23978] 1000 23978 4327 18 0 0 0 trinity-watchdo
[21623.105902] [24392] 1000 24392 4229 10 3 0 0 trinity-watchdo
[21623.107162] [24408] 1000 24408 4389 20 3 0 0 trinity-watchdo
[21623.108408] [24593] 1000 24593 4403 13 3 0 0 trinity-watchdo
[21623.109607] [24656] 1000 24656 4321 10 5 0 0 trinity-watchdo
[21623.110744] [24686] 1000 24686 4335 16 1 0 0 trinity-watchdo
[21623.111838] [25172] 1000 25172 4287 18 4 0 0 trinity-watchdo
[21623.112893] [25229] 1000 25229 4317 17 2 0 0 trinity-watchdo
[21623.113837] [25338] 1000 25338 4301 14 2 0 0 trinity-watchdo
[21623.114758] [27690] 1000 27690 4311 24 6 0 0 trinity-watchdo
[21623.115728] [29047] 1000 29047 4295 12 7 0 0 trinity-watchdo
[21623.116597] [ 7092] 1000 7092 1051124 31660 3 0 0 trinity-child3
[21623.117430] [ 8230] 1000 8230 5070 50 5 0 0 trinity-child5
[21623.118269] [ 8510] 1000 8510 8696 119 5 0 0 trinity-child5
[21623.119088] [ 8572] 1000 8572 4758 30 0 0 0 trinity-child0
[21623.119887] [ 9084] 1000 9084 4998 36 0 0 0 trinity-child0
[21623.120687] [ 9138] 1000 9138 4343 48 0 0 0 trinity-child0
[21623.121479] [ 9518] 1000 9518 4317 45 6 0 0 trinity-child6
[21623.122280] [ 9563] 1000 9563 4857 216 0 0 0 trinity-child0
[21623.123077] [ 9626] 1000 9626 4317 40 1 0 0 trinity-child1
[21623.123875] [ 9700] 1000 9700 4317 27 2 0 0 trinity-child2
[21623.124658] [ 9967] 1000 9967 4747 40 4 0 0 trinity-child4
[21623.125422] [10011] 1000 10011 4317 36 7 0 0 trinity-child7
[21623.126182] [10064] 1000 10064 4311 80 7 0 0 trinity-child7
[21623.126928] [10072] 1000 10072 4623 346 6 0 0 trinity-child6
[21623.127668] [10077] 1000 10077 4403 231 5 0 0 trinity-child5
[21623.128402] [10079] 1000 10079 4834 241 7 0 0 trinity-child7
[21623.129137] [10084] 1000 10084 4733 510 4 0 0 trinity-child4
[21623.129869] [10085] 1000 10085 4403 20 3 0 0 trinity-child3
[21623.130594] [10086] 1000 10086 4858 27 3 0 0 trinity-child3
[21623.131310] [10099] 1000 10099 4321 27 5 0 0 trinity-child5
[21623.132022] [10100] 1000 10100 4387 26 7 0 0 trinity-child7
[21623.132733] [10102] 1000 10102 5520 27 2 0 0 trinity-child2
[21623.133432] [10117] 1000 10117 532719 293 5 0 0 trinity-child5
[21623.134136] [10143] 1000 10143 4319 25 4 0 0 trinity-child4
[21623.134834] [10144] 1000 10144 4287 121 6 0 0 trinity-child6
[21623.135531] [10178] 1000 10178 4295 25 2 0 0 trinity-child2
[21623.136228] [10224] 1000 10224 4319 24 6 0 0 trinity-child6
[21623.136926] [10235] 1000 10235 4319 24 3 0 0 trinity-child3
[21623.137617] [10243] 1000 10243 4499 25 5 0 0 trinity-child5
[21623.138298] [10250] 1000 10250 4229 27 6 0 0 trinity-child6
[21623.138981] [10251] 1000 10251 4389 109 0 0 0 trinity-child0
[21623.139658] [10254] 1000 10254 4319 27 1 0 0 trinity-child1
[21623.140336] [10369] 1000 10369 5010 45 3 0 0 trinity-child3
[21623.141007] [10389] 1000 10389 4371 76 2 0 0 trinity-child2
[21623.141679] [10458] 1000 10458 1052885 46 6 0 0 trinity-child6
[21623.142343] [10460] 1000 10460 5431 39 2 0 0 trinity-child2
[21623.143016] [10461] 1000 10461 4804 38 7 0 0 trinity-child7
[21623.143689] [10462] 1000 10462 4309 30 3 0 0 trinity-child3
[21623.144358] [10475] 1000 10475 4229 23 1 0 0 trinity-child1
[21623.145033] [10476] 1000 10476 4499 284 0 0 0 trinity-child0
[21623.145711] [10478] 1000 10478 4301 111 3 0 0 trinity-child3
[21623.146382] [10482] 1000 10482 4337 86 7 0 0 trinity-child7
[21623.147064] [10485] 1000 10485 4862 116 2 0 0 trinity-child2
[21623.147743] [10499] 1000 10499 4424 82 5 0 0 trinity-child5
[21623.148423] [10526] 1000 10526 5136 36 1 0 0 trinity-child1
[21623.149111] [10538] 1000 10538 4403 137 1 0 0 trinity-child1
[21623.149796] [10553] 1000 10553 4951 355 1 0 0 trinity-child1
[21623.150483] [10554] 1000 10554 4472 25 2 0 0 trinity-child2
[21623.151155] [10555] 1000 10555 4321 40 0 0 0 trinity-child0
[21623.151840] [10650] 1000 10650 5152 208 1 0 0 trinity-child1
[21623.152515] [10685] 1000 10685 4429 366 4 0 0 trinity-child4
[21623.153180] [10693] 1000 10693 4301 68 6 0 0 trinity-child6
[21623.153855] [10694] 1000 10694 4522 64 6 0 0 trinity-child6
[21623.154532] [10712] 1000 10712 4723 656 3 0 0 trinity-child3
[21623.155202] [10713] 1000 10713 4335 219 5 0 0 trinity-child5
[21623.155877] [10714] 1000 10714 4335 251 4 0 0 trinity-child4
[21623.156555] [10715] 1000 10715 4813 682 7 0 0 trinity-child7
[21623.157226] [10761] 1000 10761 4335 234 2 0 0 trinity-child2
[21623.157904] [10777] 1000 10777 4336 279 6 0 0 trinity-child6
[21623.158580] [10788] 1000 10788 4311 48 0 0 0 trinity-child0
[21623.159250] [10789] 1000 10789 4575 466 2 0 0 trinity-child2
[21623.159927] [10790] 1000 10790 4311 60 4 0 0 trinity-child4
[21623.160607] [10791] 1000 10791 4344 276 6 0 0 trinity-child6
[21623.161279] [10792] 1000 10792 4617 574 3 0 0 trinity-child3
[21623.161950] [10793] 1000 10793 4311 55 5 0 0 trinity-child5
[21623.162618] [10794] 1000 10794 4311 52 1 0 0 trinity-child1
[21623.163277] [10795] 1000 10795 5094 974 1 0 0 trinity-child1
[21623.163939] [10802] 1000 10802 4335 82 0 0 0 trinity-child0
[21623.164605] [10813] 1000 10813 4327 113 1 0 0 trinity-child1
[21623.165271] [10846] 1000 10846 4521 273 7 0 0 trinity-child7
[21623.165938] [10856] 1000 10856 4422 217 4 0 0 trinity-child4
[21623.166605] [10880] 1000 10880 4287 71 2 0 0 trinity-child2
[21623.167266] [10884] 1000 10884 4287 53 0 0 0 trinity-child0
[21623.167936] [10894] 1000 10894 4308 53 0 0 0 trinity-child0
[21623.168604] [10897] 1000 10897 4287 77 4 0 0 trinity-child4
[21623.169257] [10905] 1000 10905 4309 52 4 0 0 trinity-child4
[21623.169920] [10907] 1000 10907 4403 300 6 0 0 trinity-child6
[21623.170581] [10924] 1000 10924 4287 69 7 0 0 trinity-child7
[21623.171242] [10931] 1000 10931 4389 143 3 0 0 trinity-child3
[21623.171910] [10938] 0 10938 3182 53 7 0 0 modprobe
[21623.172576] [10940] 0 10940 3726 24 1 0 0 modprobe
[21623.173239] [10942] 0 10942 3182 52 4 0 0 modprobe
[21623.173907] [10944] 0 10944 3726 43 3 0 0 modprobe
[21623.174577] [10945] 0 10945 3726 34 4 0 0 modprobe
[21623.175237] [10946] 1000 10946 4287 19 2 0 0 trinity
[21623.175906] Out of memory: Kill process 588 (dhclient) score 1732579 or sacrifice child
[21623.176588] Killed process 588 (dhclient) total-vm:88824kB, anon-rss:0kB, file-rss:4kB



wtf ?

we picked this..

[21623.066911] [ 588] 0 588 22206 1 2 0 0 dhclient

over say..

[21623.116597] [ 7092] 1000 7092 1051124 31660 3 0 0 trinity-child3

What went wrong here ?

And why does that score look so.. weird.


If the death spiral of endless oom killing sounds interesting, I have about 300MB of logfile
so far which I can upload somewhere.

Dave


2012-06-04 23:31:01

by David Rientjes

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Mon, 4 Jun 2012, Dave Jones wrote:

> we picked this..
>
> [21623.066911] [ 588] 0 588 22206 1 2 0 0 dhclient
>
> over say..
>
> [21623.116597] [ 7092] 1000 7092 1051124 31660 3 0 0 trinity-child3
>
> What went wrong here ?
>
> And why does that score look so.. weird.
>

It sounds like it's because pid 588 has uid=0 and the adjustment for root
processes is causing an overflow. I assume this fixes it?
---
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -183,7 +183,7 @@ static bool oom_unkillable_task(struct task_struct *p,
unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
const nodemask_t *nodemask, unsigned long totalpages)
{
- unsigned long points;
+ long points;

if (oom_unkillable_task(p, memcg, nodemask))
return 0;
@@ -223,7 +223,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
* Never return 0 for an eligible task regardless of the root bonus and
* oom_score_adj (oom_score_adj can't be OOM_SCORE_ADJ_MIN here).
*/
- return points ? points : 1;
+ return points > 0 ? points : 1;
}

/*

2012-06-05 17:45:03

by Dave Jones

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Mon, Jun 04, 2012 at 04:30:57PM -0700, David Rientjes wrote:
> On Mon, 4 Jun 2012, Dave Jones wrote:
>
> > we picked this..
> >
> > [21623.066911] [ 588] 0 588 22206 1 2 0 0 dhclient
> >
> > over say..
> >
> > [21623.116597] [ 7092] 1000 7092 1051124 31660 3 0 0 trinity-child3
> >
> > What went wrong here ?
> >
> > And why does that score look so.. weird.
> >
>
> It sounds like it's because pid 588 has uid=0 and the adjustment for root
> processes is causing an overflow. I assume this fixes it?

Still doesn't seem right..

eg..

[42309.542776] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
..
[42309.553933] [ 500] 81 500 5435 1 4 -13 -900 dbus-daemon
..
[42309.597531] [ 9054] 1000 9054 528677 14540 3 0 0 trinity-child3
..

[42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
[42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB

and a slew of similar 'wrong process' death spiral kills follows..


Dave

2012-06-05 18:52:47

by Dave Jones

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Tue, Jun 05, 2012 at 01:44:54PM -0400, Dave Jones wrote:
> On Mon, Jun 04, 2012 at 04:30:57PM -0700, David Rientjes wrote:
> > On Mon, 4 Jun 2012, Dave Jones wrote:
> >
> > > we picked this..
> > >
> > > [21623.066911] [ 588] 0 588 22206 1 2 0 0 dhclient
> > >
> > > over say..
> > >
> > > [21623.116597] [ 7092] 1000 7092 1051124 31660 3 0 0 trinity-child3
> > >
> > > What went wrong here ?
> > >
> > > And why does that score look so.. weird.
> > >
> >
> > It sounds like it's because pid 588 has uid=0 and the adjustment for root
> > processes is causing an overflow. I assume this fixes it?
>
> Still doesn't seem right..
>
> eg..
>
> [42309.542776] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> ..
> [42309.553933] [ 500] 81 500 5435 1 4 -13 -900 dbus-daemon
> ..
> [42309.597531] [ 9054] 1000 9054 528677 14540 3 0 0 trinity-child3
> ..
>
> [42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
> [42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB
>
> and a slew of similar 'wrong process' death spiral kills follows..

So after manually killing all the greedy processes, and getting the box to stop oom-killing
random things, it settled down. But I noticed something odd, that I think I also saw a few
weeks ago..

# free
total used free shared buffers cached
Mem: 3886296 3666924 219372 0 2904 20008
-/+ buffers/cache: 3644012 242284
Swap: 6029308 14488 6014820

What's using up that memory ?

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
142524 142420 99% 9.67K 47510 3 1520320K task_struct
142560 142417 99% 1.75K 7920 18 253440K signal_cache
142428 142302 99% 1.19K 5478 26 175296K task_xstate
306064 289292 94% 0.36K 6956 44 111296K debug_objects_cache
143488 143306 99% 0.50K 4484 32 71744K cred_jar
142560 142421 99% 0.50K 4455 32 71280K task_delay_info
150753 145021 96% 0.45K 4308 35 68928K kmalloc-128

Why so many task_structs ? There's only 128 processes running, and most of them
are kernel threads.

/sys/kernel/slab/task_struct/alloc_calls shows..

142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7

I get the impression that the oom-killer hasn't cleaned up properly after killing some of
those forked processes.

any thoughts ?

Dave

2012-06-08 19:57:40

by David Rientjes

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Tue, 5 Jun 2012, Dave Jones wrote:

> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 142524 142420 99% 9.67K 47510 3 1520320K task_struct
> 142560 142417 99% 1.75K 7920 18 253440K signal_cache
> 142428 142302 99% 1.19K 5478 26 175296K task_xstate
> 306064 289292 94% 0.36K 6956 44 111296K debug_objects_cache
> 143488 143306 99% 0.50K 4484 32 71744K cred_jar
> 142560 142421 99% 0.50K 4455 32 71280K task_delay_info
> 150753 145021 96% 0.45K 4308 35 68928K kmalloc-128
>
> Why so many task_structs ? There's only 128 processes running, and most of them
> are kernel threads.
>

Do you have CONFIG_OPROFILE enabled?

> /sys/kernel/slab/task_struct/alloc_calls shows..
>
> 142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7
>
> I get the impression that the oom-killer hasn't cleaned up properly after killing some of
> those forked processes.
>
> any thoughts ?
>

If we're leaking task_struct's, meaning that put_task_struct() isn't
actually freeing them when the refcount goes to 0, then it's certainly not
because of the oom killer which only sends a SIGKILL to the selected
process.

Have you tried kmemleak?

2012-06-08 20:03:43

by Dave Jones

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Fri, Jun 08, 2012 at 12:57:36PM -0700, David Rientjes wrote:
> On Tue, 5 Jun 2012, Dave Jones wrote:
>
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > 142524 142420 99% 9.67K 47510 3 1520320K task_struct
> > 142560 142417 99% 1.75K 7920 18 253440K signal_cache
> > 142428 142302 99% 1.19K 5478 26 175296K task_xstate
> > 306064 289292 94% 0.36K 6956 44 111296K debug_objects_cache
> > 143488 143306 99% 0.50K 4484 32 71744K cred_jar
> > 142560 142421 99% 0.50K 4455 32 71280K task_delay_info
> > 150753 145021 96% 0.45K 4308 35 68928K kmalloc-128
> >
> > Why so many task_structs ? There's only 128 processes running, and most of them
> > are kernel threads.
> >
>
> Do you have CONFIG_OPROFILE enabled?

it's modular (though I should just turn it off, I never use it these days), but not loaded.

> > /sys/kernel/slab/task_struct/alloc_calls shows..
> >
> > 142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7
> >
> > I get the impression that the oom-killer hasn't cleaned up properly after killing some of
> > those forked processes.
> >
> > any thoughts ?
> >
>
> If we're leaking task_struct's, meaning that put_task_struct() isn't
> actually freeing them when the refcount goes to 0, then it's certainly not
> because of the oom killer which only sends a SIGKILL to the selected
> process.
>
> Have you tried kmemleak?

I'll give that a shot on Monday. thanks,

Dave

2012-06-08 20:15:55

by David Rientjes

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Tue, 5 Jun 2012, Dave Jones wrote:

> Still doesn't seem right..
>
> eg..
>
> [42309.542776] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> ..
> [42309.553933] [ 500] 81 500 5435 1 4 -13 -900 dbus-daemon
> ..
> [42309.597531] [ 9054] 1000 9054 528677 14540 3 0 0 trinity-child3
> ..
>
> [42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
> [42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB
>
> and a slew of similar 'wrong process' death spiral kills follows..
>

On a system not under oom conditions, i.e. before you start trinity, can
you send the output of

cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
grep RSS /proc/$(pidof dbus-daemon)/status

2012-06-08 20:37:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Fri, 8 Jun 2012, David Rientjes wrote:
> On Tue, 5 Jun 2012, Dave Jones wrote:
>
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > 142524 142420 99% 9.67K 47510 3 1520320K task_struct
> > 142560 142417 99% 1.75K 7920 18 253440K signal_cache
> > 142428 142302 99% 1.19K 5478 26 175296K task_xstate
> > 306064 289292 94% 0.36K 6956 44 111296K debug_objects_cache
> > 143488 143306 99% 0.50K 4484 32 71744K cred_jar
> > 142560 142421 99% 0.50K 4455 32 71280K task_delay_info
> > 150753 145021 96% 0.45K 4308 35 68928K kmalloc-128
> >
> > Why so many task_structs ? There's only 128 processes running, and most of them
> > are kernel threads.
> >
>
> Do you have CONFIG_OPROFILE enabled?
>
> > /sys/kernel/slab/task_struct/alloc_calls shows..
> >
> > 142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7
> >
> > I get the impression that the oom-killer hasn't cleaned up properly after killing some of
> > those forked processes.
> >
> > any thoughts ?
> >
>
> If we're leaking task_struct's, meaning that put_task_struct() isn't
> actually freeing them when the refcount goes to 0, then it's certainly not
> because of the oom killer which only sends a SIGKILL to the selected
> process.

I rather suspect, that this is a asymetry between get_ and
put_task_struct and refcount just doesn't go to zero.

Thanks,

tglx

2012-06-08 21:03:39

by Dave Jones

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Fri, Jun 08, 2012 at 01:15:50PM -0700, David Rientjes wrote:
> On Tue, 5 Jun 2012, Dave Jones wrote:
>
> > Still doesn't seem right..
> >
> > eg..
> >
> > [42309.542776] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> > ..
> > [42309.553933] [ 500] 81 500 5435 1 4 -13 -900 dbus-daemon
> > ..
> > [42309.597531] [ 9054] 1000 9054 528677 14540 3 0 0 trinity-child3
> > ..
> >
> > [42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
> > [42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB
> >
> > and a slew of similar 'wrong process' death spiral kills follows..
> >
>
> On a system not under oom conditions, i.e. before you start trinity, can
> you send the output of
>
> cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
> grep RSS /proc/$(pidof dbus-daemon)/status

# cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
-900
7441500919753
# grep RSS /proc/$(pidof dbus-daemon)/status
VmRSS: 1660 kB



Dave

2012-06-10 02:15:48

by David Rientjes

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Fri, 8 Jun 2012, Thomas Gleixner wrote:

> > If we're leaking task_struct's, meaning that put_task_struct() isn't
> > actually freeing them when the refcount goes to 0, then it's certainly not
> > because of the oom killer which only sends a SIGKILL to the selected
> > process.
>
> I rather suspect, that this is a asymetry between get_ and
> put_task_struct and refcount just doesn't go to zero.
>

We found an actual task_struct leak within put_task_struct() because of
the free task notifiers during the 3.4 -rc cycle, so if kmemleak doesn't
show anything then would it be possible to bisect the problem Dave?

2012-06-10 02:21:06

by David Rientjes

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Fri, 8 Jun 2012, Dave Jones wrote:

> > On a system not under oom conditions, i.e. before you start trinity, can
> > you send the output of
> >
> > cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
> > grep RSS /proc/$(pidof dbus-daemon)/status
>
> # cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
> -900
> 7441500919753
> # grep RSS /proc/$(pidof dbus-daemon)/status
> VmRSS: 1660 kB

I'm suspecting you don't have my patch that changes the type of the
automatic variable in oom_badness() to signed. Could you retry this with
that patch or pull 3.5-rc2 which already includes it?

2012-06-10 03:21:51

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: oomkillers gone wild.

(6/9/12 10:21 PM), David Rientjes wrote:
> On Fri, 8 Jun 2012, Dave Jones wrote:
>
>> > On a system not under oom conditions, i.e. before you start trinity, can
>> > you send the output of
>> >
>> > cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
>> > grep RSS /proc/$(pidof dbus-daemon)/status
>>
>> # cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
>> -900
>> 7441500919753
>> # grep RSS /proc/$(pidof dbus-daemon)/status
>> VmRSS: 1660 kB
>
> I'm suspecting you don't have my patch that changes the type of the
> automatic variable in oom_badness() to signed. Could you retry this with
> that patch or pull 3.5-rc2 which already includes it?

Yes. Dave (Jones), As far as parsed your log, you are using x86_64, right?
As far as my testing, current linus tree works fine at least normal case.
please respin.

2012-06-10 20:11:07

by Dave Jones

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Sat, Jun 09, 2012 at 11:21:47PM -0400, KOSAKI Motohiro wrote:
> (6/9/12 10:21 PM), David Rientjes wrote:
> > On Fri, 8 Jun 2012, Dave Jones wrote:
> >
> >> > On a system not under oom conditions, i.e. before you start trinity, can
> >> > you send the output of
> >> >
> >> > cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
> >> > grep RSS /proc/$(pidof dbus-daemon)/status
> >>
> >> # cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
> >> -900
> >> 7441500919753
> >> # grep RSS /proc/$(pidof dbus-daemon)/status
> >> VmRSS: 1660 kB
> >
> > I'm suspecting you don't have my patch that changes the type of the
> > automatic variable in oom_badness() to signed. Could you retry this with
> > that patch or pull 3.5-rc2 which already includes it?

that was with the unsigned long -> long patch.

> Yes. Dave (Jones), As far as parsed your log, you are using x86_64, right?

yes.

> As far as my testing, current linus tree works fine at least normal case.
> please respin.

To double check, here it is in rc2 (which has that patch)..

$ uname -r
3.5.0-rc2+
$ cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
-900
7441500919753
$ grep RSS /proc/$(pidof dbus-daemon)/status
VmRSS: 1604 kB

2012-06-10 23:52:54

by David Rientjes

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Sun, 10 Jun 2012, Dave Jones wrote:

> To double check, here it is in rc2 (which has that patch)..
>
> $ uname -r
> 3.5.0-rc2+
> $ cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
> -900
> 7441500919753
> $ grep RSS /proc/$(pidof dbus-daemon)/status
> VmRSS: 1604 kB

Eek, yes, that's definitely wrong. The following should fix it.
---
mm/oom_kill.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -184,6 +184,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
const nodemask_t *nodemask, unsigned long totalpages)
{
long points;
+ long adj;

if (oom_unkillable_task(p, memcg, nodemask))
return 0;
@@ -192,7 +193,8 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
if (!p)
return 0;

- if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
+ adj = p->signal->oom_score_adj;
+ if (adj == OOM_SCORE_ADJ_MIN) {
task_unlock(p);
return 0;
}
@@ -210,14 +212,11 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
* implementation used by LSMs.
*/
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
- points -= 30 * totalpages / 1000;
+ adj -= 30;

- /*
- * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
- * either completely disable oom killing or always prefer a certain
- * task.
- */
- points += p->signal->oom_score_adj * totalpages / 1000;
+ /* Normalize to oom_score_adj units */
+ adj *= totalpages / 1000;
+ points += adj;

/*
* Never return 0 for an eligible task regardless of the root bonus and

2012-06-11 00:46:10

by Dave Jones

[permalink] [raw]
Subject: Re: oomkillers gone wild.

On Sun, Jun 10, 2012 at 04:52:50PM -0700, David Rientjes wrote:
> On Sun, 10 Jun 2012, Dave Jones wrote:
>
> > To double check, here it is in rc2 (which has that patch)..
> >
> > $ uname -r
> > 3.5.0-rc2+
> > $ cat /proc/$(pidof dbus-daemon)/oom_score{_adj,}
> > -900
> > 7441500919753
> > $ grep RSS /proc/$(pidof dbus-daemon)/status
> > VmRSS: 1604 kB
>
> Eek, yes, that's definitely wrong. The following should fix it.

now prints..

-900
0

which I assume is much better ;-)

Dave

2012-06-11 09:11:56

by David Rientjes

[permalink] [raw]
Subject: [patch 3.5-rc2] mm, oom: fix and cleanup oom score calculations

The divide in p->signal->oom_score_adj * totalpages / 1000 within
oom_badness() was causing an overflow of the signed long data type.

This adds both the root bias and p->signal->oom_score_adj before doing the
normalization which fixes the issue and also cleans up the calculation.

Tested-by: Dave Jones <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---
mm/oom_kill.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -184,6 +184,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
const nodemask_t *nodemask, unsigned long totalpages)
{
long points;
+ long adj;

if (oom_unkillable_task(p, memcg, nodemask))
return 0;
@@ -192,7 +193,8 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
if (!p)
return 0;

- if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
+ adj = p->signal->oom_score_adj;
+ if (adj == OOM_SCORE_ADJ_MIN) {
task_unlock(p);
return 0;
}
@@ -210,14 +212,11 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
* implementation used by LSMs.
*/
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
- points -= 30 * totalpages / 1000;
+ adj -= 30;

- /*
- * /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may
- * either completely disable oom killing or always prefer a certain
- * task.
- */
- points += p->signal->oom_score_adj * totalpages / 1000;
+ /* Normalize to oom_score_adj units */
+ adj *= totalpages / 1000;
+ points += adj;

/*
* Never return 0 for an eligible task regardless of the root bonus and

2012-06-11 19:14:10

by Dave Jones

[permalink] [raw]
Subject: Re: [patch 3.5-rc2] mm, oom: fix and cleanup oom score calculations

On Mon, Jun 11, 2012 at 02:11:50AM -0700, David Rientjes wrote:
> The divide in p->signal->oom_score_adj * totalpages / 1000 within
> oom_badness() was causing an overflow of the signed long data type.
>
> This adds both the root bias and p->signal->oom_score_adj before doing the
> normalization which fixes the issue and also cleans up the calculation.
>
> Tested-by: Dave Jones <[email protected]>
> Signed-off-by: David Rientjes <[email protected]>

Even with this, I'm still seeing the wrong processes getting killed..


[17384.172181] trinity-child4 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
[17384.172435] trinity-child4 cpuset=/ mems_allowed=0
[17384.172463] Pid: 32149, comm: trinity-child4 Not tainted 3.5.0-rc2+ #86
[17384.172495] Call Trace:
[17384.172515] [<ffffffff8164e2f5>] ? _raw_spin_unlock+0x55/0x60
[17384.172547] [<ffffffff81144014>] dump_header+0x84/0x350
[17384.172576] [<ffffffff810b7f35>] ? trace_hardirqs_on_caller+0x115/0x1a0
[17384.172610] [<ffffffff8164e262>] ? _raw_spin_unlock_irqrestore+0x42/0x80
[17384.172646] [<ffffffff8131face>] ? ___ratelimit+0x9e/0x130
[17384.172676] [<ffffffff8114486e>] oom_kill_process+0x1ee/0x340
[17384.172707] [<ffffffff81144ea9>] out_of_memory+0x249/0x410
[17384.172737] [<ffffffff8114aded>] __alloc_pages_nodemask+0xa6d/0xab0
[17384.172770] [<ffffffff8118a9e6>] alloc_pages_vma+0xb6/0x190
[17384.172802] [<ffffffff81169c0a>] do_wp_page+0xca/0x790
[17384.172830] [<ffffffff8164e2d5>] ? _raw_spin_unlock+0x35/0x60
[17384.172861] [<ffffffff8116becd>] handle_pte_fault+0x73d/0x9d0
[17384.172891] [<ffffffff810b26b0>] ? lock_release_holdtime.part.24+0x90/0x140
[17384.172928] [<ffffffff8116c96f>] handle_mm_fault+0x21f/0x2e0
[17384.172960] [<ffffffff8165158b>] do_page_fault+0x13b/0x4b0
[17384.172990] [<ffffffff81086f91>] ? get_parent_ip+0x11/0x50
[17384.173019] [<ffffffff81086f91>] ? get_parent_ip+0x11/0x50
[17384.173049] [<ffffffff81651c59>] ? sub_preempt_count+0x79/0xd0
[17384.173080] [<ffffffff8164e903>] ? error_sti+0x5/0x6
[17384.173991] [<ffffffff813271bd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[17384.174894] [<ffffffff8164e70f>] page_fault+0x1f/0x30
[17384.175830] [<ffffffff810b26b0>] ? lock_release_holdtime.part.24+0x90/0x140
[17384.176751] [<ffffffff810b1fde>] ? put_lock_stats.isra.23+0xe/0x40
[17384.177643] [<ffffffff8132705d>] ? __put_user_4+0x1d/0x30
[17384.178517] [<ffffffff8106611c>] ? sys_getresuid+0x7c/0xd0
[17384.179383] [<ffffffff81655c52>] system_call_fastpath+0x16/0x1b
[17384.180238] Mem-Info:
[17384.180996] Node 0 DMA per-cpu:
[17384.181652] CPU 0: hi: 0, btch: 1 usd: 0
[17384.182306] CPU 1: hi: 0, btch: 1 usd: 0
[17384.182944] CPU 2: hi: 0, btch: 1 usd: 0
[17384.183567] CPU 3: hi: 0, btch: 1 usd: 0
[17384.184177] CPU 4: hi: 0, btch: 1 usd: 0
[17384.184772] CPU 5: hi: 0, btch: 1 usd: 0
[17384.185442] CPU 6: hi: 0, btch: 1 usd: 0
[17384.186020] CPU 7: hi: 0, btch: 1 usd: 0
[17384.186591] Node 0 DMA32 per-cpu:
[17384.187158] CPU 0: hi: 186, btch: 31 usd: 1
[17384.187723] CPU 1: hi: 186, btch: 31 usd: 4
[17384.188271] CPU 2: hi: 186, btch: 31 usd: 35
[17384.188809] CPU 3: hi: 186, btch: 31 usd: 47
[17384.189338] CPU 4: hi: 186, btch: 31 usd: 21
[17384.189857] CPU 5: hi: 186, btch: 31 usd: 85
[17384.190367] CPU 6: hi: 186, btch: 31 usd: 134
[17384.190854] CPU 7: hi: 186, btch: 31 usd: 82
[17384.191328] Node 0 Normal per-cpu:
[17384.191795] CPU 0: hi: 186, btch: 31 usd: 5
[17384.192260] CPU 1: hi: 186, btch: 31 usd: 0
[17384.192717] CPU 2: hi: 186, btch: 31 usd: 35
[17384.193169] CPU 3: hi: 186, btch: 31 usd: 43
[17384.193611] CPU 4: hi: 186, btch: 31 usd: 46
[17384.194045] CPU 5: hi: 186, btch: 31 usd: 82
[17384.194472] CPU 6: hi: 186, btch: 31 usd: 31
[17384.194892] CPU 7: hi: 186, btch: 31 usd: 36
[17384.195345] active_anon:20877 inactive_anon:20816 isolated_anon:542
[17384.195345] active_file:142 inactive_file:490 isolated_file:0
[17384.195345] unevictable:97 dirty:0 writeback:20142 unstable:0
[17384.195345] free:22184 slab_reclaimable:29060 slab_unreclaimable:581537
[17384.195345] mapped:164 shmem:888 pagetables:3103 bounce:0
[17384.197465] Node 0 DMA free:15776kB min:264kB low:328kB high:396kB active_anon:0kB inactive_anon:44kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15632kB mlocked:0kB dirty:0kB writeback:60kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:48kB kernel_stack:8kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2264 all_unreclaimable? yes
[17384.199388] lowmem_reserve[]: 0 2643 3878 3878
[17384.199895] Node 0 DMA32 free:51032kB min:45888kB low:57360kB high:68832kB active_anon:49052kB inactive_anon:49232kB active_file:8kB inactive_file:0kB unevictable:208kB isolated(anon):1504kB isolated(file):208kB present:2707204kB mlocked:208kB dirty:0kB writeback:49584kB mapped:404kB shmem:24kB slab_reclaimable:4304kB slab_unreclaimable:1914140kB kernel_stack:602920kB pagetables:3132kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:234420 all_unreclaimable? yes
[17384.202166] lowmem_reserve[]: 0 0 1234 1234
[17384.202751] Node 0 Normal free:21928kB min:21424kB low:26780kB high:32136kB active_anon:34460kB inactive_anon:33988kB active_file:560kB inactive_file:2140kB unevictable:180kB isolated(anon):664kB isolated(file):0kB present:1264032kB mlocked:180kB dirty:0kB writeback:30924kB mapped:252kB shmem:3528kB slab_reclaimable:111936kB slab_unreclaimable:411960kB kernel_stack:492392kB pagetables:9280kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:142173 all_unreclaimable? yes
[17384.205361] lowmem_reserve[]: 0 0 0 0
[17384.206038] Node 0 DMA: 1*4kB 3*8kB 4*16kB 0*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15772kB
[17384.206747] Node 0 DMA32: 785*4kB 310*8kB 495*16kB 273*32kB 340*64kB 47*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 52100kB
[17384.208180] Node 0 Normal: 240*4kB 2342*8kB 6*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 21968kB
[17384.208952] 25433 total pagecache pages
[17384.209725] 23956 pages in swap cache
[17384.210503] Swap cache stats: add 1989506, delete 1965550, find 7882958/7981228
[17384.211307] Free swap = 5833752kB
[17384.212115] Total swap = 6029308kB
[17384.220271] 1041904 pages RAM
[17384.221089] 70330 pages reserved
[17384.221894] 17795 pages shared
[17384.222698] 806729 pages non-shared
[17384.223507] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[17384.224410] [ 327] 0 327 7866 1 7 0 0 systemd-journal
[17384.225345] [ 334] 0 334 6566 1 2 -17 -1000 udevd
[17384.226282] [ 482] 0 482 86847 1 0 0 0 NetworkManager
[17384.227205] [ 485] 0 485 4866 1 0 0 0 smartd
[17384.228082] [ 487] 0 487 29598 1 0 0 0 crond
[17384.228953] [ 489] 0 489 7068 1 6 0 0 systemd-logind
[17384.229823] [ 490] 38 490 8301 1 5 0 0 ntpd
[17384.230746] [ 503] 81 503 5435 1 4 -13 -900 dbus-daemon
[17384.231678] [ 506] 0 506 47106 1 7 -13 -900 polkitd
[17384.232614] [ 510] 0 510 17260 1 0 -13 -900 modem-manager
[17384.233560] [ 511] 0 511 27239 1 7 0 0 agetty
[17384.234564] [ 522] 0 522 4790 16 3 0 0 rpcbind
[17384.235628] [ 530] 0 530 22206 1 0 0 0 dhclient
[17384.236634] [ 536] 0 536 19403 1 6 -17 -1000 sshd
[17384.237599] [ 578] 29 578 5883 1 2 0 0 rpc.statd
[17384.238561] [ 585] 0 585 5993 1 1 0 0 rpc.mountd
[17384.239522] [ 588] 0 588 5845 1 3 0 0 rpc.idmapd
[17384.240479] [ 598] 0 598 60782 1 5 0 0 rsyslogd
[17384.241427] [ 616] 0 616 21273 38 2 0 0 sendmail
[17384.242358] [ 636] 51 636 20167 1 4 0 0 sendmail
[17384.243267] [ 786] 1000 786 6100 78 7 0 0 tmux
[17384.244158] [ 787] 1000 787 28989 1 4 0 0 bash
[17384.244996] [ 861] 1000 861 28022 2 2 0 0 test-random.sh
[17384.245892] [ 868] 1000 868 4433 19 1 0 0 trinity
[17384.246744] [ 869] 1000 869 4417 5 4 0 0 trinity
[17384.247574] [ 870] 1000 870 4365 18 4 0 0 trinity
[17384.248332] [ 871] 1000 871 4347 1 7 0 0 trinity
[17384.249117] [ 872] 1000 872 4387 19 2 0 0 trinity
[17384.249885] [ 873] 1000 873 4285 19 7 0 0 trinity
[17384.250582] [ 874] 1000 874 4293 14 0 0 0 trinity
[17384.251258] [ 875] 1000 875 4423 20 4 0 0 trinity
[17384.251910] [ 876] 1000 876 4365 16 4 0 0 trinity
[17384.252537] [ 877] 1000 877 4391 19 2 0 0 trinity
[17384.253149] [ 878] 1000 878 4377 10 1 0 0 trinity
[17384.253741] [ 879] 1000 879 4347 9 7 0 0 trinity
[17384.254313] [ 880] 1000 880 4401 10 4 0 0 trinity
[17384.254877] [ 881] 1000 881 4429 18 1 0 0 trinity
[17384.255406] [ 882] 1000 882 4389 7 4 0 0 trinity
[17384.255913] [ 883] 1000 883 4389 17 2 0 0 trinity
[17384.256414] [ 973] 0 973 6566 1 6 -17 -1000 udevd
[17384.256912] [ 974] 0 974 6549 1 3 -17 -1000 udevd
[17384.257399] [ 4895] 1000 4895 4387 30 6 0 0 trinity-watchdo
[17384.257897] [ 4952] 1000 4952 4365 28 4 0 0 trinity-watchdo
[17384.258394] [ 4953] 1000 4953 4433 31 1 0 0 trinity-watchdo
[17384.258890] [ 4954] 1000 4954 4347 23 5 0 0 trinity-watchdo
[17384.259389] [ 5300] 1000 5300 4417 16 1 0 0 trinity-watchdo
[17384.259886] [ 5688] 1000 5688 4389 15 6 0 0 trinity-watchdo
[17384.260373] [ 5868] 1000 5868 4429 30 3 0 0 trinity-watchdo
[17384.260847] [ 5972] 1000 5972 4377 20 2 0 0 trinity-watchdo
[17384.261314] [ 5973] 1000 5973 4389 29 4 0 0 trinity-watchdo
[17384.261775] [ 6125] 1000 6125 4347 19 4 0 0 trinity-watchdo
[17384.262232] [ 6136] 1000 6136 4293 18 2 0 0 trinity-watchdo
[17384.262688] [ 6137] 1000 6137 4401 17 6 0 0 trinity-watchdo
[17384.263143] [ 6138] 1000 6138 4423 29 6 0 0 trinity-watchdo
[17384.263593] [ 6139] 1000 6139 4285 30 4 0 0 trinity-watchdo
[17384.264036] [ 6227] 1000 6227 4365 31 6 0 0 trinity-watchdo
[17384.264476] [ 6288] 1000 6288 4391 30 3 0 0 trinity-watchdo
[17384.264930] [30721] 1000 30721 5239 24 7 0 0 trinity-child7
[17384.265361] [30881] 1000 30881 4897 27 2 0 0 trinity-child2
[17384.265791] [30921] 1000 30921 4389 242 0 0 0 trinity-child0
[17384.266216] [30928] 1000 30928 5479 36 0 0 0 trinity-child0
[17384.266637] [30939] 1000 30939 4864 33 1 0 0 trinity-child1
[17384.267052] [31051] 1000 31051 4365 171 2 0 0 trinity-child2
[17384.267467] [31052] 1000 31052 5849 1345 1 0 0 trinity-child1
[17384.267882] [31103] 1000 31103 6277 1764 6 0 0 trinity-child6
[17384.268297] [31109] 1000 31109 4429 109 7 0 0 trinity-child7
[17384.268709] [31137] 1000 31137 4465 303 0 0 0 trinity-child0
[17384.269124] [31141] 1000 31141 5220 65 3 0 0 trinity-child3
[17384.269535] [31158] 1000 31158 4425 70 3 0 0 trinity-child3
[17384.269948] [31160] 1000 31160 5978 1899 7 0 0 trinity-child7
[17384.270355] [31178] 1000 31178 4429 228 0 0 0 trinity-child0
[17384.270761] [31245] 1000 31245 4391 208 0 0 0 trinity-child0
[17384.271162] [31284] 1000 31284 5089 109 1 0 0 trinity-child1
[17384.271575] [31333] 1000 31333 5201 309 2 0 0 trinity-child2
[17384.271984] [31349] 1000 31349 4781 189 0 0 0 trinity-child0
[17384.272393] [31358] 1000 31358 4743 24 5 0 0 trinity-child5
[17384.272806] [31359] 1000 31359 4347 34 3 0 0 trinity-child3
[17384.273219] [31360] 1000 31360 4644 34 6 0 0 trinity-child6
[17384.273634] [31362] 1000 31362 4347 34 1 0 0 trinity-child1
[17384.274049] [31363] 1000 31363 4347 29 2 0 0 trinity-child2
[17384.274466] [31364] 1000 31364 4347 24 4 0 0 trinity-child4
[17384.274892] [31365] 1000 31365 4347 17 0 0 0 trinity-child0
[17384.275305] [31409] 1000 31409 4347 1 3 0 0 trinity-child3
[17384.275719] [31620] 1000 31620 6661 31 7 0 0 trinity-child7
[17384.276134] [31621] 1000 31621 7161 43 5 0 0 trinity-child5
[17384.276550] [31654] 1000 31654 5576 35 4 0 0 trinity-child4
[17384.276972] [31666] 1000 31666 4377 31 0 0 0 trinity-child0
[17384.277385] [31739] 1000 31739 4347 30 6 0 0 trinity-child6
[17384.277797] [31742] 1000 31742 4816 312 7 0 0 trinity-child7
[17384.278203] [31743] 1000 31743 4865 505 0 0 0 trinity-child0
[17384.278609] [31744] 1000 31744 4864 26 3 0 0 trinity-child3
[17384.279016] [31745] 1000 31745 4863 47 5 0 0 trinity-child5
[17384.279422] [31746] 1000 31746 4863 28 4 0 0 trinity-child4
[17384.279829] [31747] 1000 31747 4898 25 7 0 0 trinity-child7
[17384.280235] [31748] 1000 31748 4901 27 5 0 0 trinity-child5
[17384.280642] [31749] 1000 31749 4293 27 4 0 0 trinity-child4
[17384.281049] [31750] 1000 31750 4524 311 0 0 0 trinity-child0
[17384.281460] [31751] 1000 31751 4389 23 3 0 0 trinity-child3
[17384.281870] [31752] 1000 31752 5479 83 2 0 0 trinity-child2
[17384.282279] [31753] 1000 31753 5913 506 6 0 0 trinity-child6
[17384.282689] [31754] 1000 31754 5908 479 4 0 0 trinity-child4
[17384.283103] [31755] 1000 31755 4951 61 7 0 0 trinity-child7
[17384.283518] [31756] 1000 31756 4389 23 5 0 0 trinity-child5
[17384.283936] [31757] 1000 31757 4389 23 1 0 0 trinity-child1
[17384.284350] [31758] 1000 31758 5128 196 6 0 0 trinity-child6
[17384.284786] [31779] 1000 31779 4326 70 7 0 0 trinity-child7
[17384.285206] [31838] 1000 31838 4464 371 5 0 0 trinity-child5
[17384.285620] [31839] 1000 31839 4543 427 1 0 0 trinity-child1
[17384.286037] [31840] 1000 31840 4431 313 6 0 0 trinity-child6
[17384.286454] [31859] 1000 31859 4541 427 2 0 0 trinity-child2
[17384.286867] [31865] 1000 31865 4327 94 6 0 0 trinity-child6
[17384.287281] [31876] 1000 31876 4374 382 6 0 0 trinity-child6
[17384.287693] [31877] 1000 31877 4365 348 0 0 0 trinity-child0
[17384.288105] [31878] 1000 31878 4365 344 1 0 0 trinity-child1
[17384.288516] [31879] 1000 31879 4365 333 5 0 0 trinity-child5
[17384.288922] [31880] 1000 31880 4374 381 7 0 0 trinity-child7
[17384.289328] [31882] 1000 31882 4364 298 4 0 0 trinity-child4
[17384.289735] [31884] 1000 31884 4364 344 3 0 0 trinity-child3
[17384.290139] [31895] 1000 31895 4431 352 3 0 0 trinity-child3
[17384.290543] [31979] 1000 31979 4389 369 4 0 0 trinity-child4
[17384.290944] [31980] 1000 31980 4425 280 5 0 0 trinity-child5
[17384.291348] [31981] 1000 31981 5216 1164 2 0 0 trinity-child2
[17384.291750] [31983] 1000 31983 5546 1489 1 0 0 trinity-child1
[17384.292154] [31984] 1000 31984 4753 737 5 0 0 trinity-child5
[17384.292557] [31985] 1000 31985 5021 989 1 0 0 trinity-child1
[17384.292962] [31986] 1000 31986 4389 419 7 0 0 trinity-child7
[17384.293369] [31987] 1000 31987 4984 952 3 0 0 trinity-child3
[17384.293778] [31988] 1000 31988 4389 445 6 0 0 trinity-child6
[17384.294190] [31989] 1000 31989 6310 2260 6 0 0 trinity-child6
[17384.294603] [32072] 1000 32072 4648 568 2 0 0 trinity-child2
[17384.295026] [32073] 1000 32073 4285 227 1 0 0 trinity-child1
[17384.295441] [32074] 1000 32074 4648 609 3 0 0 trinity-child3
[17384.295856] [32079] 1000 32079 4429 292 3 0 0 trinity-child3
[17384.296267] [32083] 1000 32083 4429 292 4 0 0 trinity-child4
[17384.296677] [32084] 1000 32084 4549 538 5 0 0 trinity-child5
[17384.297083] [32085] 1000 32085 4648 567 6 0 0 trinity-child6
[17384.297490] [32094] 1000 32094 4429 128 5 0 0 trinity-child5
[17384.297899] [32095] 1000 32095 4429 34 3 0 0 trinity-child2
[17384.298306] [32104] 1000 32104 4465 330 7 0 0 trinity-child7
[17384.298715] [32117] 1000 32117 4648 577 4 0 0 trinity-child4
[17384.299128] [32128] 1000 32128 4458 382 4 0 0 trinity-child4
[17384.299537] [32131] 1000 32131 4433 126 6 0 0 trinity-child6
[17384.299948] [32132] 1000 32132 4433 120 5 0 0 trinity-child5
[17384.300356] [32133] 1000 32133 4432 373 7 0 0 trinity-child7
[17384.300764] [32135] 1000 32135 4433 121 1 0 0 trinity-child1
[17384.301172] [32136] 1000 32136 4432 344 2 0 0 trinity-child2
[17384.301581] [32138] 1000 32138 4387 152 2 0 0 trinity-child2
[17384.301987] [32146] 1000 32146 4387 148 3 0 0 trinity-child3
[17384.302391] [32148] 1000 32148 4433 74 0 0 0 trinity-child0
[17384.302795] [32149] 1000 32149 4433 103 4 0 0 trinity-child4
[17384.303200] [32150] 1000 32150 4433 115 3 0 0 trinity-child3
[17384.303606] [32151] 1000 32151 4285 200 7 0 0 trinity-child7
[17384.304011] [32162] 1000 32162 4365 160 4 0 0 trinity-child4
[17384.304417] [32163] 1000 32163 4387 156 1 0 0 trinity-child1
[17384.304832] [32178] 1000 32178 4387 35 1 0 0 trinity-child6
[17384.305238] [32179] 1000 32179 4391 214 2 0 0 trinity-child2
[17384.305646] [32180] 1000 32180 4387 147 0 0 0 trinity-child0
[17384.306053] [32181] 1000 32181 4387 153 4 0 0 trinity-child4
[17384.306459] [32182] 1000 32182 4423 225 4 0 0 trinity-child4
[17384.306865] [32183] 1000 32183 4885 621 5 0 0 trinity-child5
[17384.307270] [32184] 1000 32184 4885 632 1 0 0 trinity-child1
[17384.307676] [32185] 1000 32185 4423 233 6 0 0 trinity-child6
[17384.308086] [32186] 1000 32186 4423 142 2 0 0 trinity-child2
[17384.308493] [32187] 1000 32187 4423 118 0 0 0 trinity-child0
[17384.308897] [32188] 1000 32188 4423 185 7 0 0 trinity-child7
[17384.309302] [32189] 1000 32189 4423 232 3 0 0 trinity-child3
[17384.309707] [32190] 1000 32190 4387 35 4 0 0 trinity-child5
[17384.310116] Out of memory: Kill process 503 (dbus-daemon) score 511952 or sacrifice child
[17384.310534] Killed process 503 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB