I looked up nothing useful with google,so I'm here for help..
when this happens: I use memcg to limit the memory use of a process,and
when the memcg cgroup was out of memory,
the process was oom-killed however,it cannot really complete the
exiting. here is the some information
OS version: centos6.2 2.6.32.220.7.1
/proc/pid/stack
---------------------------------------------------------------
[<ffffffff810597ca>] __cond_resched+0x2a/0x40
[<ffffffff81121569>] unmap_vmas+0xb49/0xb70
[<ffffffff8112822e>] exit_mmap+0x7e/0x140
[<ffffffff8105b078>] mmput+0x58/0x110
[<ffffffff81061aad>] exit_mm+0x11d/0x160
[<ffffffff81061c9d>] do_exit+0x1ad/0x860
[<ffffffff81062391>] do_group_exit+0x41/0xb0
[<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
[<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
[<ffffffff8100b281>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
/proc/pid/stat
---------------------------------------------------------------
11337 (CF_user_based) R 1 11314 11314 0 -1 4203524 7753602 0 0 0 622 1806
0 0 -2 0 1 0 324381340 0 0 18446744073709551615 0 0 0 0 0 0 0 0 66784 0 0
0 17 3 1 1 0 0 0
/proc/pid/status
Name: CF_user_based
State: R (running)
Tgid: 11337
Pid: 11337
PPid: 1
TracerPid: 0
Uid: 32114 32114 32114 32114
Gid: 32114 32114 32114 32114
Utrace: 0
FDSize: 128
Groups: 32114
Threads: 1
SigQ: 2/2325005
SigPnd: 0000000000000000
ShdPnd: 0000000000004100
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 00000001800104e0
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: ffffffff
Cpus_allowed_list: 0-31
Mems_allowed: 00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 4300
nonvoluntary_ctxt_switches: 77
/var/log/messages
---------------------------------------------------------------
Oct 17 15:22:19 hpc16 kernel: CF_user_based invoked oom-killer:
gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Oct 17 15:22:19 hpc16 kernel: CF_user_based cpuset=/ mems_allowed=0-1
Oct 17 15:22:19 hpc16 kernel: Pid: 3909, comm: CF_user_based Not tainted
2.6.32-2.0.0.1 #4
Oct 17 15:22:19 hpc16 kernel: Call Trace:
Oct 17 15:22:19 hpc16 kernel: [<ffffffff810fd915>] ? dump_header+0x85/0x1a0
Oct 17 15:22:19 hpc16 kernel: [<ffffffff810fde4e>] ?
oom_kill_process+0x25e/0x2a0
Oct 17 15:22:19 hpc16 kernel: [<ffffffff810fdf5e>] ?
select_bad_process+0xce/0x110
Oct 17 15:22:19 hpc16 kernel: [<ffffffff810fe448>] ?
out_of_memory+0x1a8/0x390
Oct 17 15:22:19 hpc16 kernel: [<ffffffff8110cb0a>] ?
__alloc_pages_nodemask+0x73a/0x750
Oct 17 15:22:19 hpc16 kernel: [<ffffffff8114d4f5>] ?
__mem_cgroup_commit_charge+0x45/0x90
Oct 17 15:22:19 hpc16 kernel: [<ffffffff8113d7fa>] ?
alloc_pages_vma+0x9a/0x190
Oct 17 15:22:19 hpc16 kernel: [<ffffffff8112443c>] ?
handle_pte_fault+0x4cc/0xa90
Oct 17 15:22:19 hpc16 kernel: [<ffffffff8113cefb>] ?
alloc_pages_current+0xab/0x110
Oct 17 15:22:19 hpc16 kernel: [<ffffffff8100bbae>] ?
invalidate_interrupt5+0xe/0x20
Oct 17 15:22:19 hpc16 kernel: [<ffffffff81124b2a>] ?
handle_mm_fault+0x12a/0x1b0
Oct 17 15:22:19 hpc16 kernel: [<ffffffff814a6789>] ?
do_page_fault+0x199/0x550
Oct 17 15:22:19 hpc16 kernel: [<ffffffff812500a8>] ?
call_rwsem_wake+0x18/0x30
Oct 17 15:22:19 hpc16 kernel: [<ffffffff8100bbae>] ?
invalidate_interrupt5+0xe/0x20
Oct 17 15:22:19 hpc16 kernel: [<ffffffff814a3965>] ? page_fault+0x25/0x30
Oct 17 15:22:19 hpc16 kernel: Mem-Info:
Oct 17 15:22:19 hpc16 kernel: Node 0 Normal per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 6: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 7: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 8: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 9: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 10: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 11: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 12: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 14: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 15: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 16: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 17: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 18: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 19: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 20: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 21: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 22: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 23: hi: 186, btch: 31 usd: 18
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU 0: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 1: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 2: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 3: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 4: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 5: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 6: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 7: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 8: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 9: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 10: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 11: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 12: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 13: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 14: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 15: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 16: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 17: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 18: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 19: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 20: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 21: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 22: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 23: hi: 0, btch: 1 usd: 0
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA32 per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 6: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 7: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 8: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 9: hi: 186, btch: 31 usd: 55
Oct 17 15:22:19 hpc16 kernel: CPU 10: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 11: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 12: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 14: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 15: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 16: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 17: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 18: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 19: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 20: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 21: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 22: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 23: hi: 186, btch: 31 usd: 2
Oct 17 15:22:19 hpc16 kernel: Node 1 Normal per-cpu:
Oct 17 15:22:19 hpc16 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 6: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 7: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 8: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 9: hi: 186, btch: 31 usd: 55
Oct 17 15:22:19 hpc16 kernel: CPU 10: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 11: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 12: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 13: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 14: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 15: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 16: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 17: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 18: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 19: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 20: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 21: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 22: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: CPU 23: hi: 186, btch: 31 usd: 0
Oct 17 15:22:19 hpc16 kernel: active_anon:71580973 inactive_anon:228137
isolated_anon:0
Oct 17 15:22:19 hpc16 kernel: active_file:509 inactive_file:805
isolated_file:0
Oct 17 15:22:19 hpc16 kernel: unevictable:1882 dirty:0 writeback:0
unstable:0
Oct 17 15:22:19 hpc16 kernel: free:162389 slab_reclaimable:8722
slab_unreclaimable:10370
Oct 17 15:22:19 hpc16 kernel: mapped:681 shmem:48 pagetables:1612154
bounce:0
Oct 17 15:22:19 hpc16 kernel: Node 0 Normal free:32512kB min:32768kB
low:40960kB high:49152kB active_anon:146614460kB inactive_anon:353888kB
active_file:2036kB inactive_file:2380kB unevictable:3348kB
isolated(anon):0kB isolated(file):0kB present:148930560kB mlocked:3348kB
dirty:0kB writeback:0kB mapped:504kB shmem:116kB slab_reclaimable:14076kB
slab_unreclaimable:23092kB kernel_stack:3728kB pagetables:292144kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Oct 17 15:22:19 hpc16 kernel: lowmem_reserve[]: 0 0 0 0
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA free:15916kB min:0kB low:0kB
high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:15360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? yes
Oct 17 15:22:19 hpc16 kernel: lowmem_reserve[]: 0 3243 145401 145401
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA32 free:569668kB min:728kB
low:908kB high:1092kB active_anon:503032kB inactive_anon:2804kB
active_file:0kB inactive_file:124kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:3321540kB mlocked:0kB dirty:0kB writeback:0kB
mapped:0kB shmem:0kB slab_reclaimable:4388kB slab_unreclaimable:528kB
kernel_stack:0kB pagetables:724kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 17 15:22:19 hpc16 kernel: lowmem_reserve[]: 0 0 142157 142157
Oct 17 15:22:19 hpc16 kernel: Node 1 Normal free:31460kB min:32028kB
low:40032kB high:48040kB active_anon:139206456kB inactive_anon:555856kB
active_file:0kB inactive_file:716kB unevictable:4180kB isolated(anon):0kB
isolated(file):0kB present:145569280kB mlocked:4180kB dirty:0kB
writeback:0kB mapped:2220kB shmem:76kB slab_reclaimable:16424kB
slab_unreclaimable:17860kB kernel_stack:1000kB pagetables:6155748kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Oct 17 15:22:19 hpc16 kernel: lowmem_reserve[]: 0 0 0 0
Oct 17 15:22:19 hpc16 kernel: Node 0 Normal: 7383*4kB 0*8kB 1*16kB 2*32kB
1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 33516kB
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA: 1*4kB 1*8kB 2*16kB 2*32kB 1*64kB
1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15916kB
Oct 17 15:22:19 hpc16 kernel: Node 1 DMA32: 1057*4kB 920*8kB 774*16kB
709*32kB 544*64kB 398*128kB 200*256kB 56*512kB 25*1024kB 12*2048kB
75*4096kB = 569668kB
Oct 17 15:22:19 hpc16 kernel: Node 1 Normal: 6885*4kB 11*8kB 0*16kB 0*32kB
0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 31724kB
Oct 17 15:22:19 hpc16 kernel: 1422 total pagecache pages
Oct 17 15:22:19 hpc16 kernel: 0 pages in swap cache
Oct 17 15:22:19 hpc16 kernel: Swap cache stats: add 0, delete 0, find 0/0
Oct 17 15:22:19 hpc16 kernel: Free swap = 0kB
Oct 17 15:22:19 hpc16 kernel: Total swap = 0kB
Oct 17 15:22:19 hpc16 kernel: 75497471 pages RAM
Oct 17 15:22:19 hpc16 kernel: 1093826 pages reserved
Oct 17 15:22:19 hpc16 kernel: 11054 pages shared
Oct 17 15:22:19 hpc16 kernel: 74234502 pages non-shared
Oct 17 15:22:19 hpc16 kernel: [ pid ] uid tgid total_vm rss cpu
oom_adj oom_score_adj name
Oct 17 15:22:19 hpc16 kernel: [ 673] 0 673 2679 187
9 -17 -1000 udevd
Oct 17 15:22:19 hpc16 kernel: [ 1292] 0 1292 2678 174
3 -17 -1000 udevd
Oct 17 15:22:19 hpc16 kernel: [ 1295] 0 1295 2678 169
17 -17 -1000 udevd
Oct 17 15:22:19 hpc16 kernel: [ 1464] 0 1464 16017 357
6 0 0 sshd
Oct 17 15:22:19 hpc16 kernel: [ 1695] 0 1695 1553 142
12 0 0 portreserve
Oct 17 15:22:19 hpc16 kernel: [ 1702] 0 1702 62187 287
0 0 0 rsyslogd
Oct 17 15:22:19 hpc16 kernel: [ 1731] 0 1731 2301 146
15 0 0 irqbalance
Oct 17 15:22:19 hpc16 kernel: [ 1749] 32 1749 4768 206
8 0 0 rpcbind
Oct 17 15:22:19 hpc16 kernel: [ 1769] 29 1769 5800 240
0 0 0 rpc.statd
Oct 17 15:22:19 hpc16 kernel: [ 1828] 0 1828 6859 106
15 0 0 rpc.idmapd
Oct 17 15:22:19 hpc16 kernel: [ 1919] 81 1919 5392 159
13 0 0 dbus-daemon
Oct 17 15:22:19 hpc16 kernel: [ 1943] 0 1943 1033 156
12 0 0 acpid
Oct 17 15:22:19 hpc16 kernel: [ 1952] 68 1952 6343 426
15 0 0 hald
Oct 17 15:22:19 hpc16 kernel: [ 1953] 0 1953 4540 170
0 0 0 hald-runner
Oct 17 15:22:19 hpc16 kernel: [ 1982] 0 1982 5069 152
2 0 0 hald-addon-inpu
Oct 17 15:22:19 hpc16 kernel: [ 1989] 68 1989 4465 190
0 0 0 hald-addon-acpi
Oct 17 15:22:19 hpc16 kernel: [ 2005] 0 2005 49312 1149
13 0 0 snmpd
Oct 17 15:22:19 hpc16 kernel: [ 2013] 38 2013 7552 305
16 0 0 ntpd
Oct 17 15:22:19 hpc16 kernel: [ 2101] 0 2101 19669 422
20 0 0 master
Oct 17 15:22:19 hpc16 kernel: [ 2120] 89 2120 19732 418
16 0 0 qmgr
Oct 17 15:22:19 hpc16 kernel: [ 2133] 0 2133 29710 205
1 0 0 abrtd
Oct 17 15:22:19 hpc16 kernel: [ 2141] 0 2141 2304 145
0 0 0 abrt-dump-oops
Oct 17 15:22:19 hpc16 kernel: [ 2157] 32114 2157 237261 1782
0 0 0 python
Oct 17 15:22:19 hpc16 kernel: [ 2164] 0 2164 29311 291
10 0 0 crond
Oct 17 15:22:19 hpc16 kernel: [ 2185] 0 2185 5373 110
12 0 0 atd
Oct 17 15:22:19 hpc16 kernel: [ 2202] 0 2202 47595 1416
8 0 0 certmaster
Oct 17 15:22:19 hpc16 kernel: [ 2211] 0 2211 82597 4644
9 0 0 funcd
Oct 17 15:22:19 hpc16 kernel: [ 2218] 0 2218 996 89
13 0 0 supervise
Oct 17 15:22:19 hpc16 kernel: [ 2228] 0 2228 27049 203
2 0 0 run
Oct 17 15:22:19 hpc16 kernel: [ 2232] 0 2232 41699 3851
0 0 0 perl
Oct 17 15:22:19 hpc16 kernel: [ 2242] 0 2242 1029 132
19 0 0 mingetty
Oct 17 15:22:19 hpc16 kernel: [ 2244] 0 2244 1029 132
1 0 0 mingetty
Oct 17 15:22:19 hpc16 kernel: [ 2246] 0 2246 1029 132
15 0 0 mingetty
Oct 17 15:22:19 hpc16 kernel: [ 2248] 0 2248 1029 131
16 0 0 mingetty
Oct 17 15:22:19 hpc16 kernel: [ 2250] 0 2250 1029 131
17 0 0 mingetty
Oct 17 15:22:19 hpc16 kernel: [ 2252] 0 2252 1029 132
15 0 0 mingetty
Oct 17 15:22:19 hpc16 kernel: [ 2278] 0 2278 44389 4667
9 0 0 perl
Oct 17 15:22:19 hpc16 kernel: [ 2283] 0 2283 44389 4535
9 0 0 perl
Oct 17 15:22:19 hpc16 kernel: [ 2301] 0 2301 23312 184
15 -17 -1000 auditd
Oct 17 15:22:19 hpc16 kernel: [21038] 0 21038 3976 1883
6 0 0 pbs_mom
Oct 17 15:22:19 hpc16 kernel: [11314] 32114 11314 2704 196
15 0 0 cglimit
Oct 17 15:22:19 hpc16 kernel: [11319] 32114 11319 11089 394
14 0 0 orted
Oct 17 15:22:19 hpc16 kernel: [11337] 32114 11337 842071788 71791393
1 0 0 CF_user_based
Oct 17 15:22:19 hpc16 kernel: [24741] 89 24741 19689 414
16 0 0 pickup
Oct 17 15:22:19 hpc16 kernel: Out of memory: Kill process 11337
(CF_user_based) score 986 or sacrifice child
Oct 17 15:22:19 hpc16 kernel: Killed process 11337, UID 32114,
(CF_user_based) total-vm:3368287152kB, anon-rss:287164596kB, file-rss:976kB
--
ʹ?? Opera ?????Եĵ????ʼ??ͻ?????: http://www.opera.com/mail/
On Wed 17-10-12 18:23:34, gaoqiang wrote:
> I looked up nothing useful with google,so I'm here for help..
>
> when this happens: I use memcg to limit the memory use of a
> process,and when the memcg cgroup was out of memory,
> the process was oom-killed however,it cannot really complete the
> exiting. here is the some information
How many tasks are in the group and what kind of memory do they use?
Is it possible that you were hit by the same issue as described in
79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
> OS version: centos6.2 2.6.32.220.7.1
Your kernel is quite old and you should be probably asking your
distribution to help you out. There were many fixes since 2.6.32.
Are you able to reproduce the same issue with the current vanila kernel?
> /proc/pid/stack
> ---------------------------------------------------------------
>
> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
> [<ffffffff8105b078>] mmput+0x58/0x110
> [<ffffffff81061aad>] exit_mm+0x11d/0x160
> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
> [<ffffffff81062391>] do_group_exit+0x41/0xb0
> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
> [<ffffffff8100b281>] int_signal+0x12/0x17
> [<ffffffffffffffff>] 0xffffffffffffffff
This looks strange because this is just an exit part which shouldn't
deadlock or anything. Is this stack stable? Have you tried to take check
it more times?
--
Michal Hocko
SUSE Labs
I don't know whether the process will exit finally, bug this stack
lasts for hours, which is obviously unnormal.
The situation: we use a command calld "cglimit" to fork-and-exec the
worker process,and the "cglimit" will
set some limitation on the worker with cgroup. for now,we limit the
memory,and we also use cpu cgroup,but with
no limiation,so when the worker is running, the cgroup directory looks
like following:
/cgroup/memory/worker : this directory limit the memory
/cgroup/cpu/worker :with no limit,but worker process is in.
for some reason(some other process we didn't consider), the worker
process invoke global oom-killer,
not cgroup-oom-killer. then the worker process hangs there.
Actually, if we didn't set the worker process into the cpu cgroup,
this will never happens.
On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <[email protected]> wrote:
>
> On Wed 17-10-12 18:23:34, gaoqiang wrote:
> > I looked up nothing useful with google,so I'm here for help..
> >
> > when this happens: I use memcg to limit the memory use of a
> > process,and when the memcg cgroup was out of memory,
> > the process was oom-killed however,it cannot really complete the
> > exiting. here is the some information
>
> How many tasks are in the group and what kind of memory do they use?
> Is it possible that you were hit by the same issue as described in
> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>
> > OS version: centos6.2 2.6.32.220.7.1
>
> Your kernel is quite old and you should be probably asking your
> distribution to help you out. There were many fixes since 2.6.32.
> Are you able to reproduce the same issue with the current vanila kernel?
>
> > /proc/pid/stack
> > ---------------------------------------------------------------
> >
> > [<ffffffff810597ca>] __cond_resched+0x2a/0x40
> > [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
> > [<ffffffff8112822e>] exit_mmap+0x7e/0x140
> > [<ffffffff8105b078>] mmput+0x58/0x110
> > [<ffffffff81061aad>] exit_mm+0x11d/0x160
> > [<ffffffff81061c9d>] do_exit+0x1ad/0x860
> > [<ffffffff81062391>] do_group_exit+0x41/0xb0
> > [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
> > [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
> > [<ffffffff8100b281>] int_signal+0x12/0x17
> > [<ffffffffffffffff>] 0xffffffffffffffff
>
> This looks strange because this is just an exit part which shouldn't
> deadlock or anything. Is this stack stable? Have you tried to take check
> it more times?
>
> --
> Michal Hocko
> SUSE Labs
On Mon, Oct 22, 2012 at 7:46 AM, Qiang Gao <[email protected]> wrote:
> I don't know whether the process will exit finally, bug this stack lasts
> for hours, which is obviously unnormal.
> The situation: we use a command calld "cglimit" to fork-and-exec the worker
> process,and the "cglimit" will
> set some limitation on the worker with cgroup. for now,we limit the
> memory,and we also use cpu cgroup,but with
> no limiation,so when the worker is running, the cgroup directory looks like
> following:
>
> /cgroup/memory/worker : this directory limit the memory
> /cgroup/cpu/worker :with no limit,but worker process is in.
>
> for some reason(some other process we didn't consider), the worker process
> invoke global oom-killer,
> not cgroup-oom-killer. then the worker process hangs there.
>
> Actually, if we didn't set the worker process into the cpu cgroup, this will
> never happens.
>
You said you don't use CPU limits right? can you also send in the
output of /proc/sched_debug. Can you also send in your
/etc/cgconfig.conf? If the OOM is not caused by cgroup memory limit
and the global system is under pressure in 2.6.32, it can trigger an
OOM.
Also
1. Have you turned off swapping (seems like it) right?
2. Do you have a NUMA policy setup for this task?
Can you also share the .config (not sure if any special patches are
being used) in the version you've mentioned.
Balbir
On Mon 22-10-12 10:16:43, Qiang Gao wrote:
> I don't know whether the process will exit finally, bug this stack lasts
> for hours, which is obviously unnormal.
> The situation: we use a command calld "cglimit" to fork-and-exec the
> worker process,and the "cglimit" will
> set some limitation on the worker with cgroup. for now,we limit the
> memory,and we also use cpu cgroup,but with
> no limiation,so when the worker is running, the cgroup directory looks like
> following:
>
> /cgroup/memory/worker : this directory limit the memory
> /cgroup/cpu/worker :with no limit,but worker process is in.
>
> for some reason(some other process we didn't consider), the worker process
> invoke global oom-killer,
Are you sure that this is really global oom? What was the limit for the
group?
> not cgroup-oom-killer. then the worker process hangs there.
>
> Actually, if we didn't set the worker process into the cpu cgroup, this
> will never happens.
Strange and it smells like a misconfiguration. Could you provide the
compllete setting for both controllers?
grep . -r /cgroup/
> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <[email protected]> wrote:
>
> > On Wed 17-10-12 18:23:34, gaoqiang wrote:
> > > I looked up nothing useful with google,so I'm here for help..
> > >
> > > when this happens: I use memcg to limit the memory use of a
> > > process,and when the memcg cgroup was out of memory,
> > > the process was oom-killed however,it cannot really complete the
> > > exiting. here is the some information
> >
> > How many tasks are in the group and what kind of memory do they use?
> > Is it possible that you were hit by the same issue as described in
> > 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
> >
> > > OS version: centos6.2 2.6.32.220.7.1
> >
> > Your kernel is quite old and you should be probably asking your
> > distribution to help you out. There were many fixes since 2.6.32.
> > Are you able to reproduce the same issue with the current vanila kernel?
> >
> > > /proc/pid/stack
> > > ---------------------------------------------------------------
> > >
> > > [<ffffffff810597ca>] __cond_resched+0x2a/0x40
> > > [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
> > > [<ffffffff8112822e>] exit_mmap+0x7e/0x140
> > > [<ffffffff8105b078>] mmput+0x58/0x110
> > > [<ffffffff81061aad>] exit_mm+0x11d/0x160
> > > [<ffffffff81061c9d>] do_exit+0x1ad/0x860
> > > [<ffffffff81062391>] do_group_exit+0x41/0xb0
> > > [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
> > > [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
> > > [<ffffffff8100b281>] int_signal+0x12/0x17
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > This looks strange because this is just an exit part which shouldn't
> > deadlock or anything. Is this stack stable? Have you tried to take check
> > it more times?
> >
> > --
> > Michal Hocko
> > SUSE Labs
> >
--
Michal Hocko
SUSE Labs
information about the system is in the attach file "information.txt"
I can not reproduce it in the upstream 3.6.0 kernel..
On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <[email protected]> wrote:
> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>> I looked up nothing useful with google,so I'm here for help..
>>
>> when this happens: I use memcg to limit the memory use of a
>> process,and when the memcg cgroup was out of memory,
>> the process was oom-killed however,it cannot really complete the
>> exiting. here is the some information
>
> How many tasks are in the group and what kind of memory do they use?
> Is it possible that you were hit by the same issue as described in
> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>
>> OS version: centos6.2 2.6.32.220.7.1
>
> Your kernel is quite old and you should be probably asking your
> distribution to help you out. There were many fixes since 2.6.32.
> Are you able to reproduce the same issue with the current vanila kernel?
>
>> /proc/pid/stack
>> ---------------------------------------------------------------
>>
>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
>> [<ffffffff8105b078>] mmput+0x58/0x110
>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
>> [<ffffffff8100b281>] int_signal+0x12/0x17
>> [<ffffffffffffffff>] 0xffffffffffffffff
>
> This looks strange because this is just an exit part which shouldn't
> deadlock or anything. Is this stack stable? Have you tried to take check
> it more times?
>
> --
> Michal Hocko
> SUSE Labs
On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao <[email protected]> wrote:
> information about the system is in the attach file "information.txt"
>
> I can not reproduce it in the upstream 3.6.0 kernel..
>
> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <[email protected]> wrote:
>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>>> I looked up nothing useful with google,so I'm here for help..
>>>
>>> when this happens: I use memcg to limit the memory use of a
>>> process,and when the memcg cgroup was out of memory,
>>> the process was oom-killed however,it cannot really complete the
>>> exiting. here is the some information
>>
>> How many tasks are in the group and what kind of memory do they use?
>> Is it possible that you were hit by the same issue as described in
>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>
>>> OS version: centos6.2 2.6.32.220.7.1
>>
>> Your kernel is quite old and you should be probably asking your
>> distribution to help you out. There were many fixes since 2.6.32.
>> Are you able to reproduce the same issue with the current vanila kernel?
>>
>>> /proc/pid/stack
>>> ---------------------------------------------------------------
>>>
>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
>>> [<ffffffff8105b078>] mmput+0x58/0x110
>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
>>> [<ffffffff8100b281>] int_signal+0x12/0x17
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> This looks strange because this is just an exit part which shouldn't
>> deadlock or anything. Is this stack stable? Have you tried to take check
>> it more times?
Looking at information.txt, I found something interesting
rt_rq[0]:/1314
.rt_nr_running : 1
.rt_throttled : 1
.rt_time : 0.856656
.rt_runtime : 0.000000
cfs_rq[0]:/1314
.exec_clock : 8738.133429
.MIN_vruntime : 0.000001
.min_vruntime : 8739.371271
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : -9792.255554
.nr_spread_over : 1
.nr_running : 0
.load : 0
.load_avg : 7376.722880
.load_period : 7.203830
.load_contrib : 1023
.load_tg : 1023
.se->exec_start : 282004.715064
.se->vruntime : 18435.664560
.se->sum_exec_runtime : 8738.133429
.se->wait_start : 0.000000
.se->sleep_start : 0.000000
.se->block_start : 0.000000
.se->sleep_max : 0.000000
.se->block_max : 0.000000
.se->exec_max : 77.977054
.se->slice_max : 0.000000
.se->wait_max : 2.664779
.se->wait_sum : 29.970575
.se->wait_count : 102
.se->load.weight : 2
So 1314 is a real time process and
cpu.rt_period_us:
1000000
----------------------
cpu.rt_runtime_us:
0
When did tt move to being a Real Time process (hint: see nr_running
and nr_throttled)?
Balbir
This process was moved to RT-priority queue when global oom-killer
happened to boost the recovery
of the system.. but it wasn't get properily dealt with. I still have
no idea why where the problem is ..
On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh <[email protected]> wrote:
> On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao <[email protected]> wrote:
>> information about the system is in the attach file "information.txt"
>>
>> I can not reproduce it in the upstream 3.6.0 kernel..
>>
>> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <[email protected]> wrote:
>>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>>>> I looked up nothing useful with google,so I'm here for help..
>>>>
>>>> when this happens: I use memcg to limit the memory use of a
>>>> process,and when the memcg cgroup was out of memory,
>>>> the process was oom-killed however,it cannot really complete the
>>>> exiting. here is the some information
>>>
>>> How many tasks are in the group and what kind of memory do they use?
>>> Is it possible that you were hit by the same issue as described in
>>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>>
>>>> OS version: centos6.2 2.6.32.220.7.1
>>>
>>> Your kernel is quite old and you should be probably asking your
>>> distribution to help you out. There were many fixes since 2.6.32.
>>> Are you able to reproduce the same issue with the current vanila kernel?
>>>
>>>> /proc/pid/stack
>>>> ---------------------------------------------------------------
>>>>
>>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
>>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
>>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
>>>> [<ffffffff8105b078>] mmput+0x58/0x110
>>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
>>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
>>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
>>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
>>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
>>>> [<ffffffff8100b281>] int_signal+0x12/0x17
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> This looks strange because this is just an exit part which shouldn't
>>> deadlock or anything. Is this stack stable? Have you tried to take check
>>> it more times?
>
> Looking at information.txt, I found something interesting
>
> rt_rq[0]:/1314
> .rt_nr_running : 1
> .rt_throttled : 1
> .rt_time : 0.856656
> .rt_runtime : 0.000000
>
>
> cfs_rq[0]:/1314
> .exec_clock : 8738.133429
> .MIN_vruntime : 0.000001
> .min_vruntime : 8739.371271
> .max_vruntime : 0.000001
> .spread : 0.000000
> .spread0 : -9792.255554
> .nr_spread_over : 1
> .nr_running : 0
> .load : 0
> .load_avg : 7376.722880
> .load_period : 7.203830
> .load_contrib : 1023
> .load_tg : 1023
> .se->exec_start : 282004.715064
> .se->vruntime : 18435.664560
> .se->sum_exec_runtime : 8738.133429
> .se->wait_start : 0.000000
> .se->sleep_start : 0.000000
> .se->block_start : 0.000000
> .se->sleep_max : 0.000000
> .se->block_max : 0.000000
> .se->exec_max : 77.977054
> .se->slice_max : 0.000000
> .se->wait_max : 2.664779
> .se->wait_sum : 29.970575
> .se->wait_count : 102
> .se->load.weight : 2
>
> So 1314 is a real time process and
>
> cpu.rt_period_us:
> 1000000
> ----------------------
> cpu.rt_runtime_us:
> 0
>
> When did tt move to being a Real Time process (hint: see nr_running
> and nr_throttled)?
>
> Balbir
On Tue 23-10-12 11:35:52, Qiang Gao wrote:
> I'm sure this is a global-oom,not cgroup-oom. [the dmesg output in the end]
Yes this is the global oom killer because:
> cglimit -M 700M ./tt
> then after global-oom,the process hangs..
> 179184 pages RAM
So you have ~700M of RAM so the memcg limit is basically pointless as it
cannot be reached...
--
Michal Hocko
SUSE Labs
On 10/23/2012 11:35 AM, Qiang Gao wrote:
> information about the system is in the attach file "information.txt"
>
> I can not reproduce it in the upstream 3.6.0 kernel..
>
> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko<[email protected]> wrote:
>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>>> I looked up nothing useful with google,so I'm here for help..
>>>
>>> when this happens: I use memcg to limit the memory use of a
>>> process,and when the memcg cgroup was out of memory,
>>> the process was oom-killed however,it cannot really complete the
>>> exiting. here is the some information
>> How many tasks are in the group and what kind of memory do they use?
>> Is it possible that you were hit by the same issue as described in
>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>
>>> OS version: centos6.2 2.6.32.220.7.1
>> Your kernel is quite old and you should be probably asking your
>> distribution to help you out. There were many fixes since 2.6.32.
>> Are you able to reproduce the same issue with the current vanila kernel?
>>
>>> /proc/pid/stack
>>> ---------------------------------------------------------------
>>>
>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
>>> [<ffffffff8105b078>] mmput+0x58/0x110
>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
>>> [<ffffffff8100b281>] int_signal+0x12/0x17
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>> This looks strange because this is just an exit part which shouldn't
>> deadlock or anything. Is this stack stable? Have you tried to take check
>> it more times?
>>
Does the machine only have about 700M memory? I also find something
in the log file:
Node 0 DMA free:2772kB min:72kB low:88kB high:108kB present:15312kB..
lowmem_reserve[]: 0 674 674 674
Node 0 DMA32 free:*3172kB* min:3284kB low:4104kB high:4924kB present:690712kB ..
lowmem_reserve[]: 0 0 0 0
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
179184 pages RAM ==> 179184 * 4 / 1024 = *700M*
6773 pages reserved
Note that the free memory of DMA32(3172KB) is lower than min watermark,
which means the global is under pressure now. What's more the swap is off,
so the global oom is normal behavior.
Thanks,
Sha
this is just an example to show how to reproduce. actually,the first time I saw
this situation was on a machine with 288G RAM with many tasks running and
we limit 30G for each. but finanlly, no one exceeds this limit the the system
oom.
On Tue, Oct 23, 2012 at 4:35 PM, Michal Hocko <[email protected]> wrote:
> On Tue 23-10-12 11:35:52, Qiang Gao wrote:
>> I'm sure this is a global-oom,not cgroup-oom. [the dmesg output in the end]
>
> Yes this is the global oom killer because:
>> cglimit -M 700M ./tt
>> then after global-oom,the process hangs..
>
>> 179184 pages RAM
>
> So you have ~700M of RAM so the memcg limit is basically pointless as it
> cannot be reached...
> --
> Michal Hocko
> SUSE Labs
global-oom is the right thing to do. but oom-killed-process hanging on
do_exit is not the normal behavior
On Tue, Oct 23, 2012 at 5:01 PM, Sha Zhengju <[email protected]> wrote:
> On 10/23/2012 11:35 AM, Qiang Gao wrote:
>>
>> information about the system is in the attach file "information.txt"
>>
>> I can not reproduce it in the upstream 3.6.0 kernel..
>>
>> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko<[email protected]> wrote:
>>>
>>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>>>>
>>>> I looked up nothing useful with google,so I'm here for help..
>>>>
>>>> when this happens: I use memcg to limit the memory use of a
>>>> process,and when the memcg cgroup was out of memory,
>>>> the process was oom-killed however,it cannot really complete the
>>>> exiting. here is the some information
>>>
>>> How many tasks are in the group and what kind of memory do they use?
>>> Is it possible that you were hit by the same issue as described in
>>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>>>
>>>> OS version: centos6.2 2.6.32.220.7.1
>>>
>>> Your kernel is quite old and you should be probably asking your
>>> distribution to help you out. There were many fixes since 2.6.32.
>>> Are you able to reproduce the same issue with the current vanila kernel?
>>>
>>>> /proc/pid/stack
>>>> ---------------------------------------------------------------
>>>>
>>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
>>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
>>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
>>>> [<ffffffff8105b078>] mmput+0x58/0x110
>>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
>>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
>>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
>>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
>>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
>>>> [<ffffffff8100b281>] int_signal+0x12/0x17
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> This looks strange because this is just an exit part which shouldn't
>>> deadlock or anything. Is this stack stable? Have you tried to take check
>>> it more times?
>>>
>
> Does the machine only have about 700M memory? I also find something
> in the log file:
>
> Node 0 DMA free:2772kB min:72kB low:88kB high:108kB present:15312kB..
> lowmem_reserve[]: 0 674 674 674
> Node 0 DMA32 free:*3172kB* min:3284kB low:4104kB high:4924kB
> present:690712kB ..
> lowmem_reserve[]: 0 0 0 0
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 179184 pages RAM ==> 179184 * 4 / 1024 = *700M*
> 6773 pages reserved
>
>
> Note that the free memory of DMA32(3172KB) is lower than min watermark,
> which means the global is under pressure now. What's more the swap is off,
> so the global oom is normal behavior.
>
>
> Thanks,
> Sha
On Tue 23-10-12 17:08:40, Qiang Gao wrote:
> this is just an example to show how to reproduce. actually,the first time I saw
> this situation was on a machine with 288G RAM with many tasks running and
> we limit 30G for each. but finanlly, no one exceeds this limit the the system
> oom.
Yes but mentioning memory controller then might be misleading... It
seems that the only factor in your load is the cpu controller.
And please stop top-posting. It makes the discussion messy.
--
Michal Hocko
SUSE Labs
On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> This process was moved to RT-priority queue when global oom-killer
> happened to boost the recovery of the system..
Who did that? oom killer doesn't boost the priority (scheduling class)
AFAIK.
> but it wasn't get properily dealt with. I still have no idea why where
> the problem is ..
Well your configuration says that there is no runtime reserved for the
group.
Please refer to Documentation/scheduler/sched-rt-group.txt for more
information.
> On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh <[email protected]> wrote:
> > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao <[email protected]> wrote:
> >> information about the system is in the attach file "information.txt"
> >>
> >> I can not reproduce it in the upstream 3.6.0 kernel..
> >>
> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <[email protected]> wrote:
> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
> >>>> I looked up nothing useful with google,so I'm here for help..
> >>>>
> >>>> when this happens: I use memcg to limit the memory use of a
> >>>> process,and when the memcg cgroup was out of memory,
> >>>> the process was oom-killed however,it cannot really complete the
> >>>> exiting. here is the some information
> >>>
> >>> How many tasks are in the group and what kind of memory do they use?
> >>> Is it possible that you were hit by the same issue as described in
> >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
> >>>
> >>>> OS version: centos6.2 2.6.32.220.7.1
> >>>
> >>> Your kernel is quite old and you should be probably asking your
> >>> distribution to help you out. There were many fixes since 2.6.32.
> >>> Are you able to reproduce the same issue with the current vanila kernel?
> >>>
> >>>> /proc/pid/stack
> >>>> ---------------------------------------------------------------
> >>>>
> >>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
> >>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
> >>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
> >>>> [<ffffffff8105b078>] mmput+0x58/0x110
> >>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
> >>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
> >>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
> >>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
> >>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
> >>>> [<ffffffff8100b281>] int_signal+0x12/0x17
> >>>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>> This looks strange because this is just an exit part which shouldn't
> >>> deadlock or anything. Is this stack stable? Have you tried to take check
> >>> it more times?
> >
> > Looking at information.txt, I found something interesting
> >
> > rt_rq[0]:/1314
> > .rt_nr_running : 1
> > .rt_throttled : 1
> > .rt_time : 0.856656
> > .rt_runtime : 0.000000
> >
> >
> > cfs_rq[0]:/1314
> > .exec_clock : 8738.133429
> > .MIN_vruntime : 0.000001
> > .min_vruntime : 8739.371271
> > .max_vruntime : 0.000001
> > .spread : 0.000000
> > .spread0 : -9792.255554
> > .nr_spread_over : 1
> > .nr_running : 0
> > .load : 0
> > .load_avg : 7376.722880
> > .load_period : 7.203830
> > .load_contrib : 1023
> > .load_tg : 1023
> > .se->exec_start : 282004.715064
> > .se->vruntime : 18435.664560
> > .se->sum_exec_runtime : 8738.133429
> > .se->wait_start : 0.000000
> > .se->sleep_start : 0.000000
> > .se->block_start : 0.000000
> > .se->sleep_max : 0.000000
> > .se->block_max : 0.000000
> > .se->exec_max : 77.977054
> > .se->slice_max : 0.000000
> > .se->wait_max : 2.664779
> > .se->wait_sum : 29.970575
> > .se->wait_count : 102
> > .se->load.weight : 2
> >
> > So 1314 is a real time process and
> >
> > cpu.rt_period_us:
> > 1000000
> > ----------------------
> > cpu.rt_runtime_us:
> > 0
> >
> > When did tt move to being a Real Time process (hint: see nr_running
> > and nr_throttled)?
> >
> > Balbir
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michal Hocko
SUSE Labs
On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko <[email protected]> wrote:
> On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>> This process was moved to RT-priority queue when global oom-killer
>> happened to boost the recovery of the system..
>
> Who did that? oom killer doesn't boost the priority (scheduling class)
> AFAIK.
>
>> but it wasn't get properily dealt with. I still have no idea why where
>> the problem is ..
>
> Well your configuration says that there is no runtime reserved for the
> group.
> Please refer to Documentation/scheduler/sched-rt-group.txt for more
> information.
>
>> On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh <[email protected]> wrote:
>> > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao <[email protected]> wrote:
>> >> information about the system is in the attach file "information.txt"
>> >>
>> >> I can not reproduce it in the upstream 3.6.0 kernel..
>> >>
>> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <[email protected]> wrote:
>> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
>> >>>> I looked up nothing useful with google,so I'm here for help..
>> >>>>
>> >>>> when this happens: I use memcg to limit the memory use of a
>> >>>> process,and when the memcg cgroup was out of memory,
>> >>>> the process was oom-killed however,it cannot really complete the
>> >>>> exiting. here is the some information
>> >>>
>> >>> How many tasks are in the group and what kind of memory do they use?
>> >>> Is it possible that you were hit by the same issue as described in
>> >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
>> >>>
>> >>>> OS version: centos6.2 2.6.32.220.7.1
>> >>>
>> >>> Your kernel is quite old and you should be probably asking your
>> >>> distribution to help you out. There were many fixes since 2.6.32.
>> >>> Are you able to reproduce the same issue with the current vanila kernel?
>> >>>
>> >>>> /proc/pid/stack
>> >>>> ---------------------------------------------------------------
>> >>>>
>> >>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
>> >>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
>> >>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
>> >>>> [<ffffffff8105b078>] mmput+0x58/0x110
>> >>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
>> >>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
>> >>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
>> >>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
>> >>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
>> >>>> [<ffffffff8100b281>] int_signal+0x12/0x17
>> >>>> [<ffffffffffffffff>] 0xffffffffffffffff
>> >>>
>> >>> This looks strange because this is just an exit part which shouldn't
>> >>> deadlock or anything. Is this stack stable? Have you tried to take check
>> >>> it more times?
>> >
>> > Looking at information.txt, I found something interesting
>> >
>> > rt_rq[0]:/1314
>> > .rt_nr_running : 1
>> > .rt_throttled : 1
>> > .rt_time : 0.856656
>> > .rt_runtime : 0.000000
>> >
>> >
>> > cfs_rq[0]:/1314
>> > .exec_clock : 8738.133429
>> > .MIN_vruntime : 0.000001
>> > .min_vruntime : 8739.371271
>> > .max_vruntime : 0.000001
>> > .spread : 0.000000
>> > .spread0 : -9792.255554
>> > .nr_spread_over : 1
>> > .nr_running : 0
>> > .load : 0
>> > .load_avg : 7376.722880
>> > .load_period : 7.203830
>> > .load_contrib : 1023
>> > .load_tg : 1023
>> > .se->exec_start : 282004.715064
>> > .se->vruntime : 18435.664560
>> > .se->sum_exec_runtime : 8738.133429
>> > .se->wait_start : 0.000000
>> > .se->sleep_start : 0.000000
>> > .se->block_start : 0.000000
>> > .se->sleep_max : 0.000000
>> > .se->block_max : 0.000000
>> > .se->exec_max : 77.977054
>> > .se->slice_max : 0.000000
>> > .se->wait_max : 2.664779
>> > .se->wait_sum : 29.970575
>> > .se->wait_count : 102
>> > .se->load.weight : 2
>> >
>> > So 1314 is a real time process and
>> >
>> > cpu.rt_period_us:
>> > 1000000
>> > ----------------------
>> > cpu.rt_runtime_us:
>> > 0
>> >
>> > When did tt move to being a Real Time process (hint: see nr_running
>> > and nr_throttled)?
>> >
>> > Balbir
>> --
>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
maybe this is not a upstream-kernel bug. the centos/redhat kernel
would boost the process to RT prio when the process was selected
by oom-killer.
I think I should report this to redhat/centos.thanks for your attention
On Tue 23-10-12 18:10:33, Qiang Gao wrote:
> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko <[email protected]> wrote:
> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> >> This process was moved to RT-priority queue when global oom-killer
> >> happened to boost the recovery of the system..
> >
> > Who did that? oom killer doesn't boost the priority (scheduling class)
> > AFAIK.
> >
> >> but it wasn't get properily dealt with. I still have no idea why where
> >> the problem is ..
> >
> > Well your configuration says that there is no runtime reserved for the
> > group.
> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
> > information.
> >
[...]
> maybe this is not a upstream-kernel bug. the centos/redhat kernel
> would boost the process to RT prio when the process was selected
> by oom-killer.
This still looks like your cpu controller is misconfigured. Even if the
task is promoted to be realtime.
--
Michal Hocko
SUSE Labs
On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko <[email protected]> wrote:
> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko <[email protected]> wrote:
>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>> >> This process was moved to RT-priority queue when global oom-killer
>> >> happened to boost the recovery of the system..
>> >
>> > Who did that? oom killer doesn't boost the priority (scheduling class)
>> > AFAIK.
>> >
>> >> but it wasn't get properily dealt with. I still have no idea why where
>> >> the problem is ..
>> >
>> > Well your configuration says that there is no runtime reserved for the
>> > group.
>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
>> > information.
>> >
> [...]
>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
>> would boost the process to RT prio when the process was selected
>> by oom-killer.
>
> This still looks like your cpu controller is misconfigured. Even if the
> task is promoted to be realtime.
Precisely! You need to have rt bandwidth enabled for RT tasks to run,
as a workaround please give the groups some RT bandwidth and then work
out the migration to RT and what should be the defaults on the distro.
Balbir
On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh <[email protected]> wrote:
> On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko <[email protected]> wrote:
>> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
>>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko <[email protected]> wrote:
>>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>>> >> This process was moved to RT-priority queue when global oom-killer
>>> >> happened to boost the recovery of the system..
>>> >
>>> > Who did that? oom killer doesn't boost the priority (scheduling class)
>>> > AFAIK.
>>> >
>>> >> but it wasn't get properily dealt with. I still have no idea why where
>>> >> the problem is ..
>>> >
>>> > Well your configuration says that there is no runtime reserved for the
>>> > group.
>>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
>>> > information.
>>> >
>> [...]
>>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
>>> would boost the process to RT prio when the process was selected
>>> by oom-killer.
>>
>> This still looks like your cpu controller is misconfigured. Even if the
>> task is promoted to be realtime.
>
>
> Precisely! You need to have rt bandwidth enabled for RT tasks to run,
> as a workaround please give the groups some RT bandwidth and then work
> out the migration to RT and what should be the defaults on the distro.
>
> Balbir
see https://patchwork.kernel.org/patch/719411/
On Wed 24-10-12 11:44:17, Qiang Gao wrote:
> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh <[email protected]> wrote:
> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko <[email protected]> wrote:
> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko <[email protected]> wrote:
> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> >>> >> This process was moved to RT-priority queue when global oom-killer
> >>> >> happened to boost the recovery of the system..
> >>> >
> >>> > Who did that? oom killer doesn't boost the priority (scheduling class)
> >>> > AFAIK.
> >>> >
> >>> >> but it wasn't get properily dealt with. I still have no idea why where
> >>> >> the problem is ..
> >>> >
> >>> > Well your configuration says that there is no runtime reserved for the
> >>> > group.
> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
> >>> > information.
> >>> >
> >> [...]
> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
> >>> would boost the process to RT prio when the process was selected
> >>> by oom-killer.
> >>
> >> This still looks like your cpu controller is misconfigured. Even if the
> >> task is promoted to be realtime.
> >
> >
> > Precisely! You need to have rt bandwidth enabled for RT tasks to run,
> > as a workaround please give the groups some RT bandwidth and then work
> > out the migration to RT and what should be the defaults on the distro.
> >
> > Balbir
>
>
> see https://patchwork.kernel.org/patch/719411/
The patch surely "fixes" your problem but the primary fault here is the
mis-configured cpu cgroup. If the value for the bandwidth is zero by
default then all realtime processes in the group a screwed. The value
should be set to something more reasonable.
I am not familiar with the cpu controller but it seems that
alloc_rt_sched_group needs some treat. Care to look into it and send a
patch to the cpu controller and cgroup maintainers, please?
--
Michal Hocko
SUSE Labs
On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko <[email protected]> wrote:
> On Wed 24-10-12 11:44:17, Qiang Gao wrote:
>> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh <[email protected]> wrote:
>> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko <[email protected]> wrote:
>> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
>> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko <[email protected]> wrote:
>> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
>> >>> >> This process was moved to RT-priority queue when global oom-killer
>> >>> >> happened to boost the recovery of the system..
>> >>> >
>> >>> > Who did that? oom killer doesn't boost the priority (scheduling class)
>> >>> > AFAIK.
>> >>> >
>> >>> >> but it wasn't get properily dealt with. I still have no idea why where
>> >>> >> the problem is ..
>> >>> >
>> >>> > Well your configuration says that there is no runtime reserved for the
>> >>> > group.
>> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
>> >>> > information.
>> >>> >
>> >> [...]
>> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
>> >>> would boost the process to RT prio when the process was selected
>> >>> by oom-killer.
>> >>
>> >> This still looks like your cpu controller is misconfigured. Even if the
>> >> task is promoted to be realtime.
>> >
>> >
>> > Precisely! You need to have rt bandwidth enabled for RT tasks to run,
>> > as a workaround please give the groups some RT bandwidth and then work
>> > out the migration to RT and what should be the defaults on the distro.
>> >
>> > Balbir
>>
>>
>> see https://patchwork.kernel.org/patch/719411/
>
> The patch surely "fixes" your problem but the primary fault here is the
> mis-configured cpu cgroup. If the value for the bandwidth is zero by
> default then all realtime processes in the group a screwed. The value
> should be set to something more reasonable.
> I am not familiar with the cpu controller but it seems that
> alloc_rt_sched_group needs some treat. Care to look into it and send a
> patch to the cpu controller and cgroup maintainers, please?
>
> --
> Michal Hocko
> SUSE Labs
I'm trying to fix the problem. but no substantive progress yet.
On Fri, 2012-10-26 at 10:42 +0800, Qiang Gao wrote:
> On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko <[email protected]> wrote:
> > On Wed 24-10-12 11:44:17, Qiang Gao wrote:
> >> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh <[email protected]> wrote:
> >> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko <[email protected]> wrote:
> >> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote:
> >> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko <[email protected]> wrote:
> >> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> >> >>> >> This process was moved to RT-priority queue when global oom-killer
> >> >>> >> happened to boost the recovery of the system..
> >> >>> >
> >> >>> > Who did that? oom killer doesn't boost the priority (scheduling class)
> >> >>> > AFAIK.
> >> >>> >
> >> >>> >> but it wasn't get properily dealt with. I still have no idea why where
> >> >>> >> the problem is ..
> >> >>> >
> >> >>> > Well your configuration says that there is no runtime reserved for the
> >> >>> > group.
> >> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more
> >> >>> > information.
> >> >>> >
> >> >> [...]
> >> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel
> >> >>> would boost the process to RT prio when the process was selected
> >> >>> by oom-killer.
> >> >>
> >> >> This still looks like your cpu controller is misconfigured. Even if the
> >> >> task is promoted to be realtime.
> >> >
> >> >
> >> > Precisely! You need to have rt bandwidth enabled for RT tasks to run,
> >> > as a workaround please give the groups some RT bandwidth and then work
> >> > out the migration to RT and what should be the defaults on the distro.
> >> >
> >> > Balbir
> >>
> >>
> >> see https://patchwork.kernel.org/patch/719411/
> >
> > The patch surely "fixes" your problem but the primary fault here is the
> > mis-configured cpu cgroup. If the value for the bandwidth is zero by
> > default then all realtime processes in the group a screwed. The value
> > should be set to something more reasonable.
> > I am not familiar with the cpu controller but it seems that
> > alloc_rt_sched_group needs some treat. Care to look into it and send a
> > patch to the cpu controller and cgroup maintainers, please?
> >
> > --
> > Michal Hocko
> > SUSE Labs
>
> I'm trying to fix the problem. but no substantive progress yet.
The throttle tracks a finite resource for an arbitrary number of groups,
so there's no sane rt_runtime default other than zero.
Most folks only want the top level throttle warm fuzzy, so a complete
runtime RT_GROUP_SCHED on/off switch with default to off, ie rt tasks
cannot be moved until switched on would fix some annoying "Oopsie, I
forgot" allocation troubles. If you turn it on, shame on you if you
fail to allocate, you asked for it, you're not just stuck with it
because your distro enabled it in their config.
Or, perhaps just make zero rt_runtime always mean traverse up to first
non-zero rt_runtime, ie zero allocation children may consume parental
runtime as they see fit on first come first served basis, when it's
gone, tough, parent/children all wait for refill.
Or whatever, as long as you don't bust distribution/tracking for those
crazy people who intentionally use RT_GROUP_SCHED ;-)
The bug is in the patch that used sched_setscheduler_nocheck(). Plain
sched_setscheduler() would have replied -EGOAWAY.
-Mike
On Fri, 2012-10-26 at 10:03 -0700, Mike Galbraith wrote:
> The bug is in the patch that used sched_setscheduler_nocheck(). Plain
> sched_setscheduler() would have replied -EGOAWAY.
sched_setscheduler_nocheck() should say go away too methinks. This
isn't about permissions, it's about not being stupid in general.
sched: fix __sched_setscheduler() RT_GROUP_SCHED conditionals
Remove user and rt_bandwidth_enabled() RT_GROUP_SCHED conditionals in
__sched_setscheduler(). The end result of kernel OR user promoting a
task in a group with zero rt_runtime allocated is the same bad thing,
and throttle switch position matters little. It's safer to just say
no solely based upon bandwidth existence, may save the user a nasty
surprise if he later flips the throttle switch to 'on'.
The commit below came about due to sched_setscheduler_nocheck()
allowing a task in a task group with zero rt_runtime allocated to
be promoted by the kernel oom logic, thus marooning it forever.
<quote>
commit 341aea2bc48bf652777fb015cc2b3dfa9a451817
Author: KOSAKI Motohiro <[email protected]>
Date: Thu Apr 14 15:22:13 2011 -0700
oom-kill: remove boost_dying_task_prio()
This is an almost-revert of commit 93b43fa ("oom: give the dying task a
higher priority").
That commit dramatically improved oom killer logic when a fork-bomb
occurs. But I've found that it has nasty corner case. Now cpu cgroup has
strange default RT runtime. It's 0! That said, if a process under cpu
cgroup promote RT scheduling class, the process never run at all.
</quote>
Signed-off-by: Mike Galbraith <[email protected]>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..d3a35f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3810,17 +3810,14 @@ recheck:
}
#ifdef CONFIG_RT_GROUP_SCHED
- if (user) {
- /*
- * Do not allow realtime tasks into groups that have no runtime
- * assigned.
- */
- if (rt_bandwidth_enabled() && rt_policy(policy) &&
- task_group(p)->rt_bandwidth.rt_runtime == 0 &&
- !task_group_is_autogroup(task_group(p))) {
- task_rq_unlock(rq, p, &flags);
- return -EPERM;
- }
+ /*
+ * Do not allow realtime tasks into groups that have no runtime
+ * assigned.
+ */
+ if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0 &&
+ !task_group_is_autogroup(task_group(p))) {
+ task_rq_unlock(rq, p, &flags);
+ return -EPERM;
}
#endif