Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753184Ab2JWEkp (ORCPT ); Tue, 23 Oct 2012 00:40:45 -0400 Received: from mail-vb0-f46.google.com ([209.85.212.46]:43939 "EHLO mail-vb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750810Ab2JWEkn (ORCPT ); Tue, 23 Oct 2012 00:40:43 -0400 MIME-Version: 1.0 In-Reply-To: References: <20121019160425.GA10175@dhcp22.suse.cz> Date: Tue, 23 Oct 2012 10:10:42 +0530 Message-ID: Subject: Re: process hangs on do_exit when oom happens From: Balbir Singh To: Qiang Gao Cc: Michal Hocko , "linux-kernel@vger.kernel.org" , "linux-mmc@vger.kernel.org" , "cgroups@vger.kernel.org" , linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3773 Lines: 98 On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao wrote: > information about the system is in the attach file "information.txt" > > I can not reproduce it in the upstream 3.6.0 kernel.. > > On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: >> On Wed 17-10-12 18:23:34, gaoqiang wrote: >>> I looked up nothing useful with google,so I'm here for help.. >>> >>> when this happens: I use memcg to limit the memory use of a >>> process,and when the memcg cgroup was out of memory, >>> the process was oom-killed however,it cannot really complete the >>> exiting. here is the some information >> >> How many tasks are in the group and what kind of memory do they use? >> Is it possible that you were hit by the same issue as described in >> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. >> >>> OS version: centos6.2 2.6.32.220.7.1 >> >> Your kernel is quite old and you should be probably asking your >> distribution to help you out. There were many fixes since 2.6.32. >> Are you able to reproduce the same issue with the current vanila kernel? >> >>> /proc/pid/stack >>> --------------------------------------------------------------- >>> >>> [] __cond_resched+0x2a/0x40 >>> [] unmap_vmas+0xb49/0xb70 >>> [] exit_mmap+0x7e/0x140 >>> [] mmput+0x58/0x110 >>> [] exit_mm+0x11d/0x160 >>> [] do_exit+0x1ad/0x860 >>> [] do_group_exit+0x41/0xb0 >>> [] get_signal_to_deliver+0x1e8/0x430 >>> [] do_notify_resume+0xf4/0x8b0 >>> [] int_signal+0x12/0x17 >>> [] 0xffffffffffffffff >> >> This looks strange because this is just an exit part which shouldn't >> deadlock or anything. Is this stack stable? Have you tried to take check >> it more times? Looking at information.txt, I found something interesting rt_rq[0]:/1314 .rt_nr_running : 1 .rt_throttled : 1 .rt_time : 0.856656 .rt_runtime : 0.000000 cfs_rq[0]:/1314 .exec_clock : 8738.133429 .MIN_vruntime : 0.000001 .min_vruntime : 8739.371271 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : -9792.255554 .nr_spread_over : 1 .nr_running : 0 .load : 0 .load_avg : 7376.722880 .load_period : 7.203830 .load_contrib : 1023 .load_tg : 1023 .se->exec_start : 282004.715064 .se->vruntime : 18435.664560 .se->sum_exec_runtime : 8738.133429 .se->wait_start : 0.000000 .se->sleep_start : 0.000000 .se->block_start : 0.000000 .se->sleep_max : 0.000000 .se->block_max : 0.000000 .se->exec_max : 77.977054 .se->slice_max : 0.000000 .se->wait_max : 2.664779 .se->wait_sum : 29.970575 .se->wait_count : 102 .se->load.weight : 2 So 1314 is a real time process and cpu.rt_period_us: 1000000 ---------------------- cpu.rt_runtime_us: 0 When did tt move to being a Real Time process (hint: see nr_running and nr_throttled)? Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/