Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932662Ab2JWJKJ (ORCPT ); Tue, 23 Oct 2012 05:10:09 -0400 Received: from mail-vc0-f174.google.com ([209.85.220.174]:54417 "EHLO mail-vc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753450Ab2JWJKH (ORCPT ); Tue, 23 Oct 2012 05:10:07 -0400 MIME-Version: 1.0 In-Reply-To: <50865D06.5090605@gmail.com> References: <20121019160425.GA10175@dhcp22.suse.cz> <50865D06.5090605@gmail.com> Date: Tue, 23 Oct 2012 17:10:06 +0800 Message-ID: Subject: Re: process hangs on do_exit when oom happens From: Qiang Gao To: Sha Zhengju Cc: Michal Hocko , "linux-kernel@vger.kernel.org" , "linux-mmc@vger.kernel.org" , "cgroups@vger.kernel.org" , linux-mm@kvack.org, bsingharora@gmail.com Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3057 Lines: 79 global-oom is the right thing to do. but oom-killed-process hanging on do_exit is not the normal behavior On Tue, Oct 23, 2012 at 5:01 PM, Sha Zhengju wrote: > On 10/23/2012 11:35 AM, Qiang Gao wrote: >> >> information about the system is in the attach file "information.txt" >> >> I can not reproduce it in the upstream 3.6.0 kernel.. >> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko wrote: >>> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote: >>>> >>>> I looked up nothing useful with google,so I'm here for help.. >>>> >>>> when this happens: I use memcg to limit the memory use of a >>>> process,and when the memcg cgroup was out of memory, >>>> the process was oom-killed however,it cannot really complete the >>>> exiting. here is the some information >>> >>> How many tasks are in the group and what kind of memory do they use? >>> Is it possible that you were hit by the same issue as described in >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter. >>> >>>> OS version: centos6.2 2.6.32.220.7.1 >>> >>> Your kernel is quite old and you should be probably asking your >>> distribution to help you out. There were many fixes since 2.6.32. >>> Are you able to reproduce the same issue with the current vanila kernel? >>> >>>> /proc/pid/stack >>>> --------------------------------------------------------------- >>>> >>>> [] __cond_resched+0x2a/0x40 >>>> [] unmap_vmas+0xb49/0xb70 >>>> [] exit_mmap+0x7e/0x140 >>>> [] mmput+0x58/0x110 >>>> [] exit_mm+0x11d/0x160 >>>> [] do_exit+0x1ad/0x860 >>>> [] do_group_exit+0x41/0xb0 >>>> [] get_signal_to_deliver+0x1e8/0x430 >>>> [] do_notify_resume+0xf4/0x8b0 >>>> [] int_signal+0x12/0x17 >>>> [] 0xffffffffffffffff >>> >>> This looks strange because this is just an exit part which shouldn't >>> deadlock or anything. Is this stack stable? Have you tried to take check >>> it more times? >>> > > Does the machine only have about 700M memory? I also find something > in the log file: > > Node 0 DMA free:2772kB min:72kB low:88kB high:108kB present:15312kB.. > lowmem_reserve[]: 0 674 674 674 > Node 0 DMA32 free:*3172kB* min:3284kB low:4104kB high:4924kB > present:690712kB .. > lowmem_reserve[]: 0 0 0 0 > 0 pages in swap cache > Swap cache stats: add 0, delete 0, find 0/0 > Free swap = 0kB > Total swap = 0kB > 179184 pages RAM ==> 179184 * 4 / 1024 = *700M* > 6773 pages reserved > > > Note that the free memory of DMA32(3172KB) is lower than min watermark, > which means the global is under pressure now. What's more the swap is off, > so the global oom is normal behavior. > > > Thanks, > Sha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/