Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755475Ab2JZCmk (ORCPT ); Thu, 25 Oct 2012 22:42:40 -0400 Received: from mail-vc0-f174.google.com ([209.85.220.174]:50918 "EHLO mail-vc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753125Ab2JZCmi (ORCPT ); Thu, 25 Oct 2012 22:42:38 -0400 MIME-Version: 1.0 In-Reply-To: <20121025095719.GA11105@dhcp22.suse.cz> References: <20121019160425.GA10175@dhcp22.suse.cz> <20121023095028.GD15397@dhcp22.suse.cz> <20121023101500.GE15397@dhcp22.suse.cz> <20121025095719.GA11105@dhcp22.suse.cz> Date: Fri, 26 Oct 2012 10:42:37 +0800 Message-ID: Subject: Re: process hangs on do_exit when oom happens From: Qiang Gao To: Michal Hocko Cc: Balbir Singh , "linux-kernel@vger.kernel.org" , "linux-mmc@vger.kernel.org" , "cgroups@vger.kernel.org" , linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2436 Lines: 57 On Thu, Oct 25, 2012 at 5:57 PM, Michal Hocko wrote: > On Wed 24-10-12 11:44:17, Qiang Gao wrote: >> On Wed, Oct 24, 2012 at 1:43 AM, Balbir Singh wrote: >> > On Tue, Oct 23, 2012 at 3:45 PM, Michal Hocko wrote: >> >> On Tue 23-10-12 18:10:33, Qiang Gao wrote: >> >>> On Tue, Oct 23, 2012 at 5:50 PM, Michal Hocko wrote: >> >>> > On Tue 23-10-12 15:18:48, Qiang Gao wrote: >> >>> >> This process was moved to RT-priority queue when global oom-killer >> >>> >> happened to boost the recovery of the system.. >> >>> > >> >>> > Who did that? oom killer doesn't boost the priority (scheduling class) >> >>> > AFAIK. >> >>> > >> >>> >> but it wasn't get properily dealt with. I still have no idea why where >> >>> >> the problem is .. >> >>> > >> >>> > Well your configuration says that there is no runtime reserved for the >> >>> > group. >> >>> > Please refer to Documentation/scheduler/sched-rt-group.txt for more >> >>> > information. >> >>> > >> >> [...] >> >>> maybe this is not a upstream-kernel bug. the centos/redhat kernel >> >>> would boost the process to RT prio when the process was selected >> >>> by oom-killer. >> >> >> >> This still looks like your cpu controller is misconfigured. Even if the >> >> task is promoted to be realtime. >> > >> > >> > Precisely! You need to have rt bandwidth enabled for RT tasks to run, >> > as a workaround please give the groups some RT bandwidth and then work >> > out the migration to RT and what should be the defaults on the distro. >> > >> > Balbir >> >> >> see https://patchwork.kernel.org/patch/719411/ > > The patch surely "fixes" your problem but the primary fault here is the > mis-configured cpu cgroup. If the value for the bandwidth is zero by > default then all realtime processes in the group a screwed. The value > should be set to something more reasonable. > I am not familiar with the cpu controller but it seems that > alloc_rt_sched_group needs some treat. Care to look into it and send a > patch to the cpu controller and cgroup maintainers, please? > > -- > Michal Hocko > SUSE Labs I'm trying to fix the problem. but no substantive progress yet. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/