Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756049AbXKJXOA (ORCPT ); Sat, 10 Nov 2007 18:14:00 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755417AbXKJXNw (ORCPT ); Sat, 10 Nov 2007 18:13:52 -0500 Received: from e6.ny.us.ibm.com ([32.97.182.146]:54231 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755574AbXKJXNu (ORCPT ); Sat, 10 Nov 2007 18:13:50 -0500 Date: Sat, 10 Nov 2007 15:13:47 -0800 From: sukadev@us.ibm.com To: "Serge E. Hallyn" Cc: Srivatsa Vaddagiri , dhaval@linux.vnet.ibm.com, ckrm-tech@lists.sourceforge.net, efault@gmx.de, linux-kernel@vger.kernel.org, Dmitry Adamushko , Containers , Ingo Molnar Subject: Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y Message-ID: <20071110231347.GA446@us.ibm.com> References: <20071108234805.GA2240@us.ibm.com> <20071109070214.GO3143@linux.vnet.ibm.com> <20071109101437.GA22544@linux.vnet.ibm.com> <20071109160541.GA16523@sergelap.austin.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071109160541.GA16523@sergelap.austin.ibm.com> User-Agent: Mutt/1.4.2.2i X-Operating-System: Linux 2.0.32 on an i486 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2826 Lines: 70 Serge E. Hallyn [serue@us.ibm.com] wrote: | Quoting Srivatsa Vaddagiri (vatsa@linux.vnet.ibm.com): | > On Fri, Nov 09, 2007 at 09:45:21AM +0100, Dmitry Adamushko wrote: | > > Humm... the 'current' is not kept within the tree but | > > current->se.on_rq is supposed to be '1' , | > > so the old code looks ok to me (at least for the 'leaf' elements). | > | > You are damned right! Sorry my mistake with the previous analysis and | > (as I now find out) testing :( | > | > There are couple of problems discovered by Suka's test: | > | > - The test requires the cgroup filesystem to be mounted with | > atleast the cpu and ns options (i.e both namespace and cpu | > controllers are active in the same hierarchy). | > | > # mkdir /dev/cpuctl | > # mount -t cgroup -ocpu,ns none cpuctl | > (or simply) | > # mount -t cgroup none cpuctl -> Will activate all controllers | > in same hierarchy. | > | > - The test invokes clone() with CLONE_NEWNS set. This causes a a new child | > to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone-> | > cgroup_clone) and the child is attached to the new group (cgroup_clone-> | > attach_task->sched_move_task). At this point in time, the child's scheduler | > related fields are uninitialized (including its on_rq field, which it has | > inherited from parent). As a result sched_move_task thinks its on | > runqueue, when it isn't. | > | > As a solution to this problem, I moved sched_fork() call, which | > initializes scheduler related fields on a new task, before | > copy_namespaces(). I am not sure though whether moving up will | > cause other side-effects. Do you see any issue? | > | > - The second problem exposed by this test is that task_new_fair() | > assumes that parent and child will be part of the same group (which | > needn't be as this test shows). As a result, cfs_rq->curr can be NULL | > for the child. | > | > The solution is to test for curr pointer being NULL in | > task_new_fair(). | > | > | > With the patch below, I could run ns_exec() fine w/o a crash. | > | > Suka, can you verify whether this patch fixes your problem? | | Works on my machine. Thanks! And mine too. Thanks, | | > -- | > | > Fix copy_namespace() <-> sched_fork() dependency in do_fork, by moving | > up sched_fork(). | > | > Also introduce a NULL pointer check for 'curr' in task_new_fair(). | > | > Signed-off-by : Srivatsa Vaddagiri | | Tested-by: Serge Hallyn Tested-by: Sukadev Bhattiprolu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/