Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756268AbYFXV1e (ORCPT ); Tue, 24 Jun 2008 17:27:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754515AbYFXV1Z (ORCPT ); Tue, 24 Jun 2008 17:27:25 -0400 Received: from smtp-out.google.com ([216.239.33.17]:63256 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753779AbYFXV1Y (ORCPT ); Tue, 24 Jun 2008 17:27:24 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=received:message-id:date:from:to:subject:cc:in-reply-to: mime-version:content-type:content-transfer-encoding: content-disposition:references; b=SWx0h/zgtJiuFWwkA/a8SW0cT3XC98uNJDb7z7Xz+5UHsPeXkWFn/cjRpcWhHVgxJ ZJCWQi0iQpnXKY8gS0tDA== Message-ID: <6599ad830806241427n4e174994oce21e93a034b7051@mail.gmail.com> Date: Tue, 24 Jun 2008 14:27:10 -0700 From: "Paul Menage" To: "Matt Helsley" Subject: Re: [patch 3/4] Container Freezer: Implement freezer cgroup subsystem Cc: "Rafael J. Wysocki" , "Pavel Machek" , Linux-Kernel , linux-pm@lists.linux-foundation.org, "Linux Containers" , "Cedric Le Goater" In-Reply-To: <20080624135824.420875709@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080624135823.417569216@us.ibm.com> <20080624135824.420875709@us.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4895 Lines: 142 On Tue, Jun 24, 2008 at 6:58 AM, Matt Helsley wrote: > From: Cedric Le Goater > Subject: [patch 3/4] Container Freezer: Implement freezer cgroup subsystem > > This patch implements a new freezer subsystem for Paul Menage's > control groups framework. You can s/Paul Menage's// now that it's in mainline. > +static const char *freezer_state_strs[] = { > + "RUNNING", > + "FREEZING", > + "FROZEN", > +}; > + > +/* Check and update whenever adding new freezer states. Currently is: > + strlen("FREEZING") */ > +#define STATE_MAX_STRLEN 8 > + That's a bit nasty ... But hopefully it could go away when the write_string() method is available in cgroups? (See my patchset from earlier this week). > + > +struct cgroup_subsys freezer_subsys; > + > +/* Locking and lock ordering: > + * > + * can_attach(), cgroup_frozen(): > + * rcu (task->cgroup, freezer->state) > + * > + * freezer_fork(): > + * rcu (task->cgroup, freezer->state) > + * freezer->lock > + * task_lock > + * sighand->siglock > + * > + * freezer_read(): > + * rcu (freezer->state) > + * freezer->lock (upgrade to write) > + * read_lock css_set_lock > + * > + * freezer_write() > + * cgroup_lock > + * rcu > + * freezer->lock > + * read_lock css_set_lock > + * task_lock > + * sighand->siglock > + * > + * freezer_create(), freezer_destroy(): > + * cgroup_lock [ by cgroup core ] > + */ > +static int freezer_can_attach(struct cgroup_subsys *ss, > + struct cgroup *new_cgroup, > + struct task_struct *task) > +{ > + struct freezer *freezer; > + int retval = 0; > + > + /* > + * The call to cgroup_lock() in the freezer.state write method prevents > + * a write to that file racing against an attach, and hence the > + * can_attach() result will remain valid until the attach completes. > + */ > + rcu_read_lock(); > + freezer = cgroup_freezer(new_cgroup); > + if (freezer->state == STATE_FROZEN) > + retval = -EBUSY; Is it meant to be OK to move a task into a cgroup that's currently in the FREEZING state but not yet fully frozen? > + struct freezer *freezer; > + > + rcu_read_lock(); /* needed to fetch task's cgroup > + can't use task_lock() here because > + freeze_task() grabs that */ I'm not sure that RCU is the right thing for this. All that the RCU lock will guarantee is that the freezer structure you get a pointer to doesn't go away. It doesn't guarantee that the task doesn't move cgroup, or that the cgroup doesn't get a freeze request via a write. But in this case, the fork callback is called before the task is added to the task_list/pidhash, or to its cgroups' linked lists. So it shouldn't be able to change groups. Racing against a concurrent write to the cgroup's freeze file may be more of an issue. Can you add a __freeze_task() that has to be called with task_lock(p) already held? > + freezer = task_freezer(task); Maybe BUG_ON(freezer->state == STATE_FROZEN) ? > + > +static ssize_t freezer_read(struct cgroup *cgroup, > + struct cftype *cft, > + struct file *file, char __user *buf, > + size_t nbytes, loff_t *ppos) > +{ > + struct freezer *freezer; > + enum freezer_state state; > + > + rcu_read_lock(); > + freezer = cgroup_freezer(cgroup); > + state = freezer->state; > + if (state == STATE_FREEZING) { > + /* We change from FREEZING to FROZEN lazily if the cgroup was > + * only partially frozen when we exitted write. */ > + spin_lock_irq(&freezer->lock); > + if (freezer_check_if_frozen(cgroup)) { > + freezer->state = STATE_FROZEN; > + state = STATE_FROZEN; > + } > + spin_unlock_irq(&freezer->lock); > + } > + rcu_read_unlock(); > + > + return simple_read_from_buffer(buf, nbytes, ppos, > + freezer_state_strs[state], > + strlen(freezer_state_strs[state])); > +} Technically this could return weird results if someone read it byte-by-byte and the status changed between reads. If you used read_seq_string rather than read you'd avoid that. > + return -EIO; > + > + cgroup_lock(); If you're taking cgroup_lock() here in freezer_write(), there's no need for the rcu_read_lock() in freezer_freeze() Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/