Date: Fri, 24 Jul 2009 14:06:57 -0700
From: Matt Helsley <matthltc@us.ibm.com>
To: Benjamin Blum <bblum@google.com>
Cc: Paul Menage <menage@google.com>, Matt Helsley <matthltc@us.ibm.com>,
       linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org,
       akpm@linux-foundation.org, serue@us.ibm.com, lizf@cn.fujitsu.com
Subject: Re: [PATCH 5/6] Makes procs file writable to move all threads by
	tgid at once
Message-ID: <20090724210657.GL5878@count0.beaverton.ibm.com>
References: <20090724032033.2463.79256.stgit@hastromil.mtv.corp.google.com> <20090724032200.2463.82408.stgit@hastromil.mtv.corp.google.com> <20090724155041.GF5878@count0.beaverton.ibm.com> <6599ad830907240901w4fc02097k83d0c1012708e6ee@mail.gmail.com> <20090724172320.GH5878@count0.beaverton.ibm.com> <6599ad830907241047w9a9fff9q4dc68f26a9544398@mail.gmail.com> <2f86c2480907241353l63818dfehb20c9d4918a3f069@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2f86c2480907241353l63818dfehb20c9d4918a3f069@mail.gmail.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2648
Lines: 55

On Fri, Jul 24, 2009 at 01:53:53PM -0700, Benjamin Blum wrote:
> On Fri, Jul 24, 2009 at 1:47 PM, Paul Menage<menage@google.com> wrote:
> > On Fri, Jul 24, 2009 at 10:23 AM, Matt Helsley<matthltc@us.ibm.com> wrote:
> >>
> >> Well, I imagine holding tasklist_lock is worse than cgroup_mutex in some
> >> ways since it's used even more widely. Makes sense not to use it here..
> >
> > Just to clarify - the new "procs" code doesn't use cgroup_mutex for
> > its critical section, it uses a new cgroup_fork_mutex, which is only
> > taken for write during cgroup_proc_attach() (after all setup has been
> > done, to ensure that no new threads are created while we're updating
> > all the existing threads). So in general there'll be zero contention
> > on this lock - the cost will be the cache misses due to the rwlock
> > bouncing between the different CPUs that are taking it in read mode.
> 
> Right. The different options so far are:
> 
> Global rwsem: only needs one lock, but prevents all forking when a
> write is in progress. It should be quick enough, if it's just "iterate
> down the threadgroup list in O(n)". In the good case, fork() slows
> down by a cache miss when taking the lock in read mode.

I noticed your point about only one process contending for write on
the new semaphore since cgroup_mutex is also held on the write side.
However won't there be cacheline bouncing as lots of readers contend not
for the read side of the lock itself but the cacheline needed to take it?

> Threadgroup-local rwsem: Needs adding a field to task_struct. Only
> forks within the same threadgroup would block on a write to the procs
> file, and the zero-contention case is the same as before.

This seems like it would be better.

> Using tasklist_lock: Currently, the call to cgroup_fork() (which
> starts the race) is very far above where tasklist_lock is taken in
> fork, so taking tasklist_lock earlier is very infeasible. Could
> cgroup_fork() be moved downwards to inside it, and if so, how much
> restructuring would be needed? Even if so, this still adds stuff that
> is being done (unnecessarily) while holding a global mutex.

Yup.

> > What happened to the big-reader lock concept from 2.4.x? That would be
> > applicable here - minimizing the overhead on the critical path when
> > the write operation is expected to be very rare.

Supplanted by RCU perhaps? *shrug*

Cheers,
	-Matt Helsley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/