Date: Mon, 14 May 2007 13:11:54 +0530
From: Dipankar Sarma <dipankar@in.ibm.com>
To: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
       Gautham R Shenoy <ego@in.ibm.com>, "Rafael J. Wysocki" <rjw@sisk.pl>,
       Pavel Machek <pavel@ucw.cz>, Oleg Nesterov <oleg@tv-sign.ru>,
       Andrew Morton <akpm@linux-foundation.org>,
       LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
       Paul E McKenney <paulmck@us.ibm.com>
Subject: Re: [RFD] Freezing of kernel threads
Message-ID: <20070514074154.GA28827@in.ibm.com>
Reply-To: dipankar@in.ibm.com
References: <200705122017.32792.rjw@sisk.pl> <20070512193609.GA31426@in.ibm.com> <alpine.LFD.0.98.0705121241300.3986@woody.linux-foundation.org> <20070513083357.GA12985@in.ibm.com> <alpine.LFD.0.98.0705130925290.6739@woody.linux-foundation.org> <20070514061846.GA30625@in.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20070514061846.GA30625@in.ibm.com>
User-Agent: Mutt/1.5.11
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2461
Lines: 57

On Mon, May 14, 2007 at 11:48:46AM +0530, Srivatsa Vaddagiri wrote:
> On Sun, May 13, 2007 at 09:33:41AM -0700, Linus Torvalds wrote:
> > On Sun, 13 May 2007, Gautham R Shenoy wrote:
> > > RFC #1: Use get_hot_cpus()- put_hot_cpus() , which follow the
> > > well known refcounting model.
> > 
> > Yes. And usign the "preempt count" as a refcount is fairly natural, no? 
> > We do already depend on that in many code-paths.
> 
> 
> Tackling that requires some state bit in task_struct to educate
> read_lock to be a no-op if write lock is already held by the thread.
> 
> In summary, get/put_hot_cpu() will need to be (slightly) more complex than
> something like get/put_cpu(). Perhaps this complexity was what put off
> Andrew when he suggested the use of freezer (http://lkml.org/lkml/2006/11/1/400)

Atleast with get_hot_cpu()/put_hot_cpu(), complexity is limited
to the implementation of those interfaces and not all over the
kernel. Having worked on CPU hotplug in the past, I have no doubt
that this is the most simple *locking model* for cpu
hotplug events.


> > For example, since all users of cpu_online_map should be pure *readers* 
> > (apart from a couple of cases that actually bring up a CPU), you can do 
> > things like
> > 
> > 	#define cpu_online_map check_cpu_online_map()
> > 
> > 	static inline cpumask_t check_cpu_online_map(void)
> > 	{
> > 		WARN_ON(!preempt_safe()); /* or whatever lock we decide on */
> > 		return __real_cpu_online_map;
> > 	}
> 
> I remember Rusty had a similar function to check for unsafe references
> to cpu_online_map way back when cpu hotplug was being developed. It will
> be a good idea to reintroduce that back.

One possibility is to have a generation counter for all cpu events
and save it in cpumask_t (or some debug version of it) every
time we snapshot the online cpu mask. We can then
put checks in places where we access them whether the generation
counter matches the current generation or not. If it doesn't
it would indicate that saved cpumask not being "updated" during
cpu hotplug events. In the cpu hotplug event handlers, we can
force all saved online cpumasks to be "updated". Such debug
code may help detect the locking violators.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/