Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756511AbXENHnJ (ORCPT ); Mon, 14 May 2007 03:43:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753729AbXENHm6 (ORCPT ); Mon, 14 May 2007 03:42:58 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:54556 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753388AbXENHm5 (ORCPT ); Mon, 14 May 2007 03:42:57 -0400 Date: Mon, 14 May 2007 13:11:54 +0530 From: Dipankar Sarma To: Srivatsa Vaddagiri Cc: Linus Torvalds , Gautham R Shenoy , "Rafael J. Wysocki" , Pavel Machek , Oleg Nesterov , Andrew Morton , LKML , Ingo Molnar , Paul E McKenney Subject: Re: [RFD] Freezing of kernel threads Message-ID: <20070514074154.GA28827@in.ibm.com> Reply-To: dipankar@in.ibm.com References: <200705122017.32792.rjw@sisk.pl> <20070512193609.GA31426@in.ibm.com> <20070513083357.GA12985@in.ibm.com> <20070514061846.GA30625@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070514061846.GA30625@in.ibm.com> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2461 Lines: 57 On Mon, May 14, 2007 at 11:48:46AM +0530, Srivatsa Vaddagiri wrote: > On Sun, May 13, 2007 at 09:33:41AM -0700, Linus Torvalds wrote: > > On Sun, 13 May 2007, Gautham R Shenoy wrote: > > > RFC #1: Use get_hot_cpus()- put_hot_cpus() , which follow the > > > well known refcounting model. > > > > Yes. And usign the "preempt count" as a refcount is fairly natural, no? > > We do already depend on that in many code-paths. > > > Tackling that requires some state bit in task_struct to educate > read_lock to be a no-op if write lock is already held by the thread. > > In summary, get/put_hot_cpu() will need to be (slightly) more complex than > something like get/put_cpu(). Perhaps this complexity was what put off > Andrew when he suggested the use of freezer (http://lkml.org/lkml/2006/11/1/400) Atleast with get_hot_cpu()/put_hot_cpu(), complexity is limited to the implementation of those interfaces and not all over the kernel. Having worked on CPU hotplug in the past, I have no doubt that this is the most simple *locking model* for cpu hotplug events. > > For example, since all users of cpu_online_map should be pure *readers* > > (apart from a couple of cases that actually bring up a CPU), you can do > > things like > > > > #define cpu_online_map check_cpu_online_map() > > > > static inline cpumask_t check_cpu_online_map(void) > > { > > WARN_ON(!preempt_safe()); /* or whatever lock we decide on */ > > return __real_cpu_online_map; > > } > > I remember Rusty had a similar function to check for unsafe references > to cpu_online_map way back when cpu hotplug was being developed. It will > be a good idea to reintroduce that back. One possibility is to have a generation counter for all cpu events and save it in cpumask_t (or some debug version of it) every time we snapshot the online cpu mask. We can then put checks in places where we access them whether the generation counter matches the current generation or not. If it doesn't it would indicate that saved cpumask not being "updated" during cpu hotplug events. In the cpu hotplug event handlers, we can force all saved online cpumasks to be "updated". Such debug code may help detect the locking violators. Thanks Dipankar - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/