Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754004AbbBNIf7 (ORCPT ); Sat, 14 Feb 2015 03:35:59 -0500 Received: from hofr.at ([212.69.189.236]:38842 "EHLO mail.hofr.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753243AbbBNIf6 (ORCPT ); Sat, 14 Feb 2015 03:35:58 -0500 Date: Sat, 14 Feb 2015 09:35:55 +0100 From: Nicholas Mc Guire To: Oleg Nesterov Cc: Davidlohr Bueso , paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, waiman.long@hp.com, peterz@infradead.org, raghavendra.kt@linux.vnet.ibm.com Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9 Message-ID: <20150214083555.GA30176@opentech.at> References: <20150212003430.GA28656@linux.vnet.ibm.com> <1423710911.2046.50.camel@stgolabs.net> <20150212172805.GA20850@redhat.com> <20150212174144.GA21714@redhat.com> <20150212191009.GA26275@opentech.at> <20150212193734.GA28499@redhat.com> <20150212212746.GB30430@redhat.com> <20150213181752.GB11953@opentech.at> <20150213185328.GA19746@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150213185328.GA19746@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3004 Lines: 79 On Fri, 13 Feb 2015, Oleg Nesterov wrote: > On 02/13, Nicholas Mc Guire wrote: > > > > On Thu, 12 Feb 2015, Oleg Nesterov wrote: > > > > > Nicholas, sorry, I sent the patch but forgot to CC you. > > > See https://lkml.org/lkml/2015/2/12/587 > > > > > > And please note that "completion" was specially designed to guarantee > > > that complete() can't play with this memory after wait_for_completion/etc > > > returns. > > > > > > > hmmm.... I guess that "falling out of context" can happen in a number of cases > > with completion - any of the timeout/interruptible variants e.g: > > > > void xxx(void) > > { > > struct completion c; > > > > init_completion(&c); > > > > expose_this_completion(&c); > > > > wait_for_completion_timeout(&c,A_FEW_JIFFIES); > > } > > > > and if the other side did not call complete() within A_FEW_JIFFIES then > > it would result in the same failure - I don't think the API can prevent > > this type of bug. > > Yes sure, but in this case the user of wait_for_completion_timeout() should > blame itself, it is simply buggy. > > > Tt has to be ensured by additional locking > > Yes, but > > > drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting > > the completion to NULL and testing for !NULL before calling complete() > > with appropriate locking protection access. > > I don't understand this code, I can be easily wrong. but at first glance it > doesn't need completion at all. Exactly because it relies on the additional > fm->lock. ->finish_me could be "task_struct *", the tifm_7xx1_resume() could > simply do schedule_timeout(), tifm_7xx1_isr() could do wake_up_process(). > Nevermind, this is off-topic and most probably I misread this code. > this is unfortunately true for a few other places as well - so the problem of going out of scope with the _timeout/interruptible variants is quite general and there is no clean solution. you are right that its the users code that is buggy if the struct completion drops out of context - was jsut thinking if it were not a resonable extension of the competion API to eliminate that need to mess with locks to resolve this by adding a caccelation mechanism that would resolve this at the API level. Basically if you call wait_for_completion_timeout and the timeout condition occures you always need some way of notifying the completing end that it should no longer call complete()/complete_all(). > > Never the less of course the proposed change in completion_done() was a bug - > > many thanks for catching that so quickly ! > > OK, perhaps you can ack the fix I sent? > the only question I still have is that there would be no matching smp_wmb() to the smp_rmb() you are using (atleast I did not figure out where). thx! hofrat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/