Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754886AbbDTMRf (ORCPT ); Mon, 20 Apr 2015 08:17:35 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:38885 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754460AbbDTMRe (ORCPT ); Mon, 20 Apr 2015 08:17:34 -0400 Date: Mon, 20 Apr 2015 05:17:29 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: Linus Torvalds , Guenter Roeck , Rabin Vincent , Linux Kernel Mailing List , Peter Zijlstra , Thomas Gleixner Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking) Message-ID: <20150420121729.GN5561@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <55330B32.4010907@roeck-us.net> <20150419033940.GA25145@debian> <20150419093112.GA6139@gmail.com> <5533B6D7.9050101@roeck-us.net> <20150419180140.GA8934@gmail.com> <20150420053954.GA9923@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150420053954.GA9923@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15042012-0009-0000-0000-00000A3BACBA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5696 Lines: 124 On Mon, Apr 20, 2015 at 07:39:55AM +0200, Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar wrote: > > > > > > That's all fine and good, but why is an IPI sent to a non-existent > > > CPU? It's not like we don't know which CPU is up and down. > > > > I agree that the qemu-arm behavior smells like a bug, plain and > > simple. Nobody sane should send random IPI's to CPU's that they > > don't even know are up or not. > > Yeah, and a warning would have caught that bug a bit earlier, at the > cost of restricting the API: > > > That said, I could imagine that we would have some valid case where > > we just do a cross-cpu call to (for example) do lock wakeup, and the > > code would use some optimistic algorithm that prods the CPU after > > the lock has been released, and there could be some random race > > where the lock data structure has already been released (ie imagine > > the kind of optimistic unlocked racy access to "sem->owner" that we > > discussed as part of the rwsem_spin_on_owner() thread recently). > > > > So I _could_ imagine that somebody would want to do optimistic "prod > > other cpu" calls that in all normal cases are for existing cpus, but > > could be racy in theory. > > Yes, and I don't disagree with such optimizations in principle (it > allows less references to be taken in the fast path), but is it really > safe? > > If a CPU is going down and we potentially race against that, and send > off an IPI, the IPI might be 'in flight' for an indeterminate amount > of time, especially on wildly non-deterministic hardware like virtual > platforms. > > So if the CPU goes down during that time, the typical way a CPU goes > down is that it ignores all IPIs that arrive after that. End result: > the sender of the IPI may hang indefinitely (for the synchronous API), > waiting for the CSD lock to be released... > > For the non-wait API we could also hang, but in an even more > interesting way: we won't hang trying to send to the downed CPU, we'd > hang on the _next_ cross-call call, possibly sending to another CPU, > because the CSD_FLAG_LOCK of the sender CPU is never released by the > offlined CPU which was supposed to unlock it. > > Also note the existing warning we already have in > flush_smp_call_function_queue(): > > /* There shouldn't be any pending callbacks on an offline CPU. */ > if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) && > !warned && !llist_empty(head))) { > warned = true; > WARN(1, "IPI on offline CPU %d\n", smp_processor_id()); > > Couldn't this trigger in the opportunistic, imprecise IPI case, if > IRQs are ever enabled when or after the CPU is marked offline? > > My suggested warning would simply catch this kind of unsafe looking > race a bit sooner, instead of silently ignoring the cross-call > attempt. > > Now your suggestion could be made to work by: > > - polling for CPU status in the CSD-lock code as well. (We don't do > this currently.) > > - making sure that if a late IPI arrives to an already-down CPU it > does not attempt to execute CSD functions. (I.e. silence the > above, already existing warning.) > > - auditing the CPU-down path to make sure it does not get surprised > by late external IPIs. (I think this should already be so, given > the existing warning.) > > - IPIs can be pending indefinitely, so make sure a pending IPI won't > confuse the machinery after a CPU has been onlined again. > > - pending IPIs may consume hardware resources when not received > properly. For example I think x86 will keep trying to resend it. > Not sure there's any timeout mechanism on the hardware level - the > sending APIC might even get stuck? Audit all hardware for this > kind of possibility. > > So unless I'm missing something, to me it looks like that the current > code is only safe to be used on offline CPUs if the 'offline' CPU is > never ever brought online in the first place. > > > It doesn't sound like the qemu-arm case is that kind of situation, > > though. That one just sounds like a stupid "let's send an ipi to a > > cpu whether it exists or not". > > > > But Icommitted it without any new warning, because I could in theory > > see it as being a valid use. > > So my point is that we already have a 'late' warning, and I think it > actively hid the qemu-arm bug, because the 'offline' CPU was so > offline (due to being non-existent) that it never had a chance to > print a warning. > > With an 'early' warning we could flush out such bugs (or code > uncleanlinesses: clearly the ARM code was at least functional before) > a bit sooner. > > But I don't have strong feelings about it, SMP cross-call users are a > lot less frequent than say locking facility users, so the strength of > debugging isn't nearly as important and it's fine to me either way! In the long term, support of "send an IPI to a CPU that might or might not be leaving" will probably need an API that returns a success or failure indication. This will probably simplify things a bit, because currently the caller needs to participate in the IPI's synchronization. With a conditional primitive, callers could instead simply rely on the return value. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/