Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753803AbbDTFkC (ORCPT ); Mon, 20 Apr 2015 01:40:02 -0400 Received: from mail-wg0-f49.google.com ([74.125.82.49]:33471 "EHLO mail-wg0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752231AbbDTFkA (ORCPT ); Mon, 20 Apr 2015 01:40:00 -0400 Date: Mon, 20 Apr 2015 07:39:55 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Guenter Roeck , Rabin Vincent , Linux Kernel Mailing List , Peter Zijlstra , Thomas Gleixner , "Paul E. McKenney" Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking) Message-ID: <20150420053954.GA9923@gmail.com> References: <20150418234050.GA5987@roeck-us.net> <55330B32.4010907@roeck-us.net> <20150419033940.GA25145@debian> <20150419093112.GA6139@gmail.com> <5533B6D7.9050101@roeck-us.net> <20150419180140.GA8934@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5046 Lines: 117 * Linus Torvalds wrote: > On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar wrote: > > > > That's all fine and good, but why is an IPI sent to a non-existent > > CPU? It's not like we don't know which CPU is up and down. > > I agree that the qemu-arm behavior smells like a bug, plain and > simple. Nobody sane should send random IPI's to CPU's that they > don't even know are up or not. Yeah, and a warning would have caught that bug a bit earlier, at the cost of restricting the API: > That said, I could imagine that we would have some valid case where > we just do a cross-cpu call to (for example) do lock wakeup, and the > code would use some optimistic algorithm that prods the CPU after > the lock has been released, and there could be some random race > where the lock data structure has already been released (ie imagine > the kind of optimistic unlocked racy access to "sem->owner" that we > discussed as part of the rwsem_spin_on_owner() thread recently). > > So I _could_ imagine that somebody would want to do optimistic "prod > other cpu" calls that in all normal cases are for existing cpus, but > could be racy in theory. Yes, and I don't disagree with such optimizations in principle (it allows less references to be taken in the fast path), but is it really safe? If a CPU is going down and we potentially race against that, and send off an IPI, the IPI might be 'in flight' for an indeterminate amount of time, especially on wildly non-deterministic hardware like virtual platforms. So if the CPU goes down during that time, the typical way a CPU goes down is that it ignores all IPIs that arrive after that. End result: the sender of the IPI may hang indefinitely (for the synchronous API), waiting for the CSD lock to be released... For the non-wait API we could also hang, but in an even more interesting way: we won't hang trying to send to the downed CPU, we'd hang on the _next_ cross-call call, possibly sending to another CPU, because the CSD_FLAG_LOCK of the sender CPU is never released by the offlined CPU which was supposed to unlock it. Also note the existing warning we already have in flush_smp_call_function_queue(): /* There shouldn't be any pending callbacks on an offline CPU. */ if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) && !warned && !llist_empty(head))) { warned = true; WARN(1, "IPI on offline CPU %d\n", smp_processor_id()); Couldn't this trigger in the opportunistic, imprecise IPI case, if IRQs are ever enabled when or after the CPU is marked offline? My suggested warning would simply catch this kind of unsafe looking race a bit sooner, instead of silently ignoring the cross-call attempt. Now your suggestion could be made to work by: - polling for CPU status in the CSD-lock code as well. (We don't do this currently.) - making sure that if a late IPI arrives to an already-down CPU it does not attempt to execute CSD functions. (I.e. silence the above, already existing warning.) - auditing the CPU-down path to make sure it does not get surprised by late external IPIs. (I think this should already be so, given the existing warning.) - IPIs can be pending indefinitely, so make sure a pending IPI won't confuse the machinery after a CPU has been onlined again. - pending IPIs may consume hardware resources when not received properly. For example I think x86 will keep trying to resend it. Not sure there's any timeout mechanism on the hardware level - the sending APIC might even get stuck? Audit all hardware for this kind of possibility. So unless I'm missing something, to me it looks like that the current code is only safe to be used on offline CPUs if the 'offline' CPU is never ever brought online in the first place. > It doesn't sound like the qemu-arm case is that kind of situation, > though. That one just sounds like a stupid "let's send an ipi to a > cpu whether it exists or not". > > But Icommitted it without any new warning, because I could in theory > see it as being a valid use. So my point is that we already have a 'late' warning, and I think it actively hid the qemu-arm bug, because the 'offline' CPU was so offline (due to being non-existent) that it never had a chance to print a warning. With an 'early' warning we could flush out such bugs (or code uncleanlinesses: clearly the ARM code was at least functional before) a bit sooner. But I don't have strong feelings about it, SMP cross-call users are a lot less frequent than say locking facility users, so the strength of debugging isn't nearly as important and it's fine to me either way! Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/