Date: Mon, 20 Apr 2015 05:17:29 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Guenter Roeck <linux@roeck-us.net>, Rabin Vincent <rabin@rab.in>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>
Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix
 smp_call_function_single_async() locking)
Message-ID: <20150420121729.GN5561@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <CA+55aFz1934X3wu7FdGerwYMJ_BAkrFsajOQFtU3_ogsUX3_eQ@mail.gmail.com>
 <55330B32.4010907@roeck-us.net>
 <20150419033940.GA25145@debian>
 <CA+55aFw4FSja+VBuCYJ7wLXKVRQZ7w6vOUaUJ4B=FXyBmNkrUg@mail.gmail.com>
 <CA+55aFwTMJ7CAtBaL-uGsgVf9-oyGwqLZwz_pwjX=umbS-2CNQ@mail.gmail.com>
 <20150419093112.GA6139@gmail.com>
 <5533B6D7.9050101@roeck-us.net>
 <20150419180140.GA8934@gmail.com>
 <CA+55aFw=O3OpwU5ARrv=j7W7ZCNPkyZq7uJ24y5gXksDyM6MeQ@mail.gmail.com>
 <20150420053954.GA9923@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150420053954.GA9923@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5696
Lines: 124

On Mon, Apr 20, 2015 at 07:39:55AM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > That's all fine and good, but why is an IPI sent to a non-existent 
> > > CPU? It's not like we don't know which CPU is up and down.
> > 
> > I agree that the qemu-arm behavior smells like a bug, plain and 
> > simple. Nobody sane should send random IPI's to CPU's that they 
> > don't even know are up or not.
> 
> Yeah, and a warning would have caught that bug a bit earlier, at the 
> cost of restricting the API:
> 
> > That said, I could imagine that we would have some valid case where 
> > we just do a cross-cpu call to (for example) do lock wakeup, and the 
> > code would use some optimistic algorithm that prods the CPU after 
> > the lock has been released, and there could be some random race 
> > where the lock data structure has already been released (ie imagine 
> > the kind of optimistic unlocked racy access to "sem->owner" that we 
> > discussed as part of the rwsem_spin_on_owner() thread recently).
> > 
> > So I _could_ imagine that somebody would want to do optimistic "prod 
> > other cpu" calls that in all normal cases are for existing cpus, but 
> > could be racy in theory.
> 
> Yes, and I don't disagree with such optimizations in principle (it 
> allows less references to be taken in the fast path), but is it really 
> safe?
> 
> If a CPU is going down and we potentially race against that, and send 
> off an IPI, the IPI might be 'in flight' for an indeterminate amount 
> of time, especially on wildly non-deterministic hardware like virtual 
> platforms.
> 
> So if the CPU goes down during that time, the typical way a CPU goes 
> down is that it ignores all IPIs that arrive after that. End result: 
> the sender of the IPI may hang indefinitely (for the synchronous API), 
> waiting for the CSD lock to be released...
> 
> For the non-wait API we could also hang, but in an even more 
> interesting way: we won't hang trying to send to the downed CPU, we'd 
> hang on the _next_ cross-call call, possibly sending to another CPU, 
> because the CSD_FLAG_LOCK of the sender CPU is never released by the 
> offlined CPU which was supposed to unlock it.
> 
> Also note the existing warning we already have in 
> flush_smp_call_function_queue():
> 
>         /* There shouldn't be any pending callbacks on an offline CPU. */
>         if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
>                      !warned && !llist_empty(head))) {
>                 warned = true;
>                 WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
> 
> Couldn't this trigger in the opportunistic, imprecise IPI case, if 
> IRQs are ever enabled when or after the CPU is marked offline?
> 
> My suggested warning would simply catch this kind of unsafe looking 
> race a bit sooner, instead of silently ignoring the cross-call 
> attempt.
> 
> Now your suggestion could be made to work by:
> 
>   - polling for CPU status in the CSD-lock code as well. (We don't do
>     this currently.)
> 
>   - making sure that if a late IPI arrives to an already-down CPU it
>     does not attempt to execute CSD functions. (I.e. silence the
>     above, already existing warning.)
> 
>   - auditing the CPU-down path to make sure it does not get surprised
>     by late external IPIs. (I think this should already be so, given
>     the existing warning.)
> 
>   - IPIs can be pending indefinitely, so make sure a pending IPI won't 
>     confuse the machinery after a CPU has been onlined again.
> 
>   - pending IPIs may consume hardware resources when not received 
>     properly. For example I think x86 will keep trying to resend it. 
>     Not sure there's any timeout mechanism on the hardware level - the 
>     sending APIC might even get stuck? Audit all hardware for this 
>     kind of possibility.
> 
> So unless I'm missing something, to me it looks like that the current 
> code is only safe to be used on offline CPUs if the 'offline' CPU is 
> never ever brought online in the first place.
> 
> > It doesn't sound like the qemu-arm case is that kind of situation, 
> > though. That one just sounds like a stupid "let's send an ipi to a 
> > cpu whether it exists or not".
> > 
> > But Icommitted it without any new warning, because I could in theory 
> > see it as being a valid use.
> 
> So my point is that we already have a 'late' warning, and I think it 
> actively hid the qemu-arm bug, because the 'offline' CPU was so 
> offline (due to being non-existent) that it never had a chance to 
> print a warning.
> 
> With an 'early' warning we could flush out such bugs (or code 
> uncleanlinesses: clearly the ARM code was at least functional before) 
> a bit sooner.
> 
> But I don't have strong feelings about it, SMP cross-call users are a 
> lot less frequent than say locking facility users, so the strength of 
> debugging isn't nearly as important and it's fine to me either way!

In the long term, support of "send an IPI to a CPU that might or might
not be leaving" will probably need an API that returns a success or
failure indication.  This will probably simplify things a bit, because
currently the caller needs to participate in the IPI's synchronization.
With a conditional primitive, callers could instead simply rely on the
return value.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/