Date: Fri, 20 Feb 2015 20:41:14 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael David Tinoco <inaddy@ubuntu.com>, Peter Anvin <hpa@zytor.com>,
        Jiang Liu <jiang.liu@linux.intel.com>,
        Peter Zijlstra <peterz@infradead.org>,
        LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Gema Gomez <gema.gomez-solano@canonical.com>,
        Christopher Arges <chris.j.arges@canonical.com>,
        the arch/x86 maintainers <x86@kernel.org>
Subject: Re: smp_call_function_single lockups
Message-ID: <20150220194114.GA3603@gmail.com>
References: <CA+55aFz492bzLFhdbKN-Hygjcreup7CjMEYk3nTSfRWjppz-OA@mail.gmail.com>
 <20150218222544.GA17717@twins.programming.kicks-ass.net>
 <CAMiJ5CV--EFGnZSvJcrUrYVjy1PWueCQq5i5D+i0=p9BArPnjw@mail.gmail.com>
 <CA+55aFwWT--5mgKqryfFAbgaoEacsZn8dZ0POWH3xpdNgRMuRw@mail.gmail.com>
 <CAMiJ5CU+rvQr-_Ejd3m3ha3HsiSKu0Sq_fTaE2Ws_c_01=qbLQ@mail.gmail.com>
 <CA+55aFxWBKHth7x3FJ+dpGfy0ZT7SUhHnX7tDfgDo-wXTeX5Lg@mail.gmail.com>
 <CA+55aFx2n9zsqwuW=p6KJF62rXp+9_M-HF3wbeJRA-MeT0XLLw@mail.gmail.com>
 <CA+55aFyv1pJod7bhetc0ikmuCKzE=uhmT14KMju_fTbP93gLWA@mail.gmail.com>
 <20150220093000.GA22661@gmail.com>
 <CA+55aFyspvwLbkqktHHib7LB7pWW9a1CS-rc4oLJoz_Z9kQSRw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFyspvwLbkqktHHib7LB7pWW9a1CS-rc4oLJoz_Z9kQSRw@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3071
Lines: 75


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Feb 20, 2015 at 1:30 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > So if my memory serves me right, I think it was for 
> > local APICs, and even there mostly it was a performance 
> > issue: if an IO-APIC sent more than 2 IRQs per 'level' 
> > to a local APIC then the IO-APIC might be forced to 
> > resend those IRQs, leading to excessive message traffic 
> > on the relevant hardware bus.
> 
> Hmm. I have a distinct memory of interrupts actually 
> being lost, but I really can't find anything to support 
> that memory, so it's probably some drug-induced confusion 
> of mine. I don't find *anything* about interrupt "levels" 
> any more in modern Intel documentation on the APIC, but 
> maybe I missed something. But it might all have been an 
> IO-APIC thing.

So I just found an older discussion of it:

  http://www.gossamer-threads.com/lists/linux/kernel/1554815?do=post_view_threaded#1554815

while it's not a comprehensive description, it matches what 
I remember from it: with 3 vectors within a level of 16 
vectors we'd get excessive "retries" sent by the IO-APIC 
through the (then rather slow) APIC bus.

( It was possible for the same phenomenon to occur with 
  IPIs as well, when a CPU sent an APIC message to another
  CPU, if the affected vectors were equal modulo 16 - but
  this was rare IIRC because most systems were dual CPU so
  only two IPIs could have occured. )

> Well, the attached patch for that seems pretty trivial. 
> And seems to work for me (my machine also defaults to 
> x2apic clustered mode), and allows the APIC code to start 
> doing a "send to specific cpu" thing one by one, since it 
> falls back to the send_IPI_mask() function if no 
> individual CPU IPI function exists.
> 
> NOTE! There's a few cases in 
> arch/x86/kernel/apic/vector.c that also do that 
> "apic->send_IPI_mask(cpumask_of(i), .." thing, but they 
> aren't that important, so I didn't bother with them.
> 
> NOTE2! I've tested this, and it seems to work, but maybe 
> there is something seriously wrong. I skipped the 
> "disable interrupts" part when doing the "send_IPI", for 
> example, because I think it's entirely unnecessary for 
> that case. But this has certainly *not* gotten any real 
> stress-testing.

I'm not so sure about that aspect: I think disabling IRQs 
might be necessary with some APICs (if lower levels don't 
disable IRQs), to make sure the 'local APIC busy' bit isn't 
set:

we typically do a wait_icr_idle() call before sending an 
IPI - and if IRQs are not off then the idleness of the APIC 
might be gone. (Because a hardirq that arrives after a 
wait_icr_idle() but before the actual IPI sending sent out 
an IPI and the queue is full.)

So the IPI sending should be atomic in that sense.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/