Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752900AbbBWUuv (ORCPT ); Mon, 23 Feb 2015 15:50:51 -0500 Received: from casper.infradead.org ([85.118.1.10]:42001 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752713AbbBWUuu (ORCPT ); Mon, 23 Feb 2015 15:50:50 -0500 Date: Mon, 23 Feb 2015 21:50:43 +0100 From: Peter Zijlstra To: Linus Torvalds Cc: Rafael David Tinoco , LKML , Thomas Gleixner , Jens Axboe , Frederic Weisbecker , Gema Gomez , Christopher Arges Subject: Re: smp_call_function_single lockups Message-ID: <20150223205043.GF5029@twins.programming.kicks-ass.net> References: <20150218222544.GA17717@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1800 Lines: 44 On Mon, Feb 23, 2015 at 11:32:50AM -0800, Linus Torvalds wrote: > On Mon, Feb 23, 2015 at 6:01 AM, Rafael David Tinoco wrote: > > > > This is v3.19 + your patch (smp acquire/release) > > - (nested kvm with 2 vcpus on top of proliant with x2apic cluster mode > > and acpi_idle) > > Hmm. There is absolutely nothing else going on on that machine, except > for the single call to smp_call_function_single() that is waiting for > the CSD to be released. > > > * It looks like we got locked because of reentrant flush_tlb_* through > > smp_call_* > > but I'll leave it to you. > > No, that is all a perfectly regular callchain: > > .. native_flush_tlb_others -> smp_call_function_many -> > smp_call_function_single > > but the stack contains some stale addresses (one is probably just from > smp_call_function_single() calling into "generic_exec_single()", and > thus the stack contains the return address inside > smp_call_function_single() in _addition_ to the actual place where the > watchdog timer then interrupted it). > > It all really looks very regular and sane, and looks like > smp_call_function_single() is happily just waiting for the IPI to > finish in the (inlined) csd_lock_wait(). > > I see nothing wrong at all. [11396.096002] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 But its a virtual machine right? Its not running bare metal, its running a !virt kernel on a virt machine, so maybe some of the virt muck is borked? A very subtly broken APIC emulation would be heaps of 'fun'. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/