Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2629283ybi; Mon, 1 Jul 2019 15:42:53 -0700 (PDT) X-Google-Smtp-Source: APXvYqw49p/gAT6vTQQGhI+nPIIITO/LVarCNqR/1vcyYZh+1nY3pduX4OdOVMjaenodzzWTugKh X-Received: by 2002:a63:3710:: with SMTP id e16mr26857133pga.391.1562020973237; Mon, 01 Jul 2019 15:42:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562020973; cv=none; d=google.com; s=arc-20160816; b=bOTgsJkzui0hwuI5qE+X495dnrzLRQfb7urGBOZYYnOxRMuWpWINz4FmmBe5XB4nI5 5lfrMYvkRH5USzwjfkOc/8kMz0byudYJxL6HlsCHHiNi6/l+Xm9/FIZcalONwAcP7Z1h k2xbsBy2F93JEAikmQi4d+MlNaDu/S4IX7M1MLcUUpuk+5yorMakELhngvZMixFcZHkU lGVJ5wrVXYzsBuPi1IfG3njmDihnPukYVh4A2y3/nlU1LZaRgADqhSdBnjB0O+ji6fXQ gnL3+yZV/vHuEkao55qgPI+GJnlcEP/xfMATS5/6b0Y5BKU9QjGT6m3ec98l9WJIXWAN 7+DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=JT6MH+24atdrSQVKLsSc7iK70obIOvvCq8l5yQDsyVg=; b=STr8lGuQxZ9sP1YDQf7ENzLHQJaobG8OV6ALKRNySconJxAULkzk4SOr4GvUcA9hYW FRbwtJ9dW/E8jMhPQ/nfhTLjGdD2rLeo4mQJnb6cb3Tr+pH6XqGhv2WkPRPTP5tGzrRS 25MVOC8QYd6wfkvg8770EgEdG08mdj1sI6C+js6ojTgA/w/DpsTMnV3hIB7YiQ7stGLk hgihuFbUeovAhPlionfsC4/FEpnXYeoK1tWmsZ30Z+tnMQnI3yldZlkAnXM/njsJLVp/ L7rpQd+uzi9LnH4iXOSmgHvR4ZV0TFcMtQz8fzCSwDLL4EFQfy08PU1Wr5IJh5Bn3gp3 fubg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d6si10648572pgv.132.2019.07.01.15.42.26; Mon, 01 Jul 2019 15:42:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726678AbfGAWlx (ORCPT + 99 others); Mon, 1 Jul 2019 18:41:53 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:42414 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726341AbfGAWlx (ORCPT ); Mon, 1 Jul 2019 18:41:53 -0400 Received: from pd9ef1cb8.dip0.t-ipconnect.de ([217.239.28.184] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hi4zV-0004Dh-BP; Tue, 02 Jul 2019 00:41:41 +0200 Date: Tue, 2 Jul 2019 00:41:40 +0200 (CEST) From: Thomas Gleixner To: Rong Chen cc: Feng Tang , x86@kernel.org, LKML , "H. Peter Anvin" , "tipbuild@zytor.com" , "lkp@01.org" , Ingo Molnar , kvm@vger.kernel.org, Paolo Bonzini , =?ISO-8859-2?Q?Radim_Kr=E8m=E1=F8?= , Fenghua Yu Subject: [BUG] kvm: APIC emulation problem - was Re: [LKP] [x86/hotplug] ... In-Reply-To: Message-ID: References: <20190628063231.GA7766@shbuild999.sh.intel.com> <20190630130347.GB93752@shbuild999.sh.intel.com> <20190701083654.GB12486@shbuild999.sh.intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Folks, after chasing a 0-day test failure for a couple of days, I was finally able to reproduce the issue. Background: In preparation of supporting IPI shorthands I changed the CPU offline code to software disable the local APIC instead of just masking it. That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV register. Failure: When the CPU comes back online the startup code triggers occasionally the warning in apic_pending_intr_clear(). That complains that the IRRs are not empty. The offending vector is the local APIC timer vector who's IRR bit is set and stays set. It took me quite some time to reproduce the issue locally, but now I can see what happens. It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1 (and hardware support) it behaves correctly. Here is the series of events: Guest CPU goes down native_cpu_disable() apic_soft_disable(); play_dead() .... startup() if (apic_enabled()) apic_pending_intr_clear() <- Not taken enable APIC apic_pending_intr_clear() <- Triggers warning because IRR is stale When this happens then the deadline timer or the regular APIC timer - happens with both, has fired shortly before the APIC is disabled, but the interrupt was not serviced because the guest CPU was in an interrupt disabled region at that point. The state of the timer vector ISR/IRR bits: ISR IRR before apic_soft_disable() 0 1 after apic_soft_disable() 0 1 On startup 0 1 Now one would assume that the IRR is cleared after the INIT reset, but this happens only on CPU0. Why? Because our CPU0 hotplug is just for testing to make sure nothing breaks and goes through an NMI wakeup vehicle because INIT would send it through the boots-trap code which is not really working if that CPU was not physically unplugged. Now looking at a real world APIC the situation in that case is: ISR IRR before apic_soft_disable() 0 1 after apic_soft_disable() 0 1 On startup 0 0 Why? Once the dying CPU reenables interrupts the pending interrupt gets delivered as a spurious interupt and then the state is clear. While that CPU0 hotplug test case is surely an esoteric issue, the APIC emulation is still wrong, Even if the play_dead() code would not enable interrupts then the pending IRR bit would turn into an ISR .. interrupt when the APIC is reenabled on startup. Thanks, tglx