Received: by 10.223.185.116 with SMTP id b49csp3713231wrg; Tue, 13 Feb 2018 06:40:32 -0800 (PST) X-Google-Smtp-Source: AH8x226blV24ohZQ5eBHwCVIlcANcTAcJtQ/y47YDj/rQWz7I6be2y859EdNsKAATZ6FtWFjxP6J X-Received: by 10.98.143.1 with SMTP id n1mr1469314pfd.126.1518532832297; Tue, 13 Feb 2018 06:40:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518532832; cv=none; d=google.com; s=arc-20160816; b=MdJBjtCJmtCaMpv71ID8oPqdBB0ZGp+DGEMO5aj+WTW5Wx+dx8dULj4gVsoQjC3bMH 3anVh9B+6BU6YSylrSP4e5d9TDB4ZRwJFck+vxC1gNdPB0HV9LD+rWdZhO6DSms1WMLh w93f273sqtPIPwXCDoC56IOwiCfq2KBMgU0DFbcnVGT15UKooNz3S9N5EshV2xVMNflK M3cy8wKWYAlwUK8kTOqTtioyxA/xgliO1P6gX1Y4Ea7Zsf6L6yh5smILxQx7njekHxlP mSK+R4JrXmyHVA9fzZxQeRc4KWNcrgbdyzJsiPvZQww+wDOjjT9SEunQ38zGZnpTTQSm 6ORg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=krI3hTD32HjApPfpy5rUZ7lbrCtnhtJj0viHQd4NiRk=; b=YiQGqA3qUXdfZugubJZ4ajj1V+hLU49upEyq065AAM7F9ehWebzL7xcm/UasfGnjjD 42jvLx7NoDehQRON2BFM8UxY0S46xB5zK6w/01TU/7fbpwPGHWRu0ITlqRMK5WWDbKQS sUAmVWUKIYURu8rXUA1NUTt9AXMW/7c22zcq2YcBpxdMCFIB8jG8s+4rqXyS6C/PzjTc WlI5G49tCRRr1IXiuXHt5k3oA+6AJtqMcxSemHDLbrff4aPSZqkaIPYVG/u0xAtJLZjF KATnon0RKvNynSoFHtbwZjmid9D6vBKUhPzrXAfqtUMCzDAk5sP2DGom87B5wbtJHx3r 3Dew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 101-v6si2327970ple.263.2018.02.13.06.40.17; Tue, 13 Feb 2018 06:40:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935491AbeBMOjA (ORCPT + 99 others); Tue, 13 Feb 2018 09:39:00 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:50668 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935246AbeBMOi6 (ORCPT ); Tue, 13 Feb 2018 09:38:58 -0500 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1elbgM-00039j-0h; Tue, 13 Feb 2018 15:35:42 +0100 Date: Tue, 13 Feb 2018 15:39:02 +0100 (CET) From: Thomas Gleixner To: Tvrtko Ursulin cc: Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: smpboot: do_boot_cpu failed(-1) to wakeup CPU#0 In-Reply-To: Message-ID: References: <1696e2c6-8d0a-e954-1205-439d70a81f77@ursulin.net> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Feb 2018, Tvrtko Ursulin wrote: > On 07/02/18 12:48, Tvrtko Ursulin wrote: > > We are seeing failures to online the CPU0 on Apollo Lake in the form of: > > > > <6>[ 126.508783] smpboot: CPU 0 is now offline > > <6>[ 127.520746] smpboot: Booting Node 0 Processor 0 APIC 0x0 > > <3>[ 137.521036] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0 > > > > I unfortunately cannot say with which kernel version this started since > > we added a test which does this only recently. I also have no local > > access to this machine. (It is part of a test farm for i915 driver > > development testing.) But we recently added a test which off-lines, and > > on-lines back, CPUs and started seeing this. Small reproducer looks like > > this (without boilerplate): > > Any hints on how to debug this? Could it be firwmare? Try some boot options or > something? There are issues with CPU0 hotplug on commodity hardware. I have systems where it does not work, but TBH I never bothered to investigate it. Some years ago we had issues with suspend/resume when it was not running on CPU0. These were related to firmware assumptions about CPU0. So I wouldn't be too surprised if there are general issues with unplugging CPU0. CPU0 unplug is really only relevant for systems which support physical hotplug, so testing it on commodity hardware does not have much value. Testing in VMs for increasing the test coverage works well enough. Thanks, tglx