Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031589AbbKEBcV (ORCPT ); Wed, 4 Nov 2015 20:32:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39897 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031510AbbKEBcS (ORCPT ); Wed, 4 Nov 2015 20:32:18 -0500 Date: Thu, 5 Nov 2015 12:32:43 +1100 From: David Gibson To: Laurent Vivier Cc: Hari Bathini , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , thuth@redhat.com, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online Message-ID: <20151105123243.47dda843@voom.fritz.box> In-Reply-To: <563A0E2B.4090404@redhat.com> References: <1444935658-27319-1-git-send-email-lvivier@redhat.com> <5639FB4A.7020508@linux.vnet.ibm.com> <563A0E2B.4090404@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; boundary="Sig_/=4D6S1VH0bpPJrsE85/Ep4z"; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3290 Lines: 89 --Sig_/=4D6S1VH0bpPJrsE85/Ep4z Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Wed, 4 Nov 2015 14:54:51 +0100 Laurent Vivier wrote: >=20 >=20 > On 04/11/2015 13:34, Hari Bathini wrote: > > On 10/16/2015 12:30 AM, Laurent Vivier wrote: > >> On kexec, all secondary offline CPUs are onlined before > >> starting the new kernel, this is not done in the case of kdump. > >> > >> If kdump is configured and a kernel crash occurs whereas > >> some secondaries CPUs are offline (SMT=3Doff), > >> the new kernel is not able to start them and displays some > >> "Processor X is stuck.". > >> > >> Starting with POWER8, subcore logic relies on all threads of > >> core being booted. So, on startup kernel tries to start all > >> threads, and asks OPAL (or RTAS) to start all CPUs (including > >> threads). If a CPU has been offlined by the previous kernel, > >> it has not been returned to OPAL, and thus OPAL cannot restart > >> it: this CPU has been lost... > >> > >> Signed-off-by: Laurent Vivier > >=20 > >=20 > > Hi Laurent, >=20 > Hi Hari, >=20 > > Sorry for jumping too late into this. >=20 > better late than never :) >=20 > > Are you seeing this issue even with the below patches: > >=20 > > pseries: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?= id=3Dc1caae3de46a072d0855729aed6e793e536a4a55 Unfortunately, this is unlikely to be relevant - this fixes a failure while setting up the kexec. The problem we see occurs once we've booted the second kernel and it's attempting to bring up secondary CPUs. > > opal/powernv: > > https://github.com/open-power/skiboot/commit/9ee56b5 >=20 > Very interesting. Is there a way to have a firmware with the fix ? =46rom Laurent's analysis of the crash, I don't think this will be relevant either, but I'm not sure. It would be very interesting to know which (if any) released firmwares include this patch so we can test it. --=20 David Gibson Senior Software Engineer, Virtualization, Red Hat --Sig_/=4D6S1VH0bpPJrsE85/Ep4z Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWOrG7AAoJEGw4ysog2bOSs34P/iAUweczZ2c4gjv/FHo3ySwU +gEQKEYFKHPD3elPxIBVZ+I5kARpnzWyNHk59yBasnfiQFeX8AK0ZCoVcTKNSCWQ MeG/lDWuzx2/PXfMcWlyLeJbwxEKT6SQE1vgvbG9nmg9BIqKXllCGhUq+6g6T7C8 puOuUFlGNJXNU3gTrLS5TAHjDr0GL0hDP5AEqsG70V+zphU2MH17e15p7Fve/tW+ lyNkZxaAnHyrdXd5aXf+vta68AQAX7D117jlhcJr62WqSvSB+o5GTW/yx1vD9K8w PyFlWDn3z7JdPr1VOvjEDVm2mbUTnBF7RjoslsncNwV6IoJRkjU/U+ffVF/TcRgR 2fu6Qc9jcBIkI0fwq/t0Z1UJ8f+VAzkkl0MOSzP1qKtyC5krQRAeYYxLcy188BJD 4LM5PANsfAdzSIkfk3oIDNVTXdXJIq272ylMf6D6LxCf2/rrM7CAlddcvO/eGYr1 Whu65LELteHbOjeIWqV4byf8fWzkEh4xPSklhdaSdii6zJy89/OTIf3LkFqpJ6sM iJ+FM21nYdTshzq0S0gjXd6rwlQHPLxXhJ0W+MladCXfxBDwpB50k5dHE5lAjVlr liknmLIJ3hiaN0gYiT/4/jcDCFOfXg+yp1Hx0w9Q3+XA6M0FEwEPepTxUWiXiKDB Lyu4ysPPYgddDMT7ckGK =qeDS -----END PGP SIGNATURE----- --Sig_/=4D6S1VH0bpPJrsE85/Ep4z-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/