Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp7386424rdb; Wed, 3 Jan 2024 14:24:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IG8HF13n53LjkG2pZ2zRpF4uDiODpl1/MIo/ESJF0CuNo3KhoAIR2B8Aas5a1/wpkO2m9rT X-Received: by 2002:ac8:4e4b:0:b0:428:472c:687a with SMTP id e11-20020ac84e4b000000b00428472c687amr19492qtw.65.1704320652408; Wed, 03 Jan 2024 14:24:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704320652; cv=none; d=google.com; s=arc-20160816; b=Hx38jt85ePyJiRC2nKAYvg34qRi3txlf5mHFkkpHB/sVMgD3WItZOkFTsiD9r31T91 WltJZdwdDlBSvI4TBHB1cgc92WMeEg0dpay44UM9u384uXlD5lSLqHvTcNlY22dTxiYu s9t39Ad8C0hT5jknB8fVBuQ3tLee49ZFDFbVT/KGWgwHUYcCxCHgw7x1E1r454SOFc3R lfQMWVDNqS+5W2sOSrasQZzSs2Ltt+FYQKNws109uMv2meOo7g+MXsE/YeXmwn6xN0zQ 3u2+lGrDlkn3Buz1As2cjKvIEf+VXDqEp2c8zKaZcWqQ4LiLZih8ebyI82keJRvIC3rN EAeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-disposition:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=X/RaQ2Nq+0A15SG6THspbska/ceZ8KiA1VRNxheKLJc=; fh=YHIswTDEX9Pyhantol4hab1ZYxyVebpwRoDD5wHZl04=; b=kuOWT/e9ml+yuoTSKapFl/BY+gAhfzI69XGRT8g/9NQlu/yyW3EabE87P2aJvKtUBR zKnnb1p1X30HhPcQeGTrUC81S1pvFgzHRXG1CfpweSg09N6Oc8qsD+HR6QpKV1nxhAS9 z0l5GwSV+sig2CZNoCqcJdTOFxraWU6Ti56QSJjgAstHE7edMkSH1JCa9j1i8QfxqmrK v+sqjk1NZxQRftpgEZXvWIqyqMVXWylni1F8c6tve/fb4D7mfN1kgEAHyd7eBkbR2SGN BuEHiAm6jU7D+td7v8GXdaUzKwKECjiv5ZMsuUo+GYxjczVd7FqS5Nuow+TRhRtuZ7OG Sx3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=MfSg7HGf; spf=pass (google.com: domain of linux-kernel+bounces-16067-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16067-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d19-20020a05622a101300b0042836375e92si2695500qte.620.2024.01.03.14.24.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jan 2024 14:24:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-16067-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=MfSg7HGf; spf=pass (google.com: domain of linux-kernel+bounces-16067-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16067-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2D5871C24384 for ; Wed, 3 Jan 2024 22:24:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AB9D21F927; Wed, 3 Jan 2024 22:22:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MfSg7HGf" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8D801F617; Wed, 3 Jan 2024 22:22:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 64094C433C8; Wed, 3 Jan 2024 22:22:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704320544; bh=Xqh8mRMTZjVBQ4W72hW9xL1GK2MfD1+kMbsRptHMHmE=; h=Date:From:To:Cc:Subject:Reply-To:From; b=MfSg7HGffmUNqmVb8SnoYQnRUSfHRKdvaTVTmVyXSK4o2Ph8l3qk20AmyjZzYXIAJ kDelRQbE0X6pkFnKmcISSLMIfyItVAVHGeXNUnxKTzIjGzBTUERVwNZP0qioZYBp8d hWTCXsXiNJBNtRmfUB4T7iSIEO1ZTp0wm5GXpZMHvV06+a5wOt+j0v4VcfuLMVcI2w hBpSdndVY79I3RwvYd/KxwBOGc1HiyEcByxUH3LkmeXFEUvoz4WFxd8vYbu4i26iK+ 7UWQHVgGW5ygPdbDrCT9F7/dRmAW4vcDiLqkSu+JeM3IMEe85o6Gb2dWpvFvh3yVQ7 LiY46smhkmW7g== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id F1E49CE08F4; Wed, 3 Jan 2024 14:22:23 -0800 (PST) Date: Wed, 3 Jan 2024 14:22:23 -0800 From: "Paul E. McKenney" To: Like Xu , Andi Kleen , Kan Liang , Luwei Kang , Peter Zijlstra , Paolo Bonzini Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Breno Leitao , Arnaldo Carvalho de Melo , Ingo Molnar Subject: [BUG] Guest OSes die simultaneously (bisected) Message-ID: <3d8f5987-e09c-4dd2-a9c0-8ba22c9e948a@paulmck-laptop> Reply-To: paulmck@kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hello! Since some time between v5.19 and v6.4, long-running rcutorture tests would (rarely but intolerably often) have all guests on a given host die simultaneously with something like an instruction fault or a segmentation violation. Each bisection step required 20 hosts running 10 hours each, and this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit is certainly messing with things that could possibly cause all manner of mischief, I don't immediately see a smoking gun. Except that the commit prior to this one is rock solid. Just to make things a bit more exciting, bisection in mainline proved to be problematic due to bugs of various kinds that hid this one. I was therefore forced to bisect among the commits backported to the internal v5.19-based kernel, which fingered the backported version of the patch called out above. Please note that this is not (yet) an emergency. I will just continue to run rcutorture on v5.19-based hypervisors in the meantime. Any suggestions for debugging or fixing? Thanx, Paul