Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp7409304rdb; Wed, 3 Jan 2024 15:27:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IH7PubBNXKSBKGNsRn+SlG76olC0a8hYa7hXNPOrPv9EP36N3UG/+hyw/mOwhy4UhHROT0B X-Received: by 2002:a05:6214:4017:b0:680:b7e4:1c48 with SMTP id kd23-20020a056214401700b00680b7e41c48mr4647064qvb.113.1704324461141; Wed, 03 Jan 2024 15:27:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704324461; cv=none; d=google.com; s=arc-20160816; b=XpLy+zuGXO5vpb3sSMlKaLEYMkUtxn4JcPvovxsPxPLATyxGsolmbAvTQWHa1z/hOZ qfFyWKiRnMBTDndUDyiewwhMPcSqHpjWrrsedJQ2+mpi4TGEHXEmu58QWizVp1TQ78AT OG95jaRZpxULKt2hTDz6CRaOZuFfT80bhAsziUJqpe5UDCaAy+AJ8FCdoBjbdPiJqP93 TIG4jFf85Aba8Tzdi5q/X32CI3/aRvzZZpaWBe7rfsKO6T28M/aSwPfXPLQcis/Qa1CX a9tV8suOP11PtAwV7ku40cQQQk0bQ017Z2VfEuR9FlwINwBiZSeVuCxpFuhuo+QwqgMO 4kOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=jCBwr2v9guVeOUDTpLrMmhI/gvn1yU3XAJeXJeSxfxE=; fh=YHIswTDEX9Pyhantol4hab1ZYxyVebpwRoDD5wHZl04=; b=RNyHDO1stgwwPvbaV4DGHj4ExWlhLQiM1eSpwkY3Pmkb5kBWbP4dAMT33va7rwV+QQ qAAADHregetShZ7AOgJwTj7u5tKQt+tjcwE70e/mtWAH9HxWeQRMCNTMrlfdAieqs+xG FdJC2Mu2EPrs+qxr3qnoGAVEGQO+TKoSiht3M1nyH/o20LeBvyjzpMY6pVH1DstWwWfQ XTLxEET54dnopn3gGnpjoWm6RD3X0aFU0E8FW50X/ej3fsDetMzaWEr2UiHiEZHQP71e kuLjNewaW8SbHzCGL6GhlPlfy6fKhG9Ko/MM+hP+mj0saxVoe6RoqP6uP2rf0kWxmvQs I5mA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=I1hCv1IQ; spf=pass (google.com: domain of linux-kernel+bounces-16114-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16114-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id o17-20020a0ce411000000b0067f92e3d944si28367871qvl.398.2024.01.03.15.27.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jan 2024 15:27:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-16114-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=I1hCv1IQ; spf=pass (google.com: domain of linux-kernel+bounces-16114-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16114-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id DE4DF1C24650 for ; Wed, 3 Jan 2024 23:27:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3634A1F926; Wed, 3 Jan 2024 23:27:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="I1hCv1IQ" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60B941EB2D; Wed, 3 Jan 2024 23:27:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3339C433C8; Wed, 3 Jan 2024 23:27:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704324449; bh=3dPFVqfz1/USEV5DQhlEMM+/aJ8Fw/l5eoald+GaiQw=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=I1hCv1IQXD7U5e9Kkl/t2bFnhC+m7V3CuOhr+fALuyhDspcrXQizuER3UAyshT8Iy 6SkW9QNU2Wk8QSxqJ+ko4K573aUqtma9kAZHUSM3q2kMlxVIlKwIs+qjxEaGN4o6mI HW+Envdx0SJAELzyqNCMUDQay7NIixWLMqBIPqZUzcclchUBZ/TF/pSGsxso2AS9nF ebe1oLZVPC4RgTFI+PvFFsJvJrBEdD2uVkS7fbcXH0DRvZ+Tdzo4556gDjMvsnuMOU X4YIjVANRpwI9kdH7MLmaC73WMWp3IQfJvSpOZSyap/5WqfLF/4EjJv9lh//V5gQgi jO5zdoUQXRnRQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 5851ACE08F4; Wed, 3 Jan 2024 15:27:29 -0800 (PST) Date: Wed, 3 Jan 2024 15:27:29 -0800 From: "Paul E. McKenney" To: Like Xu , Andi Kleen , Kan Liang , Luwei Kang , Peter Zijlstra , Paolo Bonzini Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Breno Leitao , Arnaldo Carvalho de Melo , Ingo Molnar Subject: Re: [BUG] Guest OSes die simultaneously (bisected) Message-ID: <88f49775-2b56-48cc-81b8-651a940b7d6b@paulmck-laptop> Reply-To: paulmck@kernel.org References: <3d8f5987-e09c-4dd2-a9c0-8ba22c9e948a@paulmck-laptop> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3d8f5987-e09c-4dd2-a9c0-8ba22c9e948a@paulmck-laptop> On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > Hello! > > Since some time between v5.19 and v6.4, long-running rcutorture tests > would (rarely but intolerably often) have all guests on a given host die > simultaneously with something like an instruction fault or a segmentation > violation. > > Each bisection step required 20 hosts running 10 hours each, and > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit > is certainly messing with things that could possibly cause all manner > of mischief, I don't immediately see a smoking gun. Except that the > commit prior to this one is rock solid. > > Just to make things a bit more exciting, bisection in mainline proved > to be problematic due to bugs of various kinds that hid this one. I was > therefore forced to bisect among the commits backported to the internal > v5.19-based kernel, which fingered the backported version of the patch > called out above. Ah, and so why do I believe that this is a problem in mainline rather than just (say) a backporting mistake? Because this issue was first located in v6.4, which already has this commit included. Thanx, Paul > Please note that this is not (yet) an emergency. I will just continue > to run rcutorture on v5.19-based hypervisors in the meantime. > > Any suggestions for debugging or fixing? > > Thanx, Paul