Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp7908777rdb; Thu, 4 Jan 2024 11:24:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IG7oe2svWjpFrsR06K0lepDttfU1lo56oLpwOzUevXXHwWQM0a0+m8yQWzeS18Rbpd7T2lC X-Received: by 2002:a17:90a:b393:b0:28c:d8e5:6105 with SMTP id e19-20020a17090ab39300b0028cd8e56105mr959568pjr.96.1704396263237; Thu, 04 Jan 2024 11:24:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704396263; cv=none; d=google.com; s=arc-20160816; b=skLceKUFIyjObLzd0LWRKom1stmPWUnkhQd8G3YCyjNvrZwsC2O9Is2M5UvAAq6IY1 eNiWbFmQaPWYWOVoDzR04EpzWgKWUqvT261/bcOK5iHciRHYUGtN9t/WE4GM0+SeAZUZ dltJdSXvdPGSJBY7J68T4wTW2LQ2cX2BftXob2vuVDI0YXZ1dYdcvwWV86Yq4EQ73+cu QoA9lzWQHj7m48d5fS+hoKFLgaXLlEyYmEXt5nlTo/KzSPWwLKw/1RfF07jkMFOHyWNd uGk66aw+oA1LItAmG90pU12s5xQ69zpwn0/c0nMmrjkUHD2Mkqq+zGjkvIOAatA2axHd 19/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=mmgtiZcQLzYEd89wImp7tB5lsBZpJqXmNOftZlUdK7I=; fh=j05pIcpwJyXf5IVujo3Ayb/lSHCA0+XbIIk3Ew+Rf5E=; b=w9zr3/r80nAAROhH7V3mHsqraSrxbO3jl7Z4h8Imcn/DQ4DB/uTOyJ1zF/q1VJj4gM iL0BGHECRGbypAKDHgpypL0ZVS0rpdaLHBQvESzqlhQj7I71MskDUkdMtloC6OR2QiMn 62Bwd/3CPCvdoNga4RFrmxZuHzg/dPp8bEhX4hFvIzlwdcxjiWnZP53qxJG33V7JvybO BwNMQaDSMQ5igL+cpoe9/w2UTjyk446HYUxXtqfVb+eQI+m9gEb9p/6JOHcCWywPhYQu KArFjb6U1+BSlm7dKxbPq8am3AMnGdtD4yhe+vZ5zAmlivrzXseEXI5xIg9xa8QEGWP3 dHqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=meNwmT0f; spf=pass (google.com: domain of linux-kernel+bounces-17151-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-17151-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id gt7-20020a17090af2c700b0028c0df1fedesi90341pjb.175.2024.01.04.11.24.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jan 2024 11:24:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-17151-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=meNwmT0f; spf=pass (google.com: domain of linux-kernel+bounces-17151-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-17151-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id EC871B22D84 for ; Thu, 4 Jan 2024 19:24:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 628262C697; Thu, 4 Jan 2024 19:23:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="meNwmT0f" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F30A2C681; Thu, 4 Jan 2024 19:23:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0EC73C433C8; Thu, 4 Jan 2024 19:23:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704396233; bh=onmE/RNN+R0R622Kj1gpfjhJTPdyiEEgXOtbnFDL9vM=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=meNwmT0f40fUdUnCGT+x2YGTjJ2yCdYWGp93Q0I+AgpYXrr2udoKh2wGh7o51MNlh xVdhFdmRHIg/iJ2usoZ5K8UPEmrX9dKKCul6s5S9e+4pRH70PvbwiVfoKBcxpDdcU2 8uIyf1AVz0CaGWyDFfECIqiTWgdFKoGwq/II3NMorJm6KyYDSiia2G7ToXF/+e24pU 5YE59CNdkvNFam7dar7dZEhVbi1lZg/XXBSsbshDBUWq0MDdYMl/JPFRGTwNzkqcNc /WaiUESmmiFJy7DiU4/s6UjRbnANjVtZHo854wjfMLvmhPUZQkyWwyRyqq4cC0ymJ7 3KzuCBOe47+rA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 8F2C6CE06FA; Thu, 4 Jan 2024 11:23:52 -0800 (PST) Date: Thu, 4 Jan 2024 11:23:52 -0800 From: "Paul E. McKenney" To: Paolo Bonzini Cc: Sean Christopherson , Like Xu , Andi Kleen , Kan Liang , Luwei Kang , Peter Zijlstra , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Breno Leitao , Arnaldo Carvalho de Melo , Ingo Molnar Subject: Re: [BUG] Guest OSes die simultaneously (bisected) Message-ID: Reply-To: paulmck@kernel.org References: <3d8f5987-e09c-4dd2-a9c0-8ba22c9e948a@paulmck-laptop> <88f49775-2b56-48cc-81b8-651a940b7d6b@paulmck-laptop> <77d7a3e3-f35e-4507-82c2-488405b25fa4@paulmck-laptop> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Thu, Jan 04, 2024 at 05:32:34PM +0100, Paolo Bonzini wrote: > On 1/4/24 17:06, Paul E. McKenney wrote: > > Although I am happy to have been able to locate the commit (and even > > happier that Sean spotted the problem and that you quickly pushed the > > fix to mainline!), chasing this consumed a lot of time and systems over > > an embarrassingly large number of months. As in I first spotted this > > bug in late July. Despite a number of increasingly complex attempts, > > bisection became feasible only after the buggy commit was backported to > > our internal v5.19 code base. ???? > > Yes, this strikes two sore points. > > One is that I have also experienced being able to bisect only with a > somewhat more linear history (namely the CentOS Stream 9 aka c9s > frankenkernel [1]) and not with upstream. Even if the c9s kernel is not a > fully linear set of commits, there's some benefit from merge commits that > consist of slightly more curated set of patches, where each merge commit > includes both new features and bugfixes. Unfortunately, whether you'll be > able to do this with the c9s kernel depends a lot on the subsystems involved > and on the bug. Both are factors that may or may not be known in advance. I guess I am glad that it is not just me. ;-) > The other, of course, is testing. The KVM selftests infrastructure is meant > for this kind of white box problem, but the space of tests that can be > written is so large, that there's always too few tests. It shines when you > have a clear bisection but an unclear fix (in the past I have had cases > where spending two days to write a test led me to writing a fix in thirty > minutes), but boosting the reproducibility is always a good thing. Agreed, validation never will be perfect, and so improving the test suite based on production experience is a good thing, as is creating test cases based on the behavior of important production workloads for those who run them. > > And please understand that I am not casting shade on those who wrote, > > reviewed, and committed that buggy commit. As in I freely confess that > > I had to stare at Sean's fix for a few minutes before I figured out what > > was going on. > > Oh don't worry about that---rather, I am going to cast a shade on those that > did not review the commit, namely me. I am somewhat obsessed with Boolean > logic and *probably* I would have caught it, or would have asked to split > the use of designated initializers to a separate patch. Any of the two > could, at least potentially, have saved you quite some time. We have all done similar things. I certainly have! > > Instead, the point I am trying to make is that carefully > > constructed tests can serve as tireless and accurate code reviewers. > > This won't ever replace actual code review, but my experience indicates > > that it will help find more bugs more quickly and more easily. > > TBH this (conflict between virtual addresses on the host and the guest > leading to corruption of the guest) is probably not the kind of adversarial > test that one would have written or suggested right off the bat. But it > should be written now indeed. Very good, looking forward to seeing it! Thanx, Paul > Paolo > > [1] > https://www.theregister.com/2023/06/30/enterprise_distro_feature_devconf/ >