Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp7817147rdb; Thu, 4 Jan 2024 08:33:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IH+fdWgyNJWRGKzdoAgNp6KnDkQHBtsRSfsClW+M/Kq4IhUgMv9uE8fwCjQLDSD0Weui+wb X-Received: by 2002:a17:90a:71c2:b0:28b:894b:4494 with SMTP id m2-20020a17090a71c200b0028b894b4494mr728206pjs.55.1704385994504; Thu, 04 Jan 2024 08:33:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704385994; cv=none; d=google.com; s=arc-20160816; b=nhxUkzjdsJfJQVpAJB4Z6+LPY0A3bQSCJEiJbrj9M5QXVVblpJlxfK+t9NIDO4L0Z5 zkMRg8rZ+jATQ7GrOu5x01qYbnht0bt9qPmDyc2tXcbsgX/zB5p3heLOYIitzG8eK6Me kVi0SR2vPDQL/p7RBz7AVmt9M1Z8JEScq4THdsObx0OawJiJ2AHzJ4AHOk0K1QDiZK/0 aE3krzWKo+Gae7ovzYVq3V7oMSoIoSwRHO9BfT9GnNHRaxVnuO2OFW8Gr+jTXqtboy6i aq/3jG+EeK/je9osqzBuwm7R/tKRHFla6KetiZBjSVZNkIFi2JPdWWb27U51PnwKmTPf lGlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:date:message-id :dkim-signature; bh=akWgTmFa47rUgoDK+eCBTLMBxssII8Qcvv2CCCWck0g=; fh=hjnZhCPDP/v8pV3SzF/+q4ltAv3hdHyF36QwyE2ZEZY=; b=yKWlYYiK3bC0jnWEEalEWGtd/mnAX9lY/EGNFDNF6GlFU0AZxpqGVpx9fCPrgEsu7I le0eLmQGDdjElqfLb2P29ZKzUI5kAon2+HesNKFXsb+kGb5UUXRoPICki/wl8mf2mMRS i214poJsPpbxock05Q/iaDAOYhlnjeluUMdv+qud7fkl+DuyOpDe6QjlWQidKx8pxgyB 5xY3sVArDaQS7lJA9XxDfW+KjHDjL6qLWFTfyX0veQ0HcrzDWc6C6jnaMI8kiOtY9ae2 wVK43cPfGNv9YoE7ioeplzf9UU3qiSNHiuQaMKAS6DJeFM+8/pHl5PVZTOYXjBSBOfcb 9bOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TFZCiWFR; spf=pass (google.com: domain of linux-kernel+bounces-16940-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16940-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id m21-20020a17090b069500b0028bf7badff3si3179357pjz.80.2024.01.04.08.33.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jan 2024 08:33:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-16940-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TFZCiWFR; spf=pass (google.com: domain of linux-kernel+bounces-16940-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16940-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id EC0F52872AC for ; Thu, 4 Jan 2024 16:33:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 75B6E250F2; Thu, 4 Jan 2024 16:32:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TFZCiWFR" X-Original-To: linux-kernel@vger.kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36B29249E8 for ; Thu, 4 Jan 2024 16:32:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1704385963; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=akWgTmFa47rUgoDK+eCBTLMBxssII8Qcvv2CCCWck0g=; b=TFZCiWFREfR/dKvowRqAxIBuLvnttqjhVzkqOSYvhueuHFRGfhmmdkXpyZKmE+R4soobSf WkXE2j1JT+BrF4yuxHSoakXhJeucftiZP8P5SYkR2/LYso/QrE8cRVeneT+iXKUxGFwzau jsM9yOHhg+mSk2Lm0KR9BGW7N39odko= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-184-0mKYmQpDOX2PC0P5-Q4yjA-1; Thu, 04 Jan 2024 11:32:42 -0500 X-MC-Unique: 0mKYmQpDOX2PC0P5-Q4yjA-1 Received: by mail-ed1-f72.google.com with SMTP id 4fb4d7f45d1cf-5561bf805daso284894a12.3 for ; Thu, 04 Jan 2024 08:32:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704385961; x=1704990761; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=akWgTmFa47rUgoDK+eCBTLMBxssII8Qcvv2CCCWck0g=; b=n671IEf1vDn+EcsfEQ6emTTywgpM6PShcLgJn4vmgG7fEc0zjDoy53nV7MogKtWU7r dlGH1+Dhrc+7zfGBDmC1u0qpPhL2zp32KjccHGTqckYFKIRjMQNuvxc67JJ5iPSusj/r Gw5Br0f9Q1eYsuX8x+aOGi3slWOdiA/xcKm+J7Ev8mdgrQ3suG0bixRQvo6YnRwno/h7 pojPgwRGZRDxEfTW37nOAXJVAQ6/Hk9S5WQFVVY1sseNAtm9ahh3NaZObkEIYp2gpHAL hiCGZHp9V8oO3qwQVMt4W/Aivk7TFpxx66RsfUmdYELhKkNzwDZtcMVEVmGfzIS0B8Ef gHkg== X-Gm-Message-State: AOJu0Yy+0hgoxgOQamdDIc24iWsEPEYzptfUUA27fBmckiPbdM4etRRQ SiAyoQoz7ZOdpv+E2BEzNOcyOO6mZ1yAq1U8kOIrxvmFXEIl2pbJpe5chjKlJ4eXiKw7Mq92iiH XGkFS45A4Zei/tqraQJaQfLOou4iyudV2 X-Received: by 2002:a50:9b41:0:b0:555:beee:3107 with SMTP id a1-20020a509b41000000b00555beee3107mr643815edj.54.1704385960864; Thu, 04 Jan 2024 08:32:40 -0800 (PST) X-Received: by 2002:a50:9b41:0:b0:555:beee:3107 with SMTP id a1-20020a509b41000000b00555beee3107mr643804edj.54.1704385960573; Thu, 04 Jan 2024 08:32:40 -0800 (PST) Received: from ?IPV6:2001:b07:6468:f312:1c09:f536:3de6:228c? ([2001:b07:6468:f312:1c09:f536:3de6:228c]) by smtp.googlemail.com with ESMTPSA id ev14-20020a056402540e00b00556e497cc96sm1967009edb.84.2024.01.04.08.32.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 04 Jan 2024 08:32:39 -0800 (PST) Message-ID: Date: Thu, 4 Jan 2024 17:32:34 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [BUG] Guest OSes die simultaneously (bisected) To: paulmck@kernel.org Cc: Sean Christopherson , Like Xu , Andi Kleen , Kan Liang , Luwei Kang , Peter Zijlstra , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Breno Leitao , Arnaldo Carvalho de Melo , Ingo Molnar References: <3d8f5987-e09c-4dd2-a9c0-8ba22c9e948a@paulmck-laptop> <88f49775-2b56-48cc-81b8-651a940b7d6b@paulmck-laptop> <77d7a3e3-f35e-4507-82c2-488405b25fa4@paulmck-laptop> Content-Language: en-US From: Paolo Bonzini Autocrypt: addr=pbonzini@redhat.com; keydata= xsEhBFRCcBIBDqDGsz4K0zZun3jh+U6Z9wNGLKQ0kSFyjN38gMqU1SfP+TUNQepFHb/Gc0E2 CxXPkIBTvYY+ZPkoTh5xF9oS1jqI8iRLzouzF8yXs3QjQIZ2SfuCxSVwlV65jotcjD2FTN04 hVopm9llFijNZpVIOGUTqzM4U55sdsCcZUluWM6x4HSOdw5F5Utxfp1wOjD/v92Lrax0hjiX DResHSt48q+8FrZzY+AUbkUS+Jm34qjswdrgsC5uxeVcLkBgWLmov2kMaMROT0YmFY6A3m1S P/kXmHDXxhe23gKb3dgwxUTpENDBGcfEzrzilWueOeUWiOcWuFOed/C3SyijBx3Av/lbCsHU Vx6pMycNTdzU1BuAroB+Y3mNEuW56Yd44jlInzG2UOwt9XjjdKkJZ1g0P9dwptwLEgTEd3Fo UdhAQyRXGYO8oROiuh+RZ1lXp6AQ4ZjoyH8WLfTLf5g1EKCTc4C1sy1vQSdzIRu3rBIjAvnC tGZADei1IExLqB3uzXKzZ1BZ+Z8hnt2og9hb7H0y8diYfEk2w3R7wEr+Ehk5NQsT2MPI2QBd wEv1/Aj1DgUHZAHzG1QN9S8wNWQ6K9DqHZTBnI1hUlkp22zCSHK/6FwUCuYp1zcAEQEAAc0j UGFvbG8gQm9uemluaSA8cGJvbnppbmlAcmVkaGF0LmNvbT7CwU0EEwECACMFAlRCcBICGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRB+FRAMzTZpsbceDp9IIN6BIA0Ol7MoB15E 11kRz/ewzryFY54tQlMnd4xxfH8MTQ/mm9I482YoSwPMdcWFAKnUX6Yo30tbLiNB8hzaHeRj jx12K+ptqYbg+cevgOtbLAlL9kNgLLcsGqC2829jBCUTVeMSZDrzS97ole/YEez2qFpPnTV0 VrRWClWVfYh+JfzpXmgyhbkuwUxNFk421s4Ajp3d8nPPFUGgBG5HOxzkAm7xb1cjAuJ+oi/K CHfkuN+fLZl/u3E/fw7vvOESApLU5o0icVXeakfSz0LsygEnekDbxPnE5af/9FEkXJD5EoYG SEahaEtgNrR4qsyxyAGYgZlS70vkSSYJ+iT2rrwEiDlo31MzRo6Ba2FfHBSJ7lcYdPT7bbk9 AO3hlNMhNdUhoQv7M5HsnqZ6unvSHOKmReNaS9egAGdRN0/GPDWr9wroyJ65ZNQsHl9nXBqE AukZNr5oJO5vxrYiAuuTSd6UI/xFkjtkzltG3mw5ao2bBpk/V/YuePrJsnPFHG7NhizrxttB nTuOSCMo45pfHQ+XYd5K1+Cv/NzZFNWscm5htJ0HznY+oOsZvHTyGz3v91pn51dkRYN0otqr bQ4tlFFuVjArBZcapSIe6NV8C4cEiSTOwE0EVEJx7gEIAMeHcVzuv2bp9HlWDp6+RkZe+vtl KwAHplb/WH59j2wyG8V6i33+6MlSSJMOFnYUCCL77bucx9uImI5nX24PIlqT+zasVEEVGSRF m8dgkcJDB7Tps0IkNrUi4yof3B3shR+vMY3i3Ip0e41zKx0CvlAhMOo6otaHmcxr35sWq1Jk tLkbn3wG+fPQCVudJJECvVQ//UAthSSEklA50QtD2sBkmQ14ZryEyTHQ+E42K3j2IUmOLriF dNr9NvE1QGmGyIcbw2NIVEBOK/GWxkS5+dmxM2iD4Jdaf2nSn3jlHjEXoPwpMs0KZsgdU0pP JQzMUMwmB1wM8JxovFlPYrhNT9MAEQEAAcLBMwQYAQIACQUCVEJx7gIbDAAKCRB+FRAMzTZp sadRDqCctLmYICZu4GSnie4lKXl+HqlLanpVMOoFNnWs9oRP47MbE2wv8OaYh5pNR9VVgyhD OG0AU7oidG36OeUlrFDTfnPYYSF/mPCxHttosyt8O5kabxnIPv2URuAxDByz+iVbL+RjKaGM GDph56ZTswlx75nZVtIukqzLAQ5fa8OALSGum0cFi4ptZUOhDNz1onz61klD6z3MODi0sBZN Aj6guB2L/+2ZwElZEeRBERRd/uommlYuToAXfNRdUwrwl9gRMiA0WSyTb190zneRRDfpSK5d usXnM/O+kr3Dm+Ui+UioPf6wgbn3T0o6I5BhVhs4h4hWmIW7iNhPjX1iybXfmb1gAFfjtHfL xRUr64svXpyfJMScIQtBAm0ihWPltXkyITA92ngCmPdHa6M1hMh4RDX+Jf1fiWubzp1voAg0 JBrdmNZSQDz0iKmSrx8xkoXYfA3bgtFN8WJH2xgFL28XnqY4M6dLhJwV3z08tPSRqYFm4NMP dRsn0/7oymhneL8RthIvjDDQ5ktUjMe8LtHr70OZE/TT88qvEdhiIVUogHdo4qBrk41+gGQh b906Dudw5YhTJFU3nC6bbF2nrLlB4C/XSiH76ZvqzV0Z/cAMBo5NF/w= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 1/4/24 17:06, Paul E. McKenney wrote: > Although I am happy to have been able to locate the commit (and even > happier that Sean spotted the problem and that you quickly pushed the > fix to mainline!), chasing this consumed a lot of time and systems over > an embarrassingly large number of months. As in I first spotted this > bug in late July. Despite a number of increasingly complex attempts, > bisection became feasible only after the buggy commit was backported to > our internal v5.19 code base. ???? Yes, this strikes two sore points. One is that I have also experienced being able to bisect only with a somewhat more linear history (namely the CentOS Stream 9 aka c9s frankenkernel [1]) and not with upstream. Even if the c9s kernel is not a fully linear set of commits, there's some benefit from merge commits that consist of slightly more curated set of patches, where each merge commit includes both new features and bugfixes. Unfortunately, whether you'll be able to do this with the c9s kernel depends a lot on the subsystems involved and on the bug. Both are factors that may or may not be known in advance. The other, of course, is testing. The KVM selftests infrastructure is meant for this kind of white box problem, but the space of tests that can be written is so large, that there's always too few tests. It shines when you have a clear bisection but an unclear fix (in the past I have had cases where spending two days to write a test led me to writing a fix in thirty minutes), but boosting the reproducibility is always a good thing. > And please understand that I am not casting shade on those who wrote, > reviewed, and committed that buggy commit. As in I freely confess that > I had to stare at Sean's fix for a few minutes before I figured out what > was going on. Oh don't worry about that---rather, I am going to cast a shade on those that did not review the commit, namely me. I am somewhat obsessed with Boolean logic and *probably* I would have caught it, or would have asked to split the use of designated initializers to a separate patch. Any of the two could, at least potentially, have saved you quite some time. > Instead, the point I am trying to make is that carefully > constructed tests can serve as tireless and accurate code reviewers. > This won't ever replace actual code review, but my experience indicates > that it will help find more bugs more quickly and more easily. TBH this (conflict between virtual addresses on the host and the guest leading to corruption of the guest) is probably not the kind of adversarial test that one would have written or suggested right off the bat. But it should be written now indeed. Paolo [1] https://www.theregister.com/2023/06/30/enterprise_distro_feature_devconf/