Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1210226ybh; Mon, 13 Jul 2020 12:09:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxdKa1qHz7nanyG6Gr3Jjen1ZreZ7PYjX8YDipsWSt3WpLwT/2N4KW8FpohDZOMWIyJTX4Q X-Received: by 2002:a17:906:958f:: with SMTP id r15mr1211128ejx.77.1594667372513; Mon, 13 Jul 2020 12:09:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594667372; cv=none; d=google.com; s=arc-20160816; b=OcIbdt0WVPgbhj6qitE2Tm3r/K93cwSqfGP5xfZ9B6J9YYzSCUS0C4srHyWLMbWXwj 1SA+tIT5S/uNLKNEk7O+BEG0XlMvjsGJ0EybXonKQeB6vQr6HPMw40uNEC91S99wMUvc wCPGg6SliCdjJ1D1xb8n80AQFAtolzAExX9wNW7U63mi1XNZWrhZizqAK9FVqFMpb1YP pb52NxetzHSPq0xDZksSglpflpJLkJO+MjaihXJi9oM/COcNLq3sS/puhMzQ6/O5/lGx AcAhb0u+ZSOFAZp/0ZzmqzhYPCBm5wyhzJvyPh4sadPfg72BPOK5vilNGZOKw70A2VVf rI6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=5ULWe7lmY+iK9T6cNF+JQNBzyjc+6Zk1QlGxP4uBrfA=; b=tb+RJJuuHqfb6zpUwu6iJITDecBD+BdR88LqlKmSx3r/BMwtjAfesPBBW2LTsAjCmG dzTR1bPWemqEvO1IxyncxY2wN2f+WUp/jQwYhJX+jQoOIYLOaEyplwgtruut0YFJtbK5 5c+uADWmW7KwL1tvsCPaA0FEJ1pW2F+ImPvzDbVi8j/u8VZuCi1wAd1LY4a6iYNWNq6B KAMSOOlFbouxIp/3GIO7PDHUocdWPC9FnYkjM5WD6mi3RnEK4db9fKZupxW7aTY/hjAj 6Kj64zZCe0FWk/5wEWk8Foxq6Zofba8il3e3z2dntbXHmFqOk9ecGwoWu/uarz7jsjVE aMbw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s3si10035016edy.157.2020.07.13.12.09.09; Mon, 13 Jul 2020 12:09:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726936AbgGMTGu (ORCPT + 99 others); Mon, 13 Jul 2020 15:06:50 -0400 Received: from mga12.intel.com ([192.55.52.136]:31274 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726338AbgGMTGu (ORCPT ); Mon, 13 Jul 2020 15:06:50 -0400 IronPort-SDR: vQHrsfCipWGYTVBzFMUSlhPOtgeX43CZOlGnv7rRqZE1TwHJk0anVPZKjJkYGeyJ1dCDVfhvHz qlL0CImlQ6pg== X-IronPort-AV: E=McAfee;i="6000,8403,9681"; a="128274173" X-IronPort-AV: E=Sophos;i="5.75,348,1589266800"; d="scan'208";a="128274173" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2020 12:06:50 -0700 IronPort-SDR: eUaCJEPnu90pPzudvr8vulVtOdjZGIcFc/6MyjgagYhI3dMsNSh3DdNLegDhKN7oA2bipNXj2i Bj5ZHPZaJw9A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,348,1589266800"; d="scan'208";a="360148697" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.152]) by orsmga001.jf.intel.com with ESMTP; 13 Jul 2020 12:06:50 -0700 Date: Mon, 13 Jul 2020 12:06:50 -0700 From: Sean Christopherson To: Alex Williamson Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Xiong Zhang , Wayne Boyer , Zhenyu Wang , Jun Nakajima Subject: Re: [PATCH] KVM: x86/mmu: Add capability to zap only sptes for the affected memslot Message-ID: <20200713190649.GE29725@linux.intel.com> References: <20200703025047.13987-1-sean.j.christopherson@intel.com> <51637a13-f23b-8b76-c93a-76346b4cc982@redhat.com> <20200709211253.GW24919@linux.intel.com> <49c7907a-3ab4-b5db-ccb4-190b990c8be3@redhat.com> <20200710042922.GA24919@linux.intel.com> <20200713122226.28188f93@x1.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200713122226.28188f93@x1.home> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 13, 2020 at 12:22:26PM -0600, Alex Williamson wrote: > On Thu, 9 Jul 2020 21:29:22 -0700 > Sean Christopherson wrote: > > > +Alex, whom I completely spaced on Cc'ing. > > > > Alex, this is related to the dreaded VFIO memslot zapping issue from last > > year. Start of thread: https://patchwork.kernel.org/patch/11640719/. > > > > The TL;DR of below: can you try the attached patch with your reproducer > > from the original bug[*]? I honestly don't know whether it has a legitimate > > chance of working, but it's the one thing in all of this that I know was > > definitely a bug. I'd like to test it out if only to sate my curiosity. > > Absolutely no rush. > > Mixed results, maybe you can provide some guidance. Running this > against v5.8-rc4, I haven't reproduced the glitch. But it's been a > long time since I tested this previously, so I went back to v5.3-rc5 to > make sure I still have a recipe to trigger it. I can still get the > failure there as the selective flush commit was reverted in rc6. Then > I wondered, can I take broken v5.3-rc5 and apply this fix to prove that > it works? No, v5.3-rc5 + this patch still glitches. So I thought > maybe I could make v5.8-rc4 break by s/true/false/ in this patch. > Nope. Then I applied the original patch from[1] to try to break it. > Nope. So if anything, I think the evidence suggests this was broken > elsewhere and is now fixed, or maybe it is a timing issue that I can't > trigger on newer kernels. If the reproducer wasn't so touchy and time > consuming, I'd try to bisect, but I don't have that sort of bandwidth. Ow. That manages to be both a best case and worst case scenario. I can't think of any clever way to avoid bisecting. There have been a number of fixes in tangentially related code since 5.3, e.g. memslots, MMU, TLB, etc..., but trying to isolate which one, if any of them, fixed the bug has a high probability of being a wild goose chase. The only ideas I have going forward are to: a) Reproduce the bug outside of your environment and find a resource that can go through the painful bisection. b) Add a module param to toggle the new behavior and see if anything breaks. I can ask internally if it's possible to get a resource on my end to go after (a). (b) is a question for Paolo. Thanks much for testing! > Thanks, > > Alex > > [1] https://patchwork.kernel.org/patch/10798453/ >