Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3364669ybt; Mon, 22 Jun 2020 23:56:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyJ/yg1a5oLPCZAOL+GBrLanqRItylNQ0oEhRjnOQqs1MeYjxI5/24GeJ2C0clcGWdaDldC X-Received: by 2002:a17:906:78b:: with SMTP id l11mr19754420ejc.427.1592895394989; Mon, 22 Jun 2020 23:56:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592895394; cv=none; d=google.com; s=arc-20160816; b=klTGUuNCfdWSP91uhEOq52lW+QHhsI6v1PjgRlt5QaHKxQg7RKkP6M7dpWzI1XcMql rM8+9Q5rK/lkCDqpUuftvZpcj1p4+vOEr0LPkDfQeAyIhUz46BDkSu1c+otekX5QFiUx 0yOPLQh0DpliUbxL2GU7ypyypm8ViGygN+If/taTA334hXPE/xMwbcxODNB+6gcPVCb+ 8L52TaVMp4JE2LdZZ/3gO2kxtf0md2RQCkeuQMAwYTC0E5nIrVOxKZjPuRvgfEE9B8bJ RCCL4SoCBaJz900q2/KSmN8yI9Lc8b8EvAnR1sQPGYO1SCGu3duJSgPIjft9GKupPIft HJDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=vEYndUZmD8RgYJXGO4Ps/aLQyl2uDhycIVAcaer5KaI=; b=Q1YLcMrclM1ISKiHnB0J/YDRwm1FIRyVOKXZLm4JSC5nWmOL40OgN8iQ2dgac5ZwKV 450TMQcn7qDGod/lV2pO2+NhX2yWOKGScFE/x6BdMh1sAoCSbphzNzBkeETk2ACQddZ2 E4xhh370rCS0bFhOC7gvglBCrswxaCn7sKl25Sq0XWkKCMFS1E9t6yREQF0V93/0dPK4 bFutXkXN5zqM3NEUwDrz2tMc+wROcN2VB1rOsTMdVgm6iCFzpvxjIVAjzjkj1ENW2FNQ EcpfgEbCPQX4IjWs3lgqzjZtbXMaO3XeDHjfken58/jgD3QxPdiw4rdxePQ96m5G3da4 u/vA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v8si4849994ejb.548.2020.06.22.23.56.11; Mon, 22 Jun 2020 23:56:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730945AbgFWGxu (ORCPT + 99 others); Tue, 23 Jun 2020 02:53:50 -0400 Received: from mga14.intel.com ([192.55.52.115]:6827 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730688AbgFWGxu (ORCPT ); Tue, 23 Jun 2020 02:53:50 -0400 IronPort-SDR: yOdSviwHOn+YOjpmOf/m44Aq1Z3dExqdTAHPmKzdvRT8mAVc0WD4CKqKGRKSnbMBAAWxZpX1y/ FjXnkFV/K6Kw== X-IronPort-AV: E=McAfee;i="6000,8403,9660"; a="143047816" X-IronPort-AV: E=Sophos;i="5.75,270,1589266800"; d="scan'208";a="143047816" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2020 23:53:48 -0700 IronPort-SDR: vXYGhTn5jjwCXa1TPq2WcjobErcEKb5oX7SP1jRjbQBs3qmdFjCxe35EuPr4Xm7mi4Xfv6h8PP qgKOf/f+HmIg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,270,1589266800"; d="scan'208";a="279012600" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.152]) by orsmga006.jf.intel.com with ESMTP; 22 Jun 2020 23:53:48 -0700 Date: Mon, 22 Jun 2020 23:53:48 -0700 From: Sean Christopherson To: Peter Feiner Cc: Jon Cargille , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org, kvm list , LKML Subject: Re: [PATCH] kvm: x86 mmu: avoid mmu_page_hash lookup for direct_map-only VM Message-ID: <20200623065348.GA23054@linux.intel.com> References: <20200508182425.69249-1-jcargill@google.com> <20200508201355.GS27052@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 12, 2020 at 03:36:21PM -0700, Peter Feiner wrote: > On Fri, May 8, 2020 at 1:14 PM Sean Christopherson > wrote: > > > > On Fri, May 08, 2020 at 11:24:25AM -0700, Jon Cargille wrote: > > > From: Peter Feiner > > > > > > Optimization for avoiding lookups in mmu_page_hash. When there's a > > > single direct root, a shadow page has at most one parent SPTE > > > (non-root SPs have exactly one; the root has none). Thus, if an SPTE > > > is non-present, it can be linked to a newly allocated SP without > > > first checking if the SP already exists. > > > > Some mechanical comments below. I'll think through the actual logic next > > week, my brain needs to be primed anytime the MMU is involved :-) > > > > > This optimization has proven significant in batch large SP shattering > > > where the hash lookup accounted for 95% of the overhead. Is it the hash lookup or the hlist walk that is expensive? If it's the hash lookup, then a safer fix would be to do the hash lookup once in kvm_mmu_get_page() instead of doing it for both the walk and the insertion. Assuming, that's the case, I'll send a patch. Actually, I'll probably send a patch no matter what, I've been looking for an excuse to get rid of that obnoxiously long hash lookup line. :-) > > > Signed-off-by: Peter Feiner > > > Signed-off-by: Jon Cargille > > > Reviewed-by: Jim Mattson > > > > > > --- > > > arch/x86/include/asm/kvm_host.h | 13 ++++++++ > > > arch/x86/kvm/mmu/mmu.c | 55 +++++++++++++++++++-------------- > > > 2 files changed, 45 insertions(+), 23 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > index a239a297be33..9b70d764b626 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -913,6 +913,19 @@ struct kvm_arch { > > > struct kvm_page_track_notifier_node mmu_sp_tracker; > > > struct kvm_page_track_notifier_head track_notifier_head; > > > > > > + /* > > > + * Optimization for avoiding lookups in mmu_page_hash. When there's a > > > + * single direct root, a shadow page has at most one parent SPTE > > > + * (non-root SPs have exactly one; the root has none). Thus, if an SPTE > > > + * is non-present, it can be linked to a newly allocated SP without > > > + * first checking if the SP already exists. I'm pretty sure this will break due to the "zap oldest shadow page" behavior of make_mmu_pages_available() and mmu_shrink_scan(). In that case, KVM can zap a parent SP and leave a dangling child. If/when the zapped parent SP is rebuilt, it should find and relink the temporarily orphaned child. I believe the error will not actively manifest since the new, duplicate SP will be added to the front of the hlist, i.e. the newest entry will always be observed first. But, it will "leak" the child and all its children, at least until they get zapped in turn. Hitting the above is probably extremely rare in the current code base. Presumably the make_mmu_pages_available() path is rarely hit, and the mmu_shrink_scan() path is basically broken (it zaps at most a single page after reporting to the scanner that it can potentially free thousands of pages; I'm working on a series). One thought would be to zap the entire family tree when zapping a shadow page for a direct MMU. Then the above assumption would hold. I think that would be ok/safe-ish? It definitely would have "interesting" side effects. Actually, there's another case that would break, though it's contrived and silly. If a VM is configured to have vCPUs with different physical address widths (yay KVM) and caused KVM to use both 4-level and 5-level EPT, then the "single direct root" rule above would be violated. If we do get agressive and zap all children (or if my analysis is wrong), and prevent the mixed level insansity, then a simpler approach would be to skip the lookup if the MMU is direct. I.e. no need for the per-VM toggle. Direct vs. indirect MMUs are guaranteed to have different roles and so the direct MMU's pages can't be reused/shared.