Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp1130351imj; Sat, 9 Feb 2019 16:49:57 -0800 (PST) X-Google-Smtp-Source: AHgI3Ia9++Bt0RWRwEFz1N5P9shuOaDzPw58JlB69EcYY+tJkDIX+38fGubNY6x5m7MghAGsJMmA X-Received: by 2002:a63:e554:: with SMTP id z20mr27258648pgj.394.1549759797429; Sat, 09 Feb 2019 16:49:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549759797; cv=none; d=google.com; s=arc-20160816; b=JuVdaIpnBb+GDonYz56C+uMj9+8GWsAFEeWEsUu41AfF2yiB9MEzi4uLbXx/xGshVG h8jOCkyw+vyzmlwBWMKuyfPTwJoBtZMmx0YyYQApQczG3IbL15/UzxYPEnniRzeFmYlv xWq0FjTfF6wgBHDWek2ZUeUT5vr9PJ7XikW2hTkqZv1baz1m/vNnG4/sG+UdUGmPoGI9 4YW7xl14dAw54XQhVCjWuN0QlPfALC/7HiZg2fV5PUaqbO07x2O3W4jtEu63aSFFP6iA 8QWcaEXWPQG3U27nqgl93xrE6u47dTE88AD9bzILXe8/bFasfZvSqi003v1/26iJUCzo 1o/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=RUkKFTS85FoXbRyygGKOUvPm3q46TzJX+NNR/xpK20M=; b=rJo+9XD/FlrjzrTByrGb/18dU+b2k/Aqcww/znd9jq4+BrPiuigZo0rErPjOQ97Qxb MzgdSA+5iSxvFrfmyDYiZJvc/CIs83NzYooREgb/6JOBLirfbLUHhJ1IFZZvYbXquLmy LDNmENrmkTjSvuadg+mDjesOT7qPMqlsnhss2J2Hj2tP+jpYfYlmJc6aPEBwYPvf/8rD 5tDW858ZyHGpCanp0DT+ISnPDW2+qNFCFshHVD9AP8uy0D/WHT4pSrMKKUZRV1lWMCmQ EYQbHExM4BT4UonfElxMDVhcWlg131ddw0d65hQ2M0NOMFulb1So1zJKb6HhOTuKEQ26 1+BQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c3si6387709plr.178.2019.02.09.16.49.41; Sat, 09 Feb 2019 16:49:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727135AbfBJAth (ORCPT + 99 others); Sat, 9 Feb 2019 19:49:37 -0500 Received: from mail-qk1-f194.google.com ([209.85.222.194]:38560 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726940AbfBJAtg (ORCPT ); Sat, 9 Feb 2019 19:49:36 -0500 Received: by mail-qk1-f194.google.com with SMTP id p15so2768753qkl.5 for ; Sat, 09 Feb 2019 16:49:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=RUkKFTS85FoXbRyygGKOUvPm3q46TzJX+NNR/xpK20M=; b=VJrnA4nNbXjz8U8oEkQ9ij2/msq/7VmTLY38nLkXLt9tRbaBkABO+HgmTSC4G6UWNj /k/VE/6rKGYNGNfM3R69bzrthG/JjZkdMSc+wKNGoOGKpWU92cva69ci5NcdUeNmGQno qa0rxVpoxntJ9+gPvtSr/rol5tzfBArJgszSCKUNZ+cS4WOn0j/k5SaqfcmgoaSxrAoW MOOq3w+5SGl7Zk+16nN8S79Vn8p9v0ziE/3TwnhPm9iSErENSIgjoB8PzdkuYF/LtScj zHJtyXFsgtWPbUzE4N3iqmWfZcVhkcfoRHRBhS3X5kMQA2KgDuQvNM+9rF1qYnDVmj/F OmoQ== X-Gm-Message-State: AHQUAuZfuWFNlbyebLHYFIMfE3/T2eDEluWqGWTiG+ExNZXkwJFg52Cv VdFghVJf8pcP14/GBNVYMu70jQ== X-Received: by 2002:a37:7f41:: with SMTP id a62mr21745612qkd.247.1549759775240; Sat, 09 Feb 2019 16:49:35 -0800 (PST) Received: from redhat.com (pool-173-76-246-42.bstnma.fios.verizon.net. [173.76.246.42]) by smtp.gmail.com with ESMTPSA id k66sm6420163qkc.25.2019.02.09.16.49.33 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 09 Feb 2019 16:49:34 -0800 (PST) Date: Sat, 9 Feb 2019 19:49:32 -0500 From: "Michael S. Tsirkin" To: Alexander Duyck Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rkrcmar@redhat.com, alexander.h.duyck@linux.intel.com, x86@kernel.org, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, pbonzini@redhat.com, tglx@linutronix.de, akpm@linux-foundation.org Subject: Re: [RFC PATCH 3/4] kvm: Add guest side support for free memory hints Message-ID: <20190209194437-mutt-send-email-mst@kernel.org> References: <20190204181118.12095.38300.stgit@localhost.localdomain> <20190204181552.12095.46287.stgit@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190204181552.12095.46287.stgit@localhost.localdomain> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 04, 2019 at 10:15:52AM -0800, Alexander Duyck wrote: > From: Alexander Duyck > > Add guest support for providing free memory hints to the KVM hypervisor for > freed pages huge TLB size or larger. I am restricting the size to > huge TLB order and larger because the hypercalls are too expensive to be > performing one per 4K page. Even 2M pages start to get expensive with a TB guest. Really it seems we want a virtio ring so we can pass a batch of these. E.g. 256 entries, 2M each - that's more like it. > Using the huge TLB order became the obvious > choice for the order to use as it allows us to avoid fragmentation of higher > order memory on the host. > > I have limited the functionality so that it doesn't work when page > poisoning is enabled. I did this because a write to the page after doing an > MADV_DONTNEED would effectively negate the hint, so it would be wasting > cycles to do so. Again that's leaking host implementation detail into guest interface. We are giving guest page hints to host that makes sense, weird interactions with other features due to host implementation details should be handled by host. > Signed-off-by: Alexander Duyck > --- > arch/x86/include/asm/page.h | 13 +++++++++++++ > arch/x86/kernel/kvm.c | 23 +++++++++++++++++++++++ > 2 files changed, 36 insertions(+) > > diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h > index 7555b48803a8..4487ad7a3385 100644 > --- a/arch/x86/include/asm/page.h > +++ b/arch/x86/include/asm/page.h > @@ -18,6 +18,19 @@ > > struct page; > > +#ifdef CONFIG_KVM_GUEST > +#include > +extern struct static_key_false pv_free_page_hint_enabled; > + > +#define HAVE_ARCH_FREE_PAGE > +void __arch_free_page(struct page *page, unsigned int order); > +static inline void arch_free_page(struct page *page, unsigned int order) > +{ > + if (static_branch_unlikely(&pv_free_page_hint_enabled)) > + __arch_free_page(page, order); > +} > +#endif > + > #include > extern struct range pfn_mapped[]; > extern int nr_pfn_mapped; > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 5c93a65ee1e5..09c91641c36c 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -48,6 +48,7 @@ > #include > > static int kvmapf = 1; > +DEFINE_STATIC_KEY_FALSE(pv_free_page_hint_enabled); > > static int __init parse_no_kvmapf(char *arg) > { > @@ -648,6 +649,15 @@ static void __init kvm_guest_init(void) > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) > apic_set_eoi_write(kvm_guest_apic_eoi_write); > > + /* > + * The free page hinting doesn't add much value if page poisoning > + * is enabled. So we only enable the feature if page poisoning is > + * no present. > + */ > + if (!page_poisoning_enabled() && > + kvm_para_has_feature(KVM_FEATURE_PV_UNUSED_PAGE_HINT)) > + static_branch_enable(&pv_free_page_hint_enabled); > + > #ifdef CONFIG_SMP > smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus; > smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; > @@ -762,6 +772,19 @@ static __init int kvm_setup_pv_tlb_flush(void) > } > arch_initcall(kvm_setup_pv_tlb_flush); > > +void __arch_free_page(struct page *page, unsigned int order) > +{ > + /* > + * Limit hints to blocks no smaller than pageblock in > + * size to limit the cost for the hypercalls. > + */ > + if (order < KVM_PV_UNUSED_PAGE_HINT_MIN_ORDER) > + return; > + > + kvm_hypercall2(KVM_HC_UNUSED_PAGE_HINT, page_to_phys(page), > + PAGE_SIZE << order); > +} > + > #ifdef CONFIG_PARAVIRT_SPINLOCKS > > /* Kick a cpu by its apicid. Used to wake up a halted vcpu */