Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp4590171ima; Mon, 4 Feb 2019 20:33:52 -0800 (PST) X-Google-Smtp-Source: AHgI3IaPwrqGorGqLOsdWKe9PFHPtMeiPJldESTKvOUFZ9BoW2/tBCQg8Z1D023sdkWyBwlVSclQ X-Received: by 2002:a17:902:15a8:: with SMTP id m37mr3121628pla.129.1549341232007; Mon, 04 Feb 2019 20:33:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549341231; cv=none; d=google.com; s=arc-20160816; b=bgrnnGYfeRNfHfBcrUrOFTdtu4JHxr0fcCtRAvxozFhnVf10aUuv+WU1seffEQPyRZ ifGZH3H5IIcU9L0dzIRcbBI8lNuYd7LW7DRxZZ/7kMZkpc1Fux7rG8epg6Tsv/8Wefa1 Z5Q7WL+8bIpIAiGBUIGhAcM6EYq3q0O9SN5FmZwDyN2/xqLgTKr0DyBinv3G94/HtSUP tVa+m/Lrfq8I4G5nuDR6s+VXDWgJAVVYUD9hFn0iw6l64m3mqpcSiMCx6tRFKFaJ8qcg b7pMu96JYA/RibRA0/za1DNcIb4pvKwJ9EIGwWo+lKc0oauG/GEgiVRT47EiVC8Tt5VH BkEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=CJ+PINDc9crh2aProGR2Nnufvcpts/m1ULJq4AVcaXg=; b=gGGOl4Eq9Ce8qCdKvtlcurtHW0hydFJ24ZMwddFb63i7OlDlqROGunR/BKi7pza4ub pnERK/k/IGt/bVf6vZx8pFVoSngz46VFXFDBFqB255ZGTgkTvEnVMGg8Ed6XTvICko5u RRG2fgd+tKMJY+46ZdzlIGBGIFCyQb57yk+0GfiE2F0msTWuNmc4SiqzS0FjNaPL1rOG 6IrXQIX8IcmzY4vVmT8ehSnnWiw24k2dI5m9WV+P5imMsObZ7QXunoc5YAElM2XLLJ63 LsOA32EKDdWdahilI0FyLjde+3L1fB8tuOSYu/R/2CTZQeftZjMOR9RZfG/+gSZY4/EC QTQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a2si1120459pgw.264.2019.02.04.20.33.35; Mon, 04 Feb 2019 20:33:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727248AbfBEEOu (ORCPT + 99 others); Mon, 4 Feb 2019 23:14:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59412 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725864AbfBEEOu (ORCPT ); Mon, 4 Feb 2019 23:14:50 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 68FC537E60; Tue, 5 Feb 2019 04:14:49 +0000 (UTC) Received: from redhat.com (ovpn-116-138.sin2.redhat.com [10.67.116.138]) by smtp.corp.redhat.com (Postfix) with SMTP id DE4345C3FD; Tue, 5 Feb 2019 04:14:29 +0000 (UTC) Date: Mon, 4 Feb 2019 23:14:27 -0500 From: "Michael S. Tsirkin" To: Nitesh Narayan Lal Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: Re: [RFC][Patch v8 1/7] KVM: Support for guest free page hinting Message-ID: <20190204231122-mutt-send-email-mst@kernel.org> References: <20190204201854.2328-1-nitesh@redhat.com> <20190204201854.2328-2-nitesh@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190204201854.2328-2-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 05 Feb 2019 04:14:49 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 04, 2019 at 03:18:48PM -0500, Nitesh Narayan Lal wrote: > This patch includes the following: > 1. Basic skeleton for the support > 2. Enablement of x86 platform to use the same > > Signed-off-by: Nitesh Narayan Lal > --- > arch/x86/Kbuild | 2 +- > arch/x86/kvm/Kconfig | 8 ++++++++ > arch/x86/kvm/Makefile | 2 ++ > include/linux/gfp.h | 9 +++++++++ > include/linux/page_hinting.h | 17 +++++++++++++++++ > virt/kvm/page_hinting.c | 36 ++++++++++++++++++++++++++++++++++++ > 6 files changed, 73 insertions(+), 1 deletion(-) > create mode 100644 include/linux/page_hinting.h > create mode 100644 virt/kvm/page_hinting.c > > diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild > index c625f57472f7..3244df4ee311 100644 > --- a/arch/x86/Kbuild > +++ b/arch/x86/Kbuild > @@ -2,7 +2,7 @@ obj-y += entry/ > > obj-$(CONFIG_PERF_EVENTS) += events/ > > -obj-$(CONFIG_KVM) += kvm/ > +obj-$(subst m,y,$(CONFIG_KVM)) += kvm/ > > # Xen paravirtualization support > obj-$(CONFIG_XEN) += xen/ > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 72fa955f4a15..2fae31459706 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -96,6 +96,14 @@ config KVM_MMU_AUDIT > This option adds a R/W kVM module parameter 'mmu_audit', which allows > auditing of KVM MMU events at runtime. > > +# KVM_FREE_PAGE_HINTING will allow the guest to report the free pages to the > +# host in regular interval of time. > +config KVM_FREE_PAGE_HINTING > + def_bool y > + depends on KVM > + select VIRTIO > + select VIRTIO_BALLOON > + > # OK, it's a little counter-intuitive to do this, but it puts it neatly under > # the virtualization menu. > source "drivers/vhost/Kconfig" > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile > index 69b3a7c30013..78640a80501e 100644 > --- a/arch/x86/kvm/Makefile > +++ b/arch/x86/kvm/Makefile > @@ -16,6 +16,8 @@ kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ > i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ > hyperv.o page_track.o debugfs.o > > +obj-$(CONFIG_KVM_FREE_PAGE_HINTING) += $(KVM)/page_hinting.o > + > kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o > kvm-amd-y += svm.o pmu_amd.o > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 5f5e25fd6149..e596527284ba 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > struct vm_area_struct; > > @@ -456,6 +457,14 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) > return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); > } > > +#ifdef CONFIG_KVM_FREE_PAGE_HINTING > +#define HAVE_ARCH_FREE_PAGE > +static inline void arch_free_page(struct page *page, int order) > +{ > + guest_free_page(page, order); > +} > +#endif > + > #ifndef HAVE_ARCH_FREE_PAGE > static inline void arch_free_page(struct page *page, int order) { } > #endif OK so arch_free_page hook is used to tie into mm code, with follow-up patches the pages get queued in a list and then sent to hypervisor so it can free them. Fair enough but how do we know the page is not reused by the time it's received by the hypervisor? If it's reused then isn't it a problem that hypervisor calls MADV_DONTNEED on them? > diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h > new file mode 100644 > index 000000000000..b54f7428f348 > --- /dev/null > +++ b/include/linux/page_hinting.h > @@ -0,0 +1,17 @@ > +/* > + * Size of the array which is used to store the freed pages is defined by > + * MAX_FGPT_ENTRIES. If possible, we have to find a better way using which > + * we can get rid of the hardcoded array size. > + */ > +#define MAX_FGPT_ENTRIES 1000 > +/* > + * hypervisor_pages - It is a dummy structure passed with the hypercall. > + * @pfn: page frame number for the page which needs to be sent to the host. > + * @order: order of the page needs to be reported to the host. > + */ > +struct hypervisor_pages { > + unsigned long pfn; > + unsigned int order; > +}; > + > +void guest_free_page(struct page *page, int order); > diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c > new file mode 100644 > index 000000000000..818bd6b84e0c > --- /dev/null > +++ b/virt/kvm/page_hinting.c > @@ -0,0 +1,36 @@ > +#include > +#include > +#include > + > +/* > + * struct kvm_free_pages - Tracks the pages which are freed by the guest. > + * @pfn: page frame number for the page which is freed. > + * @order: order corresponding to the page freed. > + * @zonenum: zone number to which the freed page belongs. > + */ > +struct kvm_free_pages { > + unsigned long pfn; > + unsigned int order; > + int zonenum; > +}; > + > +/* > + * struct page_hinting - holds array objects for the structures used to track > + * guest free pages, along with an index variable for each of them. > + * @kvm_pt: array object for the structure kvm_free_pages. > + * @kvm_pt_idx: index for kvm_free_pages object. > + * @hypervisor_pagelist: array object for the structure hypervisor_pages. > + * @hyp_idx: index for hypervisor_pages object. > + */ > +struct page_hinting { > + struct kvm_free_pages kvm_pt[MAX_FGPT_ENTRIES]; > + int kvm_pt_idx; > + struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES]; > + int hyp_idx; > +}; > + > +DEFINE_PER_CPU(struct page_hinting, hinting_obj); > + > +void guest_free_page(struct page *page, int order) > +{ > +} > -- > 2.17.2