Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp5252578ima; Tue, 5 Feb 2019 08:41:30 -0800 (PST) X-Google-Smtp-Source: AHgI3IZT5ITtV76sg7g6yvD8R3lGuwq8bmp529Bs+UhTxZi6Kp79OIKprEeObmUiSOw5HjKDDRZQ X-Received: by 2002:a63:ef47:: with SMTP id c7mr5401167pgk.386.1549384890237; Tue, 05 Feb 2019 08:41:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549384890; cv=none; d=google.com; s=arc-20160816; b=KSl4t4CZzYBSK88Tx+PK8MMNSKBB/UGg9yOw9ieEHua5G1P42J4rQj5nLlEp9KwsZv XnZ8XyjjMttwPGNkZwdV0NfcnutIFW9jTZ67FMvrBNaSXpLZYyGLwJvLYE2LED4jfLXl y8fyMtNvYksUo1AaGOqLDhcq3p3CbU/ME9T5SbksqsGOi5/NBRDu7zg6Ai8L6AzDKIKI GDM/ss2trTi8SjBPiSkPi0sxSykI/NVr6asoqNdryIAhKPV8FUEHr+3dqb4vhepK5xgT 4QDu5c/rSiizYFi1OBFjiIHzNiBeOGLHeDI1QQFVd1LZbBPVOPdb6wlSOmy0ecdYgB6c H64Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=gK33Iq6wQ/6SrTzW5xDyZmBrG9HnhoCLzvfOn5DQ3U0=; b=kfoJFe1pOKAPLueYVVBzqhDlEYXkqMx6s/dEIzkpPVfyRF6pWYlenRXKZrGzXOJ8BF /c48CZKhp2FoWB9HmUyqd5Cksber3S155IB4CVYf4rBz5pdXslEv/3e6LhgF5R9rstpI sw1TD/DzzScosqGTozq9mT5F0Why6jfx/KS3o6xjUMZ+UHVzyk4qGGQmtsoyhAVf5i/n 8JZJbECdlTwY9orTtGrEjZTzFLWHY9zeTQ1HgzR6jAqvT8iAz4J6XYuHJovcPFgR83N5 kbbmG1Bn9/4TRe4qG8r43Ns8h2vtvKcRyzWY7zLXmAF0Fn9vJD5ZXltdDZAvoeNt5Eb+ ZzxA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n2si3528049pgr.67.2019.02.05.08.41.14; Tue, 05 Feb 2019 08:41:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730513AbfBEQ2A (ORCPT + 99 others); Tue, 5 Feb 2019 11:28:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51794 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727884AbfBEQ2A (ORCPT ); Tue, 5 Feb 2019 11:28:00 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9523EC059B7A; Tue, 5 Feb 2019 16:27:59 +0000 (UTC) Received: from redhat.com (ovpn-116-138.sin2.redhat.com [10.67.116.138]) by smtp.corp.redhat.com (Postfix) with SMTP id 1482A10021B6; Tue, 5 Feb 2019 16:27:31 +0000 (UTC) Date: Tue, 5 Feb 2019 11:27:29 -0500 From: "Michael S. Tsirkin" To: Nitesh Narayan Lal Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: Re: [RFC][Patch v8 1/7] KVM: Support for guest free page hinting Message-ID: <20190205112655-mutt-send-email-mst@kernel.org> References: <20190204201854.2328-1-nitesh@redhat.com> <20190204201854.2328-2-nitesh@redhat.com> <20190204231122-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 05 Feb 2019 16:27:59 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 05, 2019 at 08:06:33AM -0500, Nitesh Narayan Lal wrote: > On 2/4/19 11:14 PM, Michael S. Tsirkin wrote: > > On Mon, Feb 04, 2019 at 03:18:48PM -0500, Nitesh Narayan Lal wrote: > >> This patch includes the following: > >> 1. Basic skeleton for the support > >> 2. Enablement of x86 platform to use the same > >> > >> Signed-off-by: Nitesh Narayan Lal > >> --- > >> arch/x86/Kbuild | 2 +- > >> arch/x86/kvm/Kconfig | 8 ++++++++ > >> arch/x86/kvm/Makefile | 2 ++ > >> include/linux/gfp.h | 9 +++++++++ > >> include/linux/page_hinting.h | 17 +++++++++++++++++ > >> virt/kvm/page_hinting.c | 36 ++++++++++++++++++++++++++++++++++++ > >> 6 files changed, 73 insertions(+), 1 deletion(-) > >> create mode 100644 include/linux/page_hinting.h > >> create mode 100644 virt/kvm/page_hinting.c > >> > >> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild > >> index c625f57472f7..3244df4ee311 100644 > >> --- a/arch/x86/Kbuild > >> +++ b/arch/x86/Kbuild > >> @@ -2,7 +2,7 @@ obj-y += entry/ > >> > >> obj-$(CONFIG_PERF_EVENTS) += events/ > >> > >> -obj-$(CONFIG_KVM) += kvm/ > >> +obj-$(subst m,y,$(CONFIG_KVM)) += kvm/ > >> > >> # Xen paravirtualization support > >> obj-$(CONFIG_XEN) += xen/ > >> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > >> index 72fa955f4a15..2fae31459706 100644 > >> --- a/arch/x86/kvm/Kconfig > >> +++ b/arch/x86/kvm/Kconfig > >> @@ -96,6 +96,14 @@ config KVM_MMU_AUDIT > >> This option adds a R/W kVM module parameter 'mmu_audit', which allows > >> auditing of KVM MMU events at runtime. > >> > >> +# KVM_FREE_PAGE_HINTING will allow the guest to report the free pages to the > >> +# host in regular interval of time. > >> +config KVM_FREE_PAGE_HINTING > >> + def_bool y > >> + depends on KVM > >> + select VIRTIO > >> + select VIRTIO_BALLOON > >> + > >> # OK, it's a little counter-intuitive to do this, but it puts it neatly under > >> # the virtualization menu. > >> source "drivers/vhost/Kconfig" > >> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile > >> index 69b3a7c30013..78640a80501e 100644 > >> --- a/arch/x86/kvm/Makefile > >> +++ b/arch/x86/kvm/Makefile > >> @@ -16,6 +16,8 @@ kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ > >> i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ > >> hyperv.o page_track.o debugfs.o > >> > >> +obj-$(CONFIG_KVM_FREE_PAGE_HINTING) += $(KVM)/page_hinting.o > >> + > >> kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o > >> kvm-amd-y += svm.o pmu_amd.o > >> > >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h > >> index 5f5e25fd6149..e596527284ba 100644 > >> --- a/include/linux/gfp.h > >> +++ b/include/linux/gfp.h > >> @@ -7,6 +7,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> > >> struct vm_area_struct; > >> > >> @@ -456,6 +457,14 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) > >> return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); > >> } > >> > >> +#ifdef CONFIG_KVM_FREE_PAGE_HINTING > >> +#define HAVE_ARCH_FREE_PAGE > >> +static inline void arch_free_page(struct page *page, int order) > >> +{ > >> + guest_free_page(page, order); > >> +} > >> +#endif > >> + > >> #ifndef HAVE_ARCH_FREE_PAGE > >> static inline void arch_free_page(struct page *page, int order) { } > >> #endif > > OK so arch_free_page hook is used to tie into mm code, > > with follow-up patches the pages get queued in a list > > and then sent to hypervisor so it can free them. > > Fair enough but how do we know the page is > > not reused by the time it's received by the hypervisor? > > If it's reused then isn't it a problem that > > hypervisor calls MADV_DONTNEED on them? > Hi Michael, > > In order to ensure that the page is not reused, we remove it from the > buddy free list by acquiring the zone lock. After the page is freed by > the hypervisor it is returned to the buddy free list again. Thanks that's good to know. Could you point me to code that does this? > > > > > >> diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h > >> new file mode 100644 > >> index 000000000000..b54f7428f348 > >> --- /dev/null > >> +++ b/include/linux/page_hinting.h > >> @@ -0,0 +1,17 @@ > >> +/* > >> + * Size of the array which is used to store the freed pages is defined by > >> + * MAX_FGPT_ENTRIES. If possible, we have to find a better way using which > >> + * we can get rid of the hardcoded array size. > >> + */ > >> +#define MAX_FGPT_ENTRIES 1000 > >> +/* > >> + * hypervisor_pages - It is a dummy structure passed with the hypercall. > >> + * @pfn: page frame number for the page which needs to be sent to the host. > >> + * @order: order of the page needs to be reported to the host. > >> + */ > >> +struct hypervisor_pages { > >> + unsigned long pfn; > >> + unsigned int order; > >> +}; > >> + > >> +void guest_free_page(struct page *page, int order); > >> diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c > >> new file mode 100644 > >> index 000000000000..818bd6b84e0c > >> --- /dev/null > >> +++ b/virt/kvm/page_hinting.c > >> @@ -0,0 +1,36 @@ > >> +#include > >> +#include > >> +#include > >> + > >> +/* > >> + * struct kvm_free_pages - Tracks the pages which are freed by the guest. > >> + * @pfn: page frame number for the page which is freed. > >> + * @order: order corresponding to the page freed. > >> + * @zonenum: zone number to which the freed page belongs. > >> + */ > >> +struct kvm_free_pages { > >> + unsigned long pfn; > >> + unsigned int order; > >> + int zonenum; > >> +}; > >> + > >> +/* > >> + * struct page_hinting - holds array objects for the structures used to track > >> + * guest free pages, along with an index variable for each of them. > >> + * @kvm_pt: array object for the structure kvm_free_pages. > >> + * @kvm_pt_idx: index for kvm_free_pages object. > >> + * @hypervisor_pagelist: array object for the structure hypervisor_pages. > >> + * @hyp_idx: index for hypervisor_pages object. > >> + */ > >> +struct page_hinting { > >> + struct kvm_free_pages kvm_pt[MAX_FGPT_ENTRIES]; > >> + int kvm_pt_idx; > >> + struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES]; > >> + int hyp_idx; > >> +}; > >> + > >> +DEFINE_PER_CPU(struct page_hinting, hinting_obj); > >> + > >> +void guest_free_page(struct page *page, int order) > >> +{ > >> +} > >> -- > >> 2.17.2 > -- > Regards > Nitesh >