Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp5254831ima; Tue, 5 Feb 2019 08:43:44 -0800 (PST) X-Google-Smtp-Source: AHgI3IZUE9LIXq1fu2nPuSI5zrpe9zKRey4fJC+gySMIUMgM+ylKdheCzVcWX+a2kQJ18IT1cBmw X-Received: by 2002:a65:6658:: with SMTP id z24mr5344057pgv.189.1549385024089; Tue, 05 Feb 2019 08:43:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549385024; cv=none; d=google.com; s=arc-20160816; b=SsHRpf2YemP217qnpGU9O4rVVqsJWbWvvdjt4WKm7irW5sDvlgHaOthwqBx8Qb5l0Z 1gr0kcoyINeicVTn32R+N060amnaiThcO78VpziqAMITzJL7qxTm7QhVFhm2G2IyiNJZ kkhsdRCfy6u50Z5YWYC+5jYtPk0lTwqx9i1xNH7vXNweaAZJ/AE1ey1lj9uy6GB5+ayq wslznxd76QYrqDWYCKUDfPZ8u4RH06YTW+BUgVChzlieiQ8vfg5CDai8KHlJgI338N0T zyiQeK0Tite+XhkFRkoEWzPua6n3jEsu73sJuieLMbWCfsLeNVnnQdCA9IvQEfaqoRPJ N1ZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:mime-version:user-agent:date :message-id:organization:references:subject:cc:to:from; bh=HFe4yfeT7V6R8FHytJ0IEmVcL/TJ4FXNLOgrlPjL5ZE=; b=G8gCFNGf1dwp0739D64MCz3Drze0/0wvEnuTh0ZyZIdfmdN1lYwUfzeS4nW57RTlTt sBLL6jqPQO55Xh5r0YaUb7lJgdEKyzRHKpFspe7vh6g5as5Qtc5A9AXdxX4Dy07cxlKl tRONDMGTTkZaj6TQiFMI/XrOgRY2u4sB6xdDUXktXY1JFPMT3AbFrIbtSZrlobUHo6Dz vp1vtqcLIKtbcwfI5lBzOEAcqBvuUOId2nioiPiEHElfg9yQjOJoOzIm9/m4/R5ybyzK 1dD0ZY1i9XmoEVzy7IxSsbOe4UmpOr/UyK5AAHf7pVcRCIGtU0xrAlxV80N+iu1tEHpc Ga1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p11si3545231plo.363.2019.02.05.08.43.28; Tue, 05 Feb 2019 08:43:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729214AbfBEQeS (ORCPT + 99 others); Tue, 5 Feb 2019 11:34:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:32971 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725952AbfBEQeS (ORCPT ); Tue, 5 Feb 2019 11:34:18 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 478FAAB97F; Tue, 5 Feb 2019 16:34:11 +0000 (UTC) Received: from [10.40.205.61] (unknown [10.40.205.61]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9826162EDD; Tue, 5 Feb 2019 16:34:05 +0000 (UTC) From: Nitesh Narayan Lal To: "Michael S. Tsirkin" Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: Re: [RFC][Patch v8 1/7] KVM: Support for guest free page hinting References: <20190204201854.2328-1-nitesh@redhat.com> <20190204201854.2328-2-nitesh@redhat.com> <20190204231122-mutt-send-email-mst@kernel.org> <20190205112655-mutt-send-email-mst@kernel.org> Organization: Red Hat Inc, Message-ID: Date: Tue, 5 Feb 2019 11:34:02 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190205112655-mutt-send-email-mst@kernel.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="V9aoFHT5z9dOGYDTDhMOl4jT7WJZ47Qz7" X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 05 Feb 2019 16:34:11 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --V9aoFHT5z9dOGYDTDhMOl4jT7WJZ47Qz7 Content-Type: multipart/mixed; boundary="Pee5YW5Yxe3EbicsRI8Z3gySiuNnzmVM9"; protected-headers="v1" From: Nitesh Narayan Lal To: "Michael S. Tsirkin" Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Message-ID: Subject: Re: [RFC][Patch v8 1/7] KVM: Support for guest free page hinting --Pee5YW5Yxe3EbicsRI8Z3gySiuNnzmVM9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2/5/19 11:27 AM, Michael S. Tsirkin wrote: > On Tue, Feb 05, 2019 at 08:06:33AM -0500, Nitesh Narayan Lal wrote: >> On 2/4/19 11:14 PM, Michael S. Tsirkin wrote: >>> On Mon, Feb 04, 2019 at 03:18:48PM -0500, Nitesh Narayan Lal wrote: >>>> This patch includes the following: >>>> 1. Basic skeleton for the support >>>> 2. Enablement of x86 platform to use the same >>>> >>>> Signed-off-by: Nitesh Narayan Lal >>>> --- >>>> arch/x86/Kbuild | 2 +- >>>> arch/x86/kvm/Kconfig | 8 ++++++++ >>>> arch/x86/kvm/Makefile | 2 ++ >>>> include/linux/gfp.h | 9 +++++++++ >>>> include/linux/page_hinting.h | 17 +++++++++++++++++ >>>> virt/kvm/page_hinting.c | 36 +++++++++++++++++++++++++++++++++= +++ >>>> 6 files changed, 73 insertions(+), 1 deletion(-) >>>> create mode 100644 include/linux/page_hinting.h >>>> create mode 100644 virt/kvm/page_hinting.c >>>> >>>> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild >>>> index c625f57472f7..3244df4ee311 100644 >>>> --- a/arch/x86/Kbuild >>>> +++ b/arch/x86/Kbuild >>>> @@ -2,7 +2,7 @@ obj-y +=3D entry/ >>>> =20 >>>> obj-$(CONFIG_PERF_EVENTS) +=3D events/ >>>> =20 >>>> -obj-$(CONFIG_KVM) +=3D kvm/ >>>> +obj-$(subst m,y,$(CONFIG_KVM)) +=3D kvm/ >>>> =20 >>>> # Xen paravirtualization support >>>> obj-$(CONFIG_XEN) +=3D xen/ >>>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig >>>> index 72fa955f4a15..2fae31459706 100644 >>>> --- a/arch/x86/kvm/Kconfig >>>> +++ b/arch/x86/kvm/Kconfig >>>> @@ -96,6 +96,14 @@ config KVM_MMU_AUDIT >>>> This option adds a R/W kVM module parameter 'mmu_audit', which al= lows >>>> auditing of KVM MMU events at runtime. >>>> =20 >>>> +# KVM_FREE_PAGE_HINTING will allow the guest to report the free pag= es to the >>>> +# host in regular interval of time. >>>> +config KVM_FREE_PAGE_HINTING >>>> + def_bool y >>>> + depends on KVM >>>> + select VIRTIO >>>> + select VIRTIO_BALLOON >>>> + >>>> # OK, it's a little counter-intuitive to do this, but it puts it ne= atly under >>>> # the virtualization menu. >>>> source "drivers/vhost/Kconfig" >>>> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile >>>> index 69b3a7c30013..78640a80501e 100644 >>>> --- a/arch/x86/kvm/Makefile >>>> +++ b/arch/x86/kvm/Makefile >>>> @@ -16,6 +16,8 @@ kvm-y +=3D x86.o mmu.o emulate.o i8259.o irq.o l= apic.o \ >>>> i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ >>>> hyperv.o page_track.o debugfs.o >>>> =20 >>>> +obj-$(CONFIG_KVM_FREE_PAGE_HINTING) +=3D $(KVM)/page_hinting.o >>>> + >>>> kvm-intel-y +=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs1= 2.o vmx/evmcs.o vmx/nested.o >>>> kvm-amd-y +=3D svm.o pmu_amd.o >>>> =20 >>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >>>> index 5f5e25fd6149..e596527284ba 100644 >>>> --- a/include/linux/gfp.h >>>> +++ b/include/linux/gfp.h >>>> @@ -7,6 +7,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> =20 >>>> struct vm_area_struct; >>>> =20 >>>> @@ -456,6 +457,14 @@ static inline struct zonelist *node_zonelist(in= t nid, gfp_t flags) >>>> return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); >>>> } >>>> =20 >>>> +#ifdef CONFIG_KVM_FREE_PAGE_HINTING >>>> +#define HAVE_ARCH_FREE_PAGE >>>> +static inline void arch_free_page(struct page *page, int order) >>>> +{ >>>> + guest_free_page(page, order); >>>> +} >>>> +#endif >>>> + >>>> #ifndef HAVE_ARCH_FREE_PAGE >>>> static inline void arch_free_page(struct page *page, int order) { }= >>>> #endif >>> OK so arch_free_page hook is used to tie into mm code, >>> with follow-up patches the pages get queued in a list >>> and then sent to hypervisor so it can free them. >>> Fair enough but how do we know the page is >>> not reused by the time it's received by the hypervisor? >>> If it's reused then isn't it a problem that >>> hypervisor calls MADV_DONTNEED on them? >> Hi Michael, >> >> In order to ensure that the page is not reused, we remove it from the >> buddy free list by acquiring the zone lock. After the page is freed by= >> the hypervisor it is returned to the buddy free list again. > Thanks that's good to know. Could you point me to code that does this? In Patch 0006-KVM-Enables-the-kernel-to-isolate-and-report-free-page. hinting_fn() is responsible for scanning the per-cpu-array, acquiring the lock, isolating the page and invoking hyperlist_ready(). Under hyperlist_ready, the hypercall to report the free pages is made and once it is done in this function only those pages are returned to the buddy free list. > >>> >>>> diff --git a/include/linux/page_hinting.h b/include/linux/page_hinti= ng.h >>>> new file mode 100644 >>>> index 000000000000..b54f7428f348 >>>> --- /dev/null >>>> +++ b/include/linux/page_hinting.h >>>> @@ -0,0 +1,17 @@ >>>> +/* >>>> + * Size of the array which is used to store the freed pages is defi= ned by >>>> + * MAX_FGPT_ENTRIES. If possible, we have to find a better way usin= g which >>>> + * we can get rid of the hardcoded array size. >>>> + */ >>>> +#define MAX_FGPT_ENTRIES 1000 >>>> +/* >>>> + * hypervisor_pages - It is a dummy structure passed with the hyper= call. >>>> + * @pfn: page frame number for the page which needs to be sent to t= he host. >>>> + * @order: order of the page needs to be reported to the host. >>>> + */ >>>> +struct hypervisor_pages { >>>> + unsigned long pfn; >>>> + unsigned int order; >>>> +}; >>>> + >>>> +void guest_free_page(struct page *page, int order); >>>> diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c >>>> new file mode 100644 >>>> index 000000000000..818bd6b84e0c >>>> --- /dev/null >>>> +++ b/virt/kvm/page_hinting.c >>>> @@ -0,0 +1,36 @@ >>>> +#include >>>> +#include >>>> +#include >>>> + >>>> +/* >>>> + * struct kvm_free_pages - Tracks the pages which are freed by the = guest. >>>> + * @pfn: page frame number for the page which is freed. >>>> + * @order: order corresponding to the page freed. >>>> + * @zonenum: zone number to which the freed page belongs. >>>> + */ >>>> +struct kvm_free_pages { >>>> + unsigned long pfn; >>>> + unsigned int order; >>>> + int zonenum; >>>> +}; >>>> + >>>> +/* >>>> + * struct page_hinting - holds array objects for the structures use= d to track >>>> + * guest free pages, along with an index variable for each of them.= >>>> + * @kvm_pt: array object for the structure kvm_free_pages. >>>> + * @kvm_pt_idx: index for kvm_free_pages object. >>>> + * @hypervisor_pagelist: array object for the structure hypervisor_= pages. >>>> + * @hyp_idx: index for hypervisor_pages object. >>>> + */ >>>> +struct page_hinting { >>>> + struct kvm_free_pages kvm_pt[MAX_FGPT_ENTRIES]; >>>> + int kvm_pt_idx; >>>> + struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES]; >>>> + int hyp_idx; >>>> +}; >>>> + >>>> +DEFINE_PER_CPU(struct page_hinting, hinting_obj); >>>> + >>>> +void guest_free_page(struct page *page, int order) >>>> +{ >>>> +} >>>> --=20 >>>> 2.17.2 >> --=20 >> Regards >> Nitesh >> > > --=20 Regards Nitesh --Pee5YW5Yxe3EbicsRI8Z3gySiuNnzmVM9-- --V9aoFHT5z9dOGYDTDhMOl4jT7WJZ47Qz7 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEkXcoRVGaqvbHPuAGo4ZA3AYyozkFAlxZuvoACgkQo4ZA3AYy ozkZmQ//Y+1U2KqPdVZTfLI6OK9blx+dvvuX7BUFJZ7VFOK821M5Einmk5DEzZtD 9WgmpvYy14RWSaT5r3Vqgazwom8g8uNFMMZmxNsp1baqOi+i8RaI3kj/SDPArOe+ vFmRnnMLve/mDfjwzKtiEyHz1i3truYx4updaZYE8oh47I5p5vXX6aYSWls8MgBF PixITd+TX/6jR82nU/LWFJoEZ+kBgdrygZgGpGeDFestCVfhlZLtH0zqsVG++b3U vvApx+F6B/XG15Wcdh5T0iViZB5mVS/9qk2fuWhLkDCky1AQr2cDi0ophRE0qEms UBHCzvX3xkjEUzRRxqzBCJyVcmnYDiVFKMtshkmGa52oN2J+8TU27K+l6Q++UW4H gPdVr0nJHLwTbIyNU3FSCxKPdZq8AIEYzz+lNZcRuQtwGrSsmyMMZS7Z1gWEYT7s qc2JiphXDWPMB0nfZXaTghPYJmkEahQStDq9ytJbtkf1VaHgekSz7u2np8jMgdlJ oQAHUuqcEbDY5+3wYhvgoHtGnoCtnsYqJN2PO1Cv/WM5BzF7DRBhR2df2UHGq7YL dsr2F57VBXwvXCx48t4zprs9FuXpjOvZOxm5xybAswJNo2SlMgxEUF4J1NuhOVRV d7y7y3VeOyiCIUCmwoqCg7H5vimXHcQr88WERpF7Ue3YjgBy4cg= =keTC -----END PGP SIGNATURE----- --V9aoFHT5z9dOGYDTDhMOl4jT7WJZ47Qz7--