LinuxLists.cc - [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM

2018-12-26 13:40:46

Subject: [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM

From: Yao Yuan <[email protected]>

Signed-off-by: Yao Yuan <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
---
arch/x86/kvm/mmu.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

--- linux.orig/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.846720344 +0800
+++ linux/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.842719614 +0800
@@ -950,6 +950,16 @@ static void mmu_free_memory_cache(struct
kmem_cache_free(cache, mc->objects[--mc->nobjs]);
}

+static unsigned long __get_dram_free_pages(gfp_t gfp_mask)
+{
+ struct page *page;
+
+ page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id());
+ if (!page)
+ return 0;
+ return (unsigned long) page_address(page);
+}
+
static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
int min)
{
@@ -958,7 +968,7 @@ static int mmu_topup_memory_cache_page(s
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+ page = (void *)__get_dram_free_pages(GFP_KERNEL_ACCOUNT);
if (!page)
return cache->nobjs >= min ? 0 : -ENOMEM;
cache->objects[cache->nobjs++] = page;

2019-01-01 14:39:55

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM

Fengguang Wu <[email protected]> writes:

> From: Yao Yuan <[email protected]>
>
> Signed-off-by: Yao Yuan <[email protected]>
> Signed-off-by: Fengguang Wu <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> --- linux.orig/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.846720344 +0800
> +++ linux/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.842719614 +0800
> @@ -950,6 +950,16 @@ static void mmu_free_memory_cache(struct
> kmem_cache_free(cache, mc->objects[--mc->nobjs]);
> }
>
> +static unsigned long __get_dram_free_pages(gfp_t gfp_mask)
> +{
> + struct page *page;
> +
> + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id());
> + if (!page)
> + return 0;
> + return (unsigned long) page_address(page);
> +}
> +

May be it is explained in other patches. What is preventing the
allocation from pmem here? Is it that we are not using the memory
policy prefered node id and hence the zone list we built won't have the
PMEM node?

> static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
> int min)
> {
> @@ -958,7 +968,7 @@ static int mmu_topup_memory_cache_page(s
> if (cache->nobjs >= min)
> return 0;
> while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
> - page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> + page = (void *)__get_dram_free_pages(GFP_KERNEL_ACCOUNT);
> if (!page)
> return cache->nobjs >= min ? 0 : -ENOMEM;
> cache->objects[cache->nobjs++] = page;

-aneesh

2019-01-02 01:18:51

by Yuan Yao

[permalink] [raw]

Subject: Re: [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM

On Tue, Jan 01, 2019 at 02:53:07PM +0530, Aneesh Kumar K.V wrote:
> Fengguang Wu <[email protected]> writes:
>
> > From: Yao Yuan <[email protected]>
> >
> > Signed-off-by: Yao Yuan <[email protected]>
> > Signed-off-by: Fengguang Wu <[email protected]>
> > ---
> > arch/x86/kvm/mmu.c | 12 +++++++++++-
> > 1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > --- linux.orig/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.846720344 +0800
> > +++ linux/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.842719614 +0800
> > @@ -950,6 +950,16 @@ static void mmu_free_memory_cache(struct
> > kmem_cache_free(cache, mc->objects[--mc->nobjs]);
> > }
> >
> > +static unsigned long __get_dram_free_pages(gfp_t gfp_mask)
> > +{
> > + struct page *page;
> > +
> > + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id());
> > + if (!page)
> > + return 0;
> > + return (unsigned long) page_address(page);
> > +}
> > +
>
> May be it is explained in other patches. What is preventing the
> allocation from pmem here? Is it that we are not using the memory
> policy prefered node id and hence the zone list we built won't have the
> PMEM node?

That because the PMEM nodes are memory-only node in the patchset,
so numa_node_id() will always return the node id from DRAM nodes.

About the zone list, yes in patch 10/21 we build the PMEM nodes to
seperate zonelist, so DRAM nodes will not fall back to PMEM nodes.

>
> > static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
> > int min)
> > {
> > @@ -958,7 +968,7 @@ static int mmu_topup_memory_cache_page(s
> > if (cache->nobjs >= min)
> > return 0;
> > while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
> > - page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> > + page = (void *)__get_dram_free_pages(GFP_KERNEL_ACCOUNT);
> > if (!page)
> > return cache->nobjs >= min ? 0 : -ENOMEM;
> > cache->objects[cache->nobjs++] = page;
>
> -aneesh
>

2019-01-02 19:08:34

by Dave Hansen

[permalink] [raw]

Subject: Re: [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM

On 12/26/18 5:14 AM, Fengguang Wu wrote:
> +static unsigned long __get_dram_free_pages(gfp_t gfp_mask)
> +{
> + struct page *page;
> +
> + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id());
> + if (!page)
> + return 0;
> + return (unsigned long) page_address(page);
> +}

There seems to be a ton of *policy* baked into these patches. For
instance: thou shalt not allocate page tables pages from PMEM. That's
surely not a policy we want to inflict on every Linux user until the end
of time.

I think the more important question is how we can have the specific
policy that this patch implements, but also leave open room for other
policies, such as: "I don't care how slow this VM runs, minimize the
amount of fast memory it eats."

2019-01-07 10:26:25

by Fengguang Wu

[permalink] [raw]

Subject: Re: [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM

On Wed, Jan 02, 2019 at 08:47:25AM -0800, Dave Hansen wrote:
>On 12/26/18 5:14 AM, Fengguang Wu wrote:
>> +static unsigned long __get_dram_free_pages(gfp_t gfp_mask)
>> +{
>> + struct page *page;
>> +
>> + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id());
>> + if (!page)
>> + return 0;
>> + return (unsigned long) page_address(page);
>> +}
>
>There seems to be a ton of *policy* baked into these patches. For
>instance: thou shalt not allocate page tables pages from PMEM. That's
>surely not a policy we want to inflict on every Linux user until the end
>of time.

Right. It's straight forward policy for users that care performance.
The project is planned by 3 steps, at this moment we are in phase (1):

1) core functionalities, easy to backport
2) upstream-able total solution
3) upstream when API stabilized

The dumb kernel interface /proc/PID/idle_pages enables doing
the majority policies in user space. However for the other smaller
parts, it looks easier to just implement an obvious policy first.
Then to consider more possibilities.

>I think the more important question is how we can have the specific
>policy that this patch implements, but also leave open room for other
>policies, such as: "I don't care how slow this VM runs, minimize the
>amount of fast memory it eats."

Agreed. I'm open for more ways. We can treat these patches as the
soliciting version. If anyone send reasonable improvements or even
totally different way of doing it, I'd be happy to incorporate.

Thanks,
Fengguang