2021-06-08 18:08:52

by Claudio Imbrenda

[permalink] [raw]
Subject: [PATCH v2 0/2] mm: export __vmalloc_node_range and use it

Export __vmalloc_node_range so it can be used in modules.

Use the newly exported __vmalloc_node_range in KVM on s390 to overcome
a hardware limitation.

Claudio Imbrenda (2):
mm/vmalloc: export __vmalloc_node_range
KVM: s390: fix for hugepage vmalloc

arch/s390/kvm/pv.c | 5 ++++-
mm/vmalloc.c | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)

--
2.31.1


2021-06-08 18:09:27

by Claudio Imbrenda

[permalink] [raw]
Subject: [PATCH v2 1/2] mm/vmalloc: export __vmalloc_node_range

The recent patches to add support for hugepage vmalloc mappings added a
flag for __vmalloc_node_range to allow to request small pages.
This flag is not accessible when calling vmalloc, the only option is to
call directly __vmalloc_node_range, which is not exported.

This means that a module can't vmalloc memory with small pages.

Case in point: KVM on s390x needs to vmalloc a large area, and it needs
to be mapped with small pages, because of a hardware limitation.

This patch exports __vmalloc_node_range so it can be used in modules
too.

Signed-off-by: Claudio Imbrenda <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/vmalloc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a13ac524f6ff..bd6fa160b31b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2937,6 +2937,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,

return NULL;
}
+EXPORT_SYMBOL_GPL(__vmalloc_node_range);

/**
* __vmalloc_node - allocate virtually contiguous memory
--
2.31.1

2021-06-08 18:09:42

by Claudio Imbrenda

[permalink] [raw]
Subject: [PATCH v2 2/2] KVM: s390: fix for hugepage vmalloc

The Create Secure Configuration Ultravisor Call does not support using
large pages for the virtual memory area. This is a hardware limitation.

This patch replaces the vzalloc call with a longer but equivalent
__vmalloc_node_range call, also setting the VM_NO_HUGE_VMAP flag, to
guarantee that this allocation will not be performed with large pages.

Signed-off-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Fixes: 121e6f3258fe393e22c3 ("mm/vmalloc: hugepage vmalloc mappings")
Cc: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David Rientjes <[email protected]>
---
arch/s390/kvm/pv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 813b6e93dc83..6087fe7ae77c 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -140,7 +140,10 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
/* Allocate variable storage */
vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
vlen += uv_info.guest_virt_base_stor_len;
- kvm->arch.pv.stor_var = vzalloc(vlen);
+ kvm->arch.pv.stor_var = __vmalloc_node_range(vlen, PAGE_SIZE, VMALLOC_START, VMALLOC_END,
+ GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
+ VM_NO_HUGE_VMAP, NUMA_NO_NODE,
+ __builtin_return_address(0));
if (!kvm->arch.pv.stor_var)
goto out_err;
return 0;
--
2.31.1

2021-06-08 19:06:14

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: s390: fix for hugepage vmalloc


On 08.06.21 20:06, Claudio Imbrenda wrote:
> The Create Secure Configuration Ultravisor Call does not support using
> large pages for the virtual memory area. This is a hardware limitation.
>
> This patch replaces the vzalloc call with a longer but equivalent
> __vmalloc_node_range call, also setting the VM_NO_HUGE_VMAP flag, to
> guarantee that this allocation will not be performed with large pages.
>
> Signed-off-by: Claudio Imbrenda <[email protected]>
> Reviewed-by: Janosch Frank <[email protected]>
> Fixes: 121e6f3258fe393e22c3 ("mm/vmalloc: hugepage vmalloc mappings")
> Cc: Andrew Morton <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Uladzislau Rezki (Sony) <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: David Rientjes <[email protected]>

Would be good to have this in 5.13, as for everything else we want to have
hugepages in vmalloc space on s390.

In case Andrew picks this up
Acked-by: Christian Borntraeger <[email protected]>
for the KVM/390 part.

> ---
> arch/s390/kvm/pv.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 813b6e93dc83..6087fe7ae77c 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -140,7 +140,10 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
> /* Allocate variable storage */
> vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
> vlen += uv_info.guest_virt_base_stor_len;
> - kvm->arch.pv.stor_var = vzalloc(vlen);
> + kvm->arch.pv.stor_var = __vmalloc_node_range(vlen, PAGE_SIZE, VMALLOC_START, VMALLOC_END,
> + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
> + VM_NO_HUGE_VMAP, NUMA_NO_NODE,
> + __builtin_return_address(0));
> if (!kvm->arch.pv.stor_var)
> goto out_err;
> return 0;
>

2021-06-09 16:36:04

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm/vmalloc: export __vmalloc_node_range

On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
> The recent patches to add support for hugepage vmalloc mappings added a
> flag for __vmalloc_node_range to allow to request small pages.
> This flag is not accessible when calling vmalloc, the only option is to
> call directly __vmalloc_node_range, which is not exported.
>
> This means that a module can't vmalloc memory with small pages.
>
> Case in point: KVM on s390x needs to vmalloc a large area, and it needs
> to be mapped with small pages, because of a hardware limitation.
>
> This patch exports __vmalloc_node_range so it can be used in modules
> too.

No. I spent a lot of effort to mak sure such a low-level API is
not exported.

2021-06-09 16:39:59

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm/vmalloc: export __vmalloc_node_range

On Wed, 9 Jun 2021 16:59:17 +0100
Christoph Hellwig <[email protected]> wrote:

> On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
> > The recent patches to add support for hugepage vmalloc mappings
> > added a flag for __vmalloc_node_range to allow to request small
> > pages. This flag is not accessible when calling vmalloc, the only
> > option is to call directly __vmalloc_node_range, which is not
> > exported.
> >
> > This means that a module can't vmalloc memory with small pages.
> >
> > Case in point: KVM on s390x needs to vmalloc a large area, and it
> > needs to be mapped with small pages, because of a hardware
> > limitation.
> >
> > This patch exports __vmalloc_node_range so it can be used in modules
> > too.
>
> No. I spent a lot of effort to mak sure such a low-level API is
> not exported.

ok, but then how can we vmalloc memory with small pages from KVM?

2021-06-09 18:09:44

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm/vmalloc: export __vmalloc_node_range

On Wed, Jun 09, 2021 at 06:28:09PM +0200, Claudio Imbrenda wrote:
> On Wed, 9 Jun 2021 16:59:17 +0100
> Christoph Hellwig <[email protected]> wrote:
>
> > On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
> > > The recent patches to add support for hugepage vmalloc mappings
> > > added a flag for __vmalloc_node_range to allow to request small
> > > pages. This flag is not accessible when calling vmalloc, the only
> > > option is to call directly __vmalloc_node_range, which is not
> > > exported.
> > >
> > > This means that a module can't vmalloc memory with small pages.
> > >
> > > Case in point: KVM on s390x needs to vmalloc a large area, and it
> > > needs to be mapped with small pages, because of a hardware
> > > limitation.
> > >
> > > This patch exports __vmalloc_node_range so it can be used in modules
> > > too.
> >
> > No. I spent a lot of effort to mak sure such a low-level API is
> > not exported.
>
> ok, but then how can we vmalloc memory with small pages from KVM?
Does the s390x support CONFIG_HAVE_ARCH_HUGE_VMALLOC what is arch
specific?

If not then small pages are used. Or am i missing something?

I agree with Christoph that exporting a low level internals
is not a good idea.

--
Vlad Rezki

2021-06-09 18:16:51

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm/vmalloc: export __vmalloc_node_range

On 09.06.21 18:28, Claudio Imbrenda wrote:
> On Wed, 9 Jun 2021 16:59:17 +0100
> Christoph Hellwig <[email protected]> wrote:
>
>> On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
>>> The recent patches to add support for hugepage vmalloc mappings
>>> added a flag for __vmalloc_node_range to allow to request small
>>> pages. This flag is not accessible when calling vmalloc, the only
>>> option is to call directly __vmalloc_node_range, which is not
>>> exported.
>>>
>>> This means that a module can't vmalloc memory with small pages.
>>>
>>> Case in point: KVM on s390x needs to vmalloc a large area, and it
>>> needs to be mapped with small pages, because of a hardware
>>> limitation.
>>>
>>> This patch exports __vmalloc_node_range so it can be used in modules
>>> too.
>>
>> No. I spent a lot of effort to mak sure such a low-level API is
>> not exported.
>
> ok, but then how can we vmalloc memory with small pages from KVM?

An alternative would be to provide a vmalloc_no_huge function in generic
code (similar to vmalloc_32) (or if preferred in s390 base architecture code)
Something like

void *vmalloc_no_huge(unsigned long size)
{
return __vmalloc_node_flags(size, NUMA_NO_NODE,VM_NO_HUGE_VMAP |
GFP_KERNEL | __GFP_ZERO);
}
EXPORT_SYMBOL(vmalloc_no_huge);

or a similar vzalloc variant.

2021-06-09 18:18:32

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm/vmalloc: export __vmalloc_node_range



On 09.06.21 18:49, Uladzislau Rezki wrote:
> On Wed, Jun 09, 2021 at 06:28:09PM +0200, Claudio Imbrenda wrote:
>> On Wed, 9 Jun 2021 16:59:17 +0100
>> Christoph Hellwig <[email protected]> wrote:
>>
>>> On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
>>>> The recent patches to add support for hugepage vmalloc mappings
>>>> added a flag for __vmalloc_node_range to allow to request small
>>>> pages. This flag is not accessible when calling vmalloc, the only
>>>> option is to call directly __vmalloc_node_range, which is not
>>>> exported.
>>>>
>>>> This means that a module can't vmalloc memory with small pages.
>>>>
>>>> Case in point: KVM on s390x needs to vmalloc a large area, and it
>>>> needs to be mapped with small pages, because of a hardware
>>>> limitation.
>>>>
>>>> This patch exports __vmalloc_node_range so it can be used in modules
>>>> too.
>>>
>>> No. I spent a lot of effort to mak sure such a low-level API is
>>> not exported.
>>
>> ok, but then how can we vmalloc memory with small pages from KVM?
> Does the s390x support CONFIG_HAVE_ARCH_HUGE_VMALLOC what is arch
> specific?

Not yet, but we surely want that for almost everything on s390.
Only this particular firmware interface does not handle large pages
for donated memory.

>
> If not then small pages are used. Or am i missing something?
>
> I agree with Christoph that exporting a low level internals
> is not a good idea.

2021-06-10 05:28:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm/vmalloc: export __vmalloc_node_range

On Wed, Jun 09, 2021 at 07:47:43PM +0200, Christian Borntraeger wrote:
> An alternative would be to provide a vmalloc_no_huge function in generic
> code (similar to vmalloc_32) (or if preferred in s390 base architecture code)
> Something like
>
> void *vmalloc_no_huge(unsigned long size)
> {
> return __vmalloc_node_flags(size, NUMA_NO_NODE,VM_NO_HUGE_VMAP |
> GFP_KERNEL | __GFP_ZERO);
> }
> EXPORT_SYMBOL(vmalloc_no_huge);
>
> or a similar vzalloc variant.

Exactly. Given that this seems to be a weird pecularity of legacy s390
interfaces I'd only export it for 390 for now, although for
documentation purposes I'd probably still keep it in vmalloc.c.