Export __vmalloc_node_range so it can be used in modules.
Use the newly exported __vmalloc_node_range in KVM on s390 to overcome
a hardware limitation.
Claudio Imbrenda (2):
mm/vmalloc: export __vmalloc_node_range
KVM: s390: fix for hugepage vmalloc
arch/s390/kvm/pv.c | 5 ++++-
mm/vmalloc.c | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
--
2.31.1
The recent patches to add support for hugepage vmalloc mappings added a
flag for __vmalloc_node_range to allow to request small pages.
This flag is not accessible when calling vmalloc, the only option is to
call directly __vmalloc_node_range, which is not exported.
This means that a module can't vmalloc memory with small pages.
Case in point: KVM on s390x needs to vmalloc a large area, and it needs
to be mapped with small pages, because of a hardware limitation.
This patch exports __vmalloc_node_range so it can be used in modules
too.
Signed-off-by: Claudio Imbrenda <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/vmalloc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a13ac524f6ff..bd6fa160b31b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2937,6 +2937,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
return NULL;
}
+EXPORT_SYMBOL_GPL(__vmalloc_node_range);
/**
* __vmalloc_node - allocate virtually contiguous memory
--
2.31.1
The Create Secure Configuration Ultravisor Call does not support using
large pages for the virtual memory area. This is a hardware limitation.
This patch replaces the vzalloc call with a longer but equivalent
__vmalloc_node_range call, also setting the VM_NO_HUGE_VMAP flag, to
guarantee that this allocation will not be performed with large pages.
Signed-off-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Fixes: 121e6f3258fe393e22c3 ("mm/vmalloc: hugepage vmalloc mappings")
Cc: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David Rientjes <[email protected]>
---
arch/s390/kvm/pv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 813b6e93dc83..6087fe7ae77c 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -140,7 +140,10 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
/* Allocate variable storage */
vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
vlen += uv_info.guest_virt_base_stor_len;
- kvm->arch.pv.stor_var = vzalloc(vlen);
+ kvm->arch.pv.stor_var = __vmalloc_node_range(vlen, PAGE_SIZE, VMALLOC_START, VMALLOC_END,
+ GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
+ VM_NO_HUGE_VMAP, NUMA_NO_NODE,
+ __builtin_return_address(0));
if (!kvm->arch.pv.stor_var)
goto out_err;
return 0;
--
2.31.1
On 08.06.21 20:06, Claudio Imbrenda wrote:
> The Create Secure Configuration Ultravisor Call does not support using
> large pages for the virtual memory area. This is a hardware limitation.
>
> This patch replaces the vzalloc call with a longer but equivalent
> __vmalloc_node_range call, also setting the VM_NO_HUGE_VMAP flag, to
> guarantee that this allocation will not be performed with large pages.
>
> Signed-off-by: Claudio Imbrenda <[email protected]>
> Reviewed-by: Janosch Frank <[email protected]>
> Fixes: 121e6f3258fe393e22c3 ("mm/vmalloc: hugepage vmalloc mappings")
> Cc: Andrew Morton <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Uladzislau Rezki (Sony) <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: David Rientjes <[email protected]>
Would be good to have this in 5.13, as for everything else we want to have
hugepages in vmalloc space on s390.
In case Andrew picks this up
Acked-by: Christian Borntraeger <[email protected]>
for the KVM/390 part.
> ---
> arch/s390/kvm/pv.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 813b6e93dc83..6087fe7ae77c 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -140,7 +140,10 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
> /* Allocate variable storage */
> vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
> vlen += uv_info.guest_virt_base_stor_len;
> - kvm->arch.pv.stor_var = vzalloc(vlen);
> + kvm->arch.pv.stor_var = __vmalloc_node_range(vlen, PAGE_SIZE, VMALLOC_START, VMALLOC_END,
> + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
> + VM_NO_HUGE_VMAP, NUMA_NO_NODE,
> + __builtin_return_address(0));
> if (!kvm->arch.pv.stor_var)
> goto out_err;
> return 0;
>
On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
> The recent patches to add support for hugepage vmalloc mappings added a
> flag for __vmalloc_node_range to allow to request small pages.
> This flag is not accessible when calling vmalloc, the only option is to
> call directly __vmalloc_node_range, which is not exported.
>
> This means that a module can't vmalloc memory with small pages.
>
> Case in point: KVM on s390x needs to vmalloc a large area, and it needs
> to be mapped with small pages, because of a hardware limitation.
>
> This patch exports __vmalloc_node_range so it can be used in modules
> too.
No. I spent a lot of effort to mak sure such a low-level API is
not exported.
On Wed, 9 Jun 2021 16:59:17 +0100
Christoph Hellwig <[email protected]> wrote:
> On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
> > The recent patches to add support for hugepage vmalloc mappings
> > added a flag for __vmalloc_node_range to allow to request small
> > pages. This flag is not accessible when calling vmalloc, the only
> > option is to call directly __vmalloc_node_range, which is not
> > exported.
> >
> > This means that a module can't vmalloc memory with small pages.
> >
> > Case in point: KVM on s390x needs to vmalloc a large area, and it
> > needs to be mapped with small pages, because of a hardware
> > limitation.
> >
> > This patch exports __vmalloc_node_range so it can be used in modules
> > too.
>
> No. I spent a lot of effort to mak sure such a low-level API is
> not exported.
ok, but then how can we vmalloc memory with small pages from KVM?
On Wed, Jun 09, 2021 at 06:28:09PM +0200, Claudio Imbrenda wrote:
> On Wed, 9 Jun 2021 16:59:17 +0100
> Christoph Hellwig <[email protected]> wrote:
>
> > On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
> > > The recent patches to add support for hugepage vmalloc mappings
> > > added a flag for __vmalloc_node_range to allow to request small
> > > pages. This flag is not accessible when calling vmalloc, the only
> > > option is to call directly __vmalloc_node_range, which is not
> > > exported.
> > >
> > > This means that a module can't vmalloc memory with small pages.
> > >
> > > Case in point: KVM on s390x needs to vmalloc a large area, and it
> > > needs to be mapped with small pages, because of a hardware
> > > limitation.
> > >
> > > This patch exports __vmalloc_node_range so it can be used in modules
> > > too.
> >
> > No. I spent a lot of effort to mak sure such a low-level API is
> > not exported.
>
> ok, but then how can we vmalloc memory with small pages from KVM?
Does the s390x support CONFIG_HAVE_ARCH_HUGE_VMALLOC what is arch
specific?
If not then small pages are used. Or am i missing something?
I agree with Christoph that exporting a low level internals
is not a good idea.
--
Vlad Rezki
On 09.06.21 18:28, Claudio Imbrenda wrote:
> On Wed, 9 Jun 2021 16:59:17 +0100
> Christoph Hellwig <[email protected]> wrote:
>
>> On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
>>> The recent patches to add support for hugepage vmalloc mappings
>>> added a flag for __vmalloc_node_range to allow to request small
>>> pages. This flag is not accessible when calling vmalloc, the only
>>> option is to call directly __vmalloc_node_range, which is not
>>> exported.
>>>
>>> This means that a module can't vmalloc memory with small pages.
>>>
>>> Case in point: KVM on s390x needs to vmalloc a large area, and it
>>> needs to be mapped with small pages, because of a hardware
>>> limitation.
>>>
>>> This patch exports __vmalloc_node_range so it can be used in modules
>>> too.
>>
>> No. I spent a lot of effort to mak sure such a low-level API is
>> not exported.
>
> ok, but then how can we vmalloc memory with small pages from KVM?
An alternative would be to provide a vmalloc_no_huge function in generic
code (similar to vmalloc_32) (or if preferred in s390 base architecture code)
Something like
void *vmalloc_no_huge(unsigned long size)
{
return __vmalloc_node_flags(size, NUMA_NO_NODE,VM_NO_HUGE_VMAP |
GFP_KERNEL | __GFP_ZERO);
}
EXPORT_SYMBOL(vmalloc_no_huge);
or a similar vzalloc variant.
On 09.06.21 18:49, Uladzislau Rezki wrote:
> On Wed, Jun 09, 2021 at 06:28:09PM +0200, Claudio Imbrenda wrote:
>> On Wed, 9 Jun 2021 16:59:17 +0100
>> Christoph Hellwig <[email protected]> wrote:
>>
>>> On Tue, Jun 08, 2021 at 08:06:17PM +0200, Claudio Imbrenda wrote:
>>>> The recent patches to add support for hugepage vmalloc mappings
>>>> added a flag for __vmalloc_node_range to allow to request small
>>>> pages. This flag is not accessible when calling vmalloc, the only
>>>> option is to call directly __vmalloc_node_range, which is not
>>>> exported.
>>>>
>>>> This means that a module can't vmalloc memory with small pages.
>>>>
>>>> Case in point: KVM on s390x needs to vmalloc a large area, and it
>>>> needs to be mapped with small pages, because of a hardware
>>>> limitation.
>>>>
>>>> This patch exports __vmalloc_node_range so it can be used in modules
>>>> too.
>>>
>>> No. I spent a lot of effort to mak sure such a low-level API is
>>> not exported.
>>
>> ok, but then how can we vmalloc memory with small pages from KVM?
> Does the s390x support CONFIG_HAVE_ARCH_HUGE_VMALLOC what is arch
> specific?
Not yet, but we surely want that for almost everything on s390.
Only this particular firmware interface does not handle large pages
for donated memory.
>
> If not then small pages are used. Or am i missing something?
>
> I agree with Christoph that exporting a low level internals
> is not a good idea.
On Wed, Jun 09, 2021 at 07:47:43PM +0200, Christian Borntraeger wrote:
> An alternative would be to provide a vmalloc_no_huge function in generic
> code (similar to vmalloc_32) (or if preferred in s390 base architecture code)
> Something like
>
> void *vmalloc_no_huge(unsigned long size)
> {
> return __vmalloc_node_flags(size, NUMA_NO_NODE,VM_NO_HUGE_VMAP |
> GFP_KERNEL | __GFP_ZERO);
> }
> EXPORT_SYMBOL(vmalloc_no_huge);
>
> or a similar vzalloc variant.
Exactly. Given that this seems to be a weird pecularity of legacy s390
interfaces I'd only export it for 390 for now, although for
documentation purposes I'd probably still keep it in vmalloc.c.