2016-12-06 08:45:47

by Maninder Singh

[permalink] [raw]
Subject: [PATCH 1/1] arm/module: maximum utilization of module area.

This patch defines new macro MODULE_START to ensure kernel text
and module remains within 32 MB of address range.

Tried this patch by inserting 20 MB size module on 4.1 kernel:-

Earlier:-
==========
sh# insmod size.ko
....
insmod: ERROR: could not insert module size.ko: Cannot allocate memory
sh#

With this patch
===============
sh# insmod size.ko
...
sh# lsmod
Module Size Used by
size 20972425 0

Signed-off-by: Vaneet Narang <[email protected]>
Signed-off-by: Maninder Singh <[email protected]>
Reviewed-by: Ajeet Yadav <[email protected]>
---
arch/arm/include/asm/memory.h | 4 ++--
arch/arm/kernel/module.c | 9 ++++++++-
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 76cbd9c..1a0a6e5 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -37,7 +37,7 @@
* TASK_SIZE - the maximum size of a user space task.
* TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area
*/
-#define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M))
+#define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(SZ_32M))
#define TASK_UNMAPPED_BASE ALIGN(TASK_SIZE / 3, SZ_16M)

/*
@@ -50,7 +50,7 @@
* and PAGE_OFFSET - it must be within 32MB of the kernel text.
*/
#ifndef CONFIG_THUMB2_KERNEL
-#define MODULES_VADDR (PAGE_OFFSET - SZ_16M)
+#define MODULES_VADDR (PAGE_OFFSET - SZ_32M)
#else
/* smaller range for Thumb-2 symbols relocation (2^24)*/
#define MODULES_VADDR (PAGE_OFFSET - SZ_8M)
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 4f14b5c..b8e1f9c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -35,12 +35,19 @@
*/
#undef MODULES_VADDR
#define MODULES_VADDR (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#define MODULES_START MODULES_VADDR
+#else
+#ifndef CONFIG_THUMB2_KERNEL
+#define MODULES_START ALIGN((unsigned long)_etext - SZ_32M, PAGE_SIZE)
+#else
+#define MODULES_START MODULES_VADDR
+#endif
#endif

#ifdef CONFIG_MMU
void *module_alloc(unsigned long size)
{
- void *p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+ void *p = __vmalloc_node_range(size, 1, MODULES_START, MODULES_END,
GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
__builtin_return_address(0));
if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
--
1.9.1


2016-12-06 09:13:18

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

On 6 December 2016 at 08:29, Maninder Singh <[email protected]> wrote:
> This patch defines new macro MODULE_START to ensure kernel text
> and module remains within 32 MB of address range.
>
> Tried this patch by inserting 20 MB size module on 4.1 kernel:-
>
> Earlier:-
> ==========
> sh# insmod size.ko
> ....
> insmod: ERROR: could not insert module size.ko: Cannot allocate memory
> sh#
>
> With this patch
> ===============
> sh# insmod size.ko
> ...
> sh# lsmod
> Module Size Used by
> size 20972425 0
>
> Signed-off-by: Vaneet Narang <[email protected]>
> Signed-off-by: Maninder Singh <[email protected]>
> Reviewed-by: Ajeet Yadav <[email protected]>
> ---
> arch/arm/include/asm/memory.h | 4 ++--
> arch/arm/kernel/module.c | 9 ++++++++-
> 2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> index 76cbd9c..1a0a6e5 100644
> --- a/arch/arm/include/asm/memory.h
> +++ b/arch/arm/include/asm/memory.h
> @@ -37,7 +37,7 @@
> * TASK_SIZE - the maximum size of a user space task.
> * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area
> */
> -#define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M))
> +#define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(SZ_32M))
> #define TASK_UNMAPPED_BASE ALIGN(TASK_SIZE / 3, SZ_16M)
>
> /*
> @@ -50,7 +50,7 @@
> * and PAGE_OFFSET - it must be within 32MB of the kernel text.
> */
> #ifndef CONFIG_THUMB2_KERNEL
> -#define MODULES_VADDR (PAGE_OFFSET - SZ_16M)
> +#define MODULES_VADDR (PAGE_OFFSET - SZ_32M)
> #else
> /* smaller range for Thumb-2 symbols relocation (2^24)*/
> #define MODULES_VADDR (PAGE_OFFSET - SZ_8M)
> diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
> index 4f14b5c..b8e1f9c 100644
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -35,12 +35,19 @@
> */
> #undef MODULES_VADDR
> #define MODULES_VADDR (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
> +#define MODULES_START MODULES_VADDR
> +#else
> +#ifndef CONFIG_THUMB2_KERNEL
> +#define MODULES_START ALIGN((unsigned long)_etext - SZ_32M, PAGE_SIZE)

On a multi_v7_defconfig kernel, the text size is >8 MB, which means
you are only adding ~7 MB to the module area, while consuming 16 MB of
additional address space. Given that 20 MB modules are very uncommon,
I think it is better to enable CONFIG_ARM_MODULE_PLTS instead. That
way, there is no need to update these defaults for everyone.

> +#else
> +#define MODULES_START MODULES_VADDR
> +#endif
> #endif
>
> #ifdef CONFIG_MMU
> void *module_alloc(unsigned long size)
> {
> - void *p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> + void *p = __vmalloc_node_range(size, 1, MODULES_START, MODULES_END,
> GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> __builtin_return_address(0));
> if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
> --
> 1.9.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2016-12-06 15:57:25

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

On Tue, 6 Dec 2016, Maninder Singh wrote:

> This patch defines new macro MODULE_START to ensure kernel text
> and module remains within 32 MB of address range.
>
> Tried this patch by inserting 20 MB size module on 4.1 kernel:-
>
> Earlier:-
> ==========
> sh# insmod size.ko
> ....
> insmod: ERROR: could not insert module size.ko: Cannot allocate memory
> sh#
>
> With this patch
> ===============
> sh# insmod size.ko
> ...
> sh# lsmod
> Module Size Used by
> size 20972425 0

Could you please try enabling CONFIG_ARM_MODULE_PLTS instead of this
patch?


Nicolas

2016-12-12 10:11:33

by Vaneet Narang

[permalink] [raw]
Subject: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

Hi Ard,

>> +#ifndef CONFIG_THUMB2_KERNEL
>> +#define MODULES_START ALIGN((unsigned long)_etext - SZ_32M, PAGE_SIZE)
>
>On a multi_v7_defconfig kernel, the text size is >8 MB, which means
>you are only adding ~7 MB to the module area, while consuming 16 MB of
>additional address space.

I am not sure if 16MB virtual address space will make any difference on embedded
systems where physical memory is already less than virtual address space.
if required address space wastage can be reduced by keeping TASK_SIZE as
PAGE_OFFSET - 24MB like below

#define MODULES_VADDR (PAGE_OFFSET - SZ_24M)
#define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(SZ_24M))


> Given that 20 MB modules are very uncommon,

Size of all modules can be 20MB, this seems to be normal scenario.

>I think it is better to enable CONFIG_ARM_MODULE_PLTS instead. That

CONFIG_ARM_MODULE_PLTS has function call overhead as it refers PLT table
while calling kernel functions. Also size of modules will also gets increased a bit.
So using short calls from modules to kernel will be faster.
These changes trying to utilize best available space for kernel modules for
making short calls.
So CONFIG_ARM_MODULE_PLTS is not required when modules
can be accomdated within 20MB.

>way, there is no need to update these defaults for everyone.
>

>> +#else
>> +#define MODULES_START MODULES_VADDR

Thanks & Regards,
Vaneet Narang

2016-12-12 10:24:08

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

On Tue, Dec 06, 2016 at 01:59:35PM +0530, Maninder Singh wrote:
> This patch defines new macro MODULE_START to ensure kernel text
> and module remains within 32 MB of address range.
>
> Tried this patch by inserting 20 MB size module on 4.1 kernel:-
>
> Earlier:-
> ==========
> sh# insmod size.ko
> ....
> insmod: ERROR: could not insert module size.ko: Cannot allocate memory
> sh#
>
> With this patch
> ===============
> sh# insmod size.ko
> ...
> sh# lsmod
> Module Size Used by
> size 20972425 0
>
> Signed-off-by: Vaneet Narang <[email protected]>
> Signed-off-by: Maninder Singh <[email protected]>
> Reviewed-by: Ajeet Yadav <[email protected]>

A PC24 relocation has a range of +/-32MB. This means that where-ever
the module is placed, it must be capable of reaching any function
within the kernel text, which may itself be quite large (eg, 8MB, or
possibly larger). The module area exists to allow modules to be
located in an area where PC24 relocations are able to reach all of the
kernel text on sensibly configured kernels, thereby allowing for
optimal performance.

If you wish to load large modules, then enable ARM_MODULE_PLTS, which
will use the less efficient PLT method (which is basically an indirect
function call) for relocations that PC24 can't handle, and will allow
the module to be loaded into the vmalloc area.

Growing the module area so that smaller modules also get penalised by
the PLT indirection is not sane.

So, I'm afraid this change is not acceptable.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2016-12-12 12:05:26

by Vaneet Narang

[permalink] [raw]
Subject: RE: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

Hi,

>A PC24 relocation has a range of +/-32MB. This means that where-ever
>the module is placed, it must be capable of reaching any function
>within the kernel text, which may itself be quite large (eg, 8MB, or
>possibly larger). The module area exists to allow modules to be
>located in an area where PC24 relocations are able to reach all of the
>kernel text on sensibly configured kernels, thereby allowing for
>optimal performance.
>
>If you wish to load large modules, then enable ARM_MODULE_PLTS, which
>will use the less efficient PLT method (which is basically an indirect
>function call) for relocations that PC24 can't handle, and will allow
>the module to be loaded into the vmalloc area.
>
>Growing the module area so that smaller modules also get penalised by
>the PLT indirection is not sane.

This is exactly what i am saying. These changes are useful to accomdate
22MB modules without enabling ARM_MODULE_PLTS.
Without these changes ARM_MODULE_PLTS needs to be enabled to insert more than 14MB
modules but with these changes 22MB modules (considering 8MB uImage) can be inserted
without enabling ARM_MODULE_PLTS.

So till 22MB modules are not penalised but without these changes once size of modules
gets over 14MB PLT becomes must. 

Regards,
Vaneet Narang

2016-12-12 12:57:10

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

On Mon, Dec 12, 2016 at 11:22:30AM +0000, Vaneet Narang wrote:
> Hi,
>
> >A PC24 relocation has a range of +/-32MB. This means that where-ever
> >the module is placed, it must be capable of reaching any function
> >within the kernel text, which may itself be quite large (eg, 8MB, or
> >possibly larger). The module area exists to allow modules to be
> >located in an area where PC24 relocations are able to reach all of the
> >kernel text on sensibly configured kernels, thereby allowing for
> >optimal performance.
> >
> >If you wish to load large modules, then enable ARM_MODULE_PLTS, which
> >will use the less efficient PLT method (which is basically an indirect
> >function call) for relocations that PC24 can't handle, and will allow
> >the module to be loaded into the vmalloc area.
> >
> >Growing the module area so that smaller modules also get penalised by
> >the PLT indirection is not sane.
>
> This is exactly what i am saying. These changes are useful to accomdate
> 22MB modules without enabling ARM_MODULE_PLTS.

Obviously I wasn't clear enough.

No, I do not like your change for the reasons given above. I'm not going
to accept it.

If you want to use 22MB modules, enable ARM_MODULE_PLTS.

Thank you.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2016-12-12 15:29:01

by Nicolas Pitre

[permalink] [raw]
Subject: RE: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

On Mon, 12 Dec 2016, Vaneet Narang wrote:

> Hi,
>
> >A PC24 relocation has a range of +/-32MB. This means that where-ever
> >the module is placed, it must be capable of reaching any function
> >within the kernel text, which may itself be quite large (eg, 8MB, or
> >possibly larger). The module area exists to allow modules to be
> >located in an area where PC24 relocations are able to reach all of the
> >kernel text on sensibly configured kernels, thereby allowing for
> >optimal performance.
> >
> >If you wish to load large modules, then enable ARM_MODULE_PLTS, which
> >will use the less efficient PLT method (which is basically an indirect
> >function call) for relocations that PC24 can't handle, and will allow
> >the module to be loaded into the vmalloc area.
> >
> >Growing the module area so that smaller modules also get penalised by
> >the PLT indirection is not sane.
>
> This is exactly what i am saying. These changes are useful to accomdate
> 22MB modules without enabling ARM_MODULE_PLTS.

I think you need to figure out why you need such a huge module in the
first place. That is very uncommon indeed.


Nicolas

2016-12-12 16:57:46

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: Re: [PATCH 1/1] arm/module: maximum utilization of module area.

On 12 December 2016 at 15:28, Nicolas Pitre <[email protected]> wrote:
> On Mon, 12 Dec 2016, Vaneet Narang wrote:
>
>> Hi,
>>
>> >A PC24 relocation has a range of +/-32MB. This means that where-ever
>> >the module is placed, it must be capable of reaching any function
>> >within the kernel text, which may itself be quite large (eg, 8MB, or
>> >possibly larger). The module area exists to allow modules to be
>> >located in an area where PC24 relocations are able to reach all of the
>> >kernel text on sensibly configured kernels, thereby allowing for
>> >optimal performance.
>> >
>> >If you wish to load large modules, then enable ARM_MODULE_PLTS, which
>> >will use the less efficient PLT method (which is basically an indirect
>> >function call) for relocations that PC24 can't handle, and will allow
>> >the module to be loaded into the vmalloc area.
>> >
>> >Growing the module area so that smaller modules also get penalised by
>> >the PLT indirection is not sane.
>>
>> This is exactly what i am saying. These changes are useful to accomdate
>> 22MB modules without enabling ARM_MODULE_PLTS.
>
> I think you need to figure out why you need such a huge module in the
> first place. That is very uncommon indeed.
>

Also, note that the module PLT code was recently optimized, to remove
some pathological behavior which severely affected load times of large
modules.

Can you quantify the performance hit you are taking when using module
PLTs? And the actual increase in memory footprint?