2019-01-21 12:40:43

by Eial Czerwacki

[permalink] [raw]
Subject: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)
by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
causing the struct size to exceed the size ok 8KB.

in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.

the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
for f in `find -name *.ko`; do echo $f; readelf -S $f |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc

Signed-off-by: Eial Czerwacki <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>
Signed-off-by: Oren Twaig <[email protected]>
---
include/linux/percpu.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 70b7123..6b79693 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -14,7 +14,11 @@

/* enough to cover all DEFINE_PER_CPUs in modules */
#ifdef CONFIG_MODULES
+#ifdef X86_VSMP
+#define PERCPU_MODULE_RESERVE (1 << 16)
+#else
#define PERCPU_MODULE_RESERVE (8 << 10)
+#endif
#else
#define PERCPU_MODULE_RESERVE 0
#endif
--
2.7.4



2019-01-30 10:35:25

by Eial Czerwacki

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Greetings,

On 1/21/19 1:47 PM, Eial Czerwacki wrote:
> as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)
> by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
> causing the struct size to exceed the size ok 8KB.
>
> in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.
>
> the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
> for f in `find -name *.ko`; do echo $f; readelf -S $f |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc
>
> Signed-off-by: Eial Czerwacki <[email protected]>
> Signed-off-by: Shai Fultheim <[email protected]>
> Signed-off-by: Oren Twaig <[email protected]>
> ---
> include/linux/percpu.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index 70b7123..6b79693 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -14,7 +14,11 @@
>
> /* enough to cover all DEFINE_PER_CPUs in modules */
> #ifdef CONFIG_MODULES
> +#ifdef X86_VSMP
> +#define PERCPU_MODULE_RESERVE (1 << 16)
> +#else
> #define PERCPU_MODULE_RESERVE (8 << 10)
> +#endif
> #else
> #define PERCPU_MODULE_RESERVE 0
> #endif
>
is it possible to push this patch to mainline?
it seems like no objections/comment regarding it exists.
we'd like to fix the bug mentioned above.

2019-03-01 18:34:44

by Barret Rhoden

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Hi -

On 01/21/2019 06:47 AM, Eial Czerwacki wrote:
>

Your main issue was that you only sent this patch to LKML, but not the
maintainers of the file. If you don't, your patch might get lost. To
get the appropriate people and lists, run:

scripts/get_maintainer.pl YOUR_PATCH.patch.

For this patch, you'll get this:

Dennis Zhou <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
Tejun Heo <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
Christoph Lameter <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
[email protected] (open list)

I added the three maintainers to this email.

I have a few minor comments below.

> [PATCH] percpu/module resevation: change resevation size iff X86_VSMP
is set

You misspelled 'reservation'. Also, I'd just say: "percpu: increase
module reservation size if X86_VSMP is set". ('change' -> 'increase'),
only says 'reservation' once.)

> as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)

I think you can add a tag for this right above your Signed-off-by tags.
e.g.:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201339

> by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
> causing the struct size to exceed the size ok 8KB.
^of

Which struct are you talking about? I have one in mind, but others
might not know from reading the commit message.

I ran into this on https://bugzilla.kernel.org/show_bug.cgi?id=202511.
In that case, it was because modules (drm and amdkfd) were using
DEFINE_SRCU, which does a DEFINE_PER_CPU on struct srcu_data, and that
used ____cacheline_internodealigned_in_smp.

>
> in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.
^increase

>
> the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
^calculated

> for f in `find -name *.ko`; do echo $f; readelf -S $f |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc

Not sure how useful the one-liner is, versus a description of what
you're doing. i.e. "the size of all module percpu data sections, or
something."

Also, how close was that calculated value to 64K? If more modules start
using DEFINE_SRCU, each of which uses 8K, then that 64K might run out.

Thanks,
Barret

> Signed-off-by: Eial Czerwacki <[email protected]>
> Signed-off-by: Shai Fultheim <[email protected]>
> Signed-off-by: Oren Twaig <[email protected]>
> ---
> include/linux/percpu.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index 70b7123..6b79693 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -14,7 +14,11 @@
>
> /* enough to cover all DEFINE_PER_CPUs in modules */
> #ifdef CONFIG_MODULES
> +#ifdef X86_VSMP
> +#define PERCPU_MODULE_RESERVE (1 << 16)
> +#else
> #define PERCPU_MODULE_RESERVE (8 << 10)
> +#endif
> #else
> #define PERCPU_MODULE_RESERVE 0
> #endif
>


2019-03-01 20:35:44

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Hi Barret,

On Fri, Mar 01, 2019 at 01:30:15PM -0500, Barret Rhoden wrote:
> Hi -
>
> On 01/21/2019 06:47 AM, Eial Czerwacki wrote:
> >
>
> Your main issue was that you only sent this patch to LKML, but not the
> maintainers of the file. If you don't, your patch might get lost. To get
> the appropriate people and lists, run:
>
> scripts/get_maintainer.pl YOUR_PATCH.patch.
>
> For this patch, you'll get this:
>
> Dennis Zhou <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
> Tejun Heo <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
> Christoph Lameter <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
> [email protected] (open list)
>
> I added the three maintainers to this email.
>
> I have a few minor comments below.
>
> > [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is
> set
>
> You misspelled 'reservation'. Also, I'd just say: "percpu: increase module
> reservation size if X86_VSMP is set". ('change' -> 'increase'), only says
> 'reservation' once.)
>
> > as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)
>
> I think you can add a tag for this right above your Signed-off-by tags.
> e.g.:
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201339
>
> > by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
> > causing the struct size to exceed the size ok 8KB.
> ^of
>
> Which struct are you talking about? I have one in mind, but others might
> not know from reading the commit message.
>
> I ran into this on https://bugzilla.kernel.org/show_bug.cgi?id=202511. In
> that case, it was because modules (drm and amdkfd) were using DEFINE_SRCU,
> which does a DEFINE_PER_CPU on struct srcu_data, and that used
> ____cacheline_internodealigned_in_smp.
>
> >
> > in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.
> ^increase
>
> >
> > the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
> ^calculated
>
> > for f in `find -name *.ko`; do echo $f; readelf -S $f |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc
>
> Not sure how useful the one-liner is, versus a description of what you're
> doing. i.e. "the size of all module percpu data sections, or something."
>
> Also, how close was that calculated value to 64K? If more modules start
> using DEFINE_SRCU, each of which uses 8K, then that 64K might run out.
>
> Thanks,
> Barret
>
> > Signed-off-by: Eial Czerwacki <[email protected]>
> > Signed-off-by: Shai Fultheim <[email protected]>
> > Signed-off-by: Oren Twaig <[email protected]>
> > ---
> > include/linux/percpu.h | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> > index 70b7123..6b79693 100644
> > --- a/include/linux/percpu.h
> > +++ b/include/linux/percpu.h
> > @@ -14,7 +14,11 @@
> > /* enough to cover all DEFINE_PER_CPUs in modules */
> > #ifdef CONFIG_MODULES
> > +#ifdef X86_VSMP
> > +#define PERCPU_MODULE_RESERVE (1 << 16)
> > +#else
> > #define PERCPU_MODULE_RESERVE (8 << 10)
> > +#endif
> > #else
> > #define PERCPU_MODULE_RESERVE 0
> > #endif
> >
>

Thanks for sending this to me.

I must say, I really do not want to expand the reserved region. In most
cases, it can easily end up unused and thus wasted memory as it is hard
allocated on boot. This is done because code gen assumes static
variables are close to the program counter. This would not be true with
dynamic allocations which being at the end of the vmalloc area
(Summarized from Tejun's account in [1]).

Another note on the reserved region. It starts at the end of the static
region which means it generally isn't page aligned. So while an 8kb
allocation would fit, a 4kb alignment more than likely would fail.
Something as large as 8kb should probably be dynamically allocated as
well.

I read through the bugzilla report and it seems that the culprits are:
drivers/gpu/drm/amd/amdkfd/kfd_process.c:DEFINE_SRCU(kfd_processes_srcu);
drivers/gpu/drm/drm_drv.c:DEFINE_STATIC_SRCU(drm_unplug_srcu);

Is there a reason we cannot dynamically initialize these structs? I've
cced Paul McKenney because we saw an issue with ipmi in December [1].

[1] https://lore.kernel.org/linux-mm/CAJM9R-JWO1P_qJzw2JboMH2dgPX7K1tF49nO5ojvf=iwGddXRQ@mail.gmail.com/

Thanks,
Dennis

2019-03-01 21:29:02

by Barret Rhoden

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Hi -

On 03/01/2019 03:34 PM, Dennis Zhou wrote:
> Hi Barret,
>
> On Fri, Mar 01, 2019 at 01:30:15PM -0500, Barret Rhoden wrote:
>> Hi -
>>
>> On 01/21/2019 06:47 AM, Eial Czerwacki wrote:
>>>
>>
>> Your main issue was that you only sent this patch to LKML, but not the
>> maintainers of the file. If you don't, your patch might get lost. To get
>> the appropriate people and lists, run:
>>
>> scripts/get_maintainer.pl YOUR_PATCH.patch.
>>
>> For this patch, you'll get this:
>>
>> Dennis Zhou <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
>> Tejun Heo <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
>> Christoph Lameter <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
>> [email protected] (open list)
>>
>> I added the three maintainers to this email.
>>
>> I have a few minor comments below.
>>
>>> [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is
>> set
>>
>> You misspelled 'reservation'. Also, I'd just say: "percpu: increase module
>> reservation size if X86_VSMP is set". ('change' -> 'increase'), only says
>> 'reservation' once.)
>>
>>> as reported in bug #201339 (https://bugzilla.kernel.org/show_bug.cgi?id=201339)
>>
>> I think you can add a tag for this right above your Signed-off-by tags.
>> e.g.:
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201339
>>
>>> by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from the default one
>>> causing the struct size to exceed the size ok 8KB.
>> ^of
>>
>> Which struct are you talking about? I have one in mind, but others might
>> not know from reading the commit message.
>>
>> I ran into this on https://bugzilla.kernel.org/show_bug.cgi?id=202511. In
>> that case, it was because modules (drm and amdkfd) were using DEFINE_SRCU,
>> which does a DEFINE_PER_CPU on struct srcu_data, and that used
>> ____cacheline_internodealigned_in_smp.
>>
>>>
>>> in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if CONFIG_X86_VSMP is set.
>> ^increase
>>
>>>
>>> the value was caculated on linux 4.20.3, make allmodconfig all and the following oneliner:
>> ^calculated
>>
>>> for f in `find -name *.ko`; do echo $f; readelf -S $f |grep perc; done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r: "; objdump --syms --section=.data..percpu $r|grep data |sort -n |awk '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc
>>
>> Not sure how useful the one-liner is, versus a description of what you're
>> doing. i.e. "the size of all module percpu data sections, or something."
>>
>> Also, how close was that calculated value to 64K? If more modules start
>> using DEFINE_SRCU, each of which uses 8K, then that 64K might run out.
>>
>> Thanks,
>> Barret
>>
>>> Signed-off-by: Eial Czerwacki <[email protected]>
>>> Signed-off-by: Shai Fultheim <[email protected]>
>>> Signed-off-by: Oren Twaig <[email protected]>
>>> ---
>>> include/linux/percpu.h | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
>>> index 70b7123..6b79693 100644
>>> --- a/include/linux/percpu.h
>>> +++ b/include/linux/percpu.h
>>> @@ -14,7 +14,11 @@
>>> /* enough to cover all DEFINE_PER_CPUs in modules */
>>> #ifdef CONFIG_MODULES
>>> +#ifdef X86_VSMP
>>> +#define PERCPU_MODULE_RESERVE (1 << 16)
>>> +#else
>>> #define PERCPU_MODULE_RESERVE (8 << 10)
>>> +#endif
>>> #else
>>> #define PERCPU_MODULE_RESERVE 0
>>> #endif
>>>
>>
>
> Thanks for sending this to me.
>
> I must say, I really do not want to expand the reserved region. In most
> cases, it can easily end up unused and thus wasted memory as it is hard
> allocated on boot. This is done because code gen assumes static
> variables are close to the program counter. This would not be true with
> dynamic allocations which being at the end of the vmalloc area
> (Summarized from Tejun's account in [1]).
>
> Another note on the reserved region. It starts at the end of the static
> region which means it generally isn't page aligned. So while an 8kb
> allocation would fit, a 4kb alignment more than likely would fail.
> Something as large as 8kb should probably be dynamically allocated as
> well.
>
> I read through the bugzilla report and it seems that the culprits are:
> drivers/gpu/drm/amd/amdkfd/kfd_process.c:DEFINE_SRCU(kfd_processes_srcu);
> drivers/gpu/drm/drm_drv.c:DEFINE_STATIC_SRCU(drm_unplug_srcu);
>
> Is there a reason we cannot dynamically initialize these structs? I've
> cced Paul McKenney because we saw an issue with ipmi in December [1].

I looked at the AMD driver, and it looks like they could dynamically
initialize it. It would require a little extra plumbing. I imagine the
DRM one is the same way.

To catch this in the future, should we disallow DEFINE_SRCU in modules
or something? Otherwise, this will pop up again the next time someone
uses DEFINE_SRCU in a module and builds with CONFIG_X86_VSMP.

That might be a little much, and it still won't be sufficient to catch
all cases. This will also come up any time a module has a static
per-cpu data structure that uses __cacheline_aligned_in_smp, so it's not
limited to SRCU either.

I'm not familiar with VSMP - how bad is it to use L1 cache alignment
instead of 4K page alignment? Maybe some structures can use the smaller
alignment? Or maybe have VSMP require SRCU-using modules to be built-in?

Thanks,

Barret


Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

On Fri, 1 Mar 2019, Barret Rhoden wrote:

> I'm not familiar with VSMP - how bad is it to use L1 cache alignment instead
> of 4K page alignment? Maybe some structures can use the smaller alignment?
> Or maybe have VSMP require SRCU-using modules to be built-in?

It is very expensive. VMSP exchanges 4K segments via RDMA between servers
to build a large address space and run a kernel in the large address
space. Using smaller segments can cause a lot of
"cacheline" bouncing (meaning transfers of 4K segments back and forth
between servers).

2019-03-04 07:43:03

by Eial Czerwacki

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Greetings Barret,

On 3/1/19 8:30 PM, Barret Rhoden wrote:
> Hi -
>
> On 01/21/2019 06:47 AM, Eial Czerwacki wrote:
>>
>
> Your main issue was that you only sent this patch to LKML, but not the
> maintainers of the file.  If you don't, your patch might get lost.  To
> get the appropriate people and lists, run:
>
>     scripts/get_maintainer.pl YOUR_PATCH.patch.
>
> For this patch, you'll get this:
>
> Dennis Zhou <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
> Tejun Heo <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
> Christoph Lameter <[email protected]> (maintainer:PER-CPU MEMORY ALLOCATOR)
> [email protected] (open list)
>
> I added the three maintainers to this email.
>
> I have a few minor comments below.
>
thanks, I did not knew that, I'll use it next time.

>> [PATCH] percpu/module resevation: change resevation size iff X86_VSMP
> is set
>
> You misspelled 'reservation'.  Also, I'd just say: "percpu: increase
> module reservation size if X86_VSMP is set".  ('change' -> 'increase'),
> only says 'reservation' once.)
>
>> as reported in bug #201339
>> (https://bugzilla.kernel.org/show_bug.cgi?id=201339)
>
> I think you can add a tag for this right above your Signed-off-by tags.
> e.g.:
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201339
>
>> by enabling X86_VSMP, INTERNODE_CACHE_BYTES's definition differs from
>> the default one
>> causing the struct size to exceed the size ok 8KB.
>                                             ^of
>
will fix, thanks.

> Which struct are you talking about?  I have one in mind, but others
> might not know from reading the commit message.
>
> I ran into this on https://bugzilla.kernel.org/show_bug.cgi?id=202511.
> In that case, it was because modules (drm and amdkfd) were using
> DEFINE_SRCU, which does a DEFINE_PER_CPU on struct srcu_data, and that
> used ____cacheline_internodealigned_in_smp.
you are correct, the structure in question is struct srcu_data.

>
>>
>> in order to avoid such issue, increse PERCPU_MODULE_RESERVE to 64KB if
>> CONFIG_X86_VSMP is set.
>                                ^increase
>
>>
>> the value was caculated on linux 4.20.3, make allmodconfig all and the
>> following oneliner:
>                ^calculated
>
will fix, thanks.

>> for f in `find -name *.ko`; do echo $f; readelf -S $f  |grep perc;
>> done |grep data..percpu -B 1 |grep ko |while read r; do echo -n "$r:
>> "; objdump --syms --section=.data..percpu $r|grep data |sort -n  |awk
>> '{c++; d=strtonum("0x" $1) + strtonum("0x" $5); if (m < d) m = d;} END
>> {printf("%d vars-> last addr 0x%x ( %d )\n", c, m, m)}' ; done |column
>> -t |sort -k 8 -n | awk '{print $8}'| paste -sd+ | bc
>
> Not sure how useful the one-liner is, versus a description of what
> you're doing.  i.e. "the size of all module percpu data sections, or
> something."
I thought an easy reproducing will suffice, I'll take that into account.

>
> Also, how close was that calculated value to 64K?  If more modules start
> using DEFINE_SRCU, each of which uses 8K, then that 64K might run out.
the biggest module was 12472 bytes in size, as multiple modules uses the
same percpu, more is needed, the only way I was able to make it fit was 64K.

of course there is a possibility that at a specific scenario 64K will
not be enough but we have yet to encounter such scenario.

>
> Thanks,
> Barret
>
>> Signed-off-by: Eial Czerwacki <[email protected]>
>> Signed-off-by: Shai Fultheim <[email protected]>
>> Signed-off-by: Oren Twaig <[email protected]>
>> ---
>>   include/linux/percpu.h | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
>> index 70b7123..6b79693 100644
>> --- a/include/linux/percpu.h
>> +++ b/include/linux/percpu.h
>> @@ -14,7 +14,11 @@
>>     /* enough to cover all DEFINE_PER_CPUs in modules */
>>   #ifdef CONFIG_MODULES
>> +#ifdef X86_VSMP
>> +#define PERCPU_MODULE_RESERVE        (1 << 16)
>> +#else
>>   #define PERCPU_MODULE_RESERVE        (8 << 10)
>> +#endif
>>   #else
>>   #define PERCPU_MODULE_RESERVE        0
>>   #endif
>>
>
>


2019-03-13 19:41:03

by Barret Rhoden

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Hi -

On 03/01/2019 04:54 PM, Christopher Lameter wrote:
> On Fri, 1 Mar 2019, Barret Rhoden wrote:
>
>> I'm not familiar with VSMP - how bad is it to use L1 cache alignment instead
>> of 4K page alignment? Maybe some structures can use the smaller alignment?
>> Or maybe have VSMP require SRCU-using modules to be built-in?
>
> It is very expensive. VMSP exchanges 4K segments via RDMA between servers
> to build a large address space and run a kernel in the large address
> space. Using smaller segments can cause a lot of
> "cacheline" bouncing (meaning transfers of 4K segments back and forth
> between servers).
>

Given that these are large machines, would it be OK to statically
reserve 64K on them for modules' percpu data?

The bug that led me to here was from someone running on a non-VSMP
machine but had that config set. Perhaps we make it more clear in the
Kconfig option to not set it on other machines. That might make it less
likely anyone on a non-VSMP machine pays the 64K overhead.

Are there any other alternatives? Not using static SRCU in any code
that could be built as a module seems a little harsh.

Thanks,

Barret


2019-03-13 20:29:04

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Hello,

On Wed, Mar 13, 2019 at 03:40:04PM -0400, Barret Rhoden wrote:
> Are there any other alternatives? Not using static SRCU in any code
> that could be built as a module seems a little harsh.

Yes, allocate the srcu dynamically on module init and destroy on
module exit. That's how the other similar case got solved too. We
can't keep bumping up reserved size by the number of static SRCUs in
modules. It's mostly there to make trivial small things easier. We
don't lose anything meaningful by allocating srcu dynamically.

Thanks.

--
tejun

2019-03-13 21:23:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

On Wed, Mar 13, 2019 at 01:26:40PM -0700, Tejun Heo wrote:
> Hello,
>
> On Wed, Mar 13, 2019 at 03:40:04PM -0400, Barret Rhoden wrote:
> > Are there any other alternatives? Not using static SRCU in any code
> > that could be built as a module seems a little harsh.
>
> Yes, allocate the srcu dynamically on module init and destroy on
> module exit. That's how the other similar case got solved too. We
> can't keep bumping up reserved size by the number of static SRCUs in
> modules. It's mostly there to make trivial small things easier. We
> don't lose anything meaningful by allocating srcu dynamically.

Should I define DEFINE_SRCU() and DEFINE_STATIC_SRCU() only if
!defined(MODULE)?

Untested (probably doesn't even build) patch below.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 7f7c8c050f63..a979da9cf71f 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -105,6 +105,8 @@ struct srcu_struct {
* Define and initialize a srcu struct at build time.
* Do -not- call init_srcu_struct() nor cleanup_srcu_struct() on it.
*
+ * Build-time srcu_struct definition is not allowed in modules.
+ *
* Note that although DEFINE_STATIC_SRCU() hides the name from other
* files, the per-CPU variable rules nevertheless require that the
* chosen name be globally unique. These rules also prohibit use of
@@ -120,11 +122,13 @@ struct srcu_struct {
*
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
-#define __DEFINE_SRCU(name, is_static) \
- static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data);\
+#ifndef MODULE
+# define __DEFINE_SRCU(name, is_static) \
+ static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name##_srcu_data)
-#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
-#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
+# define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
+# define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
+#endif

void synchronize_srcu_expedited(struct srcu_struct *ssp);
void srcu_barrier(struct srcu_struct *ssp);
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 5ff797fd3715..7cf1e3aed695 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -496,9 +496,18 @@ static struct rcu_torture_ops rcu_busted_ops = {
* Definitions for srcu torture testing.
*/

-DEFINE_STATIC_SRCU(srcu_ctl);
static struct srcu_struct srcu_ctld;
-static struct srcu_struct *srcu_ctlp = &srcu_ctl;
+static struct srcu_struct *srcu_ctlp;
+
+#ifndef MODULE
+DEFINE_STATIC_SRCU(srcu_ctl);
+
+static void srcu_torture_init(void)
+{
+ rcu_sync_torture_init();
+ srcu_ctlp = &srcu_ctl;
+}
+#endif

static int srcu_torture_read_lock(void) __acquires(srcu_ctlp)
{
@@ -565,9 +574,10 @@ static void srcu_torture_synchronize_expedited(void)
synchronize_srcu_expedited(srcu_ctlp);
}

+#ifndef MODULE
static struct rcu_torture_ops srcu_ops = {
.ttype = SRCU_FLAVOR,
- .init = rcu_sync_torture_init,
+ .init = srcu_torture_init,
.readlock = srcu_torture_read_lock,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
@@ -581,25 +591,25 @@ static struct rcu_torture_ops srcu_ops = {
.irq_capable = 1,
.name = "srcu"
};
+#endif

-static void srcu_torture_init(void)
+static void srcud_torture_init(void)
{
rcu_sync_torture_init();
WARN_ON(init_srcu_struct(&srcu_ctld));
srcu_ctlp = &srcu_ctld;
}

-static void srcu_torture_cleanup(void)
+static void srcud_torture_cleanup(void)
{
cleanup_srcu_struct(&srcu_ctld);
- srcu_ctlp = &srcu_ctl; /* In case of a later rcutorture run. */
}

/* As above, but dynamically allocated. */
static struct rcu_torture_ops srcud_ops = {
.ttype = SRCU_FLAVOR,
- .init = srcu_torture_init,
- .cleanup = srcu_torture_cleanup,
+ .init = srcud_torture_init,
+ .cleanup = srcud_torture_cleanup,
.readlock = srcu_torture_read_lock,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
@@ -617,8 +627,8 @@ static struct rcu_torture_ops srcud_ops = {
/* As above, but broken due to inappropriate reader extension. */
static struct rcu_torture_ops busted_srcud_ops = {
.ttype = SRCU_FLAVOR,
- .init = srcu_torture_init,
- .cleanup = srcu_torture_cleanup,
+ .init = srcud_torture_init,
+ .cleanup = srcud_torture_cleanup,
.readlock = srcu_torture_read_lock,
.read_delay = rcu_read_delay,
.readunlock = srcu_torture_read_unlock,


2019-03-13 21:29:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Hello,

On Wed, Mar 13, 2019 at 02:22:55PM -0700, Paul E. McKenney wrote:
> Should I define DEFINE_SRCU() and DEFINE_STATIC_SRCU() only if
> !defined(MODULE)?

Yeah, that sounds like a great idea with comments explaining why it's
like that.

Thanks.

--
tejun

2019-03-13 23:14:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

On Wed, Mar 13, 2019 at 02:29:12PM -0700, Tejun Heo wrote:
> Hello,
>
> On Wed, Mar 13, 2019 at 02:22:55PM -0700, Paul E. McKenney wrote:
> > Should I define DEFINE_SRCU() and DEFINE_STATIC_SRCU() only if
> > !defined(MODULE)?
>
> Yeah, that sounds like a great idea with comments explaining why it's
> like that.

Like this?

* Build-time srcu_struct definition is not allowed in modules because
* otherwise it is necessary to increase the size of the reserved region
* each time a DEFINE_SRCU() or DEFINE_STATIC_SRCU() are added to a
* kernel module. Kernel modules should instead declare an srcu_struct
* and then invoke init_srcu_struct() from their module_init function and
* cleanup_srcu_struct() from their module_exit function. Note that modules
* using call_srcu() will also need to invoke srcu_barrier() from their
* module_exit function.

Also, it looks like Barret beat me to this suggestion. ;-)

In addition, rcutorture and rcuperf needed to be updated because
they used to use DEFINE_STATIC_STRUCT() whether built in or built
as a loadable module.

How does the (very lightly tested) patch below look to you all?

Thanx, Paul

------------------------------------------------------------------------

commit 34f67df09cc0c6bf082a7cfca435373caeeb8d82
Author: Paul E. McKenney <[email protected]>
Date: Wed Mar 13 16:06:22 2019 -0700

srcu: Forbid DEFINE{,_STATIC}_SRCU() from modules

Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module
requires that the size of the reserved region be increased, which is
not something we want to be doing all that often. Instead, loadable
modules should define an srcu_struct and invoke init_srcu_struct()
from their module_init function and cleanup_srcu_struct() from their
module_exit function. Note that modules using call_srcu() will also
need to invoke srcu_barrier() from their module_exit function.

This commit enforces this advice by refusing to define DEFINE_SRCU()
and DEFINE_STATIC_SRCU() within loadable modules.

Suggested-by: Barret Rhoden <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 7f7c8c050f63..ac5ea1c72e97 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -105,6 +105,15 @@ struct srcu_struct {
* Define and initialize a srcu struct at build time.
* Do -not- call init_srcu_struct() nor cleanup_srcu_struct() on it.
*
+ * Build-time srcu_struct definition is not allowed in modules because
+ * otherwise it is necessary to increase the size of the reserved region
+ * each time a DEFINE_SRCU() or DEFINE_STATIC_SRCU() are added to a
+ * kernel module. Kernel modules should instead declare an srcu_struct
+ * and then invoke init_srcu_struct() from their module_init function and
+ * cleanup_srcu_struct() from their module_exit function. Note that modules
+ * using call_srcu() will also need to invoke srcu_barrier() from their
+ * module_exit function.
+ *
* Note that although DEFINE_STATIC_SRCU() hides the name from other
* files, the per-CPU variable rules nevertheless require that the
* chosen name be globally unique. These rules also prohibit use of
@@ -120,11 +129,13 @@ struct srcu_struct {
*
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
-#define __DEFINE_SRCU(name, is_static) \
- static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data);\
+#ifndef MODULE
+# define __DEFINE_SRCU(name, is_static) \
+ static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name##_srcu_data)
-#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
-#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
+# define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
+# define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
+#endif

void synchronize_srcu_expedited(struct srcu_struct *ssp);
void srcu_barrier(struct srcu_struct *ssp);
diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c
index c29761152874..b44208b3bf95 100644
--- a/kernel/rcu/rcuperf.c
+++ b/kernel/rcu/rcuperf.c
@@ -139,6 +139,7 @@ struct rcu_perf_ops {
void (*sync)(void);
void (*exp_sync)(void);
const char *name;
+ const char *altname;
};

static struct rcu_perf_ops *cur_ops;
@@ -186,8 +187,16 @@ static struct rcu_perf_ops rcu_ops = {
* Definitions for srcu perf testing.
*/

+static struct srcu_struct *srcu_ctlp;
+
+#ifndef MODULE
DEFINE_STATIC_SRCU(srcu_ctl_perf);
-static struct srcu_struct *srcu_ctlp = &srcu_ctl_perf;
+
+static void srcu_sync_perf_init(void)
+{
+ srcu_ctlp = &srcu_ctl_perf
+}
+#endif

static int srcu_perf_read_lock(void) __acquires(srcu_ctlp)
{
@@ -224,9 +233,10 @@ static void srcu_perf_synchronize_expedited(void)
synchronize_srcu_expedited(srcu_ctlp);
}

+#ifndef MODULE
static struct rcu_perf_ops srcu_ops = {
.ptype = SRCU_FLAVOR,
- .init = rcu_sync_perf_init,
+ .init = srcu_sync_perf_init,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
.get_gp_seq = srcu_perf_completed,
@@ -238,24 +248,25 @@ static struct rcu_perf_ops srcu_ops = {
.exp_sync = srcu_perf_synchronize_expedited,
.name = "srcu"
};
+#endif

static struct srcu_struct srcud;

-static void srcu_sync_perf_init(void)
+static void srcud_sync_perf_init(void)
{
srcu_ctlp = &srcud;
init_srcu_struct(srcu_ctlp);
}

-static void srcu_sync_perf_cleanup(void)
+static void srcud_sync_perf_cleanup(void)
{
cleanup_srcu_struct(srcu_ctlp);
}

static struct rcu_perf_ops srcud_ops = {
.ptype = SRCU_FLAVOR,
- .init = srcu_sync_perf_init,
- .cleanup = srcu_sync_perf_cleanup,
+ .init = srcud_sync_perf_init,
+ .cleanup = srcud_sync_perf_cleanup,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
.get_gp_seq = srcu_perf_completed,
@@ -265,7 +276,10 @@ static struct rcu_perf_ops srcud_ops = {
.gp_barrier = srcu_rcu_barrier,
.sync = srcu_perf_synchronize,
.exp_sync = srcu_perf_synchronize_expedited,
- .name = "srcud"
+ .name = "srcud",
+#ifndef MODULE
+ .altname = "srcu" /* Avoid breaking kbuild test robot. */
+#endif
};

/*
@@ -594,7 +608,11 @@ rcu_perf_init(void)
long i;
int firsterr = 0;
static struct rcu_perf_ops *perf_ops[] = {
- &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops,
+ &rcu_ops,
+#ifndef MODULE
+ &srcu_ops,
+#endif
+ &srcud_ops, &tasks_ops,
};

if (!torture_init_begin(perf_type, verbose))
@@ -605,6 +623,11 @@ rcu_perf_init(void)
cur_ops = perf_ops[i];
if (strcmp(perf_type, cur_ops->name) == 0)
break;
+ if (cur_ops->altname &&
+ strcmp(perf_type, cur_ops->altname) == 0) {
+ pr_alert("rcu-perf: substituting perf type: \"%s\" for \"%s\"\n", cur_ops->name, perf_type);
+ break;
+ }
}
if (i == ARRAY_SIZE(perf_ops)) {
pr_alert("rcu-perf: invalid perf type: \"%s\"\n", perf_type);
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 5ff797fd3715..e4674c550b0f 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -300,6 +300,7 @@ struct rcu_torture_ops {
int can_boost;
int extendables;
const char *name;
+ const char *altname;
};

static struct rcu_torture_ops *cur_ops;
@@ -496,9 +497,18 @@ static struct rcu_torture_ops rcu_busted_ops = {
* Definitions for srcu torture testing.
*/

-DEFINE_STATIC_SRCU(srcu_ctl);
static struct srcu_struct srcu_ctld;
-static struct srcu_struct *srcu_ctlp = &srcu_ctl;
+static struct srcu_struct *srcu_ctlp;
+
+#ifndef MODULE
+DEFINE_STATIC_SRCU(srcu_ctl);
+
+static void srcu_torture_init(void)
+{
+ rcu_sync_torture_init();
+ srcu_ctlp = &srcu_ctl;
+}
+#endif

static int srcu_torture_read_lock(void) __acquires(srcu_ctlp)
{
@@ -565,9 +575,10 @@ static void srcu_torture_synchronize_expedited(void)
synchronize_srcu_expedited(srcu_ctlp);
}

+#ifndef MODULE
static struct rcu_torture_ops srcu_ops = {
.ttype = SRCU_FLAVOR,
- .init = rcu_sync_torture_init,
+ .init = srcu_torture_init,
.readlock = srcu_torture_read_lock,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
@@ -581,25 +592,25 @@ static struct rcu_torture_ops srcu_ops = {
.irq_capable = 1,
.name = "srcu"
};
+#endif

-static void srcu_torture_init(void)
+static void srcud_torture_init(void)
{
rcu_sync_torture_init();
WARN_ON(init_srcu_struct(&srcu_ctld));
srcu_ctlp = &srcu_ctld;
}

-static void srcu_torture_cleanup(void)
+static void srcud_torture_cleanup(void)
{
cleanup_srcu_struct(&srcu_ctld);
- srcu_ctlp = &srcu_ctl; /* In case of a later rcutorture run. */
}

/* As above, but dynamically allocated. */
static struct rcu_torture_ops srcud_ops = {
.ttype = SRCU_FLAVOR,
- .init = srcu_torture_init,
- .cleanup = srcu_torture_cleanup,
+ .init = srcud_torture_init,
+ .cleanup = srcud_torture_cleanup,
.readlock = srcu_torture_read_lock,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
@@ -611,14 +622,17 @@ static struct rcu_torture_ops srcud_ops = {
.cb_barrier = srcu_torture_barrier,
.stats = srcu_torture_stats,
.irq_capable = 1,
- .name = "srcud"
+ .name = "srcud",
+#ifndef MODULE
+ .altname = "srcu" /* Avoid breaking kbuild test robot. */
+#endif
};

/* As above, but broken due to inappropriate reader extension. */
static struct rcu_torture_ops busted_srcud_ops = {
.ttype = SRCU_FLAVOR,
- .init = srcu_torture_init,
- .cleanup = srcu_torture_cleanup,
+ .init = srcud_torture_init,
+ .cleanup = srcud_torture_cleanup,
.readlock = srcu_torture_read_lock,
.read_delay = rcu_read_delay,
.readunlock = srcu_torture_read_unlock,
@@ -2235,7 +2249,11 @@ rcu_torture_init(void)
int cpu;
int firsterr = 0;
static struct rcu_torture_ops *torture_ops[] = {
- &rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
+ &rcu_ops, &rcu_busted_ops,
+#ifndef MODULE
+ &srcu_ops,
+#endif
+ &srcud_ops,
&busted_srcud_ops, &tasks_ops,
};

@@ -2247,6 +2265,11 @@ rcu_torture_init(void)
cur_ops = torture_ops[i];
if (strcmp(torture_type, cur_ops->name) == 0)
break;
+ if (cur_ops->altname &&
+ strcmp(torture_type, cur_ops->altname) == 0) {
+ pr_alert("rcu-torture: substituting torture type: \"%s\" for \"%s\"\n", cur_ops->name, torture_type);
+ break;
+ }
}
if (i == ARRAY_SIZE(torture_ops)) {
pr_alert("rcu-torture: invalid torture type: \"%s\"\n",


2019-03-14 17:38:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

On Wed, Mar 13, 2019 at 04:11:55PM -0700, Paul E. McKenney wrote:
> commit 34f67df09cc0c6bf082a7cfca435373caeeb8d82
> Author: Paul E. McKenney <[email protected]>
> Date: Wed Mar 13 16:06:22 2019 -0700
>
> srcu: Forbid DEFINE{,_STATIC}_SRCU() from modules
>
> Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module
> requires that the size of the reserved region be increased, which is
> not something we want to be doing all that often. Instead, loadable
> modules should define an srcu_struct and invoke init_srcu_struct()
> from their module_init function and cleanup_srcu_struct() from their
> module_exit function. Note that modules using call_srcu() will also
> need to invoke srcu_barrier() from their module_exit function.
>
> This commit enforces this advice by refusing to define DEFINE_SRCU()
> and DEFINE_STATIC_SRCU() within loadable modules.
>
> Suggested-by: Barret Rhoden <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

Looks-great-to-me-by: Tejun Heo <[email protected]>

Thanks. :)

--
tejun

2019-03-14 22:19:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

On Thu, Mar 14, 2019 at 10:36:19AM -0700, Tejun Heo wrote:
> On Wed, Mar 13, 2019 at 04:11:55PM -0700, Paul E. McKenney wrote:
> > commit 34f67df09cc0c6bf082a7cfca435373caeeb8d82
> > Author: Paul E. McKenney <[email protected]>
> > Date: Wed Mar 13 16:06:22 2019 -0700
> >
> > srcu: Forbid DEFINE{,_STATIC}_SRCU() from modules
> >
> > Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module
> > requires that the size of the reserved region be increased, which is
> > not something we want to be doing all that often. Instead, loadable
> > modules should define an srcu_struct and invoke init_srcu_struct()
> > from their module_init function and cleanup_srcu_struct() from their
> > module_exit function. Note that modules using call_srcu() will also
> > need to invoke srcu_barrier() from their module_exit function.
> >
> > This commit enforces this advice by refusing to define DEFINE_SRCU()
> > and DEFINE_STATIC_SRCU() within loadable modules.
> >
> > Suggested-by: Barret Rhoden <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> Looks-great-to-me-by: Tejun Heo <[email protected]>

Applied. ;-)

Thanx, Paul

> Thanks. :)
>
> --
> tejun
>


2019-03-18 08:19:45

by Eial Czerwacki

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

Greetings Paul,

On 3/15/19 12:19 AM, Paul E. McKenney wrote:
> On Thu, Mar 14, 2019 at 10:36:19AM -0700, Tejun Heo wrote:
>> On Wed, Mar 13, 2019 at 04:11:55PM -0700, Paul E. McKenney wrote:
>>> commit 34f67df09cc0c6bf082a7cfca435373caeeb8d82
>>> Author: Paul E. McKenney <[email protected]>
>>> Date: Wed Mar 13 16:06:22 2019 -0700
>>>
>>> srcu: Forbid DEFINE{,_STATIC}_SRCU() from modules
>>>
>>> Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module
>>> requires that the size of the reserved region be increased, which is
>>> not something we want to be doing all that often. Instead, loadable
>>> modules should define an srcu_struct and invoke init_srcu_struct()
>>> from their module_init function and cleanup_srcu_struct() from their
>>> module_exit function. Note that modules using call_srcu() will also
>>> need to invoke srcu_barrier() from their module_exit function.
>>>
>>> This commit enforces this advice by refusing to define DEFINE_SRCU()
>>> and DEFINE_STATIC_SRCU() within loadable modules.
>>>
>>> Suggested-by: Barret Rhoden <[email protected]>
>>> Signed-off-by: Paul E. McKenney <[email protected]>
>>
>> Looks-great-to-me-by: Tejun Heo <[email protected]>
>
> Applied. ;-)
>
> Thanx, Paul
>
>> Thanks. :)
>>
>> --
>> tejun
>>
>
>

when can this patch be found in the kernel mainline git repo? I'd like
to test and see if the patch that started this mail thread still occurs.

Thanks,

Eial.

2019-03-18 14:29:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

On Mon, Mar 18, 2019 at 10:18:48AM +0200, Eial Czerwacki wrote:
> Greetings Paul,
>
> On 3/15/19 12:19 AM, Paul E. McKenney wrote:
> > On Thu, Mar 14, 2019 at 10:36:19AM -0700, Tejun Heo wrote:
> >> On Wed, Mar 13, 2019 at 04:11:55PM -0700, Paul E. McKenney wrote:
> >>> commit 34f67df09cc0c6bf082a7cfca435373caeeb8d82
> >>> Author: Paul E. McKenney <[email protected]>
> >>> Date: Wed Mar 13 16:06:22 2019 -0700
> >>>
> >>> srcu: Forbid DEFINE{,_STATIC}_SRCU() from modules
> >>>
> >>> Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module
> >>> requires that the size of the reserved region be increased, which is
> >>> not something we want to be doing all that often. Instead, loadable
> >>> modules should define an srcu_struct and invoke init_srcu_struct()
> >>> from their module_init function and cleanup_srcu_struct() from their
> >>> module_exit function. Note that modules using call_srcu() will also
> >>> need to invoke srcu_barrier() from their module_exit function.
> >>>
> >>> This commit enforces this advice by refusing to define DEFINE_SRCU()
> >>> and DEFINE_STATIC_SRCU() within loadable modules.
> >>>
> >>> Suggested-by: Barret Rhoden <[email protected]>
> >>> Signed-off-by: Paul E. McKenney <[email protected]>
> >>
> >> Looks-great-to-me-by: Tejun Heo <[email protected]>
> >
> > Applied. ;-)
> >
> > Thanx, Paul
> >
> >> Thanks. :)
> >>
> >> --
> >> tejun
> >>
> >
> >
>
> when can this patch be found in the kernel mainline git repo? I'd like
> to test and see if the patch that started this mail thread still occurs.

Thank you for your interest!

It is a3f5f4fae725 ("srcu: Forbid DEFINE{,_STATIC}_SRCU() from modules")
in my -rcu tree. If all goes well, I will submit it to the v5.2 merge
window. I do not expect it to be submitted to -stable.

And -rcu is here:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

Thanx, Paul


Subject: Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

On Wed, 13 Mar 2019, Barret Rhoden wrote:

> > It is very expensive. VMSP exchanges 4K segments via RDMA between servers
> > to build a large address space and run a kernel in the large address
> > space. Using smaller segments can cause a lot of
> > "cacheline" bouncing (meaning transfers of 4K segments back and forth
> > between servers).
> >
>
> Given that these are large machines, would it be OK to statically reserve 64K
> on them for modules' percpu data?

Likely.

> The bug that led me to here was from someone running on a non-VSMP machine but
> had that config set. Perhaps we make it more clear in the Kconfig option to
> not set it on other machines. That might make it less likely anyone on a
> non-VSMP machine pays the 64K overhead.

Right.

> Are there any other alternatives? Not using static SRCU in any code that
> could be built as a module seems a little harsh.

Sorry this ended up in my spam folder somehow. Just fished it out.