2008-06-04 14:58:26

by Mike Travis

[permalink] [raw]
Subject: Re: [patch 02/41] cpu alloc: The allocator

Christoph Lameter wrote:
> On Fri, 30 May 2008, Eric Dumazet wrote:
>
>>> +static DEFINE_PER_CPU(UNIT_TYPE, area[UNITS]);
>>>
>> area[] is not guaranteed to be aligned on anything but 4 bytes.
>>
>> If someone then needs to call cpu_alloc(8, GFP_KERNEL, 8), it might get an non
>> aligned result.
>>
>> Either you should add an __attribute__((__aligned__(PAGE_SIZE))),
>> or take into account the real address of area[] in cpu_alloc() to avoid waste
>> of up to PAGE_SIZE bytes
>> per cpu.
>
> I think cacheline aligning should be sufficient. People should not
> allocate large page aligned objects here.

I'm a bit confused. Why is DEFINE_PER_CPU_SHARED_ALIGNED() conditioned on
ifdef MODULE?

#ifdef MODULE
#define SHARED_ALIGNED_SECTION ".data.percpu"
#else
#define SHARED_ALIGNED_SECTION ".data.percpu.shared_aligned"
#endif

#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name) \
__attribute__((__section__(SHARED_ALIGNED_SECTION))) \
PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name \
____cacheline_aligned_in_smp

Thanks,
Mike


2008-06-04 15:11:32

by Eric Dumazet

[permalink] [raw]
Subject: Re: [patch 02/41] cpu alloc: The allocator

Mike Travis a ?crit :
> Christoph Lameter wrote:
>> On Fri, 30 May 2008, Eric Dumazet wrote:
>>
>>>> +static DEFINE_PER_CPU(UNIT_TYPE, area[UNITS]);
>>>>
>>> area[] is not guaranteed to be aligned on anything but 4 bytes.
>>>
>>> If someone then needs to call cpu_alloc(8, GFP_KERNEL, 8), it might get an non
>>> aligned result.
>>>
>>> Either you should add an __attribute__((__aligned__(PAGE_SIZE))),
>>> or take into account the real address of area[] in cpu_alloc() to avoid waste
>>> of up to PAGE_SIZE bytes
>>> per cpu.
>> I think cacheline aligning should be sufficient. People should not
>> allocate large page aligned objects here.
>
> I'm a bit confused. Why is DEFINE_PER_CPU_SHARED_ALIGNED() conditioned on
> ifdef MODULE?
>
> #ifdef MODULE
> #define SHARED_ALIGNED_SECTION ".data.percpu"
> #else
> #define SHARED_ALIGNED_SECTION ".data.percpu.shared_aligned"
> #endif
>
> #define DEFINE_PER_CPU_SHARED_ALIGNED(type, name) \
> __attribute__((__section__(SHARED_ALIGNED_SECTION))) \
> PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name \
> ____cacheline_aligned_in_smp
>
> Thanks,
> Mike
>
>

Because we had crashes when loading oprofile module, when a previous version of oprofile
used to use DEFINE_PER_CPU_SHARED_ALIGNED variable

module loader only takes into account the special section ".data.percpu" and ignores ".data.percpu.shared_aligned"

I therefore submitted two patches :

1) commit 8b8b498836942c0c855333d357d121c0adeefbd9
oprofile: don't request cache line alignment for cpu_buffer

Alignment was previously requested because cpu_buffer was an [NR_CPUS]
array, to avoid cache line sharing between CPUS.

After commit 608dfddd845da5ab6accef70154c8910529699f7 (oprofile: change
cpu_buffer from array to per_cpu variable ), we dont need to force an
alignement anymore since cpu_buffer sits in per_cpu zone.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Mike Travis <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>


2) and commit 44c81433e8b05dbc85985d939046f10f95901184
per_cpu: fix DEFINE_PER_CPU_SHARED_ALIGNED for modules

Current module loader lookups ".data.percpu" ELF section to perform
per_cpu relocation. But DEFINE_PER_CPU_SHARED_ALIGNED() uses another
section (".data.percpu.shared_aligned"), currently only handled in
vmlinux.lds, not by module loader.

To correct this problem, instead of adding logic into module loader, or
using at build time a module.lds file for all arches to group
".data.percpu.shared_aligned" into ".data.percpu", just use ".data.percpu"
for modules.

Alignment requirements are correctly handled by ld and module loader.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Fenghua Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>


2008-06-06 00:33:16

by Rusty Russell

[permalink] [raw]
Subject: Re: [patch 02/41] cpu alloc: The allocator

On Thursday 05 June 2008 01:11:00 Eric Dumazet wrote:
> Mike Travis a ?crit :
> > I'm a bit confused. Why is DEFINE_PER_CPU_SHARED_ALIGNED() conditioned
> > on ifdef MODULE?
> Because we had crashes when loading oprofile module, when a previous
> version of oprofile used to use DEFINE_PER_CPU_SHARED_ALIGNED variable
>
> module loader only takes into account the special section ".data.percpu"
> and ignores ".data.percpu.shared_aligned"
>
> I therefore submitted two patches :

Put one way, putting page-aligned per-cpu data in a separate section is a
space-saving hack: one which is not really required for modules because of
the low frequency of such variables. Put another way, not respecting
the .data.percpu.shared_aligned section in modules is a bug.

But a comment would probably be nice!

Cheers,
Rusty.

2008-06-10 17:33:40

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 02/41] cpu alloc: The allocator

On Wed, 4 Jun 2008, Mike Travis wrote:

> I'm a bit confused. Why is DEFINE_PER_CPU_SHARED_ALIGNED() conditioned on
> ifdef MODULE?
>
> #ifdef MODULE
> #define SHARED_ALIGNED_SECTION ".data.percpu"
> #else
> #define SHARED_ALIGNED_SECTION ".data.percpu.shared_aligned"
> #endif
>
> #define DEFINE_PER_CPU_SHARED_ALIGNED(type, name) \
> __attribute__((__section__(SHARED_ALIGNED_SECTION))) \
> PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name \
> ____cacheline_aligned_in_smp

Looks wrong to me. There can be shared objects even without modules.

2008-06-10 18:05:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: [patch 02/41] cpu alloc: The allocator

Christoph Lameter a ?crit :
> On Wed, 4 Jun 2008, Mike Travis wrote:
>
>> I'm a bit confused. Why is DEFINE_PER_CPU_SHARED_ALIGNED() conditioned on
>> ifdef MODULE?
>>
>> #ifdef MODULE
>> #define SHARED_ALIGNED_SECTION ".data.percpu"
>> #else
>> #define SHARED_ALIGNED_SECTION ".data.percpu.shared_aligned"
>> #endif
>>
>> #define DEFINE_PER_CPU_SHARED_ALIGNED(type, name) \
>> __attribute__((__section__(SHARED_ALIGNED_SECTION))) \
>> PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name \
>> ____cacheline_aligned_in_smp
>
> Looks wrong to me. There can be shared objects even without modules.
>
>

Well, MODULE is not CONFIG_MODULES :)

If compiling an object that is going to be statically linked to kernel,
MODULE is not defined, so we have shared objects.

When compiling a module, we cannot *yet* use .data.percpu.shared_aligned
section, since module loader wont handle this section.

Alternative is to change modules linking for all arches to merge
.data.percpu{*} subsections correctly, or tell module loader to take
into account all .data.percpu sections.

AFAIK no module uses DEFINE_PER_CPU_SHARED_ALIGNED() yet...


2008-06-10 18:28:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 02/41] cpu alloc: The allocator

On Tue, 10 Jun 2008, Eric Dumazet wrote:

> Well, MODULE is not CONFIG_MODULES :)
>
> If compiling an object that is going to be statically linked to kernel, MODULE
> is not defined, so we have shared objects.
>
> When compiling a module, we cannot *yet* use .data.percpu.shared_aligned
> section, since module loader wont handle this section.
>
> Alternative is to change modules linking for all arches to merge
> .data.percpu{*} subsections correctly, or tell module loader to take into
> account all .data.percpu sections.
>
> AFAIK no module uses DEFINE_PER_CPU_SHARED_ALIGNED() yet...

Ahhh. Makes sense. Add a comment to explain this?