2010-12-02 02:03:30

by Shaohua Li

[permalink] [raw]
Subject: [patch 2/3] add new macros to make percpu readmostly section correctly align

percpu readmostly section should start and end at address cachline aligned.
Idealy we should change PERCPU_VADDR/PERCPU, but I can't change all arch code, so
I add new macros for x86.

Signed-off-by: Shaohua Li <[email protected]>

---
include/asm-generic/vmlinux.lds.h | 66 ++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)

Index: linux/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux.orig/include/asm-generic/vmlinux.lds.h 2010-12-02 09:22:32.000000000 +0800
+++ linux/include/asm-generic/vmlinux.lds.h 2010-12-02 09:32:42.000000000 +0800
@@ -726,6 +726,72 @@
VMLINUX_SYMBOL(__per_cpu_end) = .; \
}

+/**
+ * PERCPU_VADDR_CACHEALIGNED - define output section for percpu area
+ * @vaddr: explicit base address (optional)
+ * @phdr: destination PHDR (optional)
+ * @cacheline: cachline size required by readmostly percpu data
+ *
+ * Macro which expands to output section for percpu area. If @vaddr
+ * is not blank, it specifies explicit base address and all percpu
+ * symbols will be offset from the given address. If blank, @vaddr
+ * always equals @laddr + LOAD_OFFSET.
+ *
+ * @phdr defines the output PHDR to use if not blank. Be warned that
+ * output PHDR is sticky. If @phdr is specified, the next output
+ * section in the linker script will go there too. @phdr should have
+ * a leading colon.
+ *
+ * Note that this macros defines __per_cpu_load as an absolute symbol.
+ * If there is no need to put the percpu section at a predetermined
+ * address, use PERCPU_CACHEALIGNED().
+ */
+#define PERCPU_VADDR_CACHEALIGNED(vaddr, phdr, cacheline) \
+ VMLINUX_SYMBOL(__per_cpu_load) = .; \
+ .data..percpu vaddr : AT(VMLINUX_SYMBOL(__per_cpu_load) \
+ - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__per_cpu_start) = .; \
+ *(.data..percpu..first) \
+ . = ALIGN(PAGE_SIZE); \
+ *(.data..percpu..page_aligned) \
+ . = ALIGN(cacheline); \
+ *(.data..percpu..readmostly) \
+ . = ALIGN(cacheline); \
+ *(.data..percpu) \
+ *(.data..percpu..shared_aligned) \
+ VMLINUX_SYMBOL(__per_cpu_end) = .; \
+ } phdr \
+ . = VMLINUX_SYMBOL(__per_cpu_load) + SIZEOF(.data..percpu);
+
+/**
+ * PERCPU_CACHEALIGNED - define output section for percpu area, simple version
+ * @align: required alignment
+ * @cacheline: cachline size required by readmostly percpu data
+ *
+ * Align to @align and outputs output section for percpu area. This
+ * macro doesn't maniuplate @vaddr or @phdr and __per_cpu_load and
+ * __per_cpu_start will be identical.
+ *
+ * This macro is equivalent to ALIGN(align); PERCPU_VADDR_CACHEALIGNED( , ) except
+ * that __per_cpu_load is defined as a relative symbol against
+ * .data..percpu which is required for relocatable x86_32
+ * configuration.
+ */
+#define PERCPU_CACHEALIGNED(align, cacheline) \
+ . = ALIGN(align); \
+ .data..percpu : AT(ADDR(.data..percpu) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__per_cpu_load) = .; \
+ VMLINUX_SYMBOL(__per_cpu_start) = .; \
+ *(.data..percpu..first) \
+ . = ALIGN(PAGE_SIZE); \
+ *(.data..percpu..page_aligned) \
+ . = ALIGN(cacheline); \
+ *(.data..percpu..readmostly) \
+ . = ALIGN(cacheline); \
+ *(.data..percpu) \
+ *(.data..percpu..shared_aligned) \
+ VMLINUX_SYMBOL(__per_cpu_end) = .; \
+ }

/*
* Definition of the high level *_SECTION macros


2010-12-10 15:14:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello,

On 12/02/2010 03:02 AM, Shaohua Li wrote:
> percpu readmostly section should start and end at address cachline aligned.
> Idealy we should change PERCPU_VADDR/PERCPU, but I can't change all arch code, so
> I add new macros for x86.
>
> Signed-off-by: Shaohua Li <[email protected]>

Why add another set of macros? Can't you just make the default one
aligned?

--
tejun

2010-12-13 00:41:31

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Fri, 2010-12-10 at 23:14 +0800, Tejun Heo wrote:
> Hello,
>
> On 12/02/2010 03:02 AM, Shaohua Li wrote:
> > percpu readmostly section should start and end at address cachline aligned.
> > Idealy we should change PERCPU_VADDR/PERCPU, but I can't change all arch code, so
> > I add new macros for x86.
> >
> > Signed-off-by: Shaohua Li <[email protected]>
>
> Why add another set of macros? Can't you just make the default one
> aligned?
how to do it without changing all arch code? There isn't cache line size
macro for all arch which could be used in vmlinux.lds.h, IIRC.

Thanks,
Shaohua

2010-12-13 09:46:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello,

On 12/13/2010 01:41 AM, Shaohua Li wrote:
>> Why add another set of macros? Can't you just make the default one
>> aligned?
> how to do it without changing all arch code? There isn't cache line size
> macro for all arch which could be used in vmlinux.lds.h, IIRC.

Can't we get around that with some ifdefferies? And even if we'll
have to change lds on every arch, I think we better do it that way.
It's not like the problem exists only on x86, right?

Thanks.

--
tejun

2010-12-14 01:08:48

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Mon, 2010-12-13 at 17:47 +0800, Tejun Heo wrote:
> Hello,
>
> On 12/13/2010 01:41 AM, Shaohua Li wrote:
> >> Why add another set of macros? Can't you just make the default one
> >> aligned?
> > how to do it without changing all arch code? There isn't cache line size
> > macro for all arch which could be used in vmlinux.lds.h, IIRC.
>
> Can't we get around that with some ifdefferies? And even if we'll
> have to change lds on every arch, I think we better do it that way.
I don't understand what you mean. defining a cachine line size macro for
all archs? There is such macro, but using it in vmlinux.ld.h always
report error. There is some other defines which can't be included in a
link script.
> It's not like the problem exists only on x86, right?
yes, though currently only x86 has it.

Thanks,
Shaohua

2010-12-14 09:57:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello,

On 12/14/2010 02:08 AM, Shaohua Li wrote:
> I don't understand what you mean. defining a cachine line size macro
> for all archs? There is such macro, but using it in vmlinux.ld.h
> always report error. There is some other defines which can't be
> included in a link script.

I haven't really looked through it but wouldn't it be possible to
ifdef it. ie. if cacheline macro is available, align it to it;
otherwise, don't. And, ultimately, the correct thing to do is making
it cacheline aligned on all archs. There can be several different
ways to get there but it might just as well be making cacheline size
available in all archs first and then update the PERCPU macro.

Thank you.

--
tejun

2010-12-15 01:58:11

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Tue, 2010-12-14 at 17:58 +0800, Tejun Heo wrote:
> Hello,
>
> On 12/14/2010 02:08 AM, Shaohua Li wrote:
> > I don't understand what you mean. defining a cachine line size macro
> > for all archs? There is such macro, but using it in vmlinux.ld.h
> > always report error. There is some other defines which can't be
> > included in a link script.
>
> I haven't really looked through it but wouldn't it be possible to
> ifdef it. ie. if cacheline macro is available, align it to it;
> otherwise, don't. And, ultimately, the correct thing to do is making
> it cacheline aligned on all archs. There can be several different
> ways to get there but it might just as well be making cacheline size
> available in all archs first and then update the PERCPU macro.
How about this one?

Subject: Make x86 percpu readmostly section correctly aligned

percpu readmostly section should start and end at address cachline aligned to
avoid cache false sharing. For ARCHs care about the cache false sharing,
they should define INTERNODE_CACHE_BYTES. We use it to do the alignment.
Currently only x86 has percpu readmostly section, so only changed it so far.

Signed-off-by: Shaohua Li <[email protected]>

---
arch/x86/kernel/vmlinux.lds.S | 2 +-
include/asm-generic/vmlinux.lds.h | 11 +++++++++++
2 files changed, 12 insertions(+), 1 deletion(-)

Index: linux/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux.orig/include/asm-generic/vmlinux.lds.h 2010-12-15 09:27:26.000000000 +0800
+++ linux/include/asm-generic/vmlinux.lds.h 2010-12-15 09:55:22.000000000 +0800
@@ -665,6 +665,13 @@
*(.discard.*) \
}

+#ifdef INTERNODE_CACHE_BYTES
+#define INTERNODE_CACHEALIGNED \
+ . = ALIGN(INTERNODE_CACHE_BYTES);
+#else
+#define INTERNODE_CACHEALIGNED
+#endif
+
/**
* PERCPU_VADDR - define output section for percpu area
* @vaddr: explicit base address (optional)
@@ -692,7 +699,9 @@
*(.data..percpu..first) \
. = ALIGN(PAGE_SIZE); \
*(.data..percpu..page_aligned) \
+ INTERNODE_CACHEALIGNED \
*(.data..percpu..readmostly) \
+ INTERNODE_CACHEALIGNED \
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
VMLINUX_SYMBOL(__per_cpu_end) = .; \
@@ -720,7 +729,9 @@
*(.data..percpu..first) \
. = ALIGN(PAGE_SIZE); \
*(.data..percpu..page_aligned) \
+ INTERNODE_CACHEALIGNED \
*(.data..percpu..readmostly) \
+ INTERNODE_CACHEALIGNED \
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
VMLINUX_SYMBOL(__per_cpu_end) = .; \
Index: linux/arch/x86/kernel/vmlinux.lds.S
===================================================================
--- linux.orig/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:01.000000000 +0800
+++ linux/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:06.000000000 +0800
@@ -20,11 +20,11 @@
#define LOAD_OFFSET __START_KERNEL_map
#endif

-#include <asm-generic/vmlinux.lds.h>
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
#include <asm/page_types.h>
#include <asm/cache.h>
+#include <asm-generic/vmlinux.lds.h>
#include <asm/boot.h>

#undef i386 /* in case the preprocessor is a 32bit one */

2010-12-15 14:08:58

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello,

On 12/15/2010 02:57 AM, Shaohua Li wrote:
> How about this one?

Much better. :-)

> +#ifdef INTERNODE_CACHE_BYTES
> +#define INTERNODE_CACHEALIGNED \
> + . = ALIGN(INTERNODE_CACHE_BYTES);
> +#else
> +#define INTERNODE_CACHEALIGNED
> +#endif

Yeah, this looks good.

> Index: linux/arch/x86/kernel/vmlinux.lds.S
> ===================================================================
> --- linux.orig/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:01.000000000 +0800
> +++ linux/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:06.000000000 +0800
> @@ -20,11 +20,11 @@
> #define LOAD_OFFSET __START_KERNEL_map
> #endif
>
> -#include <asm-generic/vmlinux.lds.h>
> #include <asm/asm-offsets.h>
> #include <asm/thread_info.h>
> #include <asm/page_types.h>
> #include <asm/cache.h>
> +#include <asm-generic/vmlinux.lds.h>
> #include <asm/boot.h>

Why do we need this chunk?

Thanks.

--
tejun

2010-12-15 14:49:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello,

On 12/15/2010 03:08 PM, Tejun Heo wrote:
>> +#ifdef INTERNODE_CACHE_BYTES
>> +#define INTERNODE_CACHEALIGNED \
>> + . = ALIGN(INTERNODE_CACHE_BYTES);
>> +#else
>> +#define INTERNODE_CACHEALIGNED
>> +#endif
>
> Yeah, this looks good.

Just one more thing. If you look at various arch linker scripts,
cache line size is always present. After all, RW_DATA_SECTION() needs
it. They're different define's or sometimes hardcoded values but it
would be nice if we can take this chance to unify them and use it for
this too. Would you be interested in doing that?

Thanks.

--
tejun

2010-12-16 00:53:42

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Wed, 2010-12-15 at 22:08 +0800, Tejun Heo wrote:
> Hello,
>
> On 12/15/2010 02:57 AM, Shaohua Li wrote:
> > How about this one?
>
> Much better. :-)
>
> > +#ifdef INTERNODE_CACHE_BYTES
> > +#define INTERNODE_CACHEALIGNED \
> > + . = ALIGN(INTERNODE_CACHE_BYTES);
> > +#else
> > +#define INTERNODE_CACHEALIGNED
> > +#endif
>
> Yeah, this looks good.
>
> > Index: linux/arch/x86/kernel/vmlinux.lds.S
> > ===================================================================
> > --- linux.orig/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:01.000000000 +0800
> > +++ linux/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:06.000000000 +0800
> > @@ -20,11 +20,11 @@
> > #define LOAD_OFFSET __START_KERNEL_map
> > #endif
> >
> > -#include <asm-generic/vmlinux.lds.h>
> > #include <asm/asm-offsets.h>
> > #include <asm/thread_info.h>
> > #include <asm/page_types.h>
> > #include <asm/cache.h>
> > +#include <asm-generic/vmlinux.lds.h>
> > #include <asm/boot.h>
>
> Why do we need this chunk?
the cache size is defined in cache.h, so I need move vmlinux.lds.h after
cache.h

Thanks,
Shaohua

2010-12-16 00:55:09

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Wed, 2010-12-15 at 22:49 +0800, Tejun Heo wrote:
> Hello,
>
> On 12/15/2010 03:08 PM, Tejun Heo wrote:
> >> +#ifdef INTERNODE_CACHE_BYTES
> >> +#define INTERNODE_CACHEALIGNED \
> >> + . = ALIGN(INTERNODE_CACHE_BYTES);
> >> +#else
> >> +#define INTERNODE_CACHEALIGNED
> >> +#endif
> >
> > Yeah, this looks good.
>
> Just one more thing. If you look at various arch linker scripts,
> cache line size is always present. After all, RW_DATA_SECTION() needs
> it. They're different define's or sometimes hardcoded values but it
> would be nice if we can take this chance to unify them and use it for
> this too. Would you be interested in doing that?
might not, sorry.

2010-12-16 05:46:25

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Thu, Dec 16, 2010 at 08:53:37AM +0800, Shaohua Li wrote:
> On Wed, 2010-12-15 at 22:08 +0800, Tejun Heo wrote:
> > Hello,
> >
> > On 12/15/2010 02:57 AM, Shaohua Li wrote:
> > > How about this one?
> >
> > Much better. :-)
> >
> > > +#ifdef INTERNODE_CACHE_BYTES
> > > +#define INTERNODE_CACHEALIGNED \
> > > + . = ALIGN(INTERNODE_CACHE_BYTES);
> > > +#else
> > > +#define INTERNODE_CACHEALIGNED
> > > +#endif
> >
> > Yeah, this looks good.
> >
> > > Index: linux/arch/x86/kernel/vmlinux.lds.S
> > > ===================================================================
> > > --- linux.orig/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:01.000000000 +0800
> > > +++ linux/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:06.000000000 +0800
> > > @@ -20,11 +20,11 @@
> > > #define LOAD_OFFSET __START_KERNEL_map
> > > #endif
> > >
> > > -#include <asm-generic/vmlinux.lds.h>
> > > #include <asm/asm-offsets.h>
> > > #include <asm/thread_info.h>
> > > #include <asm/page_types.h>
> > > #include <asm/cache.h>
> > > +#include <asm-generic/vmlinux.lds.h>
> > > #include <asm/boot.h>
> >
> > Why do we need this chunk?
> the cache size is defined in cache.h, so I need move vmlinux.lds.h after
> cache.h

The right fix is to move the inclusion of cache.h to asm-generic/vmlinux.lds.h.
A quick audit only found sparc that failed to guard non assembler stuff.

Sam

2010-12-16 05:56:50

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Thu, 2010-12-16 at 13:46 +0800, Sam Ravnborg wrote:
> On Thu, Dec 16, 2010 at 08:53:37AM +0800, Shaohua Li wrote:
> > On Wed, 2010-12-15 at 22:08 +0800, Tejun Heo wrote:
> > > Hello,
> > >
> > > On 12/15/2010 02:57 AM, Shaohua Li wrote:
> > > > How about this one?
> > >
> > > Much better. :-)
> > >
> > > > +#ifdef INTERNODE_CACHE_BYTES
> > > > +#define INTERNODE_CACHEALIGNED \
> > > > + . = ALIGN(INTERNODE_CACHE_BYTES);
> > > > +#else
> > > > +#define INTERNODE_CACHEALIGNED
> > > > +#endif
> > >
> > > Yeah, this looks good.
> > >
> > > > Index: linux/arch/x86/kernel/vmlinux.lds.S
> > > > ===================================================================
> > > > --- linux.orig/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:01.000000000 +0800
> > > > +++ linux/arch/x86/kernel/vmlinux.lds.S 2010-12-15 09:37:06.000000000 +0800
> > > > @@ -20,11 +20,11 @@
> > > > #define LOAD_OFFSET __START_KERNEL_map
> > > > #endif
> > > >
> > > > -#include <asm-generic/vmlinux.lds.h>
> > > > #include <asm/asm-offsets.h>
> > > > #include <asm/thread_info.h>
> > > > #include <asm/page_types.h>
> > > > #include <asm/cache.h>
> > > > +#include <asm-generic/vmlinux.lds.h>
> > > > #include <asm/boot.h>
> > >
> > > Why do we need this chunk?
> > the cache size is defined in cache.h, so I need move vmlinux.lds.h after
> > cache.h
>
> The right fix is to move the inclusion of cache.h to asm-generic/vmlinux.lds.h.
> A quick audit only found sparc that failed to guard non assembler stuff.
with this, we need check every arch, at least doing a compile. I'm
afraid I can't, sorry.

2010-12-16 09:51:12

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello, Shaohua.

On 12/16/2010 06:56 AM, Shaohua Li wrote:
>>>>> -#include <asm-generic/vmlinux.lds.h>
>>>>> #include <asm/asm-offsets.h>
>>>>> #include <asm/thread_info.h>
>>>>> #include <asm/page_types.h>
>>>>> #include <asm/cache.h>
>>>>> +#include <asm-generic/vmlinux.lds.h>
>>>>> #include <asm/boot.h>
>>>>
>>>> Why do we need this chunk?
>>> the cache size is defined in cache.h, so I need move vmlinux.lds.h after
>>> cache.h
>>
>> The right fix is to move the inclusion of cache.h to
>> asm-generic/vmlinux.lds.h. A quick audit only found sparc that
>> failed to guard non assembler stuff.
>
> with this, we need check every arch, at least doing a compile. I'm
> afraid I can't, sorry.

Not being able to cross build every arch is okay but you at least need
to make an effort to make things easily applicable to other archs and
avoid adding subtle ugliness like the above. Please at least try to
look at other arch codes and see how things can be made to work across
different archs. Setting up cross compilers for the major archs, for
example, sparc, power and ia64 isn't that difficult either.

Thanks.

--
tejun

2010-12-20 01:29:52

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Thu, 2010-12-16 at 17:50 +0800, Tejun Heo wrote:
> Hello, Shaohua.
>
> On 12/16/2010 06:56 AM, Shaohua Li wrote:
> >>>>> -#include <asm-generic/vmlinux.lds.h>
> >>>>> #include <asm/asm-offsets.h>
> >>>>> #include <asm/thread_info.h>
> >>>>> #include <asm/page_types.h>
> >>>>> #include <asm/cache.h>
> >>>>> +#include <asm-generic/vmlinux.lds.h>
> >>>>> #include <asm/boot.h>
> >>>>
> >>>> Why do we need this chunk?
> >>> the cache size is defined in cache.h, so I need move vmlinux.lds.h after
> >>> cache.h
> >>
> >> The right fix is to move the inclusion of cache.h to
> >> asm-generic/vmlinux.lds.h. A quick audit only found sparc that
> >> failed to guard non assembler stuff.
> >
> > with this, we need check every arch, at least doing a compile. I'm
> > afraid I can't, sorry.
>
> Not being able to cross build every arch is okay but you at least need
> to make an effort to make things easily applicable to other archs and
> avoid adding subtle ugliness like the above. Please at least try to
> look at other arch codes and see how things can be made to work across
> different archs. Setting up cross compilers for the major archs, for
> example, sparc, power and ia64 isn't that difficult either.
This still needs I fix every arch, for example, as Sam pointed out,
spark build will fail. I really have the bandwidth and capability to do
this. Increment changes are always preferred. My original patch is
trying to follow increment changes way.

2010-12-20 15:55:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello,

On 12/20/2010 02:28 AM, Shaohua Li wrote:
>> Not being able to cross build every arch is okay but you at least need
>> to make an effort to make things easily applicable to other archs and
>> avoid adding subtle ugliness like the above. Please at least try to
>> look at other arch codes and see how things can be made to work across
>> different archs. Setting up cross compilers for the major archs, for
>> example, sparc, power and ia64 isn't that difficult either.
>
> This still needs I fix every arch, for example, as Sam pointed out,
> spark build will fail. I really have the bandwidth and capability to do
> this. Increment changes are always preferred. My original patch is
> trying to follow increment changes way.

Yeah, well, in my experience, those approaches usually just end up
piling partial conversions on top of different set of partial
conversions. It's not like we're talking about a major arch dependent
feature it's just a matter of looking through different arch codes,
maybe test building some, pushing things to linux-next and dealing
with fallouts if there is any. I'll try to do it myself.

Thanks.

--
tejun

2010-12-23 02:38:50

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Mon, 2010-12-20 at 23:55 +0800, Tejun Heo wrote:
> Hello,
>
> On 12/20/2010 02:28 AM, Shaohua Li wrote:
> >> Not being able to cross build every arch is okay but you at least need
> >> to make an effort to make things easily applicable to other archs and
> >> avoid adding subtle ugliness like the above. Please at least try to
> >> look at other arch codes and see how things can be made to work across
> >> different archs. Setting up cross compilers for the major archs, for
> >> example, sparc, power and ia64 isn't that difficult either.
> >
> > This still needs I fix every arch, for example, as Sam pointed out,
> > spark build will fail. I really have the bandwidth and capability to do
> > this. Increment changes are always preferred. My original patch is
> > trying to follow increment changes way.
>
> Yeah, well, in my experience, those approaches usually just end up
> piling partial conversions on top of different set of partial
> conversions. It's not like we're talking about a major arch dependent
> feature it's just a matter of looking through different arch codes,
> maybe test building some, pushing things to linux-next and dealing
> with fallouts if there is any. I'll try to do it myself.
Thanks for doing it and sorry I didn't do it.
The first patch in the series isn't related to this issue, please
consider merge it.

Thanks,
Shaohua

2010-12-27 12:15:01

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Thu, Dec 23, 2010 at 10:38:34AM +0800, Shaohua Li wrote:
> The first patch in the series isn't related to this issue, please
> consider merge it.

Can you please re-post it? Thanks.

--
tejun

2010-12-28 00:27:20

by Shaohua Li

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

On Mon, 2010-12-27 at 20:14 +0800, Tejun Heo wrote:
> On Thu, Dec 23, 2010 at 10:38:34AM +0800, Shaohua Li wrote:
> > The first patch in the series isn't related to this issue, please
> > consider merge it.
>
> Can you please re-post it? Thanks.
here it is.

Subject: make readmostly section correctly align

readmostly section should end at cache line aligned address, otherwise the last
several data might share cachline with other data and make the readmostly data
still have cache bounce.
For example, in ia64, secpath_cachep is the last readmostly data, and it shares
cacheline with init_uts_ns.
a000000100e80480 d secpath_cachep
a000000100e80488 D init_uts_ns

Signed-off-by: Shaohua Li <[email protected]>

---
include/asm-generic/vmlinux.lds.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux.orig/include/asm-generic/vmlinux.lds.h 2010-12-01 16:49:48.000000000 +0800
+++ linux/include/asm-generic/vmlinux.lds.h 2010-12-02 09:22:32.000000000 +0800
@@ -192,7 +192,8 @@

#define READ_MOSTLY_DATA(align) \
. = ALIGN(align); \
- *(.data..read_mostly)
+ *(.data..read_mostly) \
+ . = ALIGN(align);

#define CACHELINE_ALIGNED_DATA(align) \
. = ALIGN(align); \

2010-12-28 11:13:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [patch 2/3] add new macros to make percpu readmostly section correctly align

Hello,

On Tue, Dec 28, 2010 at 08:26:57AM +0800, Shaohua Li wrote:
> Subject: make readmostly section correctly align
>
> readmostly section should end at cache line aligned address, otherwise the last
> several data might share cachline with other data and make the readmostly data
> still have cache bounce.
> For example, in ia64, secpath_cachep is the last readmostly data, and it shares
> cacheline with init_uts_ns.
> a000000100e80480 d secpath_cachep
> a000000100e80488 D init_uts_ns
>
> Signed-off-by: Shaohua Li <[email protected]>

Yeap, this looks good to me.

Acked-by: Tejun Heo <[email protected]>

I can route this through percpu tree but think -mm is the better fit.
Andrew, can you please take this one?

Thank you.

--
tejun