2011-03-12 11:50:21

by Tejun Heo

[permalink] [raw]
Subject: [PATCH tip:x86/mm 1/3] x86: A fast way to check capabilities of the current cpu

From: Christoph Lameter <[email protected]>

Add this_cpu_has() which determines if the current cpu has a certain
ability using a segment prefix and a bit test operation.

For that we need to add bit operations to x86s percpu.h.

Many uses of cpu_has use a pointer passed to a function to determine
the current flags. That is no longer necessary after this patch.

However, this patch only converts the straightforward cases where
cpu_has is used with this_cpu_ptr. The rest is work for later.

-tj: Rolled up patch to add x86_ prefix and use percpu_read() instead
of percpu_read_stable().

Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: Tejun Heo <[email protected]>
---
Ingo, can you please route these three patches? These patches are
somewhere inbetween percpu and x86 but fit x86 better. I generated
them on top of tip:x86/mm but if you want it based on something else,
just let me know. If you want these to be routed through x86/mm, I
can apply it there and send pull request too.

Thanks.

arch/x86/include/asm/cpufeature.h | 13 +++++++++----
arch/x86/include/asm/percpu.h | 27 +++++++++++++++++++++++++++
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/process.c | 4 ++--
arch/x86/kernel/smpboot.c | 4 ++--
5 files changed, 41 insertions(+), 9 deletions(-)

Index: work/arch/x86/include/asm/cpufeature.h
===================================================================
--- work.orig/arch/x86/include/asm/cpufeature.h
+++ work/arch/x86/include/asm/cpufeature.h
@@ -206,8 +206,7 @@ extern const char * const x86_power_flag
#define test_cpu_cap(c, bit) \
test_bit(bit, (unsigned long *)((c)->x86_capability))

-#define cpu_has(c, bit) \
- (__builtin_constant_p(bit) && \
+#define REQUIRED_MASK_BIT_SET(bit) \
( (((bit)>>5)==0 && (1UL<<((bit)&31) & REQUIRED_MASK0)) || \
(((bit)>>5)==1 && (1UL<<((bit)&31) & REQUIRED_MASK1)) || \
(((bit)>>5)==2 && (1UL<<((bit)&31) & REQUIRED_MASK2)) || \
@@ -217,10 +216,16 @@ extern const char * const x86_power_flag
(((bit)>>5)==6 && (1UL<<((bit)&31) & REQUIRED_MASK6)) || \
(((bit)>>5)==7 && (1UL<<((bit)&31) & REQUIRED_MASK7)) || \
(((bit)>>5)==8 && (1UL<<((bit)&31) & REQUIRED_MASK8)) || \
- (((bit)>>5)==9 && (1UL<<((bit)&31) & REQUIRED_MASK9)) ) \
- ? 1 : \
+ (((bit)>>5)==9 && (1UL<<((bit)&31) & REQUIRED_MASK9)) )
+
+#define cpu_has(c, bit) \
+ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \
test_cpu_cap(c, bit))

+#define this_cpu_has(bit) \
+ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \
+ x86_this_cpu_test_bit(bit, (unsigned long *)&cpu_info.x86_capability))
+
#define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)

#define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability))
Index: work/arch/x86/kernel/apic/apic.c
===================================================================
--- work.orig/arch/x86/kernel/apic/apic.c
+++ work/arch/x86/kernel/apic/apic.c
@@ -525,7 +525,7 @@ static void __cpuinit setup_APIC_timer(v
{
struct clock_event_device *levt = &__get_cpu_var(lapic_events);

- if (cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_ARAT)) {
+ if (this_cpu_has(X86_FEATURE_ARAT)) {
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_C3STOP;
/* Make LAPIC timer preferrable over percpu HPET */
lapic_clockevent.rating = 150;
Index: work/arch/x86/include/asm/percpu.h
===================================================================
--- work.orig/arch/x86/include/asm/percpu.h
+++ work/arch/x86/include/asm/percpu.h
@@ -492,6 +492,33 @@ do { \
old__; \
})

+static __always_inline int x86_this_cpu_constant_test_bit(unsigned int nr,
+ const unsigned long __percpu *addr)
+{
+ unsigned long __percpu *a = (unsigned long *)addr + nr / BITS_PER_LONG;
+
+ return ((1UL << (nr % BITS_PER_LONG)) & percpu_read(*a)) != 0;
+}
+
+static inline int x86_this_cpu_variable_test_bit(int nr,
+ const unsigned long __percpu *addr)
+{
+ int oldbit;
+
+ asm volatile("bt "__percpu_arg(2)",%1\n\t"
+ "sbb %0,%0"
+ : "=r" (oldbit)
+ : "m" (*(unsigned long *)addr), "Ir" (nr));
+
+ return oldbit;
+}
+
+#define x86_this_cpu_test_bit(nr, addr) \
+ (__builtin_constant_p((nr)) \
+ ? x86_this_cpu_constant_test_bit((nr), (addr)) \
+ : x86_this_cpu_variable_test_bit((nr), (addr)))
+
+
#include <asm-generic/percpu.h>

/* We can use this directly for local CPU (faster). */
Index: work/arch/x86/kernel/process.c
===================================================================
--- work.orig/arch/x86/kernel/process.c
+++ work/arch/x86/kernel/process.c
@@ -442,7 +442,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
{
if (!need_resched()) {
- if (cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_CLFLUSH_MONITOR))
+ if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)&current_thread_info()->flags);

__monitor((void *)&current_thread_info()->flags, 0, 0);
@@ -458,7 +458,7 @@ static void mwait_idle(void)
if (!need_resched()) {
trace_power_start(POWER_CSTATE, 1, smp_processor_id());
trace_cpu_idle(1, smp_processor_id());
- if (cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_CLFLUSH_MONITOR))
+ if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)&current_thread_info()->flags);

__monitor((void *)&current_thread_info()->flags, 0, 0);
Index: work/arch/x86/kernel/smpboot.c
===================================================================
--- work.orig/arch/x86/kernel/smpboot.c
+++ work/arch/x86/kernel/smpboot.c
@@ -1339,9 +1339,9 @@ static inline void mwait_play_dead(void)
void *mwait_ptr;
struct cpuinfo_x86 *c = __this_cpu_ptr(&cpu_info);

- if (!(cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)))
+ if (!this_cpu_has(X86_FEATURE_MWAIT) && mwait_usable(c))
return;
- if (!cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_CLFLSH))
+ if (!this_cpu_has(X86_FEATURE_CLFLSH))
return;
if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
return;


2011-03-12 11:50:52

by Tejun Heo

[permalink] [raw]
Subject: [PATCH tip:x86/mm 2/3] x86: Use this_cpu_has for thermal_interrupt current cpu

From: Christoph Lameter <[email protected]>

It is more effective to use a segment prefix instead of calculating the
address of the current cpu area amd then testing flags.

Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: Tejun Heo <[email protected]>
---
arch/x86/kernel/cpu/mcheck/therm_throt.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

Index: work/arch/x86/kernel/cpu/mcheck/therm_throt.c
===================================================================
--- work.orig/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ work/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -355,7 +355,6 @@ static void notify_thresholds(__u64 msr_
static void intel_thermal_interrupt(void)
{
__u64 msr_val;
- struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());

rdmsrl(MSR_IA32_THERM_STATUS, msr_val);

@@ -367,19 +366,19 @@ static void intel_thermal_interrupt(void
CORE_LEVEL) != 0)
mce_log_therm_throt_event(CORE_THROTTLED | msr_val);

- if (cpu_has(c, X86_FEATURE_PLN))
+ if (this_cpu_has(X86_FEATURE_PLN))
if (therm_throt_process(msr_val & THERM_STATUS_POWER_LIMIT,
POWER_LIMIT_EVENT,
CORE_LEVEL) != 0)
mce_log_therm_throt_event(CORE_POWER_LIMIT | msr_val);

- if (cpu_has(c, X86_FEATURE_PTS)) {
+ if (this_cpu_has(X86_FEATURE_PTS)) {
rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
if (therm_throt_process(msr_val & PACKAGE_THERM_STATUS_PROCHOT,
THERMAL_THROTTLING_EVENT,
PACKAGE_LEVEL) != 0)
mce_log_therm_throt_event(PACKAGE_THROTTLED | msr_val);
- if (cpu_has(c, X86_FEATURE_PLN))
+ if (this_cpu_has(X86_FEATURE_PLN))
if (therm_throt_process(msr_val &
PACKAGE_THERM_STATUS_POWER_LIMIT,
POWER_LIMIT_EVENT,

2011-03-12 11:51:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH tip:x86/mm 3/3] acpi throttling: Use this_cpu_has and simplify code current cpu

From: Christoph Lameter <[email protected]>

With the this_cpu_xx we no longer need to pass an acpi
structure to the msr management code. Simplifies code and improves
performance.

NOTE: This code is x86 specific (see #ifdef CONFIG_X86) but not under
arch/x86.

Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: Tejun Heo <[email protected]>
---
drivers/acpi/processor_throttling.c | 32 ++++++++++----------------------
1 file changed, 10 insertions(+), 22 deletions(-)

Index: work/drivers/acpi/processor_throttling.c
===================================================================
--- work.orig/drivers/acpi/processor_throttling.c
+++ work/drivers/acpi/processor_throttling.c
@@ -710,20 +710,14 @@ static int acpi_processor_get_throttling
}

#ifdef CONFIG_X86
-static int acpi_throttling_rdmsr(struct acpi_processor *pr,
- u64 *value)
+static int acpi_throttling_rdmsr(u64 *value)
{
- struct cpuinfo_x86 *c;
u64 msr_high, msr_low;
- unsigned int cpu;
u64 msr = 0;
int ret = -1;

- cpu = pr->id;
- c = &cpu_data(cpu);
-
- if ((c->x86_vendor != X86_VENDOR_INTEL) ||
- !cpu_has(c, X86_FEATURE_ACPI)) {
+ if ((this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_INTEL) ||
+ !this_cpu_has(X86_FEATURE_ACPI)) {
printk(KERN_ERR PREFIX
"HARDWARE addr space,NOT supported yet\n");
} else {
@@ -738,18 +732,13 @@ static int acpi_throttling_rdmsr(struct
return ret;
}

-static int acpi_throttling_wrmsr(struct acpi_processor *pr, u64 value)
+static int acpi_throttling_wrmsr(u64 value)
{
- struct cpuinfo_x86 *c;
- unsigned int cpu;
int ret = -1;
u64 msr;

- cpu = pr->id;
- c = &cpu_data(cpu);
-
- if ((c->x86_vendor != X86_VENDOR_INTEL) ||
- !cpu_has(c, X86_FEATURE_ACPI)) {
+ if ((this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_INTEL) ||
+ !this_cpu_has(X86_FEATURE_ACPI)) {
printk(KERN_ERR PREFIX
"HARDWARE addr space,NOT supported yet\n");
} else {
@@ -761,15 +750,14 @@ static int acpi_throttling_wrmsr(struct
return ret;
}
#else
-static int acpi_throttling_rdmsr(struct acpi_processor *pr,
- u64 *value)
+static int acpi_throttling_rdmsr(u64 *value)
{
printk(KERN_ERR PREFIX
"HARDWARE addr space,NOT supported yet\n");
return -1;
}

-static int acpi_throttling_wrmsr(struct acpi_processor *pr, u64 value)
+static int acpi_throttling_wrmsr(u64 value)
{
printk(KERN_ERR PREFIX
"HARDWARE addr space,NOT supported yet\n");
@@ -801,7 +789,7 @@ static int acpi_read_throttling_status(s
ret = 0;
break;
case ACPI_ADR_SPACE_FIXED_HARDWARE:
- ret = acpi_throttling_rdmsr(pr, value);
+ ret = acpi_throttling_rdmsr(value);
break;
default:
printk(KERN_ERR PREFIX "Unknown addr space %d\n",
@@ -834,7 +822,7 @@ static int acpi_write_throttling_state(s
ret = 0;
break;
case ACPI_ADR_SPACE_FIXED_HARDWARE:
- ret = acpi_throttling_wrmsr(pr, value);
+ ret = acpi_throttling_wrmsr(value);
break;
default:
printk(KERN_ERR PREFIX "Unknown addr space %d\n",

2011-03-14 19:50:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH tip:x86/mm 1/3] x86: A fast way to check capabilities of the current cpu

Tejun Heo <[email protected]> writes:

> Add this_cpu_has() which determines if the current cpu has a certain
> ability using a segment prefix and a bit test operation.


Hmm: if the CPU capability is really tested in a performance critical
path, wouldn't it even be better to just use static_branch() now?

-Andi

--
[email protected] -- Speaking for myself only

2011-03-28 16:41:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH tip:x86/mm 3/3] acpi throttling: Use this_cpu_has and simplify code current cpu

On Sat, Mar 12, 2011 at 12:51:12PM +0100, Tejun Heo wrote:
> From: Christoph Lameter <[email protected]>
>
> With the this_cpu_xx we no longer need to pass an acpi
> structure to the msr management code. Simplifies code and improves
> performance.
>
> NOTE: This code is x86 specific (see #ifdef CONFIG_X86) but not under
> arch/x86.
>
> Signed-off-by: Christoph Lameter <[email protected]>
> Acked-by: Tejun Heo <[email protected]>
> ---

Ingo, are you planning on picking these up?

Thanks.

--
tejun

Subject: Re: [PATCH tip:x86/mm 3/3] acpi throttling: Use this_cpu_has and simplify code current cpu

On Mon, 28 Mar 2011, Tejun Heo wrote:

> On Sat, Mar 12, 2011 at 12:51:12PM +0100, Tejun Heo wrote:
> > From: Christoph Lameter <[email protected]>
> >
> > With the this_cpu_xx we no longer need to pass an acpi
> > structure to the msr management code. Simplifies code and improves
> > performance.
> >
> > NOTE: This code is x86 specific (see #ifdef CONFIG_X86) but not under
> > arch/x86.
> >
> > Signed-off-by: Christoph Lameter <[email protected]>
> > Acked-by: Tejun Heo <[email protected]>
> > ---
>
> Ingo, are you planning on picking these up?

Guess the patches missed the merge period. There could be more
optimizations of arch code if the infrastructure for this_cpu_has() would
be in but I guess I better hold off for awhile.

2011-03-30 15:22:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH tip:x86/mm 3/3] acpi throttling: Use this_cpu_has and simplify code current cpu

On Wed, Mar 30, 2011 at 09:20:08AM -0500, Christoph Lameter wrote:
> > Ingo, are you planning on picking these up?
>
> Guess the patches missed the merge period. There could be more
> optimizations of arch code if the infrastructure for this_cpu_has() would
> be in but I guess I better hold off for awhile.

Yeah, merging didn't go very smooth with your recent patches. Sorry
about that. I applied the this_cpu_has() patches to x86-mm and will
soon send pull request to Ingo. Please feel free to send further
patches on top of that tree.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git

Thanks.

--
tejun

2011-03-30 16:00:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH tip:x86/mm 1/3] x86: A fast way to check capabilities of the current cpu

On 03/14/2011 12:49 PM, Andi Kleen wrote:
> Tejun Heo <[email protected]> writes:
>
>> Add this_cpu_has() which determines if the current cpu has a certain
>> ability using a segment prefix and a bit test operation.
>
>
> Hmm: if the CPU capability is really tested in a performance critical
> path, wouldn't it even be better to just use static_branch() now?
>

We have static_cpu_has() for this specific purpose (it actually predates
static_branch()).

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2011-03-30 16:08:03

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH tip:x86/mm 1/3] x86: A fast way to check capabilities of the current cpu

Hello,

On Wed, Mar 30, 2011 at 08:58:23AM -0700, H. Peter Anvin wrote:
> >> Add this_cpu_has() which determines if the current cpu has a certain
> >> ability using a segment prefix and a bit test operation.
> >
> >
> > Hmm: if the CPU capability is really tested in a performance critical
> > path, wouldn't it even be better to just use static_branch() now?
> >
>
> We have static_cpu_has() for this specific purpose (it actually predates
> static_branch()).

These patches have performance benefits but I don't think it would be
noticeable. I think it's more of code cleanup.

Thanks.

--
tejun

Subject: Re: [PATCH tip:x86/mm 1/3] x86: A fast way to check capabilities of the current cpu

On Wed, 30 Mar 2011, Tejun Heo wrote:

> Hello,
>
> On Wed, Mar 30, 2011 at 08:58:23AM -0700, H. Peter Anvin wrote:
> > >> Add this_cpu_has() which determines if the current cpu has a certain
> > >> ability using a segment prefix and a bit test operation.
> > >
> > >
> > > Hmm: if the CPU capability is really tested in a performance critical
> > > path, wouldn't it even be better to just use static_branch() now?
> > >
> >
> > We have static_cpu_has() for this specific purpose (it actually predates
> > static_branch()).
>
> These patches have performance benefits but I don't think it would be
> noticeable. I think it's more of code cleanup.

The main simplification is that the pointer to the cpu_info can be
omitted. Its passed around quite a bit right now in x86 arch code.
Removal of the cpu_info pointer could reduce code size and allow faster
access.

2011-03-30 16:19:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH tip:x86/mm 1/3] x86: A fast way to check capabilities of the current cpu

On 03/30/2011 08:58 AM, H. Peter Anvin wrote:
> On 03/14/2011 12:49 PM, Andi Kleen wrote:
>> Tejun Heo <[email protected]> writes:
>>
>>> Add this_cpu_has() which determines if the current cpu has a certain
>>> ability using a segment prefix and a bit test operation.
>>
>>
>> Hmm: if the CPU capability is really tested in a performance critical
>> path, wouldn't it even be better to just use static_branch() now?
>>
>
> We have static_cpu_has() for this specific purpose (it actually predates
> static_branch()).
>

Note that static_cpu_has() has two caveats:

1. Before alternatives patching, it will always be false.
2. It is the equivalent of boot_cpu_has(), i.e. the intersection of all
the CPU feature sets (available at boot time, obviously.)

-hpa