2019-06-13 16:42:23

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 9/9] x86/apic: Use non-atomic operations when possible

Using __clear_bit() and __cpumask_clear_cpu() is more efficient than
using their atomic counterparts. Use them when atomicity is not needed,
such as when manipulating bitmasks that are on the stack.

Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/kernel/apic/apic_flat_64.c | 4 ++--
arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
arch/x86/kernel/smp.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index bf083c3f1d73..bbdca603f94a 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -78,7 +78,7 @@ flat_send_IPI_mask_allbutself(const struct cpumask *cpumask, int vector)
int cpu = smp_processor_id();

if (cpu < BITS_PER_LONG)
- clear_bit(cpu, &mask);
+ __clear_bit(cpu, &mask);

_flat_send_IPI_mask(mask, vector);
}
@@ -92,7 +92,7 @@ static void flat_send_IPI_allbutself(int vector)
unsigned long mask = cpumask_bits(cpu_online_mask)[0];

if (cpu < BITS_PER_LONG)
- clear_bit(cpu, &mask);
+ __clear_bit(cpu, &mask);

_flat_send_IPI_mask(mask, vector);
}
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 7685444a106b..609e499387a1 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -50,7 +50,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
cpumask_copy(tmpmsk, mask);
/* If IPI should not be sent to self, clear current CPU */
if (apic_dest != APIC_DEST_ALLINC)
- cpumask_clear_cpu(smp_processor_id(), tmpmsk);
+ __cpumask_clear_cpu(smp_processor_id(), tmpmsk);

/* Collapse cpus in a cluster so a single IPI per cluster is sent */
for_each_cpu(cpu, tmpmsk) {
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 4693e2f3a03e..96421f97e75c 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -144,7 +144,7 @@ void native_send_call_func_ipi(const struct cpumask *mask)
}

cpumask_copy(allbutself, cpu_online_mask);
- cpumask_clear_cpu(smp_processor_id(), allbutself);
+ __cpumask_clear_cpu(smp_processor_id(), allbutself);

if (cpumask_equal(mask, allbutself) &&
cpumask_equal(cpu_online_mask, cpu_callout_mask))
--
2.20.1


Subject: [tip:x86/apic] x86/apic: Use non-atomic operations when possible

Commit-ID: dde3626f815e38bbf96fddd5185038c4b4d395a8
Gitweb: https://git.kernel.org/tip/dde3626f815e38bbf96fddd5185038c4b4d395a8
Author: Nadav Amit <[email protected]>
AuthorDate: Wed, 12 Jun 2019 23:48:13 -0700
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 23 Jun 2019 14:07:23 +0200

x86/apic: Use non-atomic operations when possible

Using __clear_bit() and __cpumask_clear_cpu() is more efficient than using
their atomic counterparts.

Use them when atomicity is not needed, such as when manipulating bitmasks
that are on the stack.

Signed-off-by: Nadav Amit <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/kernel/apic/apic_flat_64.c | 4 ++--
arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
arch/x86/kernel/smp.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 0005c284a5c5..65072858f553 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -78,7 +78,7 @@ flat_send_IPI_mask_allbutself(const struct cpumask *cpumask, int vector)
int cpu = smp_processor_id();

if (cpu < BITS_PER_LONG)
- clear_bit(cpu, &mask);
+ __clear_bit(cpu, &mask);

_flat_send_IPI_mask(mask, vector);
}
@@ -92,7 +92,7 @@ static void flat_send_IPI_allbutself(int vector)
unsigned long mask = cpumask_bits(cpu_online_mask)[0];

if (cpu < BITS_PER_LONG)
- clear_bit(cpu, &mask);
+ __clear_bit(cpu, &mask);

_flat_send_IPI_mask(mask, vector);
}
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 7685444a106b..609e499387a1 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -50,7 +50,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
cpumask_copy(tmpmsk, mask);
/* If IPI should not be sent to self, clear current CPU */
if (apic_dest != APIC_DEST_ALLINC)
- cpumask_clear_cpu(smp_processor_id(), tmpmsk);
+ __cpumask_clear_cpu(smp_processor_id(), tmpmsk);

/* Collapse cpus in a cluster so a single IPI per cluster is sent */
for_each_cpu(cpu, tmpmsk) {
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 04adc8d60aed..acddd988602d 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -146,7 +146,7 @@ void native_send_call_func_ipi(const struct cpumask *mask)
}

cpumask_copy(allbutself, cpu_online_mask);
- cpumask_clear_cpu(smp_processor_id(), allbutself);
+ __cpumask_clear_cpu(smp_processor_id(), allbutself);

if (cpumask_equal(mask, allbutself) &&
cpumask_equal(cpu_online_mask, cpu_callout_mask))

2019-06-25 22:00:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 9/9] x86/apic: Use non-atomic operations when possible

> diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
> index 7685444a106b..609e499387a1 100644
> --- a/arch/x86/kernel/apic/x2apic_cluster.c
> +++ b/arch/x86/kernel/apic/x2apic_cluster.c
> @@ -50,7 +50,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
> cpumask_copy(tmpmsk, mask);
> /* If IPI should not be sent to self, clear current CPU */
> if (apic_dest != APIC_DEST_ALLINC)
> - cpumask_clear_cpu(smp_processor_id(), tmpmsk);
> + __cpumask_clear_cpu(smp_processor_id(), tmpmsk);

tmpmsk is on-stack, but it's a pointer to a per-cpu variable:

tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask);

So this one doesn't appear as obviously correct as a mask which itself
is on the stack. The other three look obviously OK, though.

2019-06-25 22:04:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 9/9] x86/apic: Use non-atomic operations when possible

On Tue, 25 Jun 2019, Dave Hansen wrote:

> > diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
> > index 7685444a106b..609e499387a1 100644
> > --- a/arch/x86/kernel/apic/x2apic_cluster.c
> > +++ b/arch/x86/kernel/apic/x2apic_cluster.c
> > @@ -50,7 +50,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
> > cpumask_copy(tmpmsk, mask);
> > /* If IPI should not be sent to self, clear current CPU */
> > if (apic_dest != APIC_DEST_ALLINC)
> > - cpumask_clear_cpu(smp_processor_id(), tmpmsk);
> > + __cpumask_clear_cpu(smp_processor_id(), tmpmsk);
>
> tmpmsk is on-stack, but it's a pointer to a per-cpu variable:
>
> tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask);
>
> So this one doesn't appear as obviously correct as a mask which itself
> is on the stack. The other three look obviously OK, though.

It's still correct. The mask is per cpu and protected because interrupts
are disabled. I noticed and wanted to amend the change log and forgot.

Thanks,

tglx