2015-11-07 11:00:30

by Noam Camus

[permalink] [raw]
Subject: [PATCH v2 16/19] ARC: [plat-eznps] Use dedicated cpu_relax()

From: Tal Zilcer <[email protected]>

Since the CTOP is SMT hardware multi-threaded, we need to hint
the HW that now will be a very good time to do a hardware
thread context switching. This is done by issuing the schd.rw
instruction (binary coded here so as to not require specific
revision of GCC to build the kernel).
sched.rw means that Thread becomes eligible for execution by
the threads scheduler after all pending read/write
transactions were completed.

Implementing cpu_relax_lowlatency() with barrier()
Since with current semantics of cpu_relax() it may take a
while till yielded CPU will get back.

Signed-off-by: Noam Camus <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Acked-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/processor.h | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 7266ede..50f9bae 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -58,12 +58,21 @@ struct task_struct;
* get optimised away by gcc
*/
#ifdef CONFIG_SMP
+#ifndef CONFIG_EZNPS_MTM_EXT
#define cpu_relax() __asm__ __volatile__ ("" : : : "memory")
#else
+#define cpu_relax() \
+ __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
+#endif
+#else
#define cpu_relax() do { } while (0)
#endif

+#ifndef CONFIG_EZNPS_MTM_EXT
#define cpu_relax_lowlatency() cpu_relax()
+#else
+#define cpu_relax_lowlatency() barrier()
+#endif

#define copy_segments(tsk, mm) do { } while (0)
#define release_segments(mm) do { } while (0)
--
1.7.1


2015-11-09 10:05:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] ARC: [plat-eznps] Use dedicated cpu_relax()

On Sat, Nov 07, 2015 at 12:52:34PM +0200, Noam Camus wrote:
> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
> index 7266ede..50f9bae 100644
> --- a/arch/arc/include/asm/processor.h
> +++ b/arch/arc/include/asm/processor.h
> @@ -58,12 +58,21 @@ struct task_struct;
> * get optimised away by gcc
> */
> #ifdef CONFIG_SMP
> +#ifndef CONFIG_EZNPS_MTM_EXT
> #define cpu_relax() __asm__ __volatile__ ("" : : : "memory")
> #else
> +#define cpu_relax() \
> + __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
> +#endif
> +#else
> #define cpu_relax() do { } while (0)

I'm fairly sure this is incorrect. Even on UP we expect cpu_relax() to
be a compiler barrier.

> #endif
>
> +#ifndef CONFIG_EZNPS_MTM_EXT
> #define cpu_relax_lowlatency() cpu_relax()
> +#else
> +#define cpu_relax_lowlatency() barrier()
> +#endif

At which point you can unconditionally use that definition.

>
> #define copy_segments(tsk, mm) do { } while (0)
> #define release_segments(mm) do { } while (0)
> --
> 1.7.1
>

2015-11-09 10:22:32

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] ARC: [plat-eznps] Use dedicated cpu_relax()

On Monday 09 November 2015 03:35 PM, Peter Zijlstra wrote:
> On Sat, Nov 07, 2015 at 12:52:34PM +0200, Noam Camus wrote:
>> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
>> index 7266ede..50f9bae 100644
>> --- a/arch/arc/include/asm/processor.h
>> +++ b/arch/arc/include/asm/processor.h
>> @@ -58,12 +58,21 @@ struct task_struct;
>> * get optimised away by gcc
>> */
>> #ifdef CONFIG_SMP
>> +#ifndef CONFIG_EZNPS_MTM_EXT
>> #define cpu_relax() __asm__ __volatile__ ("" : : : "memory")
>> #else
>> +#define cpu_relax() \
>> + __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
>> +#endif
>> +#else
>> #define cpu_relax() do { } while (0)
> I'm fairly sure this is incorrect. Even on UP we expect cpu_relax() to
> be a compiler barrier.

We discussed this a while back (why do https:/lkml.org/lkml/<year>/.... links work
psuedo randomly)

http://marc.info/?l=linux-kernel&m=140350765530113


>
>> #endif
>>
>> +#ifndef CONFIG_EZNPS_MTM_EXT
>> #define cpu_relax_lowlatency() cpu_relax()
>> +#else
>> +#define cpu_relax_lowlatency() barrier()
>> +#endif
> At which point you can unconditionally use that definition.
>
>>
>> #define copy_segments(tsk, mm) do { } while (0)
>> #define release_segments(mm) do { } while (0)
>> --
>> 1.7.1
>>
> _______________________________________________
> linux-snps-arc mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-snps-arc
>

2015-11-09 10:45:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] ARC: [plat-eznps] Use dedicated cpu_relax()

On Mon, Nov 09, 2015 at 10:22:27AM +0000, Vineet Gupta wrote:
> On Monday 09 November 2015 03:35 PM, Peter Zijlstra wrote:
> > On Sat, Nov 07, 2015 at 12:52:34PM +0200, Noam Camus wrote:
> >> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
> >> index 7266ede..50f9bae 100644
> >> --- a/arch/arc/include/asm/processor.h
> >> +++ b/arch/arc/include/asm/processor.h
> >> @@ -58,12 +58,21 @@ struct task_struct;
> >> * get optimised away by gcc
> >> */
> >> #ifdef CONFIG_SMP
> >> +#ifndef CONFIG_EZNPS_MTM_EXT
> >> #define cpu_relax() __asm__ __volatile__ ("" : : : "memory")
> >> #else
> >> +#define cpu_relax() \
> >> + __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
> >> +#endif
> >> +#else
> >> #define cpu_relax() do { } while (0)
> > I'm fairly sure this is incorrect. Even on UP we expect cpu_relax() to
> > be a compiler barrier.
>
> We discussed this a while back (why do https:/lkml.org/lkml/<year>/.... links work
> psuedo randomly)
>
> http://marc.info/?l=linux-kernel&m=140350765530113

Hurm.. you have a better memory than me ;-)

So in general we assume cpu_relax() implies a barrier() and I think we
have loops like:

while (!var)
cpu_relax();

where var isn't volatile (or casted using READ_ONCE etc).

See for instance: kernel/time/timer.c:lock_timer_base() which has:

for (;;) {
u32 tf = timer->flags;

if (!(tf & TIMER_MIGRATING)) {
...
}

cpu_relax();
}

So while TIMER_MIGRATING is set, it will only ever do regular loads,
which GCC is permitted to lift out if cpu_relax() is not a barrier.

2015-11-09 12:27:34

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] ARC: [plat-eznps] Use dedicated cpu_relax()

On Monday 09 November 2015 04:15 PM, Peter Zijlstra wrote:
> On Mon, Nov 09, 2015 at 10:22:27AM +0000, Vineet Gupta wrote:
>> On Monday 09 November 2015 03:35 PM, Peter Zijlstra wrote:
>>> On Sat, Nov 07, 2015 at 12:52:34PM +0200, Noam Camus wrote:
>>>> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
>>>> index 7266ede..50f9bae 100644
>>>> --- a/arch/arc/include/asm/processor.h
>>>> +++ b/arch/arc/include/asm/processor.h
>>>> @@ -58,12 +58,21 @@ struct task_struct;
>>>> * get optimised away by gcc
>>>> */
>>>> #ifdef CONFIG_SMP
>>>> +#ifndef CONFIG_EZNPS_MTM_EXT
>>>> #define cpu_relax() __asm__ __volatile__ ("" : : : "memory")
>>>> #else
>>>> +#define cpu_relax() \
>>>> + __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
>>>> +#endif
>>>> +#else
>>>> #define cpu_relax() do { } while (0)
>>> I'm fairly sure this is incorrect. Even on UP we expect cpu_relax() to
>>> be a compiler barrier.
>>
>> We discussed this a while back (why do https:/lkml.org/lkml/<year>/.... links work
>> psuedo randomly)
>>
>> http://marc.info/?l=linux-kernel&m=140350765530113
>
> Hurm.. you have a better memory than me ;-)
>
> So in general we assume cpu_relax() implies a barrier() and I think we
> have loops like:
>
> while (!var)
> cpu_relax();
>
> where var isn't volatile (or casted using READ_ONCE etc).
>
> See for instance: kernel/time/timer.c:lock_timer_base() which has:
>
> for (;;) {
> u32 tf = timer->flags;
>
> if (!(tf & TIMER_MIGRATING)) {
> ...
> }
>
> cpu_relax();
> }
>
> So while TIMER_MIGRATING is set, it will only ever do regular loads,
> which GCC is permitted to lift out if cpu_relax() is not a barrier.

I'll just bite the bullet and make it a compiler barrier and send Linus way in
4.4. Care to provide an Ack or some such.

-------------------->
>From e29de8efa621b825891dcc744c84965b38f6b868 Mon Sep 17 00:00:00 2001
From: Vineet Gupta <[email protected]>
Date: Mon, 9 Nov 2015 17:48:34 +0530
Subject: [PATCH] ARC: cpu_relax() to be compiler barrier even for UP

cpu_relax() on ARC has been barrier only for SMP (and no-op for UP). Per
recent discussions, it is safer to make it a compiler barrier
unconditionally.

Link: http://lkml.kernel.org/r/[email protected]
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
---
arch/arc/include/asm/processor.h | 4 ----
1 file changed, 4 deletions(-)

diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 44545354e9e8..1d694c1ef6d6 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -57,11 +57,7 @@ struct task_struct;
* A lot of busy-wait loops in SMP are based off of non-volatile data otherwise
* get optimised away by gcc
*/
-#ifdef CONFIG_SMP
#define cpu_relax() __asm__ __volatile__ ("" : : : "memory")
-#else
-#define cpu_relax() do { } while (0)
-#endif

#define cpu_relax_lowlatency() cpu_relax()

--
1.9.1

2015-11-09 12:51:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 16/19] ARC: [plat-eznps] Use dedicated cpu_relax()

On Mon, Nov 09, 2015 at 05:57:25PM +0530, Vineet Gupta wrote:
> From e29de8efa621b825891dcc744c84965b38f6b868 Mon Sep 17 00:00:00 2001
> From: Vineet Gupta <[email protected]>
> Date: Mon, 9 Nov 2015 17:48:34 +0530
> Subject: [PATCH] ARC: cpu_relax() to be compiler barrier even for UP
>
> cpu_relax() on ARC has been barrier only for SMP (and no-op for UP). Per
> recent discussions, it is safer to make it a compiler barrier
> unconditionally.
>
> Link: http://lkml.kernel.org/r/[email protected]

Acked-by: Peter Zijlstra (Intel) <[email protected]>

> Signed-off-by: Vineet Gupta <[email protected]>
> ---
> arch/arc/include/asm/processor.h | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
> index 44545354e9e8..1d694c1ef6d6 100644
> --- a/arch/arc/include/asm/processor.h
> +++ b/arch/arc/include/asm/processor.h
> @@ -57,11 +57,7 @@ struct task_struct;
> * A lot of busy-wait loops in SMP are based off of non-volatile data otherwise
> * get optimised away by gcc
> */
> -#ifdef CONFIG_SMP
> #define cpu_relax() __asm__ __volatile__ ("" : : : "memory")
> -#else
> -#define cpu_relax() do { } while (0)
> -#endif
>
> #define cpu_relax_lowlatency() cpu_relax()
>
> --
> 1.9.1
>