The membarrier system call now requires all architectures to implement
sync_core(). On Xtensa, it is provided by the EXTW instruction.
[ Completely untested! Can someone on the xtensa side confirm whether
EXTW is the right way to serialize core execution and try it out ? ]
Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Paul E. McKenney <[email protected]>
CC: Chris Zankel <[email protected]>
CC: Max Filippov <[email protected]>
CC: [email protected]
---
arch/xtensa/include/asm/processor.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index 30ee8c608853..b435bd6adbd6 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -248,5 +248,14 @@ static inline unsigned long get_er(unsigned long addr)
#endif /* XCHAL_HAVE_EXTERN_REGS */
+static inline void sync_core(void)
+{
+ /*
+ * Synchronize the core execution pipeline. Acts as a compiler
+ * barrier.
+ */
+ asm volatile ("extw" : : : "memory");
+}
+
#endif /* __ASSEMBLY__ */
#endif /* _XTENSA_PROCESSOR_H */
--
2.11.0
Hi Mathieu,
On Mon, Aug 28, 2017 at 12:36 AM, Mathieu Desnoyers
<[email protected]> wrote:
> The membarrier system call now requires all architectures to implement
> sync_core(). On Xtensa, it is provided by the EXTW instruction.
>
> [ Completely untested! Can someone on the xtensa side confirm whether
> EXTW is the right way to serialize core execution and try it out ? ]
Thanks for the patch. I'm currently travelling, I'll give it a try next week
once I'm back at work.
--
Thanks.
-- Max
----- On Aug 28, 2017, at 1:12 PM, Max Filippov [email protected] wrote:
> Hi Mathieu,
>
> On Mon, Aug 28, 2017 at 12:36 AM, Mathieu Desnoyers
> <[email protected]> wrote:
>> The membarrier system call now requires all architectures to implement
>> sync_core(). On Xtensa, it is provided by the EXTW instruction.
>>
>> [ Completely untested! Can someone on the xtensa side confirm whether
>> EXTW is the right way to serialize core execution and try it out ? ]
>
> Thanks for the patch. I'm currently travelling, I'll give it a try next week
> once I'm back at work.
Hi Max,
I think we may need to flush the icache to make it consistent with the dcache
too on xtensa, in addition to the EXTW. The goal here is to allow JIT engines
to reclaim and re-use memory after they discard dynamically generated code.
This is similar to what we'd need to do on arm32, where they have inconsistent
d/i-caches.
Thoughts ?
Thanks,
Mathieu
>
> --
> Thanks.
> -- Max
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Hi Mathieu,
On Tue, Aug 29, 2017 at 11:55 AM, Mathieu Desnoyers
<[email protected]> wrote:
> ----- On Aug 28, 2017, at 1:12 PM, Max Filippov [email protected] wrote:
>> On Mon, Aug 28, 2017 at 12:36 AM, Mathieu Desnoyers
>> <[email protected]> wrote:
>>> The membarrier system call now requires all architectures to implement
>>> sync_core(). On Xtensa, it is provided by the EXTW instruction.
>>>
>>> [ Completely untested! Can someone on the xtensa side confirm whether
>>> EXTW is the right way to serialize core execution and try it out ? ]
>>
>> Thanks for the patch. I'm currently travelling, I'll give it a try next week
>> once I'm back at work.
>
> I think we may need to flush the icache to make it consistent with the dcache
> too on xtensa, in addition to the EXTW. The goal here is to allow JIT engines
> to reclaim and re-use memory after they discard dynamically generated code.
> This is similar to what we'd need to do on arm32, where they have inconsistent
> d/i-caches.
my understanding is that to support JIT engines on xtensa we need to do
icache/dcache synchronization, this procedure is covered in the ISYNC
instruction description in the ISA book, it involves MEMW and ISYNC,
but not EXTW. EXTW is meant to work as a CPU barrier that orders all
externally visible CPU signals, which seems unnecessary.
Interestingly, currently we don't have MEMW between dcache flush and
icache invalidation, so I need to add it to be consistent with the documented
procedure. Then I believe that sync_core implementation should invoke
flush_dcache_all followed by MEMW followed by invalidate_icache_all.
Does that sound right?
--
Thanks.
-- Max
----- On Sep 12, 2017, at 12:37 AM, Max Filippov [email protected] wrote:
> Hi Mathieu,
>
> On Tue, Aug 29, 2017 at 11:55 AM, Mathieu Desnoyers
> <[email protected]> wrote:
>> ----- On Aug 28, 2017, at 1:12 PM, Max Filippov [email protected] wrote:
>>> On Mon, Aug 28, 2017 at 12:36 AM, Mathieu Desnoyers
>>> <[email protected]> wrote:
>>>> The membarrier system call now requires all architectures to implement
>>>> sync_core(). On Xtensa, it is provided by the EXTW instruction.
>>>>
>>>> [ Completely untested! Can someone on the xtensa side confirm whether
>>>> EXTW is the right way to serialize core execution and try it out ? ]
>>>
>>> Thanks for the patch. I'm currently travelling, I'll give it a try next week
>>> once I'm back at work.
>>
>> I think we may need to flush the icache to make it consistent with the dcache
>> too on xtensa, in addition to the EXTW. The goal here is to allow JIT engines
>> to reclaim and re-use memory after they discard dynamically generated code.
>> This is similar to what we'd need to do on arm32, where they have inconsistent
>> d/i-caches.
>
> my understanding is that to support JIT engines on xtensa we need to do
> icache/dcache synchronization, this procedure is covered in the ISYNC
> instruction description in the ISA book, it involves MEMW and ISYNC,
> but not EXTW. EXTW is meant to work as a CPU barrier that orders all
> externally visible CPU signals, which seems unnecessary.
Good point! Yes, combining MEMW and ISYNC seems to be what we need here.
My upcoming proposal for core-serializing membarrier command issues
smp_mb() (memw) and sync_core(). I would then think that just using the
"isync" instruction is a good candidate for the sync_core() implementation,
given that it is similar to the effect of the core serializing "cpuid"
instruction on Intel.
>
> Interestingly, currently we don't have MEMW between dcache flush and
> icache invalidation, so I need to add it to be consistent with the documented
> procedure. Then I believe that sync_core implementation should invoke
> flush_dcache_all followed by MEMW followed by invalidate_icache_all.
> Does that sound right?
In a different leg of the thread targeting arm64, we discussed that we
should probably only make that membarrier command serialize the core,
without taking care of flushing the caches. Cache flushing instructions
can be executed from user-space on some architectures, or through another
arch-specific system call that targets the address range that needs to be
flushed.
For SMP variants of xtensa, that cache flushing system call should consider
whether icache invalidation needs to be performed on all CPUs when reclaiming
JIT code.
Thoughts ?
Thanks,
Mathieu
>
> --
> Thanks.
> -- Max
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com