2023-08-10 05:12:05

by Leonardo Bras

[permalink] [raw]
Subject: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

xchg for variables of size 1-byte and 2-bytes is not yet available for
riscv, even though its present in other architectures such as arm64 and
x86. This could lead to not being able to implement some locking mechanisms
or requiring some rework to make it work properly.

Implement 1-byte and 2-bytes xchg in order to achieve parity with other
architectures.

Signed-off-by: Leonardo Bras <[email protected]>
---
arch/riscv/include/asm/cmpxchg.h | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index ac9d0eeb74e6..26cea2395aae 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,6 +11,31 @@
#include <asm/barrier.h>
#include <asm/fence.h>

+#define __arch_xchg_masked(prepend, append, r, p, n) \
+({ \
+ u32 *__ptr32b = (u32 *)((ulong)(p) & ~0x3); \
+ ulong __s = ((ulong)(p) & (0x4 - sizeof(*p))) * BITS_PER_BYTE; \
+ ulong __mask = GENMASK(((sizeof(*p)) * BITS_PER_BYTE) - 1, 0) \
+ << __s; \
+ ulong __newx = (ulong)(n) << __s; \
+ ulong __retx; \
+ ulong __rc; \
+ \
+ __asm__ __volatile__ ( \
+ prepend \
+ "0: lr.w %0, %2\n" \
+ " and %1, %0, %z4\n" \
+ " or %1, %1, %z3\n" \
+ " sc.w %1, %1, %2\n" \
+ " bnez %1, 0b\n" \
+ append \
+ : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \
+ : "rJ" (__newx), "rJ" (~__mask) \
+ : "memory"); \
+ \
+ r = (__typeof__(*(p)))((__retx & __mask) >> __s); \
+})
+
#define __arch_xchg(sfx, prepend, append, r, p, n) \
({ \
__asm__ __volatile__ ( \
@@ -27,7 +52,13 @@
__typeof__(ptr) __ptr = (ptr); \
__typeof__(*(__ptr)) __new = (new); \
__typeof__(*(__ptr)) __ret; \
+ \
switch (sizeof(*__ptr)) { \
+ case 1: \
+ case 2: \
+ __arch_xchg_masked(prepend, append, \
+ __ret, __ptr, __new); \
+ break; \
case 4: \
__arch_xchg(".w" sfx, prepend, append, \
__ret, __ptr, __new); \
--
2.41.0



2023-08-10 07:11:55

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> xchg for variables of size 1-byte and 2-bytes is not yet available for
> riscv, even though its present in other architectures such as arm64 and
> x86. This could lead to not being able to implement some locking mechanisms
> or requiring some rework to make it work properly.
>
> Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> architectures.
>
> Signed-off-by: Leonardo Bras <[email protected]>

Parity with other architectures by itself is not a reason to do this,
in particular the other architectures you listed have the instructions
in hardware while riscv does not.

Emulating the small xchg() through cmpxchg() is particularly tricky
since it's easy to run into a case where this does not guarantee
forward progress. This is also something that almost no architecture
specific code relies on (generic qspinlock being a notable exception).

I would recommend just dropping this patch from the series, at least
until there is a need for it.

Arnd

2023-08-10 17:19:24

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
>> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
>> > xchg for variables of size 1-byte and 2-bytes is not yet available for
>> > riscv, even though its present in other architectures such as arm64 and
>> > x86. This could lead to not being able to implement some locking mechanisms
>> > or requiring some rework to make it work properly.
>> >
>> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
>> > architectures.
>> >
>> > Signed-off-by: Leonardo Bras <[email protected]>
>>
>
> Hello Arnd Bergmann, thanks for reviewing!
>
>> Parity with other architectures by itself is not a reason to do this,
>> in particular the other architectures you listed have the instructions
>> in hardware while riscv does not.
>
> Sure, I understand RISC-V don't have native support for xchg on variables of
> size < 4B. My argument is that it's nice to have even an emulated version for
> this in case any future mechanism wants to use it.
>
> Not having it may mean we won't be able to enable given mechanism in RISC-V.

IIUC the ask is to have a user within the kernel for these functions.
That's the general thing to do, and last time this came up there was no
in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
it yet because we're worried about the performance/fairness stuff that
other ports have seen and nobody's got concrete benchmarks yet (though
there's another patch set out that I haven't had time to look through,
so that may have changed).

So if something uses these I'm happy to go look closer.

>> Emulating the small xchg() through cmpxchg() is particularly tricky
>> since it's easy to run into a case where this does not guarantee
>> forward progress.
>>
>
> Didn't get this part:
> By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
>
> If so, yeah, it's a fair point: in some extreme case we could have multiple
> threads accessing given cacheline and have sc always failing. On the other hand,
> there are 2 arguments on that:
>
> 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> also seem to rely in this mechanism for every xchg size. Another archs like csky
> and loongarch use asm that look like mine to handle size < 4B xchg.
>
>
>> This is also something that almost no architecture
>> specific code relies on (generic qspinlock being a notable exception).
>>
>
> 2 - As you mentioned, there should be very little code that will actually make
> use of xchg for vars < 4B, so it should be safe to assume its fine to not
> guarantee forward progress for those rare usages (like some of above mentioned
> archs).
>
>> I would recommend just dropping this patch from the series, at least
>> until there is a need for it.
>
> While I agree this is a valid point, I believe its more interesting to have it
> implemented if any future mechanism wants to make use of this.
>
>
> Thanks!
> Leo

2023-08-10 18:10:22

by Leonardo Bras

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > riscv, even though its present in other architectures such as arm64 and
> > x86. This could lead to not being able to implement some locking mechanisms
> > or requiring some rework to make it work properly.
> >
> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > architectures.
> >
> > Signed-off-by: Leonardo Bras <[email protected]>
>

Hello Arnd Bergmann, thanks for reviewing!

> Parity with other architectures by itself is not a reason to do this,
> in particular the other architectures you listed have the instructions
> in hardware while riscv does not.

Sure, I understand RISC-V don't have native support for xchg on variables of
size < 4B. My argument is that it's nice to have even an emulated version for
this in case any future mechanism wants to use it.

Not having it may mean we won't be able to enable given mechanism in RISC-V.

>
> Emulating the small xchg() through cmpxchg() is particularly tricky
> since it's easy to run into a case where this does not guarantee
> forward progress.
>

Didn't get this part:
By "emulating small xchg() through cmpxchg()", did you mean like emulating an
xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?

If so, yeah, it's a fair point: in some extreme case we could have multiple
threads accessing given cacheline and have sc always failing. On the other hand,
there are 2 arguments on that:

1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
also seem to rely in this mechanism for every xchg size. Another archs like csky
and loongarch use asm that look like mine to handle size < 4B xchg.


> This is also something that almost no architecture
> specific code relies on (generic qspinlock being a notable exception).
>

2 - As you mentioned, there should be very little code that will actually make
use of xchg for vars < 4B, so it should be safe to assume its fine to not
guarantee forward progress for those rare usages (like some of above mentioned
archs).

> I would recommend just dropping this patch from the series, at least
> until there is a need for it.

While I agree this is a valid point, I believe its more interesting to have it
implemented if any future mechanism wants to make use of this.


Thanks!
Leo



2023-08-10 20:26:57

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Thu, Aug 10, 2023, at 18:23, Palmer Dabbelt wrote:
> On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
>> On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
>>> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
>>> > xchg for variables of size 1-byte and 2-bytes is not yet available for
>>> > riscv, even though its present in other architectures such as arm64 and
>>> > x86. This could lead to not being able to implement some locking mechanisms
>>> > or requiring some rework to make it work properly.
>>> >
>>> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
>>> > architectures.
>>
>>> Parity with other architectures by itself is not a reason to do this,
>>> in particular the other architectures you listed have the instructions
>>> in hardware while riscv does not.
>>
>> Sure, I understand RISC-V don't have native support for xchg on variables of
>> size < 4B. My argument is that it's nice to have even an emulated version for
>> this in case any future mechanism wants to use it.
>>
>> Not having it may mean we won't be able to enable given mechanism in RISC-V.
>
> IIUC the ask is to have a user within the kernel for these functions.
> That's the general thing to do, and last time this came up there was no
> in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> it yet because we're worried about the performance/fairness stuff that
> other ports have seen and nobody's got concrete benchmarks yet (though
> there's another patch set out that I haven't had time to look through,
> so that may have changed).

Right. In particular the qspinlock is a good example for something
where having the emulated 16-bit xchg() may end up less efficient
than a natively supported instruction.

The xchg() here is a performance optimization for CPUs that can
do this without touching the other half of the 32-bit word.

>>
>> Didn't get this part:
>> By "emulating small xchg() through cmpxchg()", did you mean like emulating an
>> xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
>>
>> If so, yeah, it's a fair point: in some extreme case we could have multiple
>> threads accessing given cacheline and have sc always failing. On the other hand,
>> there are 2 arguments on that:
>>
>> 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
>> also seem to rely in this mechanism for every xchg size. Another archs like csky
>> and loongarch use asm that look like mine to handle size < 4B xchg.

I think you misread the arm64 code, which should use native instructions
for all sizes, in both the armv8.0 and LSE atomics.

PowerPC does use the masking for xchg, but I suspect there are no
actual users, at least it actually has its own qspinlock implementation
that avoids xchg().

>>> This is also something that almost no architecture
>>> specific code relies on (generic qspinlock being a notable exception).
>>>
>>
>> 2 - As you mentioned, there should be very little code that will actually make
>> use of xchg for vars < 4B, so it should be safe to assume its fine to not
>> guarantee forward progress for those rare usages (like some of above mentioned
>> archs).

I don't this this is a safe assumption, we've had endless discussions
about using qspinlock on architectures without a native xchg(), which
needs either hardware guarantees or special countermeasures in xchg() itself
to avoid this.

What I'd actually like to do here is to remove the special 8-bit and
16-bit cases from the xchg() and cmpxchg() interfaces at all, leaving
only fixed 32-bit and native wordsize (either 32 or 64) as the option,
while dealing with the others the same way we treat the fixed
64-bit cases that hardcode the 64-bit argument types and are only
usable on architectures that provide them.

Arnd

2023-08-11 02:39:41

by Guo Ren

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Fri, Aug 11, 2023 at 3:13 AM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Aug 10, 2023, at 18:23, Palmer Dabbelt wrote:
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> >> On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> >>> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> >>> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> >>> > riscv, even though its present in other architectures such as arm64 and
> >>> > x86. This could lead to not being able to implement some locking mechanisms
> >>> > or requiring some rework to make it work properly.
> >>> >
> >>> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> >>> > architectures.
> >>
> >>> Parity with other architectures by itself is not a reason to do this,
> >>> in particular the other architectures you listed have the instructions
> >>> in hardware while riscv does not.
> >>
> >> Sure, I understand RISC-V don't have native support for xchg on variables of
> >> size < 4B. My argument is that it's nice to have even an emulated version for
> >> this in case any future mechanism wants to use it.
> >>
> >> Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
>
> Right. In particular the qspinlock is a good example for something
> where having the emulated 16-bit xchg() may end up less efficient
> than a natively supported instruction.
The xchg() efficiency depends on micro-architecture. and the number of
instructions is not the key, even one instruction would be separated
into several micro-ops. I thought the Power guys won't agree with this
view :)

>
> The xchg() here is a performance optimization for CPUs that can
> do this without touching the other half of the 32-bit word.
It's useless on a non-SMT system because all operations are cacheline
based. (Ps: Because xchg() has a load semantic, CHI's "Dirty Partial"
& "Clean Empty" can't help anymore.)

>
> >>
> >> Didn't get this part:
> >> By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> >> xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> >>
> >> If so, yeah, it's a fair point: in some extreme case we could have multiple
> >> threads accessing given cacheline and have sc always failing. On the other hand,
> >> there are 2 arguments on that:
> >>
> >> 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> >> also seem to rely in this mechanism for every xchg size. Another archs like csky
> >> and loongarch use asm that look like mine to handle size < 4B xchg.
>
> I think you misread the arm64 code, which should use native instructions
> for all sizes, in both the armv8.0 and LSE atomics.
>
> PowerPC does use the masking for xchg, but I suspect there are no
> actual users, at least it actually has its own qspinlock implementation
> that avoids xchg().
PowerPC still needs similar things, see publish_tail_cpu(), and more
complex cmpxchg semantics.

Paravrit qspinlock and CNA qspinlock still need more:
- xchg8 (RCsc)
- cmpxchg8/16_relaxed
- cmpxchg8/16_release (Rcpc)
- cmpxchg8_acquire (RCpc)
- cmpxchg8 (RCsc)

>
> >>> This is also something that almost no architecture
> >>> specific code relies on (generic qspinlock being a notable exception).
> >>>
> >>
> >> 2 - As you mentioned, there should be very little code that will actually make
> >> use of xchg for vars < 4B, so it should be safe to assume its fine to not
> >> guarantee forward progress for those rare usages (like some of above mentioned
> >> archs).
>
> I don't this this is a safe assumption, we've had endless discussions
> about using qspinlock on architectures without a native xchg(), which
> needs either hardware guarantees or special countermeasures in xchg() itself
> to avoid this.
>
> What I'd actually like to do here is to remove the special 8-bit and
> 16-bit cases from the xchg() and cmpxchg() interfaces at all, leaving
It needs to modify qspinlock, paravirt_qspinlock, and CNA_qspinlock
code to prevent using 8-bit/16-bit xchg/cmpxchg, and cleanup all
architectures' cmpxchg.h. What you do is just get them out of the
common atomic.h, but architectures still need to solve them and
connect to the qspinlock series.

> only fixed 32-bit and native wordsize (either 32 or 64) as the option,
> while dealing with the others the same way we treat the fixed
> 64-bit cases that hardcode the 64-bit argument types and are only
> usable on architectures that provide them.
>
> Arnd



--
Best Regards
Guo Ren

2023-08-11 02:41:31

by Guo Ren

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <[email protected]> wrote:
>
> On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> >> > riscv, even though its present in other architectures such as arm64 and
> >> > x86. This could lead to not being able to implement some locking mechanisms
> >> > or requiring some rework to make it work properly.
> >> >
> >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> >> > architectures.
> >> >
> >> > Signed-off-by: Leonardo Bras <[email protected]>
> >>
> >
> > Hello Arnd Bergmann, thanks for reviewing!
> >
> >> Parity with other architectures by itself is not a reason to do this,
> >> in particular the other architectures you listed have the instructions
> >> in hardware while riscv does not.
> >
> > Sure, I understand RISC-V don't have native support for xchg on variables of
> > size < 4B. My argument is that it's nice to have even an emulated version for
> > this in case any future mechanism wants to use it.
> >
> > Not having it may mean we won't be able to enable given mechanism in RISC-V.
>
> IIUC the ask is to have a user within the kernel for these functions.
> That's the general thing to do, and last time this came up there was no
> in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> it yet because we're worried about the performance/fairness stuff that
> other ports have seen and nobody's got concrete benchmarks yet (though
> there's another patch set out that I haven't had time to look through,
> so that may have changed).
Conor doesn't agree with using an alternative as a detour mechanism
between qspinlock & ticket lock. So I'm preparing V11 with static_key
(jump_label) style. Next version, I would separate paravirt_qspinlock
& CNA_qspinlock from V10. That would make it easy to review the
qspinlock patch series. You can review the next version V11. Now I'm
debugging a static_key init problem when load_modules, which is
triggered by our combo_qspinlock.

The qspinlock is being tested on the riscv platform [1] with 128 cores
with 8 NUMA nodes, next, I would update the comparison results of
qspinlock & ticket lock.

[1]: https://www.sophon.ai/

>
> So if something uses these I'm happy to go look closer.
>
> >> Emulating the small xchg() through cmpxchg() is particularly tricky
> >> since it's easy to run into a case where this does not guarantee
> >> forward progress.
> >>
> >
> > Didn't get this part:
> > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> >
> > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > threads accessing given cacheline and have sc always failing. On the other hand,
> > there are 2 arguments on that:
> >
> > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > and loongarch use asm that look like mine to handle size < 4B xchg.
> >
> >
> >> This is also something that almost no architecture
> >> specific code relies on (generic qspinlock being a notable exception).
> >>
> >
> > 2 - As you mentioned, there should be very little code that will actually make
> > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > guarantee forward progress for those rare usages (like some of above mentioned
> > archs).
> >
> >> I would recommend just dropping this patch from the series, at least
> >> until there is a need for it.
> >
> > While I agree this is a valid point, I believe its more interesting to have it
> > implemented if any future mechanism wants to make use of this.
> >
> >
> > Thanks!
> > Leo



--
Best Regards
Guo Ren

2023-08-11 07:18:10

by Conor Dooley

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <[email protected]> wrote:
> >
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > >> > riscv, even though its present in other architectures such as arm64 and
> > >> > x86. This could lead to not being able to implement some locking mechanisms
> > >> > or requiring some rework to make it work properly.
> > >> >
> > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > >> > architectures.
> > >> >
> > >> > Signed-off-by: Leonardo Bras <[email protected]>
> > >>
> > >
> > > Hello Arnd Bergmann, thanks for reviewing!
> > >
> > >> Parity with other architectures by itself is not a reason to do this,
> > >> in particular the other architectures you listed have the instructions
> > >> in hardware while riscv does not.
> > >
> > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > this in case any future mechanism wants to use it.
> > >
> > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
> Conor doesn't agree with using an alternative as a detour mechanism
> between qspinlock & ticket lock.

Hold on a sec, I don't recall having a problem with alternatives - it
was calling the stronger forward progress guarantee an erratum
(which it isn't) and an ISA extension w/o any "abusing" that framework
that I did not like.

> So I'm preparing V11 with static_key
> (jump_label) style.

I don't think there's much point rushing into making it based on static
keys when no progress has been made on implementing support for
non-standard extensions. Changing to a static key doesn't change the
detection mechanism, I've not got a problem with using alternatives for
this stuff.

Thanks,
Conor.

> Next version, I would separate paravirt_qspinlock
> & CNA_qspinlock from V10. That would make it easy to review the
> qspinlock patch series. You can review the next version V11. Now I'm
> debugging a static_key init problem when load_modules, which is
> triggered by our combo_qspinlock.
>
> The qspinlock is being tested on the riscv platform [1] with 128 cores
> with 8 NUMA nodes, next, I would update the comparison results of
> qspinlock & ticket lock.
>
> [1]: https://www.sophon.ai/
>
> >
> > So if something uses these I'm happy to go look closer.
> >
> > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > >> since it's easy to run into a case where this does not guarantee
> > >> forward progress.
> > >>
> > >
> > > Didn't get this part:
> > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > >
> > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > there are 2 arguments on that:
> > >
> > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > >
> > >
> > >> This is also something that almost no architecture
> > >> specific code relies on (generic qspinlock being a notable exception).
> > >>
> > >
> > > 2 - As you mentioned, there should be very little code that will actually make
> > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > guarantee forward progress for those rare usages (like some of above mentioned
> > > archs).
> > >
> > >> I would recommend just dropping this patch from the series, at least
> > >> until there is a need for it.
> > >
> > > While I agree this is a valid point, I believe its more interesting to have it
> > > implemented if any future mechanism wants to make use of this.
> > >
> > >
> > > Thanks!
> > > Leo
>
>
>
> --
> Best Regards
> Guo Ren


Attachments:
(No filename) (4.77 kB)
signature.asc (235.00 B)
Download all attachments

2023-08-11 10:10:57

by Conor Dooley

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Fri, Aug 11, 2023 at 07:27:28AM +0100, Conor Dooley wrote:
> On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> > On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <[email protected]> wrote:
> > >
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > >> > riscv, even though its present in other architectures such as arm64 and
> > > >> > x86. This could lead to not being able to implement some locking mechanisms
> > > >> > or requiring some rework to make it work properly.
> > > >> >
> > > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > >> > architectures.
> > > >> >
> > > >> > Signed-off-by: Leonardo Bras <[email protected]>
> > > >>
> > > >
> > > > Hello Arnd Bergmann, thanks for reviewing!
> > > >
> > > >> Parity with other architectures by itself is not a reason to do this,
> > > >> in particular the other architectures you listed have the instructions
> > > >> in hardware while riscv does not.
> > > >
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > >
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > >
> > > IIUC the ask is to have a user within the kernel for these functions.
> > > That's the general thing to do, and last time this came up there was no
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > it yet because we're worried about the performance/fairness stuff that
> > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > there's another patch set out that I haven't had time to look through,
> > > so that may have changed).
> > Conor doesn't agree with using an alternative as a detour mechanism
> > between qspinlock & ticket lock.
>
> Hold on a sec, I don't recall having a problem with alternatives - it
> was calling the stronger forward progress guarantee an erratum
> (which it isn't) and an ISA extension w/o any "abusing" that framework
> that I did not like.

Hmm, what I wrote here makes no sense reading it back. Let me try again:

Hold on a sec, I don't recall having a problem with alternatives - it
was calling the stronger forward progress guarantee an erratum
(which it isn't) and "abusing" that framework that I did not like.
The series effectively mixed and matched whichever part of ISA
extensions and errata that was convenient.

>
> > So I'm preparing V11 with static_key
> > (jump_label) style.
>
> I don't think there's much point rushing into making it based on static
> keys when no progress has been made on implementing support for
> non-standard extensions. Changing to a static key doesn't change the
> detection mechanism, I've not got a problem with using alternatives for
> this stuff.
>
> Thanks,
> Conor.
>
> > Next version, I would separate paravirt_qspinlock
> > & CNA_qspinlock from V10. That would make it easy to review the
> > qspinlock patch series. You can review the next version V11. Now I'm
> > debugging a static_key init problem when load_modules, which is
> > triggered by our combo_qspinlock.
> >
> > The qspinlock is being tested on the riscv platform [1] with 128 cores
> > with 8 NUMA nodes, next, I would update the comparison results of
> > qspinlock & ticket lock.
> >
> > [1]: https://www.sophon.ai/
> >
> > >
> > > So if something uses these I'm happy to go look closer.
> > >
> > > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > > >> since it's easy to run into a case where this does not guarantee
> > > >> forward progress.
> > > >>
> > > >
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > >
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > >
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > > >
> > > >
> > > >> This is also something that almost no architecture
> > > >> specific code relies on (generic qspinlock being a notable exception).
> > > >>
> > > >
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > > >
> > > >> I would recommend just dropping this patch from the series, at least
> > > >> until there is a need for it.
> > > >
> > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > implemented if any future mechanism wants to make use of this.
> > > >
> > > >
> > > > Thanks!
> > > > Leo
> >
> >
> >
> > --
> > Best Regards
> > Guo Ren



Attachments:
(No filename) (5.45 kB)
signature.asc (235.00 B)
Download all attachments

2023-08-11 11:23:47

by Guo Ren

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Fri, Aug 11, 2023 at 7:10 PM Andrew Jones <[email protected]> wrote:
>
> On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> > On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <[email protected]> wrote:
> > >
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > >> > riscv, even though its present in other architectures such as arm64 and
> > > >> > x86. This could lead to not being able to implement some locking mechanisms
> > > >> > or requiring some rework to make it work properly.
> > > >> >
> > > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > >> > architectures.
> > > >> >
> > > >> > Signed-off-by: Leonardo Bras <[email protected]>
> > > >>
> > > >
> > > > Hello Arnd Bergmann, thanks for reviewing!
> > > >
> > > >> Parity with other architectures by itself is not a reason to do this,
> > > >> in particular the other architectures you listed have the instructions
> > > >> in hardware while riscv does not.
> > > >
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > >
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > >
> > > IIUC the ask is to have a user within the kernel for these functions.
> > > That's the general thing to do, and last time this came up there was no
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > it yet because we're worried about the performance/fairness stuff that
> > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > there's another patch set out that I haven't had time to look through,
> > > so that may have changed).
> > Conor doesn't agree with using an alternative as a detour mechanism
> > between qspinlock & ticket lock. So I'm preparing V11 with static_key
> > (jump_label) style. Next version, I would separate paravirt_qspinlock
> > & CNA_qspinlock from V10. That would make it easy to review the
> > qspinlock patch series. You can review the next version V11. Now I'm
> > debugging a static_key init problem when load_modules, which is
> > triggered by our combo_qspinlock.
>
> We've seen problems with static keys and module loading in the past. You
> may want to take a look at commit eb6354e11630 ("riscv: Ensure isa-ext
> static keys are writable")
Thank you for the tip. I would take a look.

>
> Thanks,
> drew
>
> >
> > The qspinlock is being tested on the riscv platform [1] with 128 cores
> > with 8 NUMA nodes, next, I would update the comparison results of
> > qspinlock & ticket lock.
> >
> > [1]: https://www.sophon.ai/
> >
> > >
> > > So if something uses these I'm happy to go look closer.
> > >
> > > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > > >> since it's easy to run into a case where this does not guarantee
> > > >> forward progress.
> > > >>
> > > >
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > >
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > >
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > > >
> > > >
> > > >> This is also something that almost no architecture
> > > >> specific code relies on (generic qspinlock being a notable exception).
> > > >>
> > > >
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > > >
> > > >> I would recommend just dropping this patch from the series, at least
> > > >> until there is a need for it.
> > > >
> > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > implemented if any future mechanism wants to make use of this.
> > > >
> > > >
> > > > Thanks!
> > > > Leo
> >
> >
> >
> > --
> > Best Regards
> > Guo Ren
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Best Regards
Guo Ren

2023-08-11 11:55:43

by Andrew Jones

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Fri, Aug 11, 2023 at 09:40:30AM +0800, Guo Ren wrote:
> On Fri, Aug 11, 2023 at 12:23 AM Palmer Dabbelt <[email protected]> wrote:
> >
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > >> On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > >> > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > >> > riscv, even though its present in other architectures such as arm64 and
> > >> > x86. This could lead to not being able to implement some locking mechanisms
> > >> > or requiring some rework to make it work properly.
> > >> >
> > >> > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > >> > architectures.
> > >> >
> > >> > Signed-off-by: Leonardo Bras <[email protected]>
> > >>
> > >
> > > Hello Arnd Bergmann, thanks for reviewing!
> > >
> > >> Parity with other architectures by itself is not a reason to do this,
> > >> in particular the other architectures you listed have the instructions
> > >> in hardware while riscv does not.
> > >
> > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > this in case any future mechanism wants to use it.
> > >
> > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
> Conor doesn't agree with using an alternative as a detour mechanism
> between qspinlock & ticket lock. So I'm preparing V11 with static_key
> (jump_label) style. Next version, I would separate paravirt_qspinlock
> & CNA_qspinlock from V10. That would make it easy to review the
> qspinlock patch series. You can review the next version V11. Now I'm
> debugging a static_key init problem when load_modules, which is
> triggered by our combo_qspinlock.

We've seen problems with static keys and module loading in the past. You
may want to take a look at commit eb6354e11630 ("riscv: Ensure isa-ext
static keys are writable")

Thanks,
drew

>
> The qspinlock is being tested on the riscv platform [1] with 128 cores
> with 8 NUMA nodes, next, I would update the comparison results of
> qspinlock & ticket lock.
>
> [1]: https://www.sophon.ai/
>
> >
> > So if something uses these I'm happy to go look closer.
> >
> > >> Emulating the small xchg() through cmpxchg() is particularly tricky
> > >> since it's easy to run into a case where this does not guarantee
> > >> forward progress.
> > >>
> > >
> > > Didn't get this part:
> > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > >
> > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > there are 2 arguments on that:
> > >
> > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > >
> > >
> > >> This is also something that almost no architecture
> > >> specific code relies on (generic qspinlock being a notable exception).
> > >>
> > >
> > > 2 - As you mentioned, there should be very little code that will actually make
> > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > guarantee forward progress for those rare usages (like some of above mentioned
> > > archs).
> > >
> > >> I would recommend just dropping this patch from the series, at least
> > >> until there is a need for it.
> > >
> > > While I agree this is a valid point, I believe its more interesting to have it
> > > implemented if any future mechanism wants to make use of this.
> > >
> > >
> > > Thanks!
> > > Leo
>
>
>
> --
> Best Regards
> Guo Ren
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-12-06 00:57:23

by Leonardo Brás

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

From: Leonardo Bras <[email protected]>

On Wed, Aug 30, 2023 at 06:59:46PM -0300, Leonardo Br?s wrote:
> On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > or requiring some rework to make it work properly.
> > > > >
> > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > architectures.
> > > > >
> > > > > Signed-off-by: Leonardo Bras <[email protected]>
> > > >
> > >
> > > Hello Arnd Bergmann, thanks for reviewing!
> > >
> > > > Parity with other architectures by itself is not a reason to do this,
> > > > in particular the other architectures you listed have the instructions
> > > > in hardware while riscv does not.
> > >
> > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > this in case any future mechanism wants to use it.
> > >
> > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> >
> > IIUC the ask is to have a user within the kernel for these functions.
> > That's the general thing to do, and last time this came up there was no
> > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > it yet because we're worried about the performance/fairness stuff that
> > other ports have seen and nobody's got concrete benchmarks yet (though
> > there's another patch set out that I haven't had time to look through,
> > so that may have changed).
> >
> > So if something uses these I'm happy to go look closer.
>
> IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
> don't have an use for them for the time being.
>
> Otherwise, any comments on patches 1, 2 & 3?

ping

>
> >
> > > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > > since it's easy to run into a case where this does not guarantee
> > > > forward progress.
> > > >
> > >
> > > Didn't get this part:
> > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > >
> > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > there are 2 arguments on that:
> > >
> > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > >
> > >
> > > > This is also something that almost no architecture
> > > > specific code relies on (generic qspinlock being a notable exception).
> > > >
> > >
> > > 2 - As you mentioned, there should be very little code that will actually make
> > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > guarantee forward progress for those rare usages (like some of above mentioned
> > > archs).
> > >
> > > > I would recommend just dropping this patch from the series, at least
> > > > until there is a need for it.
> > >
> > > While I agree this is a valid point, I believe its more interesting to have it
> > > implemented if any future mechanism wants to make use of this.
> > >
> > >
> > > Thanks!
> > > Leo
> >
>

2024-01-03 11:18:23

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Tue, Dec 05, 2023 at 09:56:44PM -0300, [email protected] wrote:
> From: Leonardo Bras <[email protected]>
>
> On Wed, Aug 30, 2023 at 06:59:46PM -0300, Leonardo Brás wrote:
> > On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > > or requiring some rework to make it work properly.
> > > > > >
> > > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > > architectures.
> > > > > >
> > > > > > Signed-off-by: Leonardo Bras <[email protected]>
> > > > >
> > > >
> > > > Hello Arnd Bergmann, thanks for reviewing!
> > > >
> > > > > Parity with other architectures by itself is not a reason to do this,
> > > > > in particular the other architectures you listed have the instructions
> > > > > in hardware while riscv does not.
> > > >
> > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > this in case any future mechanism wants to use it.
> > > >
> > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > >
> > > IIUC the ask is to have a user within the kernel for these functions.
> > > That's the general thing to do, and last time this came up there was no
> > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > it yet because we're worried about the performance/fairness stuff that
> > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > there's another patch set out that I haven't had time to look through,
> > > so that may have changed).
> > >
> > > So if something uses these I'm happy to go look closer.
> >
> > IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
> > don't have an use for them for the time being.
> >
> > Otherwise, any comments on patches 1, 2 & 3?
>
> ping

Hi,

I believe the "RFC" makes some reviewers think the series isn't ready
for review, so could you please send a new one w/o RFC?

thanks

>
> >
> > >
> > > > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > > > since it's easy to run into a case where this does not guarantee
> > > > > forward progress.
> > > > >
> > > >
> > > > Didn't get this part:
> > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > >
> > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > there are 2 arguments on that:
> > > >
> > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > > >
> > > >
> > > > > This is also something that almost no architecture
> > > > > specific code relies on (generic qspinlock being a notable exception).
> > > > >
> > > >
> > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > archs).
> > > >
> > > > > I would recommend just dropping this patch from the series, at least
> > > > > until there is a need for it.
> > > >
> > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > implemented if any future mechanism wants to make use of this.
> > > >
> > > >
> > > > Thanks!
> > > > Leo
> > >
> >
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2024-01-03 15:31:30

by Leonardo Bras

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] riscv/cmpxchg: Implement xchg for variables of size 1 and 2

On Wed, Jan 03, 2024 at 07:05:04PM +0800, Jisheng Zhang wrote:
> On Tue, Dec 05, 2023 at 09:56:44PM -0300, [email protected] wrote:
> > From: Leonardo Bras <[email protected]>
> >
> > On Wed, Aug 30, 2023 at 06:59:46PM -0300, Leonardo Br?s wrote:
> > > On Thu, 2023-08-10 at 09:23 -0700, Palmer Dabbelt wrote:
> > > > On Thu, 10 Aug 2023 09:04:04 PDT (-0700), [email protected] wrote:
> > > > > On Thu, 2023-08-10 at 08:51 +0200, Arnd Bergmann wrote:
> > > > > > On Thu, Aug 10, 2023, at 06:03, Leonardo Bras wrote:
> > > > > > > xchg for variables of size 1-byte and 2-bytes is not yet available for
> > > > > > > riscv, even though its present in other architectures such as arm64 and
> > > > > > > x86. This could lead to not being able to implement some locking mechanisms
> > > > > > > or requiring some rework to make it work properly.
> > > > > > >
> > > > > > > Implement 1-byte and 2-bytes xchg in order to achieve parity with other
> > > > > > > architectures.
> > > > > > >
> > > > > > > Signed-off-by: Leonardo Bras <[email protected]>
> > > > > >
> > > > >
> > > > > Hello Arnd Bergmann, thanks for reviewing!
> > > > >
> > > > > > Parity with other architectures by itself is not a reason to do this,
> > > > > > in particular the other architectures you listed have the instructions
> > > > > > in hardware while riscv does not.
> > > > >
> > > > > Sure, I understand RISC-V don't have native support for xchg on variables of
> > > > > size < 4B. My argument is that it's nice to have even an emulated version for
> > > > > this in case any future mechanism wants to use it.
> > > > >
> > > > > Not having it may mean we won't be able to enable given mechanism in RISC-V.
> > > >
> > > > IIUC the ask is to have a user within the kernel for these functions.
> > > > That's the general thing to do, and last time this came up there was no
> > > > in-kernel use of it -- the qspinlock stuff would, but we haven't enabled
> > > > it yet because we're worried about the performance/fairness stuff that
> > > > other ports have seen and nobody's got concrete benchmarks yet (though
> > > > there's another patch set out that I haven't had time to look through,
> > > > so that may have changed).
> > > >
> > > > So if something uses these I'm happy to go look closer.
> > >
> > > IIUC patches 4 & 5 will be used by qspinlock, which may not be done yet, so we
> > > don't have an use for them for the time being.
> > >
> > > Otherwise, any comments on patches 1, 2 & 3?
> >
> > ping
>
> Hi,
>
> I believe the "RFC" makes some reviewers think the series isn't ready
> for review, so could you please send a new one w/o RFC?
>
> thanks

Sure, will do.

Thanks!
Leo

>
> >
> > >
> > > >
> > > > > > Emulating the small xchg() through cmpxchg() is particularly tricky
> > > > > > since it's easy to run into a case where this does not guarantee
> > > > > > forward progress.
> > > > > >
> > > > >
> > > > > Didn't get this part:
> > > > > By "emulating small xchg() through cmpxchg()", did you mean like emulating an
> > > > > xchg (usually 1 instruction) with lr & sc (same used in cmpxchg) ?
> > > > >
> > > > > If so, yeah, it's a fair point: in some extreme case we could have multiple
> > > > > threads accessing given cacheline and have sc always failing. On the other hand,
> > > > > there are 2 arguments on that:
> > > > >
> > > > > 1 - Other architectures, (such as powerpc, arm and arm64 without LSE atomics)
> > > > > also seem to rely in this mechanism for every xchg size. Another archs like csky
> > > > > and loongarch use asm that look like mine to handle size < 4B xchg.
> > > > >
> > > > >
> > > > > > This is also something that almost no architecture
> > > > > > specific code relies on (generic qspinlock being a notable exception).
> > > > > >
> > > > >
> > > > > 2 - As you mentioned, there should be very little code that will actually make
> > > > > use of xchg for vars < 4B, so it should be safe to assume its fine to not
> > > > > guarantee forward progress for those rare usages (like some of above mentioned
> > > > > archs).
> > > > >
> > > > > > I would recommend just dropping this patch from the series, at least
> > > > > > until there is a need for it.
> > > > >
> > > > > While I agree this is a valid point, I believe its more interesting to have it
> > > > > implemented if any future mechanism wants to make use of this.
> > > > >
> > > > >
> > > > > Thanks!
> > > > > Leo
> > > >
> > >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>