2021-08-05 21:13:56

by Vineet Gupta

[permalink] [raw]
Subject: [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV

signed math causes generation of costlier instructions such as DIV when
they could be done by barrerl shifter.

Worse part is this is not caught by things like bloat-o-meter since
instruction length / symbols are typically same size.

e.g.

stock (signed math)
__________________

919b4614 <test_taint>:
919b4614: div r2,r0,0x20
^^^
919b4618: add2 r2,0x920f6050,r2
919b4620: ld_s r2,[r2,0]
919b4622: lsr r0,r2,r0
919b4626: j_s.d [blink]
919b4628: bmsk_s r0,r0,0
919b462a: nop_s

(patched) unsigned math
__________________

919b4614 <test_taint>:
919b4614: lsr r2,r0,0x5 @nr/32
^^^
919b4618: add2 r2,0x920f6050,r2
919b4620: ld_s r2,[r2,0]
919b4622: lsr r0,r2,r0 #test_bit()
919b4626: j_s.d [blink]
919b4628: bmsk_s r0,r0,0
919b462a: nop_s

Signed-off-by: Vineet Gupta <[email protected]>
---
This is an RFC for feeback, I understand this impacts every arch,
but as of now it is only buld/run tested on ARC.
---
---
include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..c5a7d8eb9c2b 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -13,7 +13,7 @@
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static inline void __set_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +21,7 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
*p |= mask;
}

-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static inline void __clear_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +38,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static inline void __change_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +55,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int __test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +74,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline int __test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,7 +85,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
}

/* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
+static inline int __test_and_change_bit(unsigned int nr,
volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
@@ -101,7 +101,7 @@ static inline int __test_and_change_bit(int nr,
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static inline int test_bit(unsigned int nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}
--
2.25.1


2021-08-06 18:49:37

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV

On Thu, Aug 05, 2021 at 12:14:08PM -0700, Vineet Gupta wrote:
> signed math causes generation of costlier instructions such as DIV when
> they could be done by barrerl shifter.
>
> Worse part is this is not caught by things like bloat-o-meter since
> instruction length / symbols are typically same size.
>
> e.g.
>
> stock (signed math)
> __________________
>
> 919b4614 <test_taint>:
> 919b4614: div r2,r0,0x20
> ^^^
> 919b4618: add2 r2,0x920f6050,r2
> 919b4620: ld_s r2,[r2,0]
> 919b4622: lsr r0,r2,r0
> 919b4626: j_s.d [blink]
> 919b4628: bmsk_s r0,r0,0
> 919b462a: nop_s
>
> (patched) unsigned math
> __________________
>
> 919b4614 <test_taint>:
> 919b4614: lsr r2,r0,0x5 @nr/32
> ^^^
> 919b4618: add2 r2,0x920f6050,r2
> 919b4620: ld_s r2,[r2,0]
> 919b4622: lsr r0,r2,r0 #test_bit()
> 919b4626: j_s.d [blink]
> 919b4628: bmsk_s r0,r0,0
> 919b462a: nop_s

Just FYI, but on arm64 the existing codegen is alright as we have both
arithmetic and logical shifts.

> Signed-off-by: Vineet Gupta <[email protected]>
> ---
> This is an RFC for feeback, I understand this impacts every arch,
> but as of now it is only buld/run tested on ARC.
> ---
> ---
> include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)

Acked-by: Will Deacon <[email protected]>

We should really move test_bit() into the atomic header, but I failed to fix
the resulting include mess last time I tried that.

Will

2021-08-07 00:05:20

by Vineet Gupta

[permalink] [raw]
Subject: Re: [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV

On 8/6/21 6:42 AM, Will Deacon wrote:
> On Thu, Aug 05, 2021 at 12:14:08PM -0700, Vineet Gupta wrote:
>> signed math causes generation of costlier instructions such as DIV when
>> they could be done by barrerl shifter.
>>
>> Worse part is this is not caught by things like bloat-o-meter since
>> instruction length / symbols are typically same size.
>>
>> e.g.
>>
>> stock (signed math)
>> __________________
>>
>> 919b4614 <test_taint>:
>> 919b4614: div r2,r0,0x20
>> ^^^
>> 919b4618: add2 r2,0x920f6050,r2
>> 919b4620: ld_s r2,[r2,0]
>> 919b4622: lsr r0,r2,r0
>> 919b4626: j_s.d [blink]
>> 919b4628: bmsk_s r0,r0,0
>> 919b462a: nop_s
>>
>> (patched) unsigned math
>> __________________
>>
>> 919b4614 <test_taint>:
>> 919b4614: lsr r2,r0,0x5 @nr/32
>> ^^^
>> 919b4618: add2 r2,0x920f6050,r2
>> 919b4620: ld_s r2,[r2,0]
>> 919b4622: lsr r0,r2,r0 #test_bit()
>> 919b4626: j_s.d [blink]
>> 919b4628: bmsk_s r0,r0,0
>> 919b462a: nop_s
> Just FYI, but on arm64 the existing codegen is alright as we have both
> arithmetic and logical shifts.

ARC does too: There's LSR (Logical shift right) and ASR (Arithmetic
Shift Right).
So perhaps something to be done in the compiler.

>> Signed-off-by: Vineet Gupta <[email protected]>
>> ---
>> This is an RFC for feeback, I understand this impacts every arch,
>> but as of now it is only buld/run tested on ARC.
>> ---
>> ---
>> include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
>> 1 file changed, 7 insertions(+), 7 deletions(-)
> Acked-by: Will Deacon <[email protected]>
>
> We should really move test_bit() into the atomic header, but I failed to fix
> the resulting include mess last time I tried that.

OK I'll give it a try too.