signed math causes generation of costlier instructions such as DIV when
they could be done by barrerl shifter.
Worse part is this is not caught by things like bloat-o-meter since
instruction length / symbols are typically same size.
e.g.
stock (signed math)
__________________
919b4614 <test_taint>:
919b4614: div r2,r0,0x20
^^^
919b4618: add2 r2,0x920f6050,r2
919b4620: ld_s r2,[r2,0]
919b4622: lsr r0,r2,r0
919b4626: j_s.d [blink]
919b4628: bmsk_s r0,r0,0
919b462a: nop_s
(patched) unsigned math
__________________
919b4614 <test_taint>:
919b4614: lsr r2,r0,0x5 @nr/32
^^^
919b4618: add2 r2,0x920f6050,r2
919b4620: ld_s r2,[r2,0]
919b4622: lsr r0,r2,r0 #test_bit()
919b4626: j_s.d [blink]
919b4628: bmsk_s r0,r0,0
919b462a: nop_s
Signed-off-by: Vineet Gupta <[email protected]>
---
This is an RFC for feeback, I understand this impacts every arch,
but as of now it is only buld/run tested on ARC.
---
---
include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..c5a7d8eb9c2b 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -13,7 +13,7 @@
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static inline void __set_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +21,7 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
*p |= mask;
}
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static inline void __clear_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +38,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static inline void __change_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +55,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int __test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +74,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline int __test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,7 +85,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
}
/* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
+static inline int __test_and_change_bit(unsigned int nr,
volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
@@ -101,7 +101,7 @@ static inline int __test_and_change_bit(int nr,
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static inline int test_bit(unsigned int nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}
--
2.25.1
On Thu, Aug 05, 2021 at 12:14:08PM -0700, Vineet Gupta wrote:
> signed math causes generation of costlier instructions such as DIV when
> they could be done by barrerl shifter.
>
> Worse part is this is not caught by things like bloat-o-meter since
> instruction length / symbols are typically same size.
>
> e.g.
>
> stock (signed math)
> __________________
>
> 919b4614 <test_taint>:
> 919b4614: div r2,r0,0x20
> ^^^
> 919b4618: add2 r2,0x920f6050,r2
> 919b4620: ld_s r2,[r2,0]
> 919b4622: lsr r0,r2,r0
> 919b4626: j_s.d [blink]
> 919b4628: bmsk_s r0,r0,0
> 919b462a: nop_s
>
> (patched) unsigned math
> __________________
>
> 919b4614 <test_taint>:
> 919b4614: lsr r2,r0,0x5 @nr/32
> ^^^
> 919b4618: add2 r2,0x920f6050,r2
> 919b4620: ld_s r2,[r2,0]
> 919b4622: lsr r0,r2,r0 #test_bit()
> 919b4626: j_s.d [blink]
> 919b4628: bmsk_s r0,r0,0
> 919b462a: nop_s
Just FYI, but on arm64 the existing codegen is alright as we have both
arithmetic and logical shifts.
> Signed-off-by: Vineet Gupta <[email protected]>
> ---
> This is an RFC for feeback, I understand this impacts every arch,
> but as of now it is only buld/run tested on ARC.
> ---
> ---
> include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
Acked-by: Will Deacon <[email protected]>
We should really move test_bit() into the atomic header, but I failed to fix
the resulting include mess last time I tried that.
Will
On 8/6/21 6:42 AM, Will Deacon wrote:
> On Thu, Aug 05, 2021 at 12:14:08PM -0700, Vineet Gupta wrote:
>> signed math causes generation of costlier instructions such as DIV when
>> they could be done by barrerl shifter.
>>
>> Worse part is this is not caught by things like bloat-o-meter since
>> instruction length / symbols are typically same size.
>>
>> e.g.
>>
>> stock (signed math)
>> __________________
>>
>> 919b4614 <test_taint>:
>> 919b4614: div r2,r0,0x20
>> ^^^
>> 919b4618: add2 r2,0x920f6050,r2
>> 919b4620: ld_s r2,[r2,0]
>> 919b4622: lsr r0,r2,r0
>> 919b4626: j_s.d [blink]
>> 919b4628: bmsk_s r0,r0,0
>> 919b462a: nop_s
>>
>> (patched) unsigned math
>> __________________
>>
>> 919b4614 <test_taint>:
>> 919b4614: lsr r2,r0,0x5 @nr/32
>> ^^^
>> 919b4618: add2 r2,0x920f6050,r2
>> 919b4620: ld_s r2,[r2,0]
>> 919b4622: lsr r0,r2,r0 #test_bit()
>> 919b4626: j_s.d [blink]
>> 919b4628: bmsk_s r0,r0,0
>> 919b462a: nop_s
> Just FYI, but on arm64 the existing codegen is alright as we have both
> arithmetic and logical shifts.
ARC does too: There's LSR (Logical shift right) and ASR (Arithmetic
Shift Right).
So perhaps something to be done in the compiler.
>> Signed-off-by: Vineet Gupta <[email protected]>
>> ---
>> This is an RFC for feeback, I understand this impacts every arch,
>> but as of now it is only buld/run tested on ARC.
>> ---
>> ---
>> include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
>> 1 file changed, 7 insertions(+), 7 deletions(-)
> Acked-by: Will Deacon <[email protected]>
>
> We should really move test_bit() into the atomic header, but I failed to fix
> the resulting include mess last time I tried that.
OK I'll give it a try too.