2008-06-03 18:26:03

by Ivo Van Doorn

[permalink] [raw]
Subject: [PATCH 01/11] rt2x00: Calculate register offset during compile time

By using __ffs() the register offsets were always calculated
at run-time which all FIELD32/FIELD16 definitions were builtin
constants. This means we can heavily optimize the register handling
by allowing GCC to do all the work during compilation.

Add some compile_ffs() macros to perform the calculation at
compile time. After this each rt2x00 module size is reduced
by ~2500 bytes. And the stack size of several functions is reduced
as well which further limits the number of rt2x00 results in
'make checkstack'.

Signed-off-by: Ivo van Doorn <[email protected]>
---
drivers/net/wireless/rt2x00/rt2x00reg.h | 55 +++++++++++++++++++++++--------
1 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/rt2x00/rt2x00reg.h b/drivers/net/wireless/rt2x00/rt2x00reg.h
index 3f255df..03e846f 100644
--- a/drivers/net/wireless/rt2x00/rt2x00reg.h
+++ b/drivers/net/wireless/rt2x00/rt2x00reg.h
@@ -130,40 +130,67 @@ struct rt2x00_field32 {

/*
* Power of two check, this will check
- * if the mask that has been given contains
- * and contiguous set of bits.
+ * if the mask that has been given contains and contiguous set of bits.
+ * Note that we cannot use the is_power_of_2() function since this
+ * check must be done at compile-time.
*/
#define is_power_of_two(x) ( !((x) & ((x)-1)) )
#define low_bit_mask(x) ( ((x)-1) & ~(x) )
#define is_valid_mask(x) is_power_of_two(1 + (x) + low_bit_mask(x))

+/*
+ * Macro's to find first set bit in a variable.
+ * These macro's behaves the same as the __ffs() function with
+ * the most important difference that this is done during
+ * compile-time rather then run-time.
+ */
+#define compile_ffs2(__x) \
+ ((__x) & 0x1) ? 0 : 1
+
+#define compile_ffs4(__x) \
+ ((__x) & 0x3) ? compile_ffs2(__x) : compile_ffs2(__x >> 2) + 2
+
+#define compile_ffs8(__x) \
+ ((__x) & 0xf) ? compile_ffs4(__x) : compile_ffs4(__x >> 4) + 4
+
+#define compile_ffs16(__x) \
+ ((__x) & 0xff) ? compile_ffs8(__x) : compile_ffs8(__x >> 8) + 8
+
+#define compile_ffs32(__x) \
+ ((__x) & 0xffff) ? compile_ffs16(__x) : compile_ffs16(__x >> 16) + 16
+
+/*
+ * This macro will check the requirements for the FIELD{8,16,32} macros
+ * The mask should be a constant non-zero contiguous set of bits which
+ * does not exceed the given typelimit.
+ */
+#define FIELD_CHECK(__mask, __type) \
+ BUILD_BUG_ON(!__builtin_constant_p(__mask) || \
+ !(__mask) || \
+ !is_valid_mask(__mask) || \
+ (__mask) != (__type)(__mask)) \
+
#define FIELD8(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) != (u8)(__mask)); \
+ FIELD_CHECK(__mask, u8); \
(struct rt2x00_field8) { \
- __ffs(__mask), (__mask) \
+ compile_ffs8(__mask), (__mask) \
}; \
})

#define FIELD16(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) != (u16)(__mask));\
+ FIELD_CHECK(__mask, u16); \
(struct rt2x00_field16) { \
- __ffs(__mask), (__mask) \
+ compile_ffs16(__mask), (__mask) \
}; \
})

#define FIELD32(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) != (u32)(__mask));\
+ FIELD_CHECK(__mask, u32); \
(struct rt2x00_field32) { \
- __ffs(__mask), (__mask) \
+ compile_ffs32(__mask), (__mask) \
}; \
})

--
1.5.5.3



2008-06-03 20:33:03

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 01/11 v2] rt2x00: Calculate register offset during compile time


> +#define compile_ffs2(__x) \
> + ( ((__x) & 0x1) ? 0 : 1 )
> +
> +#define compile_ffs4(__x) \
> + ( ((__x) & 0x3) ? \
> + compile_ffs2(__x) : (compile_ffs2(__x >> 2) + 2) )

It seems you should also add parentheses around the __x used in the
recursive macro invocation, like this:

... : (compile_ffs2((__x) >> 2) + 2) )

or am I missing something?

johannes


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2008-06-03 20:36:32

by Ivo Van Doorn

[permalink] [raw]
Subject: Re: [PATCH 01/11 v2] rt2x00: Calculate register offset during compile time

On Tuesday 03 June 2008, Johannes Berg wrote:
>
> > +#define compile_ffs2(__x) \
> > + ( ((__x) & 0x1) ? 0 : 1 )
> > +
> > +#define compile_ffs4(__x) \
> > + ( ((__x) & 0x3) ? \
> > + compile_ffs2(__x) : (compile_ffs2(__x >> 2) + 2) )
>
> It seems you should also add parentheses around the __x used in the
> recursive macro invocation, like this:
>
> ... : (compile_ffs2((__x) >> 2) + 2) )
>
> or am I missing something?

Nope, you are right. :)
Version 3 on its way.

Thanks,

Ivo

2008-06-03 19:57:45

by Ivo Van Doorn

[permalink] [raw]
Subject: [PATCH 01/11 v2] rt2x00: Calculate register offset during compile time

rt2x00: Calculate register offset during compile time

By using __ffs() the register offsets were always calculated
at run-time which all FIELD32/FIELD16 definitions were builtin
constants. This means we can heavily optimize the register handling
by allowing GCC to do all the work during compilation.

Add some compile_ffs() macros to perform the calculation at
compile time. After this each rt2x00 module size is reduced
by ~2500 bytes. And the stack size of several functions is reduced
as well which further limits the number of rt2x00 results in
'make checkstack'.

v2: Merge GertJan's bugfix of patch [1/11] directly into this patch
instead of providing it as seperate patch.

Signed-off-by: Gertjan van Wingerde <[email protected]>
Signed-off-by: Ivo van Doorn <[email protected]>

---
John, this patch is a combination of patch [1/11] and [2/11] so please
either apply those 2 patches or just this single patch to get the correct result :)
---
diff --git a/drivers/net/wireless/rt2x00/rt2x00reg.h b/drivers/net/wireless/rt2x00/rt2x00reg.h
index 3f255df..7999d54 100644
--- a/drivers/net/wireless/rt2x00/rt2x00reg.h
+++ b/drivers/net/wireless/rt2x00/rt2x00reg.h
@@ -130,40 +130,71 @@ struct rt2x00_field32 {

/*
* Power of two check, this will check
- * if the mask that has been given contains
- * and contiguous set of bits.
+ * if the mask that has been given contains and contiguous set of bits.
+ * Note that we cannot use the is_power_of_2() function since this
+ * check must be done at compile-time.
*/
#define is_power_of_two(x) ( !((x) & ((x)-1)) )
#define low_bit_mask(x) ( ((x)-1) & ~(x) )
#define is_valid_mask(x) is_power_of_two(1 + (x) + low_bit_mask(x))

+/*
+ * Macro's to find first set bit in a variable.
+ * These macro's behaves the same as the __ffs() function with
+ * the most important difference that this is done during
+ * compile-time rather then run-time.
+ */
+#define compile_ffs2(__x) \
+ ( ((__x) & 0x1) ? 0 : 1 )
+
+#define compile_ffs4(__x) \
+ ( ((__x) & 0x3) ? \
+ compile_ffs2(__x) : (compile_ffs2(__x >> 2) + 2) )
+
+#define compile_ffs8(__x) \
+ ( ((__x) & 0xf) ? \
+ compile_ffs4(__x) : (compile_ffs4(__x >> 4) + 4) )
+
+#define compile_ffs16(__x) \
+ ( ((__x) & 0xff) ? \
+ compile_ffs8(__x) : (compile_ffs8(__x >> 8) + 8) )
+
+#define compile_ffs32(__x) \
+ ( ((__x) & 0xffff) ? \
+ compile_ffs16(__x) : (compile_ffs16(__x >> 16) + 16) )
+
+/*
+ * This macro will check the requirements for the FIELD{8,16,32} macros
+ * The mask should be a constant non-zero contiguous set of bits which
+ * does not exceed the given typelimit.
+ */
+#define FIELD_CHECK(__mask, __type) \
+ BUILD_BUG_ON(!__builtin_constant_p(__mask) || \
+ !(__mask) || \
+ !is_valid_mask(__mask) || \
+ (__mask) != (__type)(__mask)) \
+
#define FIELD8(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) != (u8)(__mask)); \
+ FIELD_CHECK(__mask, u8); \
(struct rt2x00_field8) { \
- __ffs(__mask), (__mask) \
+ compile_ffs8(__mask), (__mask) \
}; \
})

#define FIELD16(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) != (u16)(__mask));\
+ FIELD_CHECK(__mask, u16); \
(struct rt2x00_field16) { \
- __ffs(__mask), (__mask) \
+ compile_ffs16(__mask), (__mask) \
}; \
})

#define FIELD32(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) != (u32)(__mask));\
+ FIELD_CHECK(__mask, u32); \
(struct rt2x00_field32) { \
- __ffs(__mask), (__mask) \
+ compile_ffs32(__mask), (__mask) \
}; \
})


2008-06-03 21:19:00

by Harvey Harrison

[permalink] [raw]
Subject: Re: [PATCH 01/11 v3] rt2x00: Calculate register offset during compile time

On Tue, 2008-06-03 at 22:45 +0200, Ivo van Doorn wrote:
> By using __ffs() the register offsets were always calculated
> at run-time which all FIELD32/FIELD16 definitions were builtin
> constants. This means we can heavily optimize the register handling
> by allowing GCC to do all the work during compilation.
>=20
> Add some compile_ffs() macros to perform the calculation at
> compile time. After this each rt2x00 module size is reduced
> by ~2500 bytes. And the stack size of several functions is reduced
> as well which further limits the number of rt2x00 results in
> 'make checkstack'.
> +/*
> + * Macro's to find first set bit in a variable.
> + * These macro's behaves the same as the __ffs() function with
> + * the most important difference that this is done during
> + * compile-time rather then run-time.
> + */

#define const_ffs8(__x) ( \
BUILD_BUG_ON(!__builtin_constant_p(__x)); \
__builtin_choose_expr((__x) & 0x01, 0, \
=EF=BB=BF__builtin_choose_expr((__x) & 0x02, 1, \
=EF=BB=BF__builtin_choose_expr((__x) & 0x04, 2, \
=EF=BB=BF__builtin_choose_expr((__x) & 0x08, 3, \
=EF=BB=BF__builtin_choose_expr((__x) & 0x10, 4, \
=EF=BB=BF__builtin_choose_expr((__x) & 0x20, 5, \
=EF=BB=BF__builtin_choose_expr((__x) & 0x40, 6, \
=EF=BB=BF__builtin_choose_expr((__x) & 0x80, 7, \
8)))))))); )

#define const_ffs16(__x) ( \
__builtin_choose_expr((__x) & 0xff, \
const_ffs8(__x), \
const_ffs8((__x) >> 8) + 8); )

=EF=BB=BF
#define const_ffs32(__x) ( \
__builtin_choose_expr((__x) & 0xffff, \
const_ffs16(__x), \
const_ffs16((__x) >> 16) + 16); )


Just a thought.

Harvey

2008-06-03 20:38:05

by Ivo Van Doorn

[permalink] [raw]
Subject: [PATCH 01/11 v3] rt2x00: Calculate register offset during compile time

By using __ffs() the register offsets were always calculated
at run-time which all FIELD32/FIELD16 definitions were builtin
constants. This means we can heavily optimize the register handling
by allowing GCC to do all the work during compilation.

Add some compile_ffs() macros to perform the calculation at
compile time. After this each rt2x00 module size is reduced
by ~2500 bytes. And the stack size of several functions is reduced
as well which further limits the number of rt2x00 results in
'make checkstack'.

v2: Merge GertJan's bugfix of patch [1/11] directly into this patch
=C2=A0 =C2=A0 =C2=A0 instead of providing it as seperate patch.
v3: Add extra parentheses when bitshifting __x

Signed-off-by: Gertjan van Wingerde <[email protected]>
Signed-off-by: Ivo van Doorn <[email protected]>
---
John, this patch is a combination of patch [1/11] and [2/11] so please
either apply those 2 patches or just this single patch to get the corre=
ct result :)
---
diff --git a/drivers/net/wireless/rt2x00/rt2x00reg.h b/drivers/net/wire=
less/rt2x00/rt2x00reg.h
index 3f255df..7999d54 100644
--- a/drivers/net/wireless/rt2x00/rt2x00reg.h
+++ b/drivers/net/wireless/rt2x00/rt2x00reg.h
@@ -130,40 +130,71 @@ struct rt2x00_field32 {
=20
/*
* Power of two check, this will check
- * if the mask that has been given contains
- * and contiguous set of bits.
+ * if the mask that has been given contains and contiguous set of bits=
=2E
+ * Note that we cannot use the is_power_of_2() function since this
+ * check must be done at compile-time.
*/
#define is_power_of_two(x) ( !((x) & ((x)-1)) )
#define low_bit_mask(x) ( ((x)-1) & ~(x) )
#define is_valid_mask(x) is_power_of_two(1 + (x) + low_bit_mask(x))
=20
+/*
+ * Macro's to find first set bit in a variable.
+ * These macro's behaves the same as the __ffs() function with
+ * the most important difference that this is done during
+ * compile-time rather then run-time.
+ */
+#define compile_ffs2(__x) \
+ ( ((__x) & 0x1) ? 0 : 1 )
+
+#define compile_ffs4(__x) \
+ ( ((__x) & 0x3) ? \
+ compile_ffs2(__x) : (compile_ffs2((__x) >> 2) + 2) )
+
+#define compile_ffs8(__x) \
+ ( ((__x) & 0xf) ? \
+ compile_ffs4(__x) : (compile_ffs4((__x) >> 4) + 4) )
+
+#define compile_ffs16(__x) \
+ ( ((__x) & 0xff) ? \
+ compile_ffs8(__x) : (compile_ffs8((__x) >> 8) + 8) )
+
+#define compile_ffs32(__x) \
+ ( ((__x) & 0xffff) ? \
+ compile_ffs16(__x) : (compile_ffs16((__x) >> 16) + 16) )
+
+/*
+ * This macro will check the requirements for the FIELD{8,16,32} macro=
s
+ * The mask should be a constant non-zero contiguous set of bits which
+ * does not exceed the given typelimit.
+ */
+#define FIELD_CHECK(__mask, __type) \
+ BUILD_BUG_ON(!__builtin_constant_p(__mask) || \
+ !(__mask) || \
+ !is_valid_mask(__mask) || \
+ (__mask) !=3D (__type)(__mask)) \
+
#define FIELD8(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) !=3D (u8)(__mask)); \
+ FIELD_CHECK(__mask, u8); \
(struct rt2x00_field8) { \
- __ffs(__mask), (__mask) \
+ compile_ffs8(__mask), (__mask) \
}; \
})
=20
#define FIELD16(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) !=3D (u16)(__mask));\
+ FIELD_CHECK(__mask, u16); \
(struct rt2x00_field16) { \
- __ffs(__mask), (__mask) \
+ compile_ffs16(__mask), (__mask) \
}; \
})
=20
#define FIELD32(__mask) \
({ \
- BUILD_BUG_ON(!(__mask) || \
- !is_valid_mask(__mask) || \
- (__mask) !=3D (u32)(__mask));\
+ FIELD_CHECK(__mask, u32); \
(struct rt2x00_field32) { \
- __ffs(__mask), (__mask) \
+ compile_ffs32(__mask), (__mask) \
}; \
})
=20

2008-06-03 22:03:13

by Ivo Van Doorn

[permalink] [raw]
Subject: Re: [PATCH 01/11 v3] rt2x00: Calculate register offset during compile time

On Tuesday 03 June 2008, Harvey Harrison wrote:
> On Tue, 2008-06-03 at 22:45 +0200, Ivo van Doorn wrote:
> > By using __ffs() the register offsets were always calculated
> > at run-time which all FIELD32/FIELD16 definitions were builtin
> > constants. This means we can heavily optimize the register handling
> > by allowing GCC to do all the work during compilation.
> >=20
> > Add some compile_ffs() macros to perform the calculation at
> > compile time. After this each rt2x00 module size is reduced
> > by ~2500 bytes. And the stack size of several functions is reduced
> > as well which further limits the number of rt2x00 results in
> > 'make checkstack'.
> > +/*
> > + * Macro's to find first set bit in a variable.
> > + * These macro's behaves the same as the __ffs() function with
> > + * the most important difference that this is done during
> > + * compile-time rather then run-time.
> > + */
>=20
> #define const_ffs8(__x) ( \
> BUILD_BUG_ON(!__builtin_constant_p(__x)); \
> __builtin_choose_expr((__x) & 0x01, 0, \
> =EF=BB=BF__builtin_choose_expr((__x) & 0x02, 1, \
> =EF=BB=BF__builtin_choose_expr((__x) & 0x04, 2, \
> =EF=BB=BF__builtin_choose_expr((__x) & 0x08, 3, \
> =EF=BB=BF__builtin_choose_expr((__x) & 0x10, 4, \
> =EF=BB=BF__builtin_choose_expr((__x) & 0x20, 5, \
> =EF=BB=BF__builtin_choose_expr((__x) & 0x40, 6, \
> =EF=BB=BF__builtin_choose_expr((__x) & 0x80, 7, \
> 8)))))))); )
>=20
> #define const_ffs16(__x) ( \
> __builtin_choose_expr((__x) & 0xff, \
> const_ffs8(__x), \
> const_ffs8((__x) >> 8) + 8); )
>=20
> =EF=BB=BF
> #define const_ffs32(__x) ( \
> __builtin_choose_expr((__x) & 0xffff, \
> const_ffs16(__x), \
> const_ffs16((__x) >> 16) + 16); )
>=20
>=20
> Just a thought.

Thanks for the tipe, that does sound a lot cleaner then the ? : stateme=
nts,
I did a quick compilation check and it doesn't seem to have any influen=
ce.
I'll move the __builtin_choose_expr into rt2x00.git for testing and mov=
e
that patch upstream to wireless-2.6 later.

Ivo

2008-06-03 20:42:55

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 01/11 v3] rt2x00: Calculate register offset during compile time

On Tue, 2008-06-03 at 22:45 +0200, Ivo van Doorn wrote:
> By using __ffs() the register offsets were always calculated
> at run-time which all FIELD32/FIELD16 definitions were builtin
> constants. This means we can heavily optimize the register handling
> by allowing GCC to do all the work during compilation.
>
> Add some compile_ffs() macros to perform the calculation at
> compile time. After this each rt2x00 module size is reduced
> by ~2500 bytes. And the stack size of several functions is reduced
> as well which further limits the number of rt2x00 results in
> 'make checkstack'.
>
> v2: Merge GertJan's bugfix of patch [1/11] directly into this patch
> instead of providing it as seperate patch.
> v3: Add extra parentheses when bitshifting __x
>
> Signed-off-by: Gertjan van Wingerde <[email protected]>
> Signed-off-by: Ivo van Doorn <[email protected]>
> ---
> John, this patch is a combination of patch [1/11] and [2/11] so please
> either apply those 2 patches or just this single patch to get the correct result :)

This, however, is no longer true ;) You need this one, not the other
two.

johannes


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part