2008-07-15 21:16:08

by Mike Travis

[permalink] [raw]
Subject: [PATCH 7/8] cpumask: Provide a generic set of CPUMASK_ALLOC macros

* Provide a generic set of CPUMASK_ALLOC macros patterned after the
SCHED_CPUMASK_ALLOC macros. This is used where multiple cpumask_t
variables are declared on the stack to reduce the amount of stack
space required.

Based on linux-2.6.tip/master at the following commit:

commit 0a91813e16ebd5c2d9b5c2acd5b7c91742112c4f
Merge: 9a635fa... 724dce0...
Author: Ingo Molnar <[email protected]>
Date: Tue Jul 15 14:55:17 2008 +0200

Signed-off-by: Mike Travis <[email protected]>
Cc: Paul Jackson <[email protected]>
---
include/linux/cpumask.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

--- linux-2.6.tip.orig/include/linux/cpumask.h
+++ linux-2.6.tip/include/linux/cpumask.h
@@ -75,6 +75,17 @@
* CPU_MASK_NONE Initializer - no bits set
* unsigned long *cpus_addr(mask) Array of unsigned long's in mask
*
+ *if NR_CPUS > BITS_PER_LONG
+ * CPUMASK_ALLOC(m) Declares and allocates struct m *m =
+ * (struct m *)kmalloc(sizeof(*m), ...)
+ * CPUMASK_FREE(m) Macro for kfree(v)
+ *else
+ * CPUMASK_ALLOC(m) Declares struct m _m, *m = &_m
+ * CPUMASK_FREE(m) Nop
+ *endif
+ * CPUMASK_VAR(v, m) Declares cpumask_t *v =
+ * m + offset(struct m, v)
+ *
* int cpumask_scnprintf(buf, len, mask) Format cpumask for printing
* int cpumask_parse_user(ubuf, ulen, mask) Parse ascii string as cpumask
* int cpulist_scnprintf(buf, len, mask) Format cpumask as list for printing
@@ -311,6 +322,16 @@ extern cpumask_t cpu_mask_all;

#define cpus_addr(src) ((src).bits)

+#if NR_CPUS > BITS_PER_LONG
+#define CPUMASK_ALLOC(m) struct m *m = kmalloc(sizeof(*m), GFP_KERNEL)
+#define CPUMASK_FREE(m) kfree(m)
+#else
+#define CPUMASK_ALLOC(m) struct allmasks _m, *m = &_m
+#define CPUMASK_FREE(m)
+#endif
+#define CPUMASK_VAR(v, m) cpumask_t *v = (cpumask_t *) \
+ ((unsigned long)(m) + offsetof(struct m, v))
+
#define cpumask_scnprintf(buf, len, src) \
__cpumask_scnprintf((buf), (len), &(src), NR_CPUS)
static inline int __cpumask_scnprintf(char *buf, int len,

--


2008-07-16 06:41:43

by Bert Wesarg

[permalink] [raw]
Subject: Re: [PATCH 7/8] cpumask: Provide a generic set of CPUMASK_ALLOC macros

On Tue, Jul 15, 2008 at 23:14, Mike Travis <[email protected]> wrote:
> * Provide a generic set of CPUMASK_ALLOC macros patterned after the
> SCHED_CPUMASK_ALLOC macros. This is used where multiple cpumask_t
> variables are declared on the stack to reduce the amount of stack
> space required.
>
> Based on linux-2.6.tip/master at the following commit:
>
> commit 0a91813e16ebd5c2d9b5c2acd5b7c91742112c4f
> Merge: 9a635fa... 724dce0...
> Author: Ingo Molnar <[email protected]>
> Date: Tue Jul 15 14:55:17 2008 +0200
>
> Signed-off-by: Mike Travis <[email protected]>
> Cc: Paul Jackson <[email protected]>
> ---
> include/linux/cpumask.h | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> --- linux-2.6.tip.orig/include/linux/cpumask.h
> +++ linux-2.6.tip/include/linux/cpumask.h
> @@ -75,6 +75,17 @@
> * CPU_MASK_NONE Initializer - no bits set
> * unsigned long *cpus_addr(mask) Array of unsigned long's in mask
> *
> + *if NR_CPUS > BITS_PER_LONG
> + * CPUMASK_ALLOC(m) Declares and allocates struct m *m =
> + * (struct m *)kmalloc(sizeof(*m), ...)
Shouldn't you mention the GFP_KERNEL flag? And the cast should not
necessarily be mentioned in a comment.

> + * CPUMASK_FREE(m) Macro for kfree(v)
kfree(m)

> + *else
> + * CPUMASK_ALLOC(m) Declares struct m _m, *m = &_m
> + * CPUMASK_FREE(m) Nop
> + *endif
> + * CPUMASK_VAR(v, m) Declares cpumask_t *v =
> + * m + offset(struct m, v)
offsetof
and why can't you use a &(m->v)?

Regards
Bert

2008-07-16 12:25:42

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 7/8] cpumask: Provide a generic set of CPUMASK_ALLOC macros

Bert Wesarg wrote:
> On Tue, Jul 15, 2008 at 23:14, Mike Travis <[email protected]> wrote:
>> * Provide a generic set of CPUMASK_ALLOC macros patterned after the
>> SCHED_CPUMASK_ALLOC macros. This is used where multiple cpumask_t
>> variables are declared on the stack to reduce the amount of stack
>> space required.
>>
>> Based on linux-2.6.tip/master at the following commit:
>>
>> commit 0a91813e16ebd5c2d9b5c2acd5b7c91742112c4f
>> Merge: 9a635fa... 724dce0...
>> Author: Ingo Molnar <[email protected]>
>> Date: Tue Jul 15 14:55:17 2008 +0200
>>
>> Signed-off-by: Mike Travis <[email protected]>
>> Cc: Paul Jackson <[email protected]>
>> ---
>> include/linux/cpumask.h | 21 +++++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>>
>> --- linux-2.6.tip.orig/include/linux/cpumask.h
>> +++ linux-2.6.tip/include/linux/cpumask.h
>> @@ -75,6 +75,17 @@
>> * CPU_MASK_NONE Initializer - no bits set
>> * unsigned long *cpus_addr(mask) Array of unsigned long's in mask
>> *
>> + *if NR_CPUS > BITS_PER_LONG
>> + * CPUMASK_ALLOC(m) Declares and allocates struct m *m =
>> + * (struct m *)kmalloc(sizeof(*m), ...)
> Shouldn't you mention the GFP_KERNEL flag? And the cast should not
> necessarily be mentioned in a comment.

Hmm, yes. I encountered the checkpatch warning about casting kmalloc and
changed the macro but not the comment. (And now there's room for adding
GFP_KERNEL to the comment! ;-)
>
>> + * CPUMASK_FREE(m) Macro for kfree(v)
> kfree(m)
>
>> + *else
>> + * CPUMASK_ALLOC(m) Declares struct m _m, *m = &_m
>> + * CPUMASK_FREE(m) Nop
>> + *endif
>> + * CPUMASK_VAR(v, m) Declares cpumask_t *v =
>> + * m + offset(struct m, v)
> offsetof
> and why can't you use a &(m->v)?

I thought about this but what if kmalloc fails? It's ok to add the
offset to a NULL pointer, but dereferencing a null pointer (even though
it's just to get the address) may introduce a fault, yes?

I'll look into this further. There was one other remnant from the
copy/paste that needed to be fixed:

-#define CPUMASK_ALLOC(m) struct allmasks _m, *m = &_m
+#define CPUMASK_ALLOC(m) struct m _m, *m = &_m

Thanks!
Mike

2008-07-16 13:01:40

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 7/8] cpumask: Provide a generic set of CPUMASK_ALLOC macros

Mike Travis wrote:
> Bert Wesarg wrote:
...
>>> + * CPUMASK_VAR(v, m) Declares cpumask_t *v =
>>> + * m + offset(struct m, v)
>> offsetof
>> and why can't you use a &(m->v)?
>
> I thought about this but what if kmalloc fails? It's ok to add the
> offset to a NULL pointer, but dereferencing a null pointer (even though
> it's just to get the address) may introduce a fault, yes?
>
> I'll look into this further.

Answering my own question, apparently it is ok. The pointer is simply used
to provide the base to which the offset is added.

struct allmasks *allmasks = ((void *)0);
400557: 48 c7 45 d8 00 00 00 movq $0x0,0xffffffffffffffd8(%rbp)
40055e: 00

cpumask_t *online_policy_cpus = &(allmasks->online_policy_cpus);
40055f: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax
400563: 48 89 45 e0 mov %rax,0xffffffffffffffe0(%rbp)
cpumask_t *saved_mask = &(allmasks->saved_mask);
400567: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax
40056b: 48 05 00 02 00 00 add $0x200,%rax
400571: 48 89 45 e8 mov %rax,0xffffffffffffffe8(%rbp)
cpumask_t *set_mask = &(allmasks->set_mask);
400575: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax
400579: 48 05 00 04 00 00 add $0x400,%rax
40057f: 48 89 45 f0 mov %rax,0xfffffffffffffff0(%rbp)
cpumask_t *covered_cpus = &(allmasks->covered_cpus);
400583: 48 8b 45 d8 mov 0xffffffffffffffd8(%rbp),%rax
400587: 48 05 00 06 00 00 add $0x600,%rax
40058d: 48 89 45 f8 mov %rax,0xfffffffffffffff8(%rbp)

Sometimes the most obvious is also the most elusive... ;-)

I've updated the code and the patch description to clarify the checking
of a NULL structure base before using the cpumask_t pointers. I've also
changed CPUMASK_VAR to CPUMASK_PTR to be a bit more clear on it's function.

* Provide a generic set of CPUMASK_ALLOC macros patterned after the
SCHED_CPUMASK_ALLOC macros. This is used where multiple cpumask_t
variables are declared on the stack to reduce the amount of stack
space required when the NR_CPUS count is large enough to warrant it.

Basically, if NR_CPUS <= BITS_PER_LONG then the multiple cpumask_t
structure (which needs to be pre-defined) is declared as a local
variable and pointers to each mask is provided. The compiler
will optimize out the extra dereference, resulting in code that
is the same without the pointer reference.

If NR_CPUS > BITS_PER_LONG, then instead of declaring the combined
cpumask_t structure on the stack, kmalloc is used to obtain the
memory space. In this case, the CPUMASK_FREE is now kfree instead
of a nop.

For both cases, CPUMASK_PTR declares and initializes each cpumask_t
pointer but these should *not* be used before the structure pointer
is verified not to be NULL. (This check for NULL will be optimized
out for the case where the structure is declared as local memory.)

One question that remains, should the threshold to use kmalloc be
BITS_PER_LONG or something larger? Sched uses NR_CPUS > 128, though
it has about 7 cpumask_t vars it uses. My (obvious) concern is when
NR_CPUS is 4096 (and soon 16384), but where is the line between a
fairly large system and a really huge system?

Thanks,
Mike