2008-02-14 12:35:42

by Paul Jackson

[permalink] [raw]
Subject: [RFC] bitmap relative operator for mempolicy extensions

From: Paul Jackson <[email protected]>

[Beware - never tested, never booted.]

The following adds one more bitmap operator, with the usual
cpumask and nodemask wrappers. This operator computes one
bitmap relative to another. If the n-th bit in the origin
mask is set, then the m-th bit of the destination mask will
be set, where m is the position of the n-th set bit in the
relative mask.

This is initially to be used by the MPOL_F_RELATIVE_NODES
facility being considered for mm/mempolicy.c.

Signed-off-by: Paul Jackson <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

---
include/linux/bitmap.h | 3 ++
include/linux/cpumask.h | 12 ++++++++
include/linux/nodemask.h | 12 ++++++++
lib/bitmap.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 88 insertions(+), 2 deletions(-)

--- 2.6.24-mm1.orig/include/linux/nodemask.h 2008-02-04 10:41:13.360573666 -0800
+++ 2.6.24-mm1/include/linux/nodemask.h 2008-02-14 03:18:08.134310910 -0800
@@ -14,6 +14,7 @@
* bitmap_scnlistprintf() and bitmap_parselist(), also in bitmap.c.
* For details of node_remap(), see bitmap_bitremap in lib/bitmap.c.
* For details of nodes_remap(), see bitmap_remap in lib/bitmap.c.
+ * For details of nodes_relative(), see bitmap_relative in lib/bitmap.c.
*
* The available nodemask operations are:
*
@@ -55,7 +56,8 @@
* int nodelist_scnprintf(buf, len, mask) Format nodemask as list for printing
* int nodelist_parse(buf, map) Parse ascii string as nodelist
* int node_remap(oldbit, old, new) newbit = map(old, new)(oldbit)
- * int nodes_remap(dst, src, old, new) *dst = map(old, new)(dst)
+ * void nodes_remap(dst, src, old, new) *dst = map(old, new)(src)
+ * void nodes_relative(dst, orig, relmap) *dst = orig relative to relmap
*
* for_each_node_mask(node, mask) for-loop node over mask
*
@@ -326,6 +328,14 @@ static inline void __nodes_remap(nodemas
bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
}

+#define nodes_relative(dst, orig, relmap) \
+ __nodes_relative(&(dst), &(orig), &(relmap), MAX_NUMNODES)
+static inline void __nodes_relative(nodemask_t *dstp, const nodemask_t *origp,
+ const nodemask_t *relmapp, int nbits)
+{
+ bitmap_relative(dstp->bits, origp->bits, relmapp->bits, nbits);
+}
+
#if MAX_NUMNODES > 1
#define for_each_node_mask(node, mask) \
for ((node) = first_node(mask); \
--- 2.6.24-mm1.orig/include/linux/bitmap.h 2008-02-04 10:41:02.440391382 -0800
+++ 2.6.24-mm1/include/linux/bitmap.h 2008-02-14 03:18:08.150311160 -0800
@@ -46,6 +46,7 @@
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
* bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
* bitmap_bitremap(oldbit, old, new, nbits) newbit = map(old, new)(oldbit)
+ * bitmap_relative(dst, orig, relmap, nbits) *dst = orig relative to relmap
* bitmap_scnprintf(buf, len, src, nbits) Print bitmap src to buf
* bitmap_parse(buf, buflen, dst, nbits) Parse bitmap dst from kernel buf
* bitmap_parse_user(ubuf, ulen, dst, nbits) Parse bitmap dst from user buf
@@ -120,6 +121,8 @@ extern void bitmap_remap(unsigned long *
const unsigned long *old, const unsigned long *new, int bits);
extern int bitmap_bitremap(int oldbit,
const unsigned long *old, const unsigned long *new, int bits);
+extern void bitmap_relative(unsigned long *dst, const unsigned long *orig,
+ const unsigned long *relmap, int bits);
extern int bitmap_find_free_region(unsigned long *bitmap, int bits, int order);
extern void bitmap_release_region(unsigned long *bitmap, int pos, int order);
extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order);
--- 2.6.24-mm1.orig/include/linux/cpumask.h 2008-02-04 10:53:59.229364809 -0800
+++ 2.6.24-mm1/include/linux/cpumask.h 2008-02-14 03:18:08.174311535 -0800
@@ -14,6 +14,7 @@
* bitmap_scnlistprintf() and bitmap_parselist(), also in bitmap.c.
* For details of cpu_remap(), see bitmap_bitremap in lib/bitmap.c
* For details of cpus_remap(), see bitmap_remap in lib/bitmap.c.
+ * For details of cpus_relative(), see bitmap_relative in lib/bitmap.c.
*
* The available cpumask operations are:
*
@@ -53,7 +54,8 @@
* int cpulist_scnprintf(buf, len, mask) Format cpumask as list for printing
* int cpulist_parse(buf, map) Parse ascii string as cpulist
* int cpu_remap(oldbit, old, new) newbit = map(old, new)(oldbit)
- * int cpus_remap(dst, src, old, new) *dst = map(old, new)(src)
+ * void cpus_remap(dst, src, old, new) *dst = map(old, new)(src)
+ * void cpus_relative(dst, orig, relmap) *dst = orig relative to relmap
*
* for_each_cpu_mask(cpu, mask) for-loop cpu over mask
*
@@ -311,6 +313,14 @@ static inline void __cpus_remap(cpumask_
bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
}

+#define cpus_relative(dst, orig, relmap) \
+ __cpus_relative(&(dst), &(orig), &(relmap), NR_CPUS)
+static inline void __cpus_relative(cpumask_t *dstp, const cpumask_t *origp,
+ const cpumask_t *relmapp, int nbits)
+{
+ bitmap_relative(dstp->bits, origp->bits, relmapp->bits, nbits);
+}
+
#if NR_CPUS > 1
#define for_each_cpu_mask(cpu, mask) \
for ((cpu) = first_cpu(mask); \
--- 2.6.24-mm1.orig/lib/bitmap.c 2008-02-04 10:41:35.656945848 -0800
+++ 2.6.24-mm1/lib/bitmap.c 2008-02-14 03:18:08.190311785 -0800
@@ -698,6 +698,69 @@ int bitmap_bitremap(int oldbit, const un
}
EXPORT_SYMBOL(bitmap_bitremap);

+/**
+ * bitmap_relative - translate one bitmap relative to another
+ * @dst: resulting translated bitmap
+ * @orig: original untranslated bitmap
+ * @relmap: bitmap relative to which translated
+ * @bits: number of bits in each of these bitmaps
+ *
+ * Set the n-th bit of @dst iff there exists some m such that the
+ * n-th bit of @relmap is set, the m-th bit of @orig is is set,
+ * and the n-th bit of @relmap is the m-th set bit of @relmap.
+ * (If you understood the previous sentence the first time your
+ * read it, you're overqualified for your current job.)
+ *
+ * Example:
+ * Let's say @relmap has bits 30-39 set, and @orig has bits
+ * 1, 3, 5, 7, 9 and 11 set. Then on return from this routine,
+ * @dst will have bits 31, 33, 35, 37 and 39 set.
+ *
+ * When bit 0 is set in @orig, it means turn on the bit in
+ * @dst corresponding to whatever is the first bit (if any)
+ * that is turned on in @relmap. Since bit 0 was off in the
+ * above example, we leave off that bit (bit 30) in @dst.
+ *
+ * When bit 1 is set in @orig (as in the above example), it
+ * means turn on the bit in @dst corresponding to whatever
+ * is the second bit that is turned on in @relmap. The second
+ * bit in @relmap that was turned on in the above example was
+ * bit 31, so we turned on bit 31 in @dst.
+ *
+ * Similarly, we turned on bits 33, 35, 37 and 39 in @dst,
+ * because they were the 4th, 6th, 8th and 10th set bits
+ * set in @relmap, and the 4th, 6th, 8th and 10th bits of
+ * @orig (i.e. bits 3, 5, 7 and 9) were also set.
+ *
+ * When bit 11 is set in @orig, it means turn on the bit in
+ * @dst corresponding to whatever is the twelth bit that is
+ * turned on in @relmap. In the above example, there were
+ * only ten bits turned on in @relmap (30..39), so that bit
+ * 11 was set in @orig had no affect on @dst.
+ *
+ * If either of @orig or @relmap is empty (no set bits), then
+ * @dst will be returned empty.
+ *
+ * All bits in @dst not set by the above rule are cleared.
+ */
+void bitmap_relative(unsigned long *dst, const unsigned long *orig,
+ const unsigned long *relmap, int bits)
+{
+ int n, m; /* same meaning as in above comment */
+
+ bitmap_zero(dst, bits);
+
+ m = 0;
+ for (n = find_first_bit(relmap, bits);
+ n < bits;
+ n = find_next_bit(relmap, bits, n + 1)) {
+ /* m == bitmap_pos_to_ord(relmap, n, bits) */
+ if (test_bit(m, orig))
+ set_bit(n, dst);
+ m++;
+ }
+}
+
/*
* Common code for bitmap_*_region() routines.
* bitmap: array of unsigned longs corresponding to the bitmap

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373


2008-02-14 14:11:23

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Hi Paul,

> The following adds one more bitmap operator, with the usual
> cpumask and nodemask wrappers. This operator computes one
> bitmap relative to another. If the n-th bit in the origin
> mask is set, then the m-th bit of the destination mask will
> be set, where m is the position of the n-th set bit in the
> relative mask.

this patch is very interesting.

BTW:
Are you think this function name must be "relative" ?
I think "relative" implies ordered set.
but linux bitmap is abstraction of unordered set.
if possible, i prefer another name.

end up, bitmap_relative is map as pecial case, i think.

> This is initially to be used by the MPOL_F_RELATIVE_NODES
> facility being considered for mm/mempolicy.c.

agreed with node_relative idea.
I think this is very useful.

Thanks!

2008-02-14 16:35:54

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Kosaki-san wrote:
> i prefer another name [!relative].

Any suggestions?

I'll give the name some thought myself.
I like good names, and this is the right
time to get this one right.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-02-14 19:28:18

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Excellent. Relative node masks are a nice feature and may allow us to even
cut down the size of the bitmasks for configurations with large numbers of
nodes.

Reviewed-by: Christoph Lameter <[email protected]>

ccing Mike since he may need something similar for cpu masks which are
getting a bit too large for 4k systems on x86_64.

2008-02-14 20:03:03

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

On Thursday 14 February 2008 20:27:59 Christoph Lameter wrote:
> Excellent. Relative node masks are a nice feature and may allow us to even
> cut down the size of the bitmasks for configurations with large numbers of
> nodes.
>
> Reviewed-by: Christoph Lameter <[email protected]>
>
> ccing Mike since he may need something similar for cpu masks which are
> getting a bit too large for 4k systems on x86_64.

You're saying the kernel should use these relative masks internally?

That means it would be impossible to run workloads that use the complete
machine because you couldn't represent all nodes.

Doesn't sound like a good idea.

-Andi

2008-02-14 20:07:07

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

On Thu, 14 Feb 2008, Andi Kleen wrote:

> You're saying the kernel should use these relative masks internally?

There is just some thoughts about this. Did not have time to look into the
details. Mike?

> That means it would be impossible to run workloads that use the complete
> machine because you couldn't represent all nodes.

Not sure how they are addressing this.

2008-02-14 20:26:33

by Mike Travis

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Christoph Lameter wrote:
> On Thu, 14 Feb 2008, Andi Kleen wrote:
>
>> You're saying the kernel should use these relative masks internally?
>
> There is just some thoughts about this. Did not have time to look into the
> details. Mike?

There are a few places where the entire cpumask is not needed. For
example, in the area of core siblings on a node. There's a limit
to how many cores/threads can be on a node and the full 4k cpumask
is not needed. How this pertains to this new functionality I'm
not sure yet.

>
>> That means it would be impossible to run workloads that use the complete
>> machine because you couldn't represent all nodes.
>
> Not sure how they are addressing this.

2008-02-14 20:30:24

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

On Thursday 14 February 2008 21:25:59 Mike Travis wrote:
> Christoph Lameter wrote:
> > On Thu, 14 Feb 2008, Andi Kleen wrote:
> >
> >> You're saying the kernel should use these relative masks internally?
> >
> > There is just some thoughts about this. Did not have time to look into the
> > details. Mike?
>
> There are a few places where the entire cpumask is not needed. For
> example, in the area of core siblings on a node. There's a limit
> to how many cores/threads can be on a node and the full 4k cpumask
> is not needed. How this pertains to this new functionality I'm
> not sure yet.

That would require that the BIOS enumerates the CPUs in a way that
the cores of a socket are continuous. While that's usually true
I don't think there's a guarantee. In theory they could be all scattered.

Ok I theory Linux could remap later but that seems hardly worth
the trouble.

I would rather just use arrays of integers in this case with a reasonable fixed
upper limit (e.g. 16 or 32 -- if there are ever >32 thread x86 CPUs presumably they will
require an updated cpufreq driver too...)


-Andi

2008-02-14 20:56:09

by Ray Lee

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

On Thu, Feb 14, 2008 at 8:35 AM, Paul Jackson <[email protected]> wrote:
> Kosaki-san wrote:
> > i prefer another name [!relative].
>
> Any suggestions?
>
> I'll give the name some thought myself.
> I like good names, and this is the right
> time to get this one right.

'Relative map' implies a constant offset. What you have there is just
a map as relmap could be sparse (which, btw, would have been nice to
have in the example).

map_bitmap violates your naming convention, bitmap_map isn't all that
clear, bitmap_remap is taken, and while it is one-to-one and onto, I
think calling it bitmap_bijection would lose everyone except the
mathematicians. bitmap_onto? bitmap_map_onto? bitmap_map_bitmap_onto?

bitmap_read_my_kernel_doc?

Minor suggestion:
+ * and the n-th bit of @relmap is the m-th set bit of @relmap.

Perhaps s/is the/is also the/, so that the reader doesn't try to
second guess if you accidentally wrote @relmap twice instead of one of
them being @orig.

2008-02-14 21:17:19

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

On Thu, 14 Feb 2008, Ray Lee wrote:

> map_bitmap violates your naming convention, bitmap_map isn't all that
> clear, bitmap_remap is taken, and while it is one-to-one and onto, I
> think calling it bitmap_bijection would lose everyone except the
> mathematicians. bitmap_onto? bitmap_map_onto? bitmap_map_bitmap_onto?
>

Whatever this operation ends up being called should be mirrored in the
name of the new mempolicy flag being introduced, so this will need to be
finalized before MPOL_F_RELATIVE_NODES can be proposed.

> Minor suggestion:
> + * and the n-th bit of @relmap is the m-th set bit of @relmap.
>
> Perhaps s/is the/is also the/, so that the reader doesn't try to
> second guess if you accidentally wrote @relmap twice instead of one of
> them being @orig.
>

There's also an extra "is" in the description:

--- 2.6.24-mm1.orig/lib/bitmap.c 2008-02-04 10:41:35.656945848 -0800
+++ 2.6.24-mm1/lib/bitmap.c 2008-02-14 03:18:08.190311785 -0800
@@ -698,6 +698,69 @@ int bitmap_bitremap(int oldbit, const un
}
EXPORT_SYMBOL(bitmap_bitremap);

+/**
+ * bitmap_relative - translate one bitmap relative to another
+ * @dst: resulting translated bitmap
+ * @orig: original untranslated bitmap
+ * @relmap: bitmap relative to which translated
+ * @bits: number of bits in each of these bitmaps
+ *
+ * Set the n-th bit of @dst iff there exists some m such that the
+ * n-th bit of @relmap is set, the m-th bit of @orig is is set,

on the last line of this snippet.

David

2008-02-15 00:24:53

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Hi Ray,

> > > i prefer another name [!relative].
> >
> > Any suggestions?
> >
> > I'll give the name some thought myself.
> > I like good names, and this is the right
> > time to get this one right.
>
> 'Relative map' implies a constant offset. What you have there is just
> a map as relmap could be sparse (which, btw, would have been nice to
> have in the example).
>
> map_bitmap violates your naming convention, bitmap_map isn't all that
> clear, bitmap_remap is taken, and while it is one-to-one and onto, I
> think calling it bitmap_bijection would lose everyone except the
> mathematicians. bitmap_onto? bitmap_map_onto? bitmap_map_bitmap_onto?

Thanks for many idea.
I like bitmap_onto and/or bitmap_map_onto. :)


- kosaki

2008-02-15 09:54:59

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Andi, responding to Christoph, wrote:
> You're saying the kernel should use these relative masks internally?

In a conversation with Christoph Thursday afternoon, I got the
impression that he liked the idea of using some more compact
representation of sparse collections of CPUs (or Nodes) than cpumask_t.
We end up with some big arrays of big cpumask_t's and nodemask_t's,
mostly filled with zero bits ... wasting memory.

I'll agree with Christoph on that, though I'd agree with you that this
bitmap relative operator is probably -not- the key to that more compact
representation.

The place that my thinking grinds to a halt on this is dealing with the
problem that all the more compact representations of a collection of
CPU numbers that I can think of either (1) are variable length (usually
leading to dynamic storage), or (2) impose some artificial restriction
on how many elements they can represent, or perhaps (3) use some
complex data structure to enumerate just the actual elements of the
power set of all cpus (or all nodes) that we actually use, which is
vastly less than the set of all possible subsets of such.

You address this artificial restriction yourself, in another message:
> I would rather just use arrays of integers in this case with a
> reasonable fixed upper limit (e.g. 16 or 32 -- if there are ever
> >32 thread x86 CPUs presumably they will require an updated cpufreq
> driver too...)

That might be the key here. Perhaps as you suggest we can identify
some places where we can replace sparse cpumask_t's with (fixed length?)
arrays of integers, with little or no practical loss of generality, and
with nice reductions in memory footprint.

Mike Travis <[email protected]> -- do you see some places where replacing
cpumask_t's with fixed arrays of 16 or 32 CPU numbers would be a big
enough win to be worth the effort?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-02-15 10:25:44

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Kosaki-san wrote:
> I like bitmap_onto

I like it too. And easier to type than bitmap_surjection ;).

> relmap could be sparse (which, btw, would have been nice to
> have in the example)

Good idea.

I'll see what I can cook up.

Thank-you, sir.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-02-15 10:47:33

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

David wrote:
> There's also an extra "is" in the description:

ah so so -- thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-02-15 10:58:43

by Paul Jackson

[permalink] [raw]
Subject: Re: [RFC] bitmap relative operator for mempolicy extensions

Ray wrote:
> bitmap_onto?

Ah - you were the original person to propose this. Thank-you.

> bitmap_read_my_kernel_doc?

;).

> Minor suggestion:
> + * and the n-th bit of @relmap is the m-th set bit of @relmap.

I'll ponder that confusing line - thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214