nodemask_t-05-x86_64.patch - Changes to x86_64 specific code.
Untested. Code review & testing requested.
I'll be surprised if the following line works:
nodemask_t node_offline_map = nodes_complement(node_online_map);
1) Doesn't nodes_complement return void, and work in place?
2) It might set bits above MAX_NUMNODES, if MAX_NUMNODES isn't a word size multiple.
I am less sure of (2) - the exact details of handling the unused bits of
a bitmask are still confusing me. But this would be one of the very
rare situations that I can find that would actually be sensitive to
possible confusions here - most places don't set bits that aren't
already set in some mask, or are careful to initialize a mask with just
set bits in select positions from 0 to MAX_NUMNODES-1.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
On Mon, Mar 22, 2004 at 11:08:50PM -0800, Paul Jackson wrote:
> I'll be surprised if the following line works:
> nodemask_t node_offline_map = nodes_complement(node_online_map);
> 1) Doesn't nodes_complement return void, and work in place?
> 2) It might set bits above MAX_NUMNODES, if MAX_NUMNODES isn't a word size multiple.
> I am less sure of (2) - the exact details of handling the unused bits of
> a bitmask are still confusing me. But this would be one of the very
> rare situations that I can find that would actually be sensitive to
> possible confusions here - most places don't set bits that aren't
> already set in some mask, or are careful to initialize a mask with just
> set bits in select positions from 0 to MAX_NUMNODES-1.
In general I attempted to model things after 3-address code.
bitmap_complement() is a glaring inconsistency I wouldn't mind seeing
shored up with the rest (though I guess it's only got 2 operands).
-- wli
Yes - making the *_complement ops into two operand (dst, src) would be a
good idea, Bill. Thanks. I will likely include that in what I'm doing
now.
Meanwhile, Matthew's patch 5/7 appears broken here.
My current understanding of the complement op is that it is broken for
non-word multiple sizes. There's a good chance I'm still be confused on
this matter.
It might make sense to redo this particular bit of offline logic not by
using *_complement, but rather by looping over all nodes, and only
acting if not online, thus avoiding the *_complement() operator for now.
I have not thought through the performance implications of such a code
inversion, however.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
On Tue, Mar 23, 2004 at 01:36:50PM -0800, Paul Jackson wrote:
> My current understanding of the complement op is that it is broken for
> non-word multiple sizes. There's a good chance I'm still be confused on
> this matter.
> It might make sense to redo this particular bit of offline logic not by
> using *_complement, but rather by looping over all nodes, and only
> acting if not online, thus avoiding the *_complement() operator for now.
> I have not thought through the performance implications of such a code
> inversion, however.
If someone's not treating the leftover bits as "don't cares", then
there's a bug. Or so goes the current convention. I think originally, I
had avoided touching the bits at the end, then the "don't cares"
suggestion was incorporated). This doesn't appear to be problematic on
smaller systems. Any idea where this bogon might be?
-- wli
> Any idea where this bogon might be?
Unless I'm missing something (quite possible), the following would
expose the complement operators setting of high unused bits.
You have 32 bit unsigned longs and NR_CPUS == 24, you complement
some cpumask, and you then ask if it is empty, or ask if it is
equal to another cpumask that differs only in the unused
high byte, or ask its Hamming weight?
The arithmetic cpumask ops don't mask off the unused bits above NR_CPUS.
This might never be noticed, because cpumask_complement is the only op
that sets these unused bits, the complement op is very rarely used, and
none of its current uses can lead to the above bugs that I see offhand.
My workarea (vanilla 2.6.4 plus Matthew's nodemask patch) right now has:
1) this complement operator entirely removed
2) its rare uses recoded to not use that op
I'm also experimenting with adding a precondition to each operation,
that no bits above the specified size (NR_CPUS or whatever the
equivalent for NODES is) may be set on input mask arguments. Each
operation need do just enough to avoid setting any such unused high bits
of output masks, given the assumption that they aren't already set on
any input parameters.
With this precondition, operations need make no promise to avoid being
confused by such high bits if improperly set on an input. They make no
promise to mask off such bits in a result mask if passed in. Nor do
they promise not to mask them off - that's at the discretion of the
implementation.
This precondition actually fits the existing implementation quite well.
About all I'd have to do, that I see so far, is to add code to the
complement op to zero out the unused bits. Or just get rid of the
complement op entirely ... as I'm considering.
Ah - one other place needs fixing - the array cpumask CPU_MASK_ALL
initializer sets all bits, which is ok now, since the other array
ops more or less all limit their considerations to the first NR_CPUS
bits. But it wouldn't surprise me if someday a bug showed up here
as well, added in the future perhaps, in the case that NR_CPUS is
not an exact multiple of unsigned longs.
I still don't know how I will fix the CPU_MASK_ALL static initializor in
the multi-word case - since I can't put runtime code in it.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
At some point in the past, I wrote:
>> Any idea where this bogon might be?
On Tue, Mar 23, 2004 at 08:11:01PM -0800, Paul Jackson wrote:
> Unless I'm missing something (quite possible), the following would
> expose the complement operators setting of high unused bits.
> You have 32 bit unsigned longs and NR_CPUS == 24, you complement
> some cpumask, and you then ask if it is empty, or ask if it is
> equal to another cpumask that differs only in the unused
> high byte, or ask its Hamming weight?
> The arithmetic cpumask ops don't mask off the unused bits above NR_CPUS.
> This might never be noticed, because cpumask_complement is the only op
> that sets these unused bits, the complement op is very rarely used, and
> none of its current uses can lead to the above bugs that I see offhand.
It's supposed to do this. bitmap_weight() masks off the trailing bits to
filter out "don't cares" from whole-word ops. Everything else operates
bitwise. Something like the following will keep the bits clear as you
wish, since apparently the more simplistic single-word ops aren't obeying
this. It's likely possible to better this by exploiting some invariants;
feel free improve on it.
-- wli
Index: pgcl-2.6.5-rc2/include/asm-generic/cpumask_arith.h
===================================================================
--- pgcl-2.6.5-rc2.orig/include/asm-generic/cpumask_arith.h 2004-03-19 16:11:33.000000000 -0800
+++ pgcl-2.6.5-rc2/include/asm-generic/cpumask_arith.h 2004-03-23 20:34:54.000000000 -0800
@@ -6,6 +6,12 @@
* to contain the whole cpu bitmap.
*/
+#if NR_CPUS % BITS_PER_LONG
+#define __CPU_MASK_VALID_BITS__ ((1UL << (NR_CPUS % BITS_PER_LONG)) - 1)
+#else
+#define __CPU_MASK_VALID_BITS__ (~0UL)
+#endif
+
#define cpu_set(cpu, map) set_bit(cpu, &(map))
#define cpu_clear(cpu, map) clear_bit(cpu, &(map))
#define cpu_isset(cpu, map) test_bit(cpu, &(map))
@@ -14,15 +20,15 @@
#define cpus_and(dst,src1,src2) do { dst = (src1) & (src2); } while (0)
#define cpus_or(dst,src1,src2) do { dst = (src1) | (src2); } while (0)
#define cpus_clear(map) do { map = 0; } while (0)
-#define cpus_complement(map) do { map = ~(map); } while (0)
-#define cpus_equal(map1, map2) ((map1) == (map2))
-#define cpus_empty(map) ((map) == 0)
+#define cpus_complement(map) do { map = ~(map) & __CPU_MASK_VALID_BITS__; } while (0)
+#define cpus_equal(map1, map2) (!(((map1) ^ (map2)) & __CPU_MASK_VALID_BITS__))
+#define cpus_empty(map) (!((map) & __CPU_MASK_VALID_BITS__))
#define cpus_addr(map) (&(map))
#if BITS_PER_LONG == 32
-#define cpus_weight(map) hweight32(map)
+#define cpus_weight(map) hweight32((map) & __CPU_MASK_VALID_BITS__)
#elif BITS_PER_LONG == 64
-#define cpus_weight(map) hweight64(map)
+#define cpus_weight(map) hweight64((map) & __CPU_MASK_VALID_BITS__)
#endif
#define cpus_shift_right(dst, src, n) do { dst = (src) >> (n); } while (0)
@@ -39,11 +45,13 @@
#define CPU_MASK_NONE ((cpumask_t)0)
/* only ever use this for things that are _never_ used on large boxen */
-#define cpus_coerce(map) ((unsigned long)(map))
+#define cpus_coerce(map) ((unsigned long)(map) & __CPU_MASK_VALID_BITS__)
#define cpus_promote(map) ({ map; })
#define cpumask_of_cpu(cpu) ({ ((cpumask_t)1) << (cpu); })
#define first_cpu(map) __ffs(map)
#define next_cpu(cpu, map) find_next_bit(&(map), NR_CPUS, cpu + 1)
+#undef __CPU_MASK_VALID_BITS__
+
#endif /* __ASM_GENERIC_CPUMASK_ARITH_H */
On Tue, 23 Mar 2004 20:11:01 -0800,
Paul Jackson <[email protected]> wrote:
>I still don't know how I will fix the CPU_MASK_ALL static initializor in
>the multi-word case - since I can't put runtime code in it.
#define NR_CPUS_WORDS ((NR_CPUS+BITS_PER_LONG-1)/BITS_PER_LONG)
#define NR_CPUS_UNDEF (NR_CPUS_WORDS*BITS_PER_LONG-NR_CPUS)
#if NR_CPUS_UNDEF == 0
#define CPU_MASK_ALL { [0 ... NR_CPUS_WORDS-1] = ~0UL }
#else
#define CPU_MASK_ALL { [0 ... NR_CPUS_WORDS-2] = ~0UL, ~0UL << NR_CPUS_UNDEF }
#endif
> CPU_MASK_ALL ...
Yup - now you tell me ... ;).
I just got done figuring it out in a slightly variant way ... ('nbits'
is NR_CPUS or similar):
#define MASK_LAST_WORD(nbits) \
( \
((nbits) % BITS_PER_LONG) ? \
(1<<((nbits) % BITS_PER_LONG))-1 : ~0UL \
)
#define MASK_ALL(nbits) \
{ { \
[0 ... BITS_TO_LONGS(nbits)-1] = ~0UL, \
[BITS_TO_LONGS(nbits)-1] = MASK_LAST_WORD(nbits) \
} }
This way overwrites the last word of the mask, first putting ~0UL
in all words, then however many bits are needed in the last word.
Does your way work if NR_CPUS is less than BITS_PER_LONG?
Won't gcc complain upon seeing something like, for say
NR_CPUS = 4 on a 32 bit system:
{ [ 0 ... -1 ] = ~0UL, ~0UL << 28 }
with the errors and warnings:
error: empty index range in initializer
warning: excess elements in struct initializer
and shouldn't the last word be inverted: ~(~0UL << NR_CPUS_UNDEF) ?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
On Tue, 23 Mar 2004 20:11:01 -0800, Paul Jackson <[email protected]> wrote:
>> I still don't know how I will fix the CPU_MASK_ALL static initializor in
>> the multi-word case - since I can't put runtime code in it.
On Fri, Mar 26, 2004 at 04:06:34PM +1100, Keith Owens wrote:
> #define NR_CPUS_WORDS ((NR_CPUS+BITS_PER_LONG-1)/BITS_PER_LONG)
This looks suspiciously like BITS_TO_LONGS(NR_CPUS) =)
On Fri, Mar 26, 2004 at 04:06:34PM +1100, Keith Owens wrote:
> #define NR_CPUS_UNDEF (NR_CPUS_WORDS*BITS_PER_LONG-NR_CPUS)
> #if NR_CPUS_UNDEF == 0
> #define CPU_MASK_ALL { [0 ... NR_CPUS_WORDS-1] = ~0UL }
> #else
> #define CPU_MASK_ALL { [0 ... NR_CPUS_WORDS-2] = ~0UL, ~0UL << NR_CPUS_UNDEF }
> #endif
Hmm, shouldn't that last one be ~0UL >> NR_CPUS_UNDEF?
-- wli
On Thu, 25 Mar 2004 23:14:12 -0800,
Paul Jackson <[email protected]> wrote:
>Does your way work if NR_CPUS is less than BITS_PER_LONG?
>Won't gcc complain upon seeing something like, for say
>NR_CPUS = 4 on a 32 bit system:
>
> { [ 0 ... -1 ] = ~0UL, ~0UL << 28 }
>
>with the errors and warnings:
>
> error: empty index range in initializer
> warning: excess elements in struct initializer
I only did one case, to concentrate on the value for the last word. A
full implementation has to cater for NR_CPUS < BITS_PER_LONG as well.
>and shouldn't the last word be inverted: ~(~0UL << NR_CPUS_UNDEF) ?
For big endian, ~0UL << NR_CPUS_UNDEF is right. For little endian, it
depends on how you represent an incomplete bit map. Is it represented
as a pure bit string, i.e. as if the arch were big endian? Or is it
represented as a mapping onto the bytes of the underlying long?
> how you represent an incomplete bit map
See the follow files for the best available documentation of how bitmaps
are layed out. The representation is little-endian friendly, so the
big-endian arch's had to struggle with it the most, and ended up
documenting it the best.
include/asm-ppc64/bitops.h
include/asm-s390/bitops.h
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
On Mon, Mar 29, 2004 at 03:08:46PM +1000, Keith Owens wrote:
> For big endian, ~0UL << NR_CPUS_UNDEF is right. For little endian, it
> depends on how you represent an incomplete bit map. Is it represented
> as a pure bit string, i.e. as if the arch were big endian? Or is it
> represented as a mapping onto the bytes of the underlying long?
Bitmaps are represented in Linux in such manners as befit wrong
(little) endian machines. I suggest shifting in the opposite direction.
-- wli