2003-09-10 21:36:20

by Jesse Barnes

[permalink] [raw]
Subject: [PATCH] you have how many nodes??

Needed this for booting on a 128 node system.

Thanks,
Jesse


diff -Nru a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h Wed Sep 10 14:31:09 2003
+++ b/include/linux/mm.h Wed Sep 10 14:31:09 2003
@@ -324,7 +324,7 @@
* sets it, so none of the operations on it need to be atomic.
*/
#define NODE_SHIFT 4
-#define ZONE_SHIFT (BITS_PER_LONG - 8)
+#define ZONE_SHIFT (BITS_PER_LONG - 10)

struct zone;
extern struct zone *zone_table[];
diff -Nru a/include/linux/mmzone.h b/include/linux/mmzone.h
--- a/include/linux/mmzone.h Wed Sep 10 14:31:09 2003
+++ b/include/linux/mmzone.h Wed Sep 10 14:31:09 2003
@@ -311,8 +311,8 @@

#include <asm/mmzone.h>

-/* page->zone is currently 8 bits ... */
-#define MAX_NR_NODES (255 / MAX_NR_ZONES)
+/* page->zone is currently 10 bits ... */
+#define MAX_NR_NODES NR_NODES

#endif /* !CONFIG_DISCONTIGMEM */


2003-09-10 22:34:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

[email protected] (Jesse Barnes) wrote:
>
> Needed this for booting on a 128 node system.
>
> -#define ZONE_SHIFT (BITS_PER_LONG - 8)
> +#define ZONE_SHIFT (BITS_PER_LONG - 10)

eeek, ia32 just lost another two page flags.

This stuff needs to be controlled by per-arch and per-subarch header files.

Instead of going backwards like this we'd like to actually free up _more_
bits in page->flags. The worst (and controlling) case is on 32-bit NUMA:
eight nodes, three zones per node. That's five bits, leaving us 27 page
flags.

So we'd need

include/asm-foo/zonestuff.h:

#define ARCH_MAX_NODES_SHIFT 3 /* Up to 8 nodes */
#define ARCH_MAX_ZONES_SHIFT 2 /* Up to 4 zones per node */


and all the mm.h/mmzone.h constants use those two.


I think. We could just say "dang numaq needs five bits", so:


#if BITS_PER_LONG == 32
#define ZONE_SHIFT 5
#else
#define ZONE_SHIFT 10
#endif


Bit sleazy, but I think that would suffice.

2003-09-10 22:54:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

[email protected] (Jesse Barnes) wrote:
>
> On Wed, Sep 10, 2003 at 03:12:54PM -0700, Andrew Morton wrote:
> > I think. We could just say "dang numaq needs five bits", so:
> >
> >
> > #if BITS_PER_LONG == 32
> > #define ZONE_SHIFT 5
> > #else
> > #define ZONE_SHIFT 10
> > #endif
>
> That's fine with me, do you want me to rediff and send a new patch?
>

Well your patch as it stands would appear to break NUMAQ builds, due to
NUMAQ setting MAX_NUMNODES directly in the arch code. ia64 is using
another layer of macroification via NR_NODES instead.

MAX_NUMNODES, NR_NODES and MAX_NR_NODES appear to be a bit of a mess, and
they should all be replaced with shift distances anyway.

Could you please get together with Martin Bligh, come up with something
which works on NUMAQ and your 128 CPU PDA and also cast an eye across the
other architectures (sparc64, sh, ...)? It all needs a bit of thought and
a spring clean.

Thanks.

2003-09-10 22:36:45

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

On Wed, Sep 10, 2003 at 03:12:54PM -0700, Andrew Morton wrote:
> I think. We could just say "dang numaq needs five bits", so:
>
>
> #if BITS_PER_LONG == 32
> #define ZONE_SHIFT 5
> #else
> #define ZONE_SHIFT 10
> #endif

That's fine with me, do you want me to rediff and send a new patch?

Thanks,
Jesse

2003-09-11 00:51:53

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

On Wed, Sep 10, 2003 at 03:12:54PM -0700, Andrew Morton wrote:
> Instead of going backwards like this we'd like to actually free up _more_
> bits in page->flags. The worst (and controlling) case is on 32-bit NUMA:
> eight nodes, three zones per node. That's five bits, leaving us 27 page
> flags.

The worst case for i386 NUMA is actually 16 nodes.


-- wli

2003-09-10 23:57:53

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

>> > I think. We could just say "dang numaq needs five bits", so:
>> >
>> >
>> > # if BITS_PER_LONG == 32
>> > # define ZONE_SHIFT 5
>> > # else
>> > # define ZONE_SHIFT 10
>> > # endif
>>
>> That's fine with me, do you want me to rediff and send a new patch?
>
> Well your patch as it stands would appear to break NUMAQ builds, due to
> NUMAQ setting MAX_NUMNODES directly in the arch code. ia64 is using
> another layer of macroification via NR_NODES instead.
>
> MAX_NUMNODES, NR_NODES and MAX_NR_NODES appear to be a bit of a mess, and
> they should all be replaced with shift distances anyway.

;-)

Yes, it's a turgid mess.

I'd prefer to define things in terms of MAX_NUMNODES, and derive the shifts
from that if possible - much more intuitive to maintain.
But other than that I agree completely with you.

> Could you please get together with Martin Bligh, come up with something
> which works on NUMAQ and your 128 CPU PDA and also cast an eye across the
> other architectures (sparc64, sh, ...)? It all needs a bit of thought and
> a spring clean.

I'll have a look, I'm sure we can come up with something between us.

M.

2003-09-11 00:03:22

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

On Wed, Sep 10, 2003 at 04:46:40PM -0700, Martin J. Bligh wrote:
> Yes, it's a turgid mess.
>
> I'd prefer to define things in terms of MAX_NUMNODES, and derive the shifts
> from that if possible - much more intuitive to maintain.
> But other than that I agree completely with you.

Yeah, I don't mind switching, should just be a search and replace.

> > Could you please get together with Martin Bligh, come up with something
> > which works on NUMAQ and your 128 CPU PDA and also cast an eye across the
> > other architectures (sparc64, sh, ...)? It all needs a bit of thought and
> > a spring clean.
>
> I'll have a look, I'm sure we can come up with something between us.

Cool, thanks.

Jesse

2003-09-16 00:34:52

by Matthew Dobson

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

Ok, I made an attempt to clean up this mess quite a while ago (2.5.47),
but that patch is utterly useless now. At Martin's urging I've created
a new series of patches to resolve this.

01 - Make sure MAX_NUMNODES is defined in one and only one place.
Remove superfluous definitions. Instead of defining MAX_NUMNODES in
asm/numnodes.h, we define NODES_SHIFT there. Then in linux/mmzone.h we
turn that NODES_SHIFT value into MAX_NUMNODES.

02 - Remove MAX_NR_NODES. This value is only used in a couple of
places, and it's incorrectly used in all those places as far as I can
tell. Replace with MAX_NUMNODES. Create MAX_NODES_SHIFT and use this
value to check NODES_SHIFT is appropriate. A possible future patch
should make MAX_NODES_SHIFT vary based on 32 vs. 64 bit archs.

03 - Fix up the sh arch. sh defined NR_NODES, change sh to use standard
MAX_NUMNODES instead.

04 - Fix up the arm arch. This needs to be reviewed. Relatively
straightforward replacement of NR_NODES with standard MAX_NUMNODES.

05 - Fix up the ia64 arch. This *definitely* needs to be reviewed.
This code made my head hurt. I think I may have gotten it right.
Totally untested.

Cheers!

-Matt

Jesse Barnes wrote:
> On Wed, Sep 10, 2003 at 04:46:40PM -0700, Martin J. Bligh wrote:
>
>>Yes, it's a turgid mess.
>>
>>I'd prefer to define things in terms of MAX_NUMNODES, and derive the shifts
>>from that if possible - much more intuitive to maintain.
>>But other than that I agree completely with you.
>
>
> Yeah, I don't mind switching, should just be a search and replace.
>
>
>>>Could you please get together with Martin Bligh, come up with something
>>>which works on NUMAQ and your 128 CPU PDA and also cast an eye across the
>>>other architectures (sparc64, sh, ...)? It all needs a bit of thought and
>>>a spring clean.
>>
>>I'll have a look, I'm sure we can come up with something between us.
>
>
> Cool, thanks.
>
> Jesse
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


2003-09-19 22:04:51

by Matthew Dobson

[permalink] [raw]
Subject: Re: [PATCH] you have how many nodes??

diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.0-test5-mm3/include/asm-arm/memory.h linux-2.6.0-test5-numnodes_update/include/asm-arm/memory.h
--- linux-2.6.0-test5-mm3/include/asm-arm/memory.h Fri Sep 19 14:05:33 2003
+++ linux-2.6.0-test5-numnodes_update/include/asm-arm/memory.h Fri Sep 19 14:52:11 2003
@@ -84,27 +84,24 @@ static inline void *phys_to_virt(unsigne

#define PHYS_TO_NID(addr) (0)

-#else
+#else /* CONFIG_DISCONTIGMEM */
+
/*
* This is more complex. We have a set of mem_map arrays spread
* around in memory.
*/
-#include <asm/numnodes.h>
-#define NUM_NODES (1 << NODES_SHIFT)
+#include <linux/numa.h>

#define page_to_pfn(page) \
(( (page) - page_zone(page)->zone_mem_map) \
+ page_zone(page)->zone_start_pfn)
-
#define pfn_to_page(pfn) \
(PFN_TO_MAPBASE(pfn) + LOCAL_MAP_NR((pfn) << PAGE_SHIFT))
-
-#define pfn_valid(pfn) (PFN_TO_NID(pfn) < NUM_NODES)
+#define pfn_valid(pfn) (PFN_TO_NID(pfn) < MAX_NUMNODES)

#define virt_to_page(kaddr) \
(ADDR_TO_MAPBASE(kaddr) + LOCAL_MAP_NR(kaddr))
-
-#define virt_addr_valid(kaddr) (KVADDR_TO_NID(kaddr) < NUM_NODES)
+#define virt_addr_valid(kaddr) (KVADDR_TO_NID(kaddr) < MAX_NUMNODES)

/*
* Common discontigmem stuff.
@@ -112,9 +109,7 @@ static inline void *phys_to_virt(unsigne
*/
#define PHYS_TO_NID(addr) PFN_TO_NID((addr) >> PAGE_SHIFT)

-#undef NUM_NODES
-
-#endif
+#endif /* !CONFIG_DISCONTIGMEM */

/*
* For BIO. "will die". Kill me when bio_to_phys() and bvec_to_phys() die.
diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.0-test5-mm3/include/asm-ia64/nodedata.h linux-2.6.0-test5-numnodes_update/include/asm-ia64/nodedata.h
--- linux-2.6.0-test5-mm3/include/asm-ia64/nodedata.h Fri Sep 19 14:05:33 2003
+++ linux-2.6.0-test5-numnodes_update/include/asm-ia64/nodedata.h Fri Sep 19 14:52:13 2003
@@ -8,13 +8,11 @@
* Copyright (c) 2002 Erich Focht <[email protected]>
* Copyright (c) 2002 Kimio Suganuma <[email protected]>
*/
-
-
#ifndef _ASM_IA64_NODEDATA_H
#define _ASM_IA64_NODEDATA_H

-
-#include <linux/mmzone.h>
+#include <linux/numa.h>
+#include <asm/mmzone.h>

/*
* Node Data. One of these structures is located on each node of a NUMA system.
@@ -24,7 +22,7 @@ struct pglist_data;
struct ia64_node_data {
short active_cpu_count;
short node;
- struct pglist_data *pg_data_ptrs[MAX_NUMNODES];
+ struct pglist_data *pg_data_ptrs[MAX_NUMNODES];
struct page *bank_mem_map_base[NR_BANKS];
struct ia64_node_data *node_data_ptrs[MAX_NUMNODES];
short node_id_map[NR_BANKS];
diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.0-test5-mm3/include/asm-ia64/numa.h linux-2.6.0-test5-numnodes_update/include/asm-ia64/numa.h
--- linux-2.6.0-test5-mm3/include/asm-ia64/numa.h Fri Sep 19 14:05:33 2003
+++ linux-2.6.0-test5-numnodes_update/include/asm-ia64/numa.h Fri Sep 19 14:52:13 2003
@@ -13,9 +13,9 @@

#ifdef CONFIG_NUMA

-#include <linux/mmzone.h>
-
+#include <linux/numa.h>
#include <linux/cache.h>
+
extern volatile char cpu_to_node_map[NR_CPUS] __cacheline_aligned;
extern volatile unsigned long node_to_cpu_mask[MAX_NUMNODES] __cacheline_aligned;

diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.0-test5-mm3/include/asm-ia64/sn/pda.h linux-2.6.0-test5-numnodes_update/include/asm-ia64/sn/pda.h
--- linux-2.6.0-test5-mm3/include/asm-ia64/sn/pda.h Fri Sep 19 14:05:33 2003
+++ linux-2.6.0-test5-numnodes_update/include/asm-ia64/sn/pda.h Fri Sep 19 14:52:13 2003
@@ -10,7 +10,7 @@

#include <linux/config.h>
#include <linux/cache.h>
-#include <linux/mmzone.h>
+#include <linux/numa.h>
#include <asm/percpu.h>
#include <asm/system.h>
#include <asm/processor.h>
diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.0-test5-mm3/include/linux/mm.h linux-2.6.0-test5-numnodes_update/include/linux/mm.h
--- linux-2.6.0-test5-mm3/include/linux/mm.h Fri Sep 19 14:05:37 2003
+++ linux-2.6.0-test5-numnodes_update/include/linux/mm.h Fri Sep 19 14:52:04 2003
@@ -323,7 +323,6 @@ static inline void put_page(struct page
* The zone field is never updated after free_area_init_core()
* sets it, so none of the operations on it need to be atomic.
*/
-#define NODE_SHIFT 4
#define ZONE_SHIFT (BITS_PER_LONG - 8)

struct zone;
diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.0-test5-mm3/include/linux/mmzone.h linux-2.6.0-test5-numnodes_update/include/linux/mmzone.h
--- linux-2.6.0-test5-mm3/include/linux/mmzone.h Fri Sep 19 14:05:37 2003
+++ linux-2.6.0-test5-numnodes_update/include/linux/mmzone.h Fri Sep 19 14:52:06 2003
@@ -10,14 +10,8 @@
#include <linux/wait.h>
#include <linux/cache.h>
#include <linux/threads.h>
+#include <linux/numa.h>
#include <asm/atomic.h>
-#ifdef CONFIG_DISCONTIGMEM
-#include <asm/numnodes.h>
-#endif
-#ifndef NODES_SHIFT
-#define NODES_SHIFT 0
-#endif
-#define MAX_NUMNODES (1 << NODES_SHIFT)

/* Free memory management - zoned buddy allocator. */
#ifndef CONFIG_FORCE_MAX_ZONEORDER
@@ -313,12 +307,19 @@ extern struct pglist_data contig_page_da
#else /* CONFIG_DISCONTIGMEM */

#include <asm/mmzone.h>
+
+#if BITS_PER_LONG == 32
/*
- * page->zone is currently 8 bits
- * there are 3 zones (2 bits)
- * this leaves 8-2=6 bits for nodes
+ * with 32 bit flags field, page->zone is currently 8 bits.
+ * there are 3 zones (2 bits) and this leaves 8-2=6 bits for nodes.
*/
#define MAX_NODES_SHIFT 6
+#elif BITS_PER_LONG == 64
+/*
+ * with 64 bit flags field, there's plenty of room.
+ */
+#define MAX_NODES_SHIFT 10
+#endif

#endif /* !CONFIG_DISCONTIGMEM */

diff -Nurp --exclude-from=/home/mcd/.dontdiff linux-2.6.0-test5-mm3/include/linux/numa.h linux-2.6.0-test5-numnodes_update/include/linux/numa.h
--- linux-2.6.0-test5-mm3/include/linux/numa.h Wed Dec 31 16:00:00 1969
+++ linux-2.6.0-test5-numnodes_update/include/linux/numa.h Fri Sep 19 14:52:04 2003
@@ -0,0 +1,16 @@
+#ifndef _LINUX_NUMA_H
+#define _LINUX_NUMA_H
+
+#include <linux/config.h>
+
+#ifdef CONFIG_DISCONTIGMEM
+#include <asm/numnodes.h>
+#endif
+
+#ifndef NODES_SHIFT
+#define NODES_SHIFT 0
+#endif
+
+#define MAX_NUMNODES (1 << NODES_SHIFT)
+
+#endif /* _LINUX_NUMA_H */


Attachments:
numnodes_update.patch (5.97 kB)