2011-05-18 21:25:11

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 1/7] [RFC] Mainline BG/P platform support

The Linux kernel patches for the IBM BlueGene/P have been open-sourced
for quite some time, but haven't been integrated into the mainline Linux
kernel source tree. This is the first patch series of several where I
will attempt to cleanup and mainline the already public patches. I
welcome feedback as well as any help I can get. I'm drawing on
the patches available for the IBM Compute Node kernel, the ZeptoOS project
and the Kittyhawk project.
(all available from http://wiki.bg.anl-external.org)

I'll be prioritizing core patches which are harder to keep current with
mainline due to merge conflicts and then slowly incorporating the drivers
and other extensions (if acceptable after community review).

I'll be maintaining the patchset in my kernel.org repository
(/pub/scm/linux/kernel/git/ericvh/bluegene.git) under the bluegene
branch with the source repos (zepto, kittyhawk, ibmcn) available in
respective branches. Ben - if you would prefer me to send pull requests
once we get rolling, I can switch to that -- otherwise I'll stick to
just submitting patches to the list assuming you'll pull them when they
become acceptable. Thanks for your attention reviewing these patches.

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
MAINTAINERS | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 69f19f1..3ffca88 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3863,6 +3863,14 @@ S: Maintained
F: arch/powerpc/platforms/40x/
F: arch/powerpc/platforms/44x/

+LINUX FOR POWERPC BLUEGENE/P
+M: Eric Van Hensbergen <[email protected]>
+W: http://bg-linux.anl-external.org/wiki/index.php/Main_Page
+L: [email protected]
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/bluegene.git
+S: Maintained
+F: arch/powerpc/platforms/44x/bgp*
+
LINUX FOR POWERPC EMBEDDED XILINX VIRTEX
M: Grant Likely <[email protected]>
W: http://wiki.secretlab.ca/index.php/Linux_on_Xilinx_Virtex
--
1.7.4.1


2011-05-18 21:25:13

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 2/7] [RFC] add bluegene entry to cputable

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/kernel/cputable.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index b9602ee..0eb245e 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1732,6 +1732,20 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check = machine_check_440A,
.platform = "ppc440",
},
+ { /* Blue Gene/P */
+ .pvr_mask = 0xfffffff0,
+ .pvr_value = 0x52131880,
+ .cpu_name = "450 Blue Gene/P",
+ .cpu_features = CPU_FTRS_440x6,
+ .cpu_user_features = COMMON_USER_BOOKE |
+ PPC_FEATURE_HAS_FPU,
+ .mmu_features = MMU_FTR_TYPE_44x,
+ .icache_bsize = 32,
+ .dcache_bsize = 32,
+ .cpu_setup = __setup_cpu_460gt,
+ .machine_check = machine_check_440A,
+ .platform = "ppc440",
+ },
{ /* 460EX */
.pvr_mask = 0xffff0006,
.pvr_value = 0x13020002,
--
1.7.4.1

2011-05-18 21:25:19

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

This patch adds save/restore register support for the BlueGene/P
double hummer FPU.

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/include/asm/ppc_asm.h | 39 ++++++++++++++++++++++++-----------
arch/powerpc/kernel/fpu.S | 8 +++---
arch/powerpc/platforms/44x/Kconfig | 9 ++++++++
3 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 9821006..daa22bb 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -88,6 +88,13 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
REST_10GPRS(22, base)
#endif

+#ifdef CONFIG_BGP
+#define LFPDX(frt, ra, rb) .long (31<<26)|((frt)<<21)|((ra)<<16)| \
+ ((rb)<<11)|(462<<1)
+#define STFPDX(frt, ra, rb) .long (31<<26)|((frt)<<21)|((ra)<<16)| \
+ ((rb)<<11)|(974<<1)
+#endif /* CONFIG_BGP */
+
#define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
#define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
#define SAVE_8GPRS(n, base) SAVE_4GPRS(n, base); SAVE_4GPRS(n+4, base)
@@ -97,18 +104,26 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)

-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
-#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
-#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
-#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
-#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
-#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
-#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
-#define REST_16FPRS(n, base) REST_8FPRS(n, base); REST_8FPRS(n+8, base)
-#define REST_32FPRS(n, base) REST_16FPRS(n, base); REST_16FPRS(n+16, base)
+#ifdef CONFIG_BGP
+#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
+#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)
+#else /* CONFIG_BGP */
+#define SAVE_FPR(n, b, base) (stfd n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
+#define REST_FPR(n, b, base) (lfd n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
+#endif /* CONFIG_BGP */
+
+#define SAVE_2FPRS(n, b, base) SAVE_FPR(n, b, base); SAVE_FPR(n+1, b, base)
+#define SAVE_4FPRS(n, b, base) SAVE_2FPRS(n, b, base); SAVE_2FPRS(n+2, b, base)
+#define SAVE_8FPRS(n, b, base) SAVE_4FPRS(n, b, base); SAVE_4FPRS(n+4, b, base)
+#define SAVE_16FPRS(n, b, base) SAVE_8FPRS(n, b, base); SAVE_8FPRS(n+8, b, base)
+#define SAVE_32FPRS(n, b, base) SAVE_16FPRS(n, b, base); \
+ SAVE_16FPRS(n+16, b, base)
+#define REST_2FPRS(n, b, base) REST_FPR(n, b, base); REST_FPR(n+1, b, base)
+#define REST_4FPRS(n, b, base) REST_2FPRS(n, b, base); REST_2FPRS(n+2, b, base)
+#define REST_8FPRS(n, b, base) REST_4FPRS(n, b, base); REST_4FPRS(n+4, b, base)
+#define REST_16FPRS(n, b, base) REST_8FPRS(n, b, base); REST_8FPRS(n+8, b, base)
+#define REST_32FPRS(n, b, base) REST_16FPRS(n, b, base); \
+ REST_16FPRS(n+16, b, base)

#define SAVE_VR(n,b,base) li b,THREAD_VR0+(16*(n)); stvx n,base,b
#define SAVE_2VRS(n,b,base) SAVE_VR(n,b,base); SAVE_VR(n+1,b,base)
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index de36955..9f11c66 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -30,7 +30,7 @@
BEGIN_FTR_SECTION \
b 2f; \
END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
- REST_32FPRS(n,base); \
+ REST_32FPRS(n,c,base); \
b 3f; \
2: REST_32VSRS(n,c,base); \
3:
@@ -39,13 +39,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
BEGIN_FTR_SECTION \
b 2f; \
END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
- SAVE_32FPRS(n,base); \
+ SAVE_32FPRS(n,c,base); \
b 3f; \
2: SAVE_32VSRS(n,c,base); \
3:
#else
-#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
-#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n,b,base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n,b,base)
#endif

/*
diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
index f485fc5f..24a515e 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -169,6 +169,15 @@ config YOSEMITE
help
This option enables support for the AMCC PPC440EP evaluation board.

+config BGP
+ bool "Blue Gene/P"
+ depends on 44x
+ default n
+ select PPC_FPU
+ select PPC_DOUBLE_FPU
+ help
+ This option enables support for the IBM BlueGene/P supercomputer.
+
config ISS4xx
bool "ISS 4xx Simulator"
depends on (44x || 40x)
--
1.7.4.1

2011-05-18 21:26:15

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 4/7] [RFC] enable L1_WRITETHROUGH mode for BG/P

BG/P nodes need to be configured for writethrough to work in SMP
configurations. This patch adds the right hooks in the MMU code
to make sure L1_WRITETHROUGH configurations are setup for BG/P.

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/include/asm/mmu-44x.h | 2 ++
arch/powerpc/kernel/head_44x.S | 24 ++++++++++++++++++++++--
arch/powerpc/kernel/misc_32.S | 15 +++++++++++++++
arch/powerpc/lib/copy_32.S | 10 ++++++++++
arch/powerpc/mm/44x_mmu.c | 7 +++++--
arch/powerpc/platforms/Kconfig | 5 +++++
arch/powerpc/platforms/Kconfig.cputype | 4 ++++
7 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
index bf52d70..ca1b90c 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -8,6 +8,7 @@

#define PPC44x_MMUCR_TID 0x000000ff
#define PPC44x_MMUCR_STS 0x00010000
+#define PPC44x_MMUCR_U2 0x00200000

#define PPC44x_TLB_PAGEID 0
#define PPC44x_TLB_XLAT 1
@@ -32,6 +33,7 @@

/* Storage attribute and access control fields */
#define PPC44x_TLB_ATTR_MASK 0x0000ff80
+#define PPC44x_TLB_WL1 0x00100000 /* Write-through L1 */
#define PPC44x_TLB_U0 0x00008000 /* User 0 */
#define PPC44x_TLB_U1 0x00004000 /* User 1 */
#define PPC44x_TLB_U2 0x00002000 /* User 2 */
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 5e12b74..1f7ae60 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -429,7 +429,16 @@ finish_tlb_load_44x:
andi. r10,r12,_PAGE_USER /* User page ? */
beq 1f /* nope, leave U bits empty */
rlwimi r11,r11,3,26,28 /* yes, copy S bits to U */
-1: tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
+1:
+#ifdef CONFIG_L1_WRITETHROUGH
+ andi. r10, r11, PPC44x_TLB_I
+ bne 2f
+ oris r11,r11,PPC44x_TLB_WL1@h /* Add coherency for */
+ /* non-inhibited */
+ ori r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
+2:
+#endif /* CONFIG_L1_WRITETHROUGH */
+ tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */

/* Done...restore registers and get out of here.
*/
@@ -799,7 +808,11 @@ skpinv: addi r4,r4,1 /* Increment */
sync

/* Initialize MMUCR */
+#ifdef CONFIG_L1_WRITETHROUGH
+ lis r5, PPC44x_MMUCR_U2@h
+#else
li r5,0
+#endif /* CONFIG_L1_WRITETHROUGH */
mtspr SPRN_MMUCR,r5
sync

@@ -814,7 +827,14 @@ skpinv: addi r4,r4,1 /* Increment */
/* attrib fields */
/* Added guarded bit to protect against speculative loads/stores */
li r5,0
- ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
+#ifdef CONFIG_L1_WRITETHROUGH
+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+ PPC44x_TLB_G | PPC44x_TLB_U2)
+ oris r5,r5,PPC44x_TLB_WL1@h
+#else
+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+ PPC44x_TLB_G)
+#endif /* CONFIG_L1_WRITETHROUGH

li r0,63 /* TLB slot 63 */

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 094bd98..d88369b 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -506,7 +506,20 @@ _GLOBAL(clear_pages)
li r0,PAGE_SIZE/L1_CACHE_BYTES
slw r0,r0,r4
mtctr r0
+#ifdef CONFIG_L1_WRITETHROUGH
+ /* assuming 32 byte cacheline */
+ li r4, 0
+1: stw r4, 0(r3)
+ stw r4, 4(r3)
+ stw r4, 8(r3)
+ stw r4, 12(r3)
+ stw r4, 16(r3)
+ stw r4, 20(r3)
+ stw r4, 24(r3)
+ stw r4, 28(r3)
+#else
1: dcbz 0,r3
+#endif /* CONFIG_L1_WRITETHROUGH */
addi r3,r3,L1_CACHE_BYTES
bdnz 1b
blr
@@ -550,7 +563,9 @@ _GLOBAL(copy_page)
mtctr r0
1:
dcbt r11,r4
+#ifndef CONFIG_L1_WRITETHROUGH
dcbz r5,r3
+#endif
COPY_16_BYTES
#if L1_CACHE_BYTES >= 32
COPY_16_BYTES
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index 55f19f9..98a07e3 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -98,7 +98,11 @@ _GLOBAL(cacheable_memzero)
bdnz 4b
3: mtctr r9
li r7,4
+#ifdef CONFIG_L1_WRITETHROUGH
+10:
+#else
10: dcbz r7,r6
+#endif /* CONFIG_L1_WRITETHROUGH */
addi r6,r6,CACHELINE_BYTES
bdnz 10b
clrlwi r5,r8,32-LG_CACHELINE_BYTES
@@ -187,7 +191,9 @@ _GLOBAL(cacheable_memcpy)
mtctr r0
beq 63f
53:
+#ifndef CONFIG_L1_WRITETHROUGH
dcbz r11,r6
+#endif /* CONFIG_L1_WRITETHROUGH */
COPY_16_BYTES
#if L1_CACHE_BYTES >= 32
COPY_16_BYTES
@@ -368,7 +374,11 @@ _GLOBAL(__copy_tofrom_user)
mtctr r8

53: dcbt r3,r4
+#ifdef CONFIG_L1_WRITETHROUGH
+54:
+#else
54: dcbz r11,r6
+#endif
.section __ex_table,"a"
.align 2
.long 54b,105f
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 024acab..b684c8a 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -80,9 +80,12 @@ static void __init ppc44x_pin_tlb(unsigned int virt, unsigned int phys)
:
#ifdef CONFIG_PPC47x
: "r" (PPC47x_TLB2_S_RWX),
-#else
+#elseif CONFIG_L1_WRITETHROUGH
+ : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \
+ | PPC44x_TLB_U2 | PPC44x_TLB_M),
+#else /* neither CONFIG_PPC47x or CONFIG_L1_WRITETHROUGH */
: "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G),
-#endif
+#endif /* CONFIG_PPC47x */
"r" (phys),
"r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M),
"r" (entry),
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index f7b0772..684a281 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -348,4 +348,9 @@ config XILINX_PCI
bool "Xilinx PCI host bridge support"
depends on PCI && XILINX_VIRTEX

+config L1_WRITETHROUGH
+ bool "Blue Gene/P enabled writethrough mode"
+ depends on BGP
+ default y
+
endmenu
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 111138c..3a3c711 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -329,9 +329,13 @@ config NOT_COHERENT_CACHE
bool
depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
default n if PPC_47x
+ default n if BGP
default y

config CHECK_CACHE_COHERENCY
bool

+config L1_WRITETHROUGH
+ bool
+
endmenu
--
1.7.4.1

2011-05-18 21:25:27

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 5/7] [RFC] force 32-byte aligned kmallocs

For BGP, it is convenient for 'kmalloc' to come back with 32-byte
aligned units for torus DMA

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/include/asm/page_32.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/page_32.h b/arch/powerpc/include/asm/page_32.h
index 68d73b2..fb0a7ae 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -9,7 +9,7 @@

#define VM_DATA_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS32

-#ifdef CONFIG_NOT_COHERENT_CACHE
+#if defined(CONFIG_NOT_COHERENT_CACHE) || defined(CONFIG_BGP)
#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
#endif

--
1.7.4.1

2011-05-18 21:25:33

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 6/7] [RFC] enable early TLBs for BG/P

BG/P maps firmware with an early TLB

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/include/asm/mmu-44x.h | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
index ca1b90c..2807d6e 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -115,8 +115,12 @@ typedef struct {
#endif /* !__ASSEMBLY__ */

#ifndef CONFIG_PPC_EARLY_DEBUG_44x
+#ifndef CONFIG_BGP
#define PPC44x_EARLY_TLBS 1
-#else
+#else /* CONFIG_BGP */
+#define PPC44x_EARLY_TLBS 2
+#endif /* CONFIG_BGP */
+#else /* CONFIG_PPC_EARLY_DEBUG_44x */
#define PPC44x_EARLY_TLBS 2
#define PPC44x_EARLY_DEBUG_VIRTADDR (ASM_CONST(0xf0000000) \
| (ASM_CONST(CONFIG_PPC_EARLY_DEBUG_44x_PHYSLOW) & 0xffff))
--
1.7.4.1

2011-05-18 21:25:41

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 7/7] [RFC] SMP support code

This patch adds the necessary core code to enable SMP support on BlueGene/P

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/kernel/head_44x.S | 72 +++++++++++++++++++++++++++++
arch/powerpc/mm/fault.c | 77 ++++++++++++++++++++++++++++++++
arch/powerpc/platforms/Kconfig.cputype | 2 +-
3 files changed, 150 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 1f7ae60..57d4483 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1133,6 +1133,70 @@ clear_utlb_entry:

#endif /* CONFIG_PPC_47x */

+#if defined(CONFIG_BGP) && defined(CONFIG_SMP)
+_GLOBAL(start_secondary_bgp)
+ /* U2 will be enabled in TLBs. */
+ lis r7,PPC44x_MMUCR_U2@h
+ mtspr SPRN_MMUCR,r7
+ li r7,0
+ mtspr SPRN_PID,r7
+ sync
+ lis r8,KERNELBASE@h
+
+ /* The tlb_44x_hwater global var (setup by cpu#0) reveals how many
+ * 256M TLBs we need to map.
+ */
+ lis r9, tlb_44x_hwater@ha
+ lwz r9, tlb_44x_hwater@l(r9)
+
+ li r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+ PPC44x_TLB_M|PPC44x_TLB_U2)
+ oris r5, r5, PPC44x_TLB_WL1@h
+
+ /* tlb_44x_hwater is the biggest TLB slot number for regular TLBs.
+ TLB 63 covers kernel base mapping(256MB) and TLB 62 covers CNS.
+ With 768MB lowmem, it is set to 59.
+ */
+2:
+ addi r9, r9, 1
+ cmpwi r9,62 /* Stop at entry 62 which is the fw */
+ beq 3f
+ addis r7,r7,0x1000 /* add 256M */
+ addis r8,r8,0x1000
+ ori r6,r8,PPC44x_TLB_VALID | PPC44x_TLB_256M
+
+ tlbwe r6,r9,PPC44x_TLB_PAGEID /* Load the pageid fields */
+ tlbwe r7,r9,PPC44x_TLB_XLAT /* Load the translation fields */
+ tlbwe r5,r9,PPC44x_TLB_ATTRIB /* Load the attrib/access fields */
+ b 2b
+
+3: isync
+
+ /* Setup context from global var secondary_ti */
+ lis r1, secondary_ti@ha
+ lwz r1, secondary_ti@l(r1)
+ lwz r2, TI_TASK(r1) /* r2 = task_info */
+
+ addi r3,r2,THREAD /* init task's THREAD */
+ mtspr SPRN_SPRG3,r3
+
+ li r0,0
+ stwu r0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
+
+ /* Let's move on */
+ lis r4,start_secondary@h
+ ori r4,r4,start_secondary@l
+ lis r3,MSR_KERNEL@h
+ ori r3,r3,MSR_KERNEL@l
+ mtspr SPRN_SRR0,r4
+ mtspr SPRN_SRR1,r3
+ rfi /* change context and jump to start_secondary */
+
+_GLOBAL(start_secondary_resume)
+ /* I don't think this currently happens on BGP */
+ b .
+#endif /* CONFIG_BGP && CONFIG_SMP */
+
/*
* Here we are back to code that is common between 44x and 47x
*
@@ -1144,6 +1208,14 @@ head_start_common:
lis r4,interrupt_base@h /* IVPR only uses the high 16-bits */
mtspr SPRN_IVPR,r4

+#if defined(CONFIG_BGP) && defined(CONFIG_SMP)
+ /* are we an additional CPU */
+ li r0, 0
+ mfspr r4, SPRN_PIR
+ cmpw r4, r0
+ bgt start_secondary_bgp
+#endif /* CONFIG_BGP && CONFIG_SMP */
+
addis r22,r22,KERNELBASE@h
mtlr r22
isync
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 54f4fb9..0e73244 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -103,6 +103,77 @@ static int store_updates_sp(struct pt_regs *regs)
return 0;
}

+#ifdef CONFIG_BGP
+/*
+ * The icbi instruction does not broadcast to all cpus in the ppc450
+ * processor used by Blue Gene/P. It is unlikely this problem will
+ * be exhibited in other processors so this remains ifdef'ed for BGP
+ * specifically.
+ *
+ * We deal with this by marking executable pages either writable, or
+ * executable, but never both. The permissions will fault back and
+ * forth if the thread is actively writing to executable sections.
+ * Each time we fault to become executable we flush the dcache into
+ * icache on all cpus.
+ */
+struct bgp_fixup_parm {
+ struct page *page;
+ unsigned long address;
+ struct vm_area_struct *vma;
+};
+
+static void bgp_fixup_cache_tlb(void *parm)
+{
+ struct bgp_fixup_parm *p = parm;
+
+ if (!PageHighMem(p->page))
+ flush_dcache_icache_page(p->page);
+ local_flush_tlb_page(p->vma, p->address);
+}
+
+static void bgp_fixup_access_perms(struct vm_area_struct *vma,
+ unsigned long address,
+ int is_write, int is_exec)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ pte_t *ptep = NULL;
+ pmd_t *pmdp;
+
+ if (get_pteptr(mm, address, &ptep, &pmdp)) {
+ spinlock_t *ptl = pte_lockptr(mm, pmdp);
+ pte_t old;
+
+ spin_lock(ptl);
+ old = *ptep;
+ if (pte_present(old)) {
+ struct page *page = pte_page(old);
+
+ if (is_exec) {
+ struct bgp_fixup_parm param = {
+ .page = page,
+ .address = address,
+ .vma = vma,
+ };
+ pte_update(ptep, _PAGE_HWWRITE, 0);
+ on_each_cpu(bgp_fixup_cache_tlb, &param, 1);
+ pte_update(ptep, 0, _PAGE_EXEC);
+ pte_unmap_unlock(ptep, ptl);
+ return;
+ }
+ if (is_write &&
+ (pte_val(old) & _PAGE_RW) &&
+ (pte_val(old) & _PAGE_DIRTY) &&
+ !(pte_val(old) & _PAGE_HWWRITE)) {
+ pte_update(ptep, _PAGE_EXEC, _PAGE_HWWRITE);
+ }
+ }
+ if (!pte_same(old, *ptep))
+ flush_tlb_page(vma, address);
+ pte_unmap_unlock(ptep, ptl);
+ }
+}
+#endif /* CONFIG_BGP */
+
/*
* For 600- and 800-family processors, the error_code parameter is DSISR
* for a data fault, SRR1 for an instruction fault. For 400-family processors
@@ -333,6 +404,12 @@ good_area:
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, 0,
regs, address);
}
+
+#ifdef CONFIG_BGP
+ /* Fixup _PAGE_EXEC and _PAGE_HWWRITE if necessary */
+ bgp_fixup_access_perms(vma, address, is_write, is_exec);
+#endif /* CONFIG_BGP */
+
up_read(&mm->mmap_sem);
return 0;

diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 3a3c711..b77a25f 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -300,7 +300,7 @@ config PPC_PERF_CTRS
This enables the powerpc-specific perf_event back-end.

config SMP
- depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x
+ depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x || BGP
bool "Symmetric multi-processing support"
---help---
This enables support for systems with more than one CPU. If you have
--
1.7.4.1

2011-05-19 05:58:41

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

Eric,

> This patch adds save/restore register support for the BlueGene/P
> double hummer FPU.

What does this mean? Needs more details here.

> Signed-off-by: Eric Van Hensbergen <[email protected]>
> ---
> arch/powerpc/include/asm/ppc_asm.h | 39 ++++++++++++++++++++++++----------
-
> arch/powerpc/kernel/fpu.S | 8 +++---
> arch/powerpc/platforms/44x/Kconfig | 9 ++++++++
> 3 files changed, 40 insertions(+), 16 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/pp
c_asm.h
> index 9821006..daa22bb 100644
> --- a/arch/powerpc/include/asm/ppc_asm.h
> +++ b/arch/powerpc/include/asm/ppc_asm.h
> @@ -88,6 +88,13 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> REST_10GPRS(22, base)
> #endif
>
> +#ifdef CONFIG_BGP
> +#define LFPDX(frt, ra, rb) .long (31<<26)|((frt)<<21)|((ra)<<16)| \
> + ((rb)<<11)|(462<<1)
> +#define STFPDX(frt, ra, rb) .long (31<<26)|((frt)<<21)|((ra)<<16)| \
> + ((rb)<<11)|(974<<1)
> +#endif /* CONFIG_BGP */

Put these in arch/powerpc/include/asm/ppc-opcode.h and reformat to fit
whats there already.

Also, don't need to put these defines inside a #ifdef.

> +
> #define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
> #define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
> #define SAVE_8GPRS(n, base) SAVE_4GPRS(n, base); SAVE_4GPRS(n+4, base)
> @@ -97,18 +104,26 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> #define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
> #define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
>
> -#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
> -#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
> -#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
> -#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
> -#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
> -#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
> -#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
> -#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
> -#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
> -#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
> -#define REST_16FPRS(n, base) REST_8FPRS(n, base); REST_8FPRS(n+8, base)
> -#define REST_32FPRS(n, base) REST_16FPRS(n, base); REST_16FPRS(n+16, base)
> +#ifdef CONFIG_BGP
> +#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
> +#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)

16*? Are these FP regs 64 or 128 bits wide? If 128 you are doing to
have to play with TS_WIDTH to get the size of the FPs correct in the
thread_struct.

I think there's a bug here.

> +#else /* CONFIG_BGP */
> +#define SAVE_FPR(n, b, base) (stfd n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
> +#define REST_FPR(n, b, base) (lfd n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base))
> +#endif /* CONFIG_BGP */
> +
> +#define SAVE_2FPRS(n, b, base) SAVE_FPR(n, b, base); SAVE_FPR(n+1, b,
base)
> +#define SAVE_4FPRS(n, b, base) SAVE_2FPRS(n, b, base); SAVE_2FPRS(n+2,
b, base)
> +#define SAVE_8FPRS(n, b, base) SAVE_4FPRS(n, b, base); SAVE_4FPRS(n+4,
b, base)
> +#define SAVE_16FPRS(n, b, base) SAVE_8FPRS(n, b, base); SAVE_8FPRS(n+8,
b, base)
> +#define SAVE_32FPRS(n, b, base) SAVE_16FPRS(n, b, base); \
> + SAVE_16FPRS(n+16, b, base)
> +#define REST_2FPRS(n, b, base) REST_FPR(n, b, base); REST_FPR(n+1, b,
base)
> +#define REST_4FPRS(n, b, base) REST_2FPRS(n, b, base); REST_2FPRS(n+2,
b, base)
> +#define REST_8FPRS(n, b, base) REST_4FPRS(n, b, base); REST_4FPRS(n+4,
b, base)
> +#define REST_16FPRS(n, b, base) REST_8FPRS(n, b, base); REST_8FPRS(n+8,
b, base)
> +#define REST_32FPRS(n, b, base) REST_16FPRS(n, b, base); \
> + REST_16FPRS(n+16, b, base)
>
> #define SAVE_VR(n,b,base) li b,THREAD_VR0+(16*(n)); stvx n,base,b
> #define SAVE_2VRS(n,b,base) SAVE_VR(n,b,base); SAVE_VR(n+1,b,base)
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index de36955..9f11c66 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -30,7 +30,7 @@
> BEGIN_FTR_SECTION \
> b 2f; \
> END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> - REST_32FPRS(n,base); \
> + REST_32FPRS(n,c,base); \
> b 3f; \
> 2: REST_32VSRS(n,c,base); \
> 3:
> @@ -39,13 +39,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);
\
> BEGIN_FTR_SECTION \
> b 2f; \
> END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> - SAVE_32FPRS(n,base); \
> + SAVE_32FPRS(n,c,base); \
> b 3f; \
> 2: SAVE_32VSRS(n,c,base); \
> 3:
> #else
> -#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
> -#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
> +#define REST_32FPVSRS(n,b,base) REST_32FPRS(n,b,base)
> +#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n,b,base)
> #endif
>
> /*
> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/
Kconfig
> index f485fc5f..24a515e 100644
> --- a/arch/powerpc/platforms/44x/Kconfig
> +++ b/arch/powerpc/platforms/44x/Kconfig
> @@ -169,6 +169,15 @@ config YOSEMITE
> help
> This option enables support for the AMCC PPC440EP evaluation board.
>
> +config BGP

Does this FPU feature have a specific name like double hammer? I'd
rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
something like that...

> + bool "Blue Gene/P"
> + depends on 44x
> + default n
> + select PPC_FPU
> + select PPC_DOUBLE_FPU

... in fact, it seem you are doing something like these here but you
don't use PPC_DOUBLE_FPU anywhere?

Mikey

> + help
> + This option enables support for the IBM BlueGene/P supercomputer.
> +
> config ISS4xx
> bool "ISS 4xx Simulator"
> depends on (44x || 40x)
> --
> 1.7.4.1
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

2011-05-19 10:42:42

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH 4/7] [RFC] enable L1_WRITETHROUGH mode for BG/P

On Wed, May 18, 2011 at 04:24:52PM -0500, Eric Van Hensbergen wrote:
>BG/P nodes need to be configured for writethrough to work in SMP
>configurations. This patch adds the right hooks in the MMU code
>to make sure L1_WRITETHROUGH configurations are setup for BG/P.
>
>Signed-off-by: Eric Van Hensbergen <[email protected]>
>---
> arch/powerpc/include/asm/mmu-44x.h | 2 ++
> arch/powerpc/kernel/head_44x.S | 24 ++++++++++++++++++++++--
> arch/powerpc/kernel/misc_32.S | 15 +++++++++++++++
> arch/powerpc/lib/copy_32.S | 10 ++++++++++
> arch/powerpc/mm/44x_mmu.c | 7 +++++--
> arch/powerpc/platforms/Kconfig | 5 +++++
> arch/powerpc/platforms/Kconfig.cputype | 4 ++++
> 7 files changed, 63 insertions(+), 4 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
>index bf52d70..ca1b90c 100644
>--- a/arch/powerpc/include/asm/mmu-44x.h
>+++ b/arch/powerpc/include/asm/mmu-44x.h
>@@ -8,6 +8,7 @@
>
> #define PPC44x_MMUCR_TID 0x000000ff
> #define PPC44x_MMUCR_STS 0x00010000
>+#define PPC44x_MMUCR_U2 0x00200000
>
> #define PPC44x_TLB_PAGEID 0
> #define PPC44x_TLB_XLAT 1
>@@ -32,6 +33,7 @@
>
> /* Storage attribute and access control fields */
> #define PPC44x_TLB_ATTR_MASK 0x0000ff80
>+#define PPC44x_TLB_WL1 0x00100000 /* Write-through L1 */
> #define PPC44x_TLB_U0 0x00008000 /* User 0 */
> #define PPC44x_TLB_U1 0x00004000 /* User 1 */
> #define PPC44x_TLB_U2 0x00002000 /* User 2 */
>diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
>index 5e12b74..1f7ae60 100644
>--- a/arch/powerpc/kernel/head_44x.S
>+++ b/arch/powerpc/kernel/head_44x.S
>@@ -429,7 +429,16 @@ finish_tlb_load_44x:
> andi. r10,r12,_PAGE_USER /* User page ? */
> beq 1f /* nope, leave U bits empty */
> rlwimi r11,r11,3,26,28 /* yes, copy S bits to U */
>-1: tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
>+1:
>+#ifdef CONFIG_L1_WRITETHROUGH
>+ andi. r10, r11, PPC44x_TLB_I
>+ bne 2f
>+ oris r11,r11,PPC44x_TLB_WL1@h /* Add coherency for */
>+ /* non-inhibited */
>+ ori r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
>+2:
>+#endif /* CONFIG_L1_WRITETHROUGH */
>+ tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
>
> /* Done...restore registers and get out of here.
> */
>@@ -799,7 +808,11 @@ skpinv: addi r4,r4,1 /* Increment */
> sync
>
> /* Initialize MMUCR */
>+#ifdef CONFIG_L1_WRITETHROUGH
>+ lis r5, PPC44x_MMUCR_U2@h
>+#else
> li r5,0
>+#endif /* CONFIG_L1_WRITETHROUGH */
> mtspr SPRN_MMUCR,r5
> sync
>
>@@ -814,7 +827,14 @@ skpinv: addi r4,r4,1 /* Increment */
> /* attrib fields */
> /* Added guarded bit to protect against speculative loads/stores */
> li r5,0
>- ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
>+#ifdef CONFIG_L1_WRITETHROUGH
>+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
>+ PPC44x_TLB_G | PPC44x_TLB_U2)
>+ oris r5,r5,PPC44x_TLB_WL1@h
>+#else
>+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
>+ PPC44x_TLB_G)
>+#endif /* CONFIG_L1_WRITETHROUGH
>
> li r0,63 /* TLB slot 63 */
>
>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
>index 094bd98..d88369b 100644
>--- a/arch/powerpc/kernel/misc_32.S
>+++ b/arch/powerpc/kernel/misc_32.S
>@@ -506,7 +506,20 @@ _GLOBAL(clear_pages)
> li r0,PAGE_SIZE/L1_CACHE_BYTES
> slw r0,r0,r4
> mtctr r0
>+#ifdef CONFIG_L1_WRITETHROUGH
>+ /* assuming 32 byte cacheline */
>+ li r4, 0
>+1: stw r4, 0(r3)
>+ stw r4, 4(r3)
>+ stw r4, 8(r3)
>+ stw r4, 12(r3)
>+ stw r4, 16(r3)
>+ stw r4, 20(r3)
>+ stw r4, 24(r3)
>+ stw r4, 28(r3)
>+#else
> 1: dcbz 0,r3
>+#endif /* CONFIG_L1_WRITETHROUGH */
> addi r3,r3,L1_CACHE_BYTES
> bdnz 1b
> blr
>@@ -550,7 +563,9 @@ _GLOBAL(copy_page)
> mtctr r0
> 1:
> dcbt r11,r4
>+#ifndef CONFIG_L1_WRITETHROUGH
> dcbz r5,r3
>+#endif
> COPY_16_BYTES
> #if L1_CACHE_BYTES >= 32
> COPY_16_BYTES
>diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
>index 55f19f9..98a07e3 100644
>--- a/arch/powerpc/lib/copy_32.S
>+++ b/arch/powerpc/lib/copy_32.S
>@@ -98,7 +98,11 @@ _GLOBAL(cacheable_memzero)
> bdnz 4b
> 3: mtctr r9
> li r7,4
>+#ifdef CONFIG_L1_WRITETHROUGH
>+10:
>+#else
> 10: dcbz r7,r6
>+#endif /* CONFIG_L1_WRITETHROUGH */
> addi r6,r6,CACHELINE_BYTES
> bdnz 10b
> clrlwi r5,r8,32-LG_CACHELINE_BYTES
>@@ -187,7 +191,9 @@ _GLOBAL(cacheable_memcpy)
> mtctr r0
> beq 63f
> 53:
>+#ifndef CONFIG_L1_WRITETHROUGH
> dcbz r11,r6
>+#endif /* CONFIG_L1_WRITETHROUGH */
> COPY_16_BYTES
> #if L1_CACHE_BYTES >= 32
> COPY_16_BYTES
>@@ -368,7 +374,11 @@ _GLOBAL(__copy_tofrom_user)
> mtctr r8
>
> 53: dcbt r3,r4
>+#ifdef CONFIG_L1_WRITETHROUGH
>+54:
>+#else
> 54: dcbz r11,r6
>+#endif
> .section __ex_table,"a"
> .align 2
> .long 54b,105f
>diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
>index 024acab..b684c8a 100644
>--- a/arch/powerpc/mm/44x_mmu.c
>+++ b/arch/powerpc/mm/44x_mmu.c
>@@ -80,9 +80,12 @@ static void __init ppc44x_pin_tlb(unsigned int virt, unsigned int phys)
> :
> #ifdef CONFIG_PPC47x
> : "r" (PPC47x_TLB2_S_RWX),
>-#else
>+#elseif CONFIG_L1_WRITETHROUGH
>+ : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \
>+ | PPC44x_TLB_U2 | PPC44x_TLB_M),
>+#else /* neither CONFIG_PPC47x or CONFIG_L1_WRITETHROUGH */
> : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G),
>-#endif
>+#endif /* CONFIG_PPC47x */
> "r" (phys),
> "r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M),
> "r" (entry),
>diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
>index f7b0772..684a281 100644
>--- a/arch/powerpc/platforms/Kconfig
>+++ b/arch/powerpc/platforms/Kconfig
>@@ -348,4 +348,9 @@ config XILINX_PCI
> bool "Xilinx PCI host bridge support"
> depends on PCI && XILINX_VIRTEX
>
>+config L1_WRITETHROUGH
>+ bool "Blue Gene/P enabled writethrough mode"
>+ depends on BGP
>+ default y

You add this config option here, named generically, but then make it
depend on BGP. It sees it should be named BGP_L1_WRITETHROUGH, and then
just selected by the BGP platform. But then....

> endmenu
>diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
>index 111138c..3a3c711 100644
>--- a/arch/powerpc/platforms/Kconfig.cputype
>+++ b/arch/powerpc/platforms/Kconfig.cputype
>@@ -329,9 +329,13 @@ config NOT_COHERENT_CACHE
> bool
> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> default n if PPC_47x
>+ default n if BGP
> default y
>
> config CHECK_CACHE_COHERENCY
> bool
>
>+config L1_WRITETHROUGH
>+ bool

You add an identical option down here. Confused.

josh

2011-05-19 11:00:06

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH 1/7] [RFC] Mainline BG/P platform support

On Wed, May 18, 2011 at 04:24:49PM -0500, Eric Van Hensbergen wrote:
>The Linux kernel patches for the IBM BlueGene/P have been open-sourced
>for quite some time, but haven't been integrated into the mainline Linux
>kernel source tree. This is the first patch series of several where I
>will attempt to cleanup and mainline the already public patches. I
>welcome feedback as well as any help I can get. I'm drawing on
>the patches available for the IBM Compute Node kernel, the ZeptoOS project
>and the Kittyhawk project.
>(all available from http://wiki.bg.anl-external.org)
>
>I'll be prioritizing core patches which are harder to keep current with
>mainline due to merge conflicts and then slowly incorporating the drivers
>and other extensions (if acceptable after community review).
>
>I'll be maintaining the patchset in my kernel.org repository
>(/pub/scm/linux/kernel/git/ericvh/bluegene.git) under the bluegene
>branch with the source repos (zepto, kittyhawk, ibmcn) available in
>respective branches. Ben - if you would prefer me to send pull requests
>once we get rolling, I can switch to that -- otherwise I'll stick to
>just submitting patches to the list assuming you'll pull them when they
>become acceptable. Thanks for your attention reviewing these patches.

This is going to get slightly messy if there are lots of changes in
platforms/44x and/or head_44x.S. Most 4xx changes go through my tree,
and I'm happy to pull from your tree once things get off the ground. We
just need to make sure and coordinate as we go.

My tree is fairly low-churn (as is all of 4xx) so hopefully I'm worried
for nothing.

>Signed-off-by: Eric Van Hensbergen <[email protected]>
>---
> MAINTAINERS | 8 ++++++++
> 1 files changed, 8 insertions(+), 0 deletions(-)
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 69f19f1..3ffca88 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -3863,6 +3863,14 @@ S: Maintained
> F: arch/powerpc/platforms/40x/
> F: arch/powerpc/platforms/44x/
>
>+LINUX FOR POWERPC BLUEGENE/P
>+M: Eric Van Hensbergen <[email protected]>
>+W: http://bg-linux.anl-external.org/wiki/index.php/Main_Page
>+L: [email protected]
>+T: git git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/bluegene.git
>+S: Maintained
>+F: arch/powerpc/platforms/44x/bgp*

This should probably be the last patch in the series. You have a file
pattern listed for files that don't exist at all in any of the other
patches you submitted :).

josh

2011-05-19 12:35:14

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 1/7] [RFC] Mainline BG/P platform support

On Thu, May 19, 2011 at 6:01 AM, Josh Boyer <[email protected]> wrote:
> On Wed, May 18, 2011 at 04:24:49PM -0500, Eric Van Hensbergen wrote:
>>
>>I'll be maintaining the patchset in my kernel.org repository
>>(/pub/scm/linux/kernel/git/ericvh/bluegene.git) under the bluegene
>>branch with the source repos (zepto, kittyhawk, ibmcn) available in
>>respective branches. ?Ben - if you would prefer me to send pull requests
>>once we get rolling, I can switch to that -- otherwise I'll stick to
>>just submitting patches to the list assuming you'll pull them when they
>>become acceptable. ?Thanks for your attention reviewing these patches.
>
> This is going to get slightly messy if there are lots of changes in
> platforms/44x and/or head_44x.S. ?Most 4xx changes go through my tree,
> and I'm happy to pull from your tree once things get off the ground. ?We
> just need to make sure and coordinate as we go.
>

I'm fine with processing the changes through your tree. Most of the items
with conflicts are in this series, so hopefully it won't be too messy (outside
of some Makefile and Kconfig changes which are much easier to merge)
after this. So, should I base changes on:
http://git.kernel.org/?p=linux/kernel/git/jwboyer/powerpc-4xx.git;a=shortlog;h=refs/heads/next
or:
http://git.kernel.org/?p=linux/kernel/git/benh/powerpc.git;a=shortlog;h=refs/heads/next

There are some important questions on code organization which it would
probably be a good idea to discuss at some point -- in particular what I
should do about the device drivers. Pretty much every driver except for
the ethernet is particular to this platform. IIRC some of the embedded
platforms have the SOC drivers in the platforms directory -- but it doesn't
seem like you've done this with 4xx so I was gonna just place them in
the appropriate drivers/* directory. The other question is that there are
a number of patches which involve communication which a somewhat
substantial firmware layer. You can get an idea of the existing patch's
code organization by looking at:

http://git.kernel.org/?p=linux/kernel/git/ericvh/bluegene.git;a=commit;h=bee9f329eeef6c8eb95c35de4c5d22a0c05a1b3e

Its important to point out that I am going through and cleaning up as I
go, so not everything from that patch will make the cut as is (or perhaps
even at all) -- but that should help identify where potential conflicts are
as well as potentially out of place code.

>>
>>+LINUX FOR POWERPC BLUEGENE/P
>>+M: ? ?Eric Van Hensbergen <[email protected]>
>>+W: ? ?http://bg-linux.anl-external.org/wiki/index.php/Main_Page
>>+L: ? [email protected]
>>+T: ? ?git git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/bluegene.git
>>+S: ? ?Maintained
>>+F: ? ?arch/powerpc/platforms/44x/bgp*
>
> This should probably be the last patch in the series. ?You have a file
> pattern listed for files that don't exist at all in any of the other
> patches you submitted :).
>

Yeah, I wondered about that, its just I hate patch series intro messages with
no patch, and I figured this was a good way out of it. It also adds
the relevant
info as far as mailing lists and wiki pages where folks can go for more info --
but if folks have a problem with it I'll kill it until we get
everything else in.

-eric

2011-05-19 12:53:37

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 4/7] [RFC] enable L1_WRITETHROUGH mode for BG/P

On Thu, May 19, 2011 at 5:43 AM, Josh Boyer <[email protected]> wrote:
> On Wed, May 18, 2011 at 04:24:52PM -0500, Eric Van Hensbergen wrote:
>>
>>+config L1_WRITETHROUGH
>>+ ? ? ?bool "Blue Gene/P enabled writethrough mode"
>>+ ? ? ?depends on BGP
>>+ ? ? ?default y
>
> You add this config option here, named generically, but then make it
> depend on BGP. ?It sees it should be named BGP_L1_WRITETHROUGH, and then
> just selected by the BGP platform. ?But then....
>
>> endmenu
>>diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
>>index 111138c..3a3c711 100644
>>--- a/arch/powerpc/platforms/Kconfig.cputype
>>+++ b/arch/powerpc/platforms/Kconfig.cputype
>>@@ -329,9 +329,13 @@ config NOT_COHERENT_CACHE
>> ? ? ? bool
>> ? ? ? depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>> ? ? ? default n if PPC_47x
>>+ ? ? ?default n if BGP
>> ? ? ? default y
>>
>> config CHECK_CACHE_COHERENCY
>> ? ? ? bool
>>
>>+config L1_WRITETHROUGH
>>+ ? ? ?bool
>
> You add an identical option down here. ?Confused.
>

Yeah, this was copied from the original patches and it confused me as
well, but I had never
modified Kconfig.cputype before so I wasn't sure if there were some
weird rules. I'm happy
to remove the vestigial one and make the changes you suggest to make
the naming BGP
specific.

-eric

2011-05-19 13:53:54

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <[email protected]> wrote:
> Eric,
>
>> This patch adds save/restore register support for the BlueGene/P
>> double hummer FPU.
>
> What does this mean? ?Needs more details here.
>

Hi Mikey,

any specific details you are looking for here? AFAIK these patches
are required for the kernel to save/restore the double hummer
properly.

>>
>> +#ifdef CONFIG_BGP
>> +#define LFPDX(frt, ra, rb) ? .long (31<<26)|((frt)<<21)|((ra)<<16)| \
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ((rb)<<11)|(462<<1)
>> +#define STFPDX(frt, ra, rb) ?.long (31<<26)|((frt)<<21)|((ra)<<16)| \
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ((rb)<<11)|(974<<1)
>> +#endif /* CONFIG_BGP */
>
> Put these in arch/powerpc/include/asm/ppc-opcode.h and reformat to fit
> whats there already.
>
> Also, don't need to put these defines inside a #ifdef.
>

Sure, I'll fix that up.

>> +#ifdef CONFIG_BGP
>> +#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
>> +#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)
>
> 16*? ?Are these FP regs 64 or 128 bits wide? ?If 128 you are doing to
> have to play with TS_WIDTH to get the size of the FPs correct in the
> thread_struct.
>
> I think there's a bug here.
>

I actually have three different versions of this code from different
source patches that I'm drawing from - so your help in figuring out
the best way to approach this is appreciated. The kittyhawk version
of the code has 8* instead of 16*. According to the docs:
"Each of the two FPU units contains 32 64-bit floating point registers
for a total of 64 FP registers per processor." which would seem to
point to the kittyhawk version - but they have a second SAVE_32SFPRS
for the second hummer. What wasn't clear to me with this version of
the code was whether or not they were doing something clever like
saving the pair of the 64-bit FPU registers in a single 128-bit slot
(seems plausible). If this is not the way to go, I can certainly
switch the kittyhawk version of the patch with the *, the extra
SAVE32SFPR and the extra double hummer specific storage space in the
thread_struct. If it would help I can post an alternate version of
the patch for discussion with the kittyhawk version.

>> ?/*
>> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/
> Kconfig
>> index f485fc5f..24a515e 100644
>> --- a/arch/powerpc/platforms/44x/Kconfig
>> +++ b/arch/powerpc/platforms/44x/Kconfig
>> @@ -169,6 +169,15 @@ config YOSEMITE
>> ? ? ? help
>> ? ? ? ? This option enables support for the AMCC PPC440EP evaluation board.
>>
>> +config ? ? ? BGP
>
> Does this FPU feature have a specific name like double hammer? ?I'd
> rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
> something like that...
>
>> + ? ? bool "Blue Gene/P"
>> + ? ? depends on 44x
>> + ? ? default n
>> + ? ? select PPC_FPU
>> + ? ? select PPC_DOUBLE_FPU
>
> ... in fact, it seem you are doing something like these here but you
> don't use PPC_DOUBLE_FPU anywhere?
>

A fair point. I'm fine with calling it DOUBLE_HUMMER, but I wasn't sure if
that was "too internal" of a name for the kernel. Let me know and
I'll fix it up.
I'll also change the CONFIG_BGP defines in the FPU code to PPC_DOUBLE_FPU
or PPC_DOUBLE_HUMMER depending on what the community decides.

Thanks for the feedback!

-eric

2011-05-19 21:36:36

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

In message <[email protected]> you wrote:
> On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <[email protected]> wrote=
> :
> > Eric,
> >
> >> This patch adds save/restore register support for the BlueGene/P
> >> double hummer FPU.
> >
> > What does this mean? =A0Needs more details here.
> >
>
> Hi Mikey,
>
> any specific details you are looking for here? AFAIK these patches
> are required for the kernel to save/restore the double hummer
> properly.

I should have been more specific. What does double hammer mean?

I description of how double hammer differs from normal and why a change
in the fpu code is needed would be great.

>
> >>
> >> +#ifdef CONFIG_BGP
> >> +#define LFPDX(frt, ra, rb) =A0 .long (31<<26)|((frt)<<21)|((ra)<<16)| \
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ((rb)<<11)|(462<<1)
> >> +#define STFPDX(frt, ra, rb) =A0.long (31<<26)|((frt)<<21)|((ra)<<16)| \
> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ((rb)<<11)|(974<<1)
> >> +#endif /* CONFIG_BGP */
> >
> > Put these in arch/powerpc/include/asm/ppc-opcode.h and reformat to fit
> > whats there already.
> >
> > Also, don't need to put these defines inside a #ifdef.
> >
>
> Sure, I'll fix that up.
>
> >> +#ifdef CONFIG_BGP
> >> +#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base=
> , b)
> >> +#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base,=
> b)
> >
> > 16*? =A0Are these FP regs 64 or 128 bits wide? =A0If 128 you are doing to
> > have to play with TS_WIDTH to get the size of the FPs correct in the
> > thread_struct.
> >
> > I think there's a bug here.
> >
>
> I actually have three different versions of this code from different
> source patches that I'm drawing from - so your help in figuring out
> the best way to approach this is appreciated. The kittyhawk version
> of the code has 8* instead of 16*. According to the docs:
> "Each of the two FPU units contains 32 64-bit floating point registers
> for a total of 64 FP registers per processor." which would seem to
> point to the kittyhawk version - but they have a second SAVE_32SFPRS
> for the second hummer. What wasn't clear to me with this version of
> the code was whether or not they were doing something clever like
> saving the pair of the 64-bit FPU registers in a single 128-bit slot
> (seems plausible).

Ok, sounds like there is 32*8*2 bytes of data, rather than the normal
32*8 bytes for FP only (ignoring VSX). If this is the case, then you'll
need make 'fpr' in the thread struct bigger which you can do by setting
TS_FPRWIDTH = 2 like we do for VSX.

If there is some instruction that saves and restores two of these at a
time (which LFPDX/STFPDX might I guess), then we can use that, otherwise
we'll have to do 64 saves/restores. Double load/stores will be faster
I'm guessing though.

If two at a time, do we need to increase the index in pairs?

> If this is not the way to go, I can certainly
> switch the kittyhawk version of the patch with the *, the extra
> SAVE32SFPR and the extra double hummer specific storage space in the
> thread_struct.

I'd be tempted to keep it in the 'fpr' part of the struct so you can
then access it with ptrace/signals/core dumps.

> If it would help I can post an alternate version of the patch for
> discussion with the kittyhawk version.

Sure.

The most useful thing would be to see the instruction definition for
STFPDX/LFPDX.

>
> >> =A0/*
> >> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms=
> /44x/
> > Kconfig
> >> index f485fc5f..24a515e 100644
> >> --- a/arch/powerpc/platforms/44x/Kconfig
> >> +++ b/arch/powerpc/platforms/44x/Kconfig
> >> @@ -169,6 +169,15 @@ config YOSEMITE
> >> =A0 =A0 =A0 help
> >> =A0 =A0 =A0 =A0 This option enables support for the AMCC PPC440EP evalua=
> tion board.
> >>
> >> +config =A0 =A0 =A0 BGP
> >
> > Does this FPU feature have a specific name like double hammer? =A0I'd
> > rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
> > something like that...
> >
> >> + =A0 =A0 bool "Blue Gene/P"
> >> + =A0 =A0 depends on 44x
> >> + =A0 =A0 default n
> >> + =A0 =A0 select PPC_FPU
> >> + =A0 =A0 select PPC_DOUBLE_FPU
> >
> > ... in fact, it seem you are doing something like these here but you
> > don't use PPC_DOUBLE_FPU anywhere?
> >
>
> A fair point. I'm fine with calling it DOUBLE_HUMMER, but I wasn't sure if
> that was "too internal" of a name for the kernel. Let me know and
> I'll fix it up.

What I'm mostly concerned about is disassociating it with a particular
CPU.

If it has an external name, then all the better.

> I'll also change the CONFIG_BGP defines in the FPU code to PPC_DOUBLE_FPU
> or PPC_DOUBLE_HUMMER depending on what the community decides.

Mikey

2011-05-19 21:41:35

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 3/7] [RFC][V2] add support for BlueGene/P Double FPU

This patch adds save/restore register support for the BlueGene/P
double FPU. Since there are two FPUs, we need to save and restore
twice the registers. Fortunately BG/P gives us some opcodes to
assist with that task.

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/include/asm/ppc-opcode.h | 9 +++++++++
arch/powerpc/include/asm/ppc_asm.h | 32 ++++++++++++++++++++------------
arch/powerpc/kernel/fpu.S | 8 ++++----
arch/powerpc/platforms/44x/Kconfig | 9 +++++++++
arch/powerpc/platforms/Kconfig.cputype | 4 ++++
5 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 1255569..12a3cc9 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -56,6 +56,9 @@
#define PPC_INST_TLBSRX_DOT 0x7c0006a5
#define PPC_INST_XXLOR 0xf0000510

+#define PPC_INST_LFPDX 0x7c00039c
+#define PPC_INST_STFPDX 0x7c00079c
+
/* macros to insert fields into opcodes */
#define __PPC_RA(a) (((a) & 0x1f) << 16)
#define __PPC_RB(b) (((b) & 0x1f) << 11)
@@ -126,4 +129,10 @@
#define XXLOR(t, a, b) stringify_in_c(.long PPC_INST_XXLOR | \
VSX_XX3((t), (a), (b)))

+#define LFPDX(t, a, b) stringify_in_c(.long PPC_INST_LFPDX | \
+ __PPC_RT(t) | __PPC_RA(a) | __PPC_RB(b)))
+#define STFPDX(t, a, b) stringify_in_c(.long PPC_INST_STFPDX | \
+ __PPC_RT(t) | __PPC_RA(a) | __PPC_RB(b)))
+
+
#endif /* _ASM_POWERPC_PPC_OPCODE_H */
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 9821006..c5f05ad 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -97,18 +97,26 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)

-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
-#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
-#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
-#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
-#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
-#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
-#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
-#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
-#define REST_16FPRS(n, base) REST_8FPRS(n, base); REST_8FPRS(n+8, base)
-#define REST_32FPRS(n, base) REST_16FPRS(n, base); REST_16FPRS(n+16, base)
+#ifdef CONFIG_DOUBLE_FPU
+#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
+#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)
+#else /* CONFIG_DOUBLE_FPU */
+#define SAVE_FPR(n, b, base) stfd n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
+#define REST_FPR(n, b, base) lfd n, THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
+#endif /* CONFIG_DOUBLE_FPU */
+
+#define SAVE_2FPRS(n, b, base) SAVE_FPR(n, b, base); SAVE_FPR(n+1, b, base)
+#define SAVE_4FPRS(n, b, base) SAVE_2FPRS(n, b, base); SAVE_2FPRS(n+2, b, base)
+#define SAVE_8FPRS(n, b, base) SAVE_4FPRS(n, b, base); SAVE_4FPRS(n+4, b, base)
+#define SAVE_16FPRS(n, b, base) SAVE_8FPRS(n, b, base); SAVE_8FPRS(n+8, b, base)
+#define SAVE_32FPRS(n, b, base) SAVE_16FPRS(n, b, base); \
+ SAVE_16FPRS(n+16, b, base)
+#define REST_2FPRS(n, b, base) REST_FPR(n, b, base); REST_FPR(n+1, b, base)
+#define REST_4FPRS(n, b, base) REST_2FPRS(n, b, base); REST_2FPRS(n+2, b, base)
+#define REST_8FPRS(n, b, base) REST_4FPRS(n, b, base); REST_4FPRS(n+4, b, base)
+#define REST_16FPRS(n, b, base) REST_8FPRS(n, b, base); REST_8FPRS(n+8, b, base)
+#define REST_32FPRS(n, b, base) REST_16FPRS(n, b, base); \
+ REST_16FPRS(n+16, b, base)

#define SAVE_VR(n,b,base) li b,THREAD_VR0+(16*(n)); stvx n,base,b
#define SAVE_2VRS(n,b,base) SAVE_VR(n,b,base); SAVE_VR(n+1,b,base)
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index de36955..9f11c66 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -30,7 +30,7 @@
BEGIN_FTR_SECTION \
b 2f; \
END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
- REST_32FPRS(n,base); \
+ REST_32FPRS(n,c,base); \
b 3f; \
2: REST_32VSRS(n,c,base); \
3:
@@ -39,13 +39,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
BEGIN_FTR_SECTION \
b 2f; \
END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
- SAVE_32FPRS(n,base); \
+ SAVE_32FPRS(n,c,base); \
b 3f; \
2: SAVE_32VSRS(n,c,base); \
3:
#else
-#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
-#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n,b,base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n,b,base)
#endif

/*
diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/Kconfig
index f485fc5f..24a515e 100644
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -169,6 +169,15 @@ config YOSEMITE
help
This option enables support for the AMCC PPC440EP evaluation board.

+config BGP
+ bool "Blue Gene/P"
+ depends on 44x
+ default n
+ select PPC_FPU
+ select PPC_DOUBLE_FPU
+ help
+ This option enables support for the IBM BlueGene/P supercomputer.
+
config ISS4xx
bool "ISS 4xx Simulator"
depends on (44x || 40x)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 111138c..1ae59c5 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -137,6 +137,10 @@ config PPC_FPU
bool
default y if PPC64

+config PPC_DOUBLE_FPU
+ bool "Bluegene/P Double FPU Support"
+ depends on BGP
+
config FSL_EMB_PERFMON
bool "Freescale Embedded Perfmon"
depends on E500 || PPC_83xx
--
1.7.4.1

2011-05-19 21:42:42

by Eric Van Hensbergen

[permalink] [raw]
Subject: [PATCH 4/7] [RFC][V2] enable BGP_L1_WRITETHROUGH mode for BG/P

BG/P nodes need to be configured for writethrough to work in SMP
configurations. This patch adds the right hooks in the MMU code
to make sure BGP_L1_WRITETHROUGH configurations are setup for BG/P.

Signed-off-by: Eric Van Hensbergen <[email protected]>
---
arch/powerpc/include/asm/mmu-44x.h | 2 ++
arch/powerpc/kernel/head_44x.S | 24 ++++++++++++++++++++++--
arch/powerpc/kernel/misc_32.S | 15 +++++++++++++++
arch/powerpc/lib/copy_32.S | 10 ++++++++++
arch/powerpc/mm/44x_mmu.c | 7 +++++--
arch/powerpc/platforms/Kconfig | 5 +++++
arch/powerpc/platforms/Kconfig.cputype | 1 +
7 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
index bf52d70..ca1b90c 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -8,6 +8,7 @@

#define PPC44x_MMUCR_TID 0x000000ff
#define PPC44x_MMUCR_STS 0x00010000
+#define PPC44x_MMUCR_U2 0x00200000

#define PPC44x_TLB_PAGEID 0
#define PPC44x_TLB_XLAT 1
@@ -32,6 +33,7 @@

/* Storage attribute and access control fields */
#define PPC44x_TLB_ATTR_MASK 0x0000ff80
+#define PPC44x_TLB_WL1 0x00100000 /* Write-through L1 */
#define PPC44x_TLB_U0 0x00008000 /* User 0 */
#define PPC44x_TLB_U1 0x00004000 /* User 1 */
#define PPC44x_TLB_U2 0x00002000 /* User 2 */
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 5e12b74..f10ac53 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -429,7 +429,16 @@ finish_tlb_load_44x:
andi. r10,r12,_PAGE_USER /* User page ? */
beq 1f /* nope, leave U bits empty */
rlwimi r11,r11,3,26,28 /* yes, copy S bits to U */
-1: tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
+1:
+#ifdef CONFIG_BGP_L1_WRITETHROUGH
+ andi. r10, r11, PPC44x_TLB_I
+ bne 2f
+ oris r11,r11,PPC44x_TLB_WL1@h /* Add coherency for */
+ /* non-inhibited */
+ ori r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
+2:
+#endif /* CONFIG_BGP_L1_WRITETHROUGH */
+ tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */

/* Done...restore registers and get out of here.
*/
@@ -799,7 +808,11 @@ skpinv: addi r4,r4,1 /* Increment */
sync

/* Initialize MMUCR */
+#ifdef CONFIG_BGP_L1_WRITETHROUGH
+ lis r5, PPC44x_MMUCR_U2@h
+#else
li r5,0
+#endif /* CONFIG_BGP_L1_WRITETHROUGH */
mtspr SPRN_MMUCR,r5
sync

@@ -814,7 +827,14 @@ skpinv: addi r4,r4,1 /* Increment */
/* attrib fields */
/* Added guarded bit to protect against speculative loads/stores */
li r5,0
- ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
+#ifdef CONFIG_BGP_L1_WRITETHROUGH
+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+ PPC44x_TLB_G | PPC44x_TLB_U2)
+ oris r5,r5,PPC44x_TLB_WL1@h
+#else
+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+ PPC44x_TLB_G)
+#endif /* CONFIG_BGP_L1_WRITETHROUGH

li r0,63 /* TLB slot 63 */

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 094bd98..3f56d7b 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -506,7 +506,20 @@ _GLOBAL(clear_pages)
li r0,PAGE_SIZE/L1_CACHE_BYTES
slw r0,r0,r4
mtctr r0
+#ifdef CONFIG_BGP_L1_WRITETHROUGH
+ /* assuming 32 byte cacheline */
+ li r4, 0
+1: stw r4, 0(r3)
+ stw r4, 4(r3)
+ stw r4, 8(r3)
+ stw r4, 12(r3)
+ stw r4, 16(r3)
+ stw r4, 20(r3)
+ stw r4, 24(r3)
+ stw r4, 28(r3)
+#else
1: dcbz 0,r3
+#endif /* CONFIG_BGP_L1_WRITETHROUGH */
addi r3,r3,L1_CACHE_BYTES
bdnz 1b
blr
@@ -550,7 +563,9 @@ _GLOBAL(copy_page)
mtctr r0
1:
dcbt r11,r4
+#ifndef CONFIG_BGP_L1_WRITETHROUGH
dcbz r5,r3
+#endif /* CONFIG_BGP_L1_WRITETHROUGH */
COPY_16_BYTES
#if L1_CACHE_BYTES >= 32
COPY_16_BYTES
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index 55f19f9..552df54 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -98,7 +98,11 @@ _GLOBAL(cacheable_memzero)
bdnz 4b
3: mtctr r9
li r7,4
+#ifdef CONFIG_BGP_L1_WRITETHROUGH
+10:
+#else
10: dcbz r7,r6
+#endif /* CONFIG_L1_WRITETHROUGH */
addi r6,r6,CACHELINE_BYTES
bdnz 10b
clrlwi r5,r8,32-LG_CACHELINE_BYTES
@@ -187,7 +191,9 @@ _GLOBAL(cacheable_memcpy)
mtctr r0
beq 63f
53:
+#ifndef CONFIG_BGP_L1_WRITETHROUGH
dcbz r11,r6
+#endif /* CONFIG_BGP_L1_WRITETHROUGH */
COPY_16_BYTES
#if L1_CACHE_BYTES >= 32
COPY_16_BYTES
@@ -368,7 +374,11 @@ _GLOBAL(__copy_tofrom_user)
mtctr r8

53: dcbt r3,r4
+#ifdef CONFIG_BGP_L1_WRITETHROUGH
+54:
+#else
54: dcbz r11,r6
+#endif
.section __ex_table,"a"
.align 2
.long 54b,105f
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 024acab..f5c60b3 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -80,9 +80,12 @@ static void __init ppc44x_pin_tlb(unsigned int virt, unsigned int phys)
:
#ifdef CONFIG_PPC47x
: "r" (PPC47x_TLB2_S_RWX),
-#else
+#elseif CONFIG_BGP_L1_WRITETHROUGH
+ : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \
+ | PPC44x_TLB_U2 | PPC44x_TLB_M),
+#else /* neither CONFIG_PPC47x or CONFIG_BGP_L1_WRITETHROUGH */
: "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G),
-#endif
+#endif /* CONFIG_PPC47x */
"r" (phys),
"r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M),
"r" (entry),
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index f7b0772..7defe94 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -348,4 +348,9 @@ config XILINX_PCI
bool "Xilinx PCI host bridge support"
depends on PCI && XILINX_VIRTEX

+config BGP_L1_WRITETHROUGH
+ bool "Blue Gene/P enabled writethrough mode"
+ depends on BGP
+ default y
+
endmenu
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 1ae59c5..caa3bbf 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -333,6 +333,7 @@ config NOT_COHERENT_CACHE
bool
depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
default n if PPC_47x
+ default n if BGP
default y

config CHECK_CACHE_COHERENCY
--
1.7.4.1

2011-05-19 21:55:15

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

Damnit Mikey, just after I hit send on [V2].....

On Thu, May 19, 2011 at 4:36 PM, Michael Neuling <[email protected]> wrote:
> In message <[email protected]> you wrote:
>> On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <[email protected]> wrote=
>> :
>> > Eric,
>> >
>> >> This patch adds save/restore register support for the BlueGene/P
>> >> double hummer FPU.
>> >
>> > What does this mean? =A0Needs more details here.
>> >

okay, I've changed it a bit in [V2], if you want more I can do my best.

>> "Each of the two FPU units contains 32 64-bit floating point registers
>> for a total of 64 FP registers per processor." which would seem to
>> point to the kittyhawk version - but they have a second SAVE_32SFPRS
>> for the second hummer. ?What wasn't clear to me with this version of
>> the code was whether or not they were doing something clever like
>> saving the pair of the 64-bit FPU registers in a single 128-bit slot
>> (seems plausible).
>
> Ok, sounds like there is 32*8*2 bytes of data, rather than the normal
> 32*8 bytes for FP only (ignoring VSX). ?If this is the case, then you'll
> need make 'fpr' in the thread struct bigger which you can do by setting
> TS_FPRWIDTH = 2 like we do for VSX.
>

Okay, I'll incorporate that into [V3].

> If there is some instruction that saves and restores two of these at a
> time (which LFPDX/STFPDX might I guess), then we can use that, otherwise
> we'll have to do 64 saves/restores. ?Double load/stores will be faster
> I'm guessing though.

I assume that's true.

>
> If two at a time, do we need to increase the index in pairs?
>

I don't believe so.

>> If this is not the way to go, I can certainly
>> switch the kittyhawk version of the patch with the *, the extra
>> SAVE32SFPR and the extra double hummer specific storage space in the
>> thread_struct.
>
> I'd be tempted to keep it in the 'fpr' part of the struct so you can
> then access it with ptrace/signals/core dumps.
>
>> If it would help I can post an alternate version of the patch for
>> discussion with the kittyhawk version.
>
> Sure.
>

Kittyhawk version can be seen here:

http://git.kernel.org/?p=linux/kernel/git/ericvh/bluegene.git;a=commitdiff;h=94bffe786324b9bd07187b11afd836e3ec362d95

>
> The most useful thing would be to see the instruction definition for
> STFPDX/LFPDX.
>

https://wiki.alcf.anl.gov/images/d/d9/PPC440_FP2_arch.pdf

>>
>> >> =A0/*
>> >> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms=
>> /44x/
>> > Kconfig
>> >> index f485fc5f..24a515e 100644
>> >> --- a/arch/powerpc/platforms/44x/Kconfig
>> >> +++ b/arch/powerpc/platforms/44x/Kconfig
>> >> @@ -169,6 +169,15 @@ config YOSEMITE
>> >> =A0 =A0 =A0 help
>> >> =A0 =A0 =A0 =A0 This option enables support for the AMCC PPC440EP evalua=
>> tion board.
>> >>
>> >> +config =A0 =A0 =A0 BGP
>> >
>> > Does this FPU feature have a specific name like double hammer? =A0I'd
>> > rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
>> > something like that...
>> >
>> >> + =A0 =A0 bool "Blue Gene/P"
>> >> + =A0 =A0 depends on 44x
>> >> + =A0 =A0 default n
>> >> + =A0 =A0 select PPC_FPU
>> >> + =A0 =A0 select PPC_DOUBLE_FPU
>> >
>> > ... in fact, it seem you are doing something like these here but you
>> > don't use PPC_DOUBLE_FPU anywhere?
>> >
>>
>> A fair point. ?I'm fine with calling it DOUBLE_HUMMER, but I wasn't sure if
>> that was "too internal" of a name for the kernel. ?Let me know and
>> I'll fix it up.
>
> What I'm mostly concerned about is disassociating it with a particular
> CPU.
>
> If it has an external name, then all the better.
>

Since it isn't available on other chips, shoudl it just be PPC_BGP_FPU
or PPC_BGP_DOUBLE_FPU?

-eric

2011-05-19 23:16:45

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

In message <[email protected]> you wrote:
> Damnit Mikey, just after I hit send on [V2].....
>
> On Thu, May 19, 2011 at 4:36 PM, Michael Neuling <[email protected]> wrote:
> > In message <[email protected]> you wrote=
> :
> >> On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <[email protected]> wr=
> ote=3D
> >> :
> >> > Eric,
> >> >
> >> >> This patch adds save/restore register support for the BlueGene/P
> >> >> double hummer FPU.
> >> >
> >> > What does this mean? =3DA0Needs more details here.
> >> >
>
> okay, I've changed it a bit in [V2], if you want more I can do my best.

If you can describe the whole primary and secondary registers that'd be
cool. ASCII art would be awesome! :-)


>
> >> "Each of the two FPU units contains 32 64-bit floating point registers
> >> for a total of 64 FP registers per processor." which would seem to
> >> point to the kittyhawk version - but they have a second SAVE_32SFPRS
> >> for the second hummer. =A0What wasn't clear to me with this version of
> >> the code was whether or not they were doing something clever like
> >> saving the pair of the 64-bit FPU registers in a single 128-bit slot
> >> (seems plausible).
> >
> > Ok, sounds like there is 32*8*2 bytes of data, rather than the normal
> > 32*8 bytes for FP only (ignoring VSX). =A0If this is the case, then you'l=
> l
> > need make 'fpr' in the thread struct bigger which you can do by setting
> > TS_FPRWIDTH =3D 2 like we do for VSX.
> >
>
> Okay, I'll incorporate that into [V3].
>
> > If there is some instruction that saves and restores two of these at a
> > time (which LFPDX/STFPDX might I guess), then we can use that, otherwise
> > we'll have to do 64 saves/restores. =A0Double load/stores will be faster
> > I'm guessing though.
>
> I assume that's true.
>
> >
> > If two at a time, do we need to increase the index in pairs?
> >
>
> I don't believe so.
>
> >> If this is not the way to go, I can certainly
> >> switch the kittyhawk version of the patch with the *, the extra
> >> SAVE32SFPR and the extra double hummer specific storage space in the
> >> thread_struct.
> >
> > I'd be tempted to keep it in the 'fpr' part of the struct so you can
> > then access it with ptrace/signals/core dumps.
> >
> >> If it would help I can post an alternate version of the patch for
> >> discussion with the kittyhawk version.
> >
> > Sure.
> >
>
> Kittyhawk version can be seen here:
>
> http://git.kernel.org/?p=3Dlinux/kernel/git/ericvh/bluegene.git;a=3Dcommitd=
> iff;h=3D94bffe786324b9bd07187b11afd836e3ec362d95

OK. I can see the secondary.

BTW I think it's buggy in a different way.

--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -51,6 +51,9 @@ _GLOBAL(load_up_fpu)
toreal(r4)
addi r4,r4,THREAD /* want last_task_used_math->thread */
SAVE_32FPRS(0, r4)
+#ifdef CONFIG_DOUBLE_HUMMER
+ SAVE_32SFPRS(0, r10, r3)
+#endif /* CONFIG_DOUBLE_HUMMER */
mffs fr0
stfd fr0,THREAD_FPSCR(r4)
PPC_LL r5,PT_REGS(r4)
@@ -78,6 +81,9 @@ _GLOBAL(load_up_fpu)
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
REST_32FPRS(0, r5)
+#ifdef CONFIG_DOUBLE_HUMMER
+ REST_32SFPRS(0, r10, r5)
+#endif /* CONFIG_DOUBLE_HUMMER */

REST uses r5 as the base in both cases (primary and secondary) which is
good. SAVE uses r4 in the primary case and r3 in the secondary, which
is the wrong base.

>
> >
> > The most useful thing would be to see the instruction definition for
> > STFPDX/LFPDX.
> >
>
> https://wiki.alcf.anl.gov/images/d/d9/PPC440_FP2_arch.pdf

stfpdx does Primary->DW[EA] Secondary->DW[EA+8]

I'm tempted to continue to use this and store the data in 'fpr' in the
thread_struct. Doing it this way the primary register will continue to
be in the same location as before, which will mean ptrace etc will
continue to work at least for the primary. The secondary will be
accessible using ptrace etc as well, but it'll be a bit of kludge
because it'll appear in the VSX location.

Putting the secondary register in a new area in the thread struct will
mean it's totally inaccessible for debugging without extra code in
ptrace.c/signals.c etc

We are going to need 16x spacing but you are doing to have to increase
the size using TS_FPRWIDTH = 2.

> >>
> >> >> =3DA0/*
> >> >> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platfo=
> rms=3D
> >> /44x/
> >> > Kconfig
> >> >> index f485fc5f..24a515e 100644
> >> >> --- a/arch/powerpc/platforms/44x/Kconfig
> >> >> +++ b/arch/powerpc/platforms/44x/Kconfig
> >> >> @@ -169,6 +169,15 @@ config YOSEMITE
> >> >> =3DA0 =3DA0 =3DA0 help
> >> >> =3DA0 =3DA0 =3DA0 =3DA0 This option enables support for the AMCC PPC4=
> 40EP evalua=3D
> >> tion board.
> >> >>
> >> >> +config =3DA0 =3DA0 =3DA0 BGP
> >> >
> >> > Does this FPU feature have a specific name like double hammer? =3DA0I'=
> d
> >> > rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
> >> > something like that...
> >> >
> >> >> + =3DA0 =3DA0 bool "Blue Gene/P"
> >> >> + =3DA0 =3DA0 depends on 44x
> >> >> + =3DA0 =3DA0 default n
> >> >> + =3DA0 =3DA0 select PPC_FPU
> >> >> + =3DA0 =3DA0 select PPC_DOUBLE_FPU
> >> >
> >> > ... in fact, it seem you are doing something like these here but you
> >> > don't use PPC_DOUBLE_FPU anywhere?
> >> >
> >>
> >> A fair point. =A0I'm fine with calling it DOUBLE_HUMMER, but I wasn't su=
> re if
> >> that was "too internal" of a name for the kernel. =A0Let me know and
> >> I'll fix it up.
> >
> > What I'm mostly concerned about is disassociating it with a particular
> > CPU.
> >
> > If it has an external name, then all the better.
> >
>
> Since it isn't available on other chips, shoudl it just be PPC_BGP_FPU
> or PPC_BGP_DOUBLE_FPU?

I'd probably still prefer it disassociated with the CPU name, but we are
really bike shedding here. I'm not too fussed.

Mikey

2011-05-20 00:30:22

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

On Thu, May 19, 2011 at 6:16 PM, Michael Neuling <[email protected]> wrote:
> In message <[email protected]> you wrote:
>> On Thu, May 19, 2011 at 4:36 PM, Michael Neuling <[email protected]> wrote:
>> > In message <[email protected]> you wrote=
>> :
>> >> On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <[email protected]> wr=
>> ote=3D
>> >> :
>> >> > Eric,
>> >> >
>> >> >> This patch adds save/restore register support for the BlueGene/P
>> >> >> double hummer FPU.
>> >> >
>> >> > What does this mean? =3DA0Needs more details here.
>> >> >
>>
>> okay, I've changed it a bit in [V2], if you want more I can do my best.
>
> If you can describe the whole primary and secondary registers that'd be
> cool. ?ASCII art would be awesome! :-)
>

You sure you don't just want a bitfield.conf? :) I'll do my best, but my
ASCII art isn't what it used to be. I'll also include a reference to the PDF.

>> >
>> > Ok, sounds like there is 32*8*2 bytes of data, rather than the normal
>> > 32*8 bytes for FP only (ignoring VSX). =A0If this is the case, then you'l=
>> l
>> > need make 'fpr' in the thread struct bigger which you can do by setting
>> > TS_FPRWIDTH =3D 2 like we do for VSX.
>> >

Okay - so basically what I have now and TS_FPRWIDTH=2 ?

>>
>> Since it isn't available on other chips, shoudl it just be PPC_BGP_FPU
>> or PPC_BGP_DOUBLE_FPU?
>
> I'd probably still prefer it disassociated with the CPU name, but we are
> really bike shedding here. ?I'm not too fussed.
>

I'll leave it separate and switch it to PPC_FP2 (or would you prefer
PPC_FP2_FPU to make it clear) since the public PDF refers to it this way.

If that all sounds good, I'll spin [V3] tomorrow.

-eric

2011-05-20 00:35:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/7] [RFC] add bluegene entry to cputable

On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
> Signed-off-by: Eric Van Hensbergen <[email protected]>
> ---
> arch/powerpc/kernel/cputable.c | 14 ++++++++++++++
> 1 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
> index b9602ee..0eb245e 100644
> --- a/arch/powerpc/kernel/cputable.c
> +++ b/arch/powerpc/kernel/cputable.c
> @@ -1732,6 +1732,20 @@ static struct cpu_spec __initdata cpu_specs[] = {
> .machine_check = machine_check_440A,
> .platform = "ppc440",
> },
> + { /* Blue Gene/P */
> + .pvr_mask = 0xfffffff0,
> + .pvr_value = 0x52131880,
> + .cpu_name = "450 Blue Gene/P",
> + .cpu_features = CPU_FTRS_440x6,
> + .cpu_user_features = COMMON_USER_BOOKE |
> + PPC_FEATURE_HAS_FPU,
> + .mmu_features = MMU_FTR_TYPE_44x,
> + .icache_bsize = 32,
> + .dcache_bsize = 32,
> + .cpu_setup = __setup_cpu_460gt,
^^^^^^^^^^^^^^^^^^
Are you sure ?

Cheers,
Ben.

> + .machine_check = machine_check_440A,
> + .platform = "ppc440",
> + },
> { /* 460EX */
> .pvr_mask = 0xffff0006,
> .pvr_value = 0x13020002,

2011-05-20 01:59:51

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 5/7] [RFC] force 32-byte aligned kmallocs

On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
> For BGP, it is convenient for 'kmalloc' to come back with 32-byte
> aligned units for torus DMA
>
> Signed-off-by: Eric Van Hensbergen <[email protected]>
> ---
> arch/powerpc/include/asm/page_32.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/page_32.h b/arch/powerpc/include/asm/page_32.h
> index 68d73b2..fb0a7ae 100644
> --- a/arch/powerpc/include/asm/page_32.h
> +++ b/arch/powerpc/include/asm/page_32.h
> @@ -9,7 +9,7 @@
>
> #define VM_DATA_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS32
>
> -#ifdef CONFIG_NOT_COHERENT_CACHE
> +#if defined(CONFIG_NOT_COHERENT_CACHE) || defined(CONFIG_BGP)
> #define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> #endif

Is DMA cache coherent on BG/P ? That's odd for a 4xx base :-)

Cheers,
Ben.


2011-05-20 00:39:15

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 6/7] [RFC] enable early TLBs for BG/P

On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
> BG/P maps firmware with an early TLB

That's a bit gross. How often do you call that firmware in practice ?
Aren't you better off instead inserting a TLB entry for it when you call
it instead ? A simple tlbsx. + tlbwe sequence would do. That would free
up a TLB entry for normal use.

Cheers,
Ben.

> Signed-off-by: Eric Van Hensbergen <[email protected]>
> ---
> arch/powerpc/include/asm/mmu-44x.h | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
> index ca1b90c..2807d6e 100644
> --- a/arch/powerpc/include/asm/mmu-44x.h
> +++ b/arch/powerpc/include/asm/mmu-44x.h
> @@ -115,8 +115,12 @@ typedef struct {
> #endif /* !__ASSEMBLY__ */
>
> #ifndef CONFIG_PPC_EARLY_DEBUG_44x
> +#ifndef CONFIG_BGP
> #define PPC44x_EARLY_TLBS 1
> -#else
> +#else /* CONFIG_BGP */
> +#define PPC44x_EARLY_TLBS 2
> +#endif /* CONFIG_BGP */
> +#else /* CONFIG_PPC_EARLY_DEBUG_44x */
> #define PPC44x_EARLY_TLBS 2
> #define PPC44x_EARLY_DEBUG_VIRTADDR (ASM_CONST(0xf0000000) \
> | (ASM_CONST(CONFIG_PPC_EARLY_DEBUG_44x_PHYSLOW) & 0xffff))

2011-05-20 00:43:42

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

> >> >> > Eric,
> >> >> >
> >> >> >> This patch adds save/restore register support for the BlueGene/P
> >> >> >> double hummer FPU.
> >> >> >
> >> >> > What does this mean? =3DA0Needs more details here.
> >> >> >
> >>
> >> okay, I've changed it a bit in [V2], if you want more I can do my best.
> >
> > If you can describe the whole primary and secondary registers that'd be
> > cool. =A0ASCII art would be awesome! :-)
> >
>
> You sure you don't just want a bitfield.conf? :)

hehe, maybe an interpretive dance video posted on youtube?

> I'll do my best, but my ASCII art isn't what it used to be. I'll also
> include a reference to the PDF.

Something self contained in the comments would be great as external
links tend to disappear.

> >> > Ok, sounds like there is 32*8*2 bytes of data, rather than the normal
> >> > 32*8 bytes for FP only (ignoring VSX). If this is the case, then you'll
> >> > need make 'fpr' in the thread struct bigger which you can do by setting
> >> > TS_FPRWIDTH = 2 like we do for VSX.
> >> >
>
> Okay - so basically what I have now and TS_FPRWIDTH=2 ?

Yes.

> >>
> >> Since it isn't available on other chips, shoudl it just be PPC_BGP_FPU
> >> or PPC_BGP_DOUBLE_FPU?
> >
> > I'd probably still prefer it disassociated with the CPU name, but we are
> > really bike shedding here. =A0I'm not too fussed.
> >
>
> I'll leave it separate and switch it to PPC_FP2 (or would you prefer
> PPC_FP2_FPU to make it clear) since the public PDF refers to it this
> way.

PPC_FPU_FP2 would be my vote.

> If that all sounds good, I'll spin [V3] tomorrow.

Thanks!

Mikey

2011-05-20 00:47:58

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 5/7] [RFC] force 32-byte aligned kmallocs

On Thu, May 19, 2011 at 7:36 PM, Benjamin Herrenschmidt
<[email protected]> wrote:
> On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
>>
>> -#ifdef CONFIG_NOT_COHERENT_CACHE
>> +#if defined(CONFIG_NOT_COHERENT_CACHE) || defined(CONFIG_BGP)
>> ?#define ARCH_DMA_MINALIGN ? ?L1_CACHE_BYTES
>> ?#endif
>
> Is DMA cache coherent on BG/P ? That's odd for a 4xx base :-)
>

My understanding of things (which could be totally wrong) is that the
DMA we care about on BG/P (namely the Torus and Collective networks)
is coherent at the L2. Of course the change in question is talking
about L1_CACHE_BYTES, so my reading of this is that its a sleazy way
of getting aligned mallocs that make interactions with the tightly
coupled networks easier/more-efficient. I'm open to alternative
suggestions.

-eric

2011-05-20 00:52:55

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

On Thu, 2011-05-19 at 15:58 +1000, Michael Neuling wrote:

> > +
> > #define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
> > #define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
> > #define SAVE_8GPRS(n, base) SAVE_4GPRS(n, base); SAVE_4GPRS(n+4, base)
> > @@ -97,18 +104,26 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> > #define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
> > #define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
> >
> > -#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
> > -#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
> > -#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
> > -#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
> > -#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
> > -#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
> > -#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
> > -#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
> > -#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
> > -#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
> > -#define REST_16FPRS(n, base) REST_8FPRS(n, base); REST_8FPRS(n+8, base)
> > -#define REST_32FPRS(n, base) REST_16FPRS(n, base); REST_16FPRS(n+16, base)
> > +#ifdef CONFIG_BGP
> > +#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
> > +#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)
>
> 16*? Are these FP regs 64 or 128 bits wide? If 128 you are doing to
> have to play with TS_WIDTH to get the size of the FPs correct in the
> thread_struct.
>
> I think there's a bug here.

Regardless of that, btw, I don't think it's very sane to change those
macros that way. I'd rather have a separate set to save/restore the BG
stuff and separate code alltogether for loading/saving/flushing/etc...
like FSP SPE. The FPU save/restore code is already too complex as it is.

Also, should we aim to have this co-exist with other 4xx platforms in a
multiplatform kernel ? In that case it should not break the normal FP
case. Feel free to use CPU feature bits, there are 2 or 3 left available
in the 32-bit space, maybe pick a "combo" one for BGP (or one for hummer
and a MMU bit for the odd SMP tricks).

Hrm... thinking of which, what about doing it using the alternate
feature section ? This allows two "alternate" piece of codes to overlay,
the kernel will replace the original one with the alternative one if the
feature bits match. That way you can just stick an alternate around
SAVE/REST_32FPRS that replace them with your new SAVE/REST_32HFPRS (or
whatever you want to call you new set of macros).

Of course you'll probably need a separate area in the thread
struct/pt_regs etc... which mean a userspace ABI change, a change of the
sig context etc etc ....

Cheers,
Ben.

2011-05-20 00:53:22

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU

On Thu, 2011-05-19 at 08:53 -0500, Eric Van Hensbergen wrote:
> On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <[email protected]> wrote:
> > Eric,
> >
> >> This patch adds save/restore register support for the BlueGene/P
> >> double hummer FPU.
> >
> > What does this mean? Needs more details here.
> >
>
> Hi Mikey,
>
> any specific details you are looking for here? AFAIK these patches
> are required for the kernel to save/restore the double hummer
> properly.

A description of the double hummer would be good.

Cheers,
Ben.

> >>
> >> +#ifdef CONFIG_BGP
> >> +#define LFPDX(frt, ra, rb) .long (31<<26)|((frt)<<21)|((ra)<<16)| \
> >> + ((rb)<<11)|(462<<1)
> >> +#define STFPDX(frt, ra, rb) .long (31<<26)|((frt)<<21)|((ra)<<16)| \
> >> + ((rb)<<11)|(974<<1)
> >> +#endif /* CONFIG_BGP */
> >
> > Put these in arch/powerpc/include/asm/ppc-opcode.h and reformat to fit
> > whats there already.
> >
> > Also, don't need to put these defines inside a #ifdef.
> >
>
> Sure, I'll fix that up.
>
> >> +#ifdef CONFIG_BGP
> >> +#define SAVE_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); STFPDX(n, base, b)
> >> +#define REST_FPR(n, b, base) li b, THREAD_FPR0+(16*(n)); LFPDX(n, base, b)
> >
> > 16*? Are these FP regs 64 or 128 bits wide? If 128 you are doing to
> > have to play with TS_WIDTH to get the size of the FPs correct in the
> > thread_struct.
> >
> > I think there's a bug here.
> >
>
> I actually have three different versions of this code from different
> source patches that I'm drawing from - so your help in figuring out
> the best way to approach this is appreciated. The kittyhawk version
> of the code has 8* instead of 16*. According to the docs:
> "Each of the two FPU units contains 32 64-bit floating point registers
> for a total of 64 FP registers per processor." which would seem to
> point to the kittyhawk version - but they have a second SAVE_32SFPRS
> for the second hummer. What wasn't clear to me with this version of
> the code was whether or not they were doing something clever like
> saving the pair of the 64-bit FPU registers in a single 128-bit slot
> (seems plausible). If this is not the way to go, I can certainly
> switch the kittyhawk version of the patch with the *, the extra
> SAVE32SFPR and the extra double hummer specific storage space in the
> thread_struct. If it would help I can post an alternate version of
> the patch for discussion with the kittyhawk version.
>
> >> /*
> >> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platforms/44x/
> > Kconfig
> >> index f485fc5f..24a515e 100644
> >> --- a/arch/powerpc/platforms/44x/Kconfig
> >> +++ b/arch/powerpc/platforms/44x/Kconfig
> >> @@ -169,6 +169,15 @@ config YOSEMITE
> >> help
> >> This option enables support for the AMCC PPC440EP evaluation board.
> >>
> >> +config BGP
> >
> > Does this FPU feature have a specific name like double hammer? I'd
> > rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
> > something like that...
> >
> >> + bool "Blue Gene/P"
> >> + depends on 44x
> >> + default n
> >> + select PPC_FPU
> >> + select PPC_DOUBLE_FPU
> >
> > ... in fact, it seem you are doing something like these here but you
> > don't use PPC_DOUBLE_FPU anywhere?
> >
>
> A fair point. I'm fine with calling it DOUBLE_HUMMER, but I wasn't sure if
> that was "too internal" of a name for the kernel. Let me know and
> I'll fix it up.
> I'll also change the CONFIG_BGP defines in the FPU code to PPC_DOUBLE_FPU
> or PPC_DOUBLE_HUMMER depending on what the community decides.
>
> Thanks for the feedback!
>
> -eric
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2011-05-20 01:01:20

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 4/7] [RFC] enable L1_WRITETHROUGH mode for BG/P

On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
> BG/P nodes need to be configured for writethrough to work in SMP
> configurations. This patch adds the right hooks in the MMU code
> to make sure L1_WRITETHROUGH configurations are setup for BG/P.

> /* Storage attribute and access control fields */
> #define PPC44x_TLB_ATTR_MASK 0x0000ff80
> +#define PPC44x_TLB_WL1 0x00100000 /* Write-through L1 */
> #define PPC44x_TLB_U0 0x00008000 /* User 0 */
> #define PPC44x_TLB_U1 0x00004000 /* User 1 */
> #define PPC44x_TLB_U2 0x00002000 /* User 2 */
> diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
> index 5e12b74..1f7ae60 100644
> --- a/arch/powerpc/kernel/head_44x.S
> +++ b/arch/powerpc/kernel/head_44x.S
> @@ -429,7 +429,16 @@ finish_tlb_load_44x:
> andi. r10,r12,_PAGE_USER /* User page ? */
> beq 1f /* nope, leave U bits empty */
> rlwimi r11,r11,3,26,28 /* yes, copy S bits to U */
> -1: tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
> +1:
> +#ifdef CONFIG_L1_WRITETHROUGH
> + andi. r10, r11, PPC44x_TLB_I
> + bne 2f
> + oris r11,r11,PPC44x_TLB_WL1@h /* Add coherency for */
> + /* non-inhibited */
> + ori r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
> +2:
> +#endif /* CONFIG_L1_WRITETHROUGH */

Make it an MMU feature so it's done at runtime rather than compile time.

Also, you should aim toward avoiding that conditional branch in such a
critical hot path :-) A way to do so would be to shove these in the PTE
instead, there's plenty of unused bits in the top part for example.

> + tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
>
> /* Done...restore registers and get out of here.
> */
> @@ -799,7 +808,11 @@ skpinv: addi r4,r4,1 /* Increment */
> sync
>
> /* Initialize MMUCR */
> +#ifdef CONFIG_L1_WRITETHROUGH
> + lis r5, PPC44x_MMUCR_U2@h
> +#else
> li r5,0
> +#endif /* CONFIG_L1_WRITETHROUGH */
> mtspr SPRN_MMUCR,r5
> sync
>
> @@ -814,7 +827,14 @@ skpinv: addi r4,r4,1 /* Increment */
> /* attrib fields */
> /* Added guarded bit to protect against speculative loads/stores */
> li r5,0
> - ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
> +#ifdef CONFIG_L1_WRITETHROUGH
> + ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
> + PPC44x_TLB_G | PPC44x_TLB_U2)
> + oris r5,r5,PPC44x_TLB_WL1@h
> +#else
> + ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
> + PPC44x_TLB_G)
> +#endif /* CONFIG_L1_WRITETHROUGH
>
> li r0,63 /* TLB slot 63 */
>
> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> index 094bd98..d88369b 100644
> --- a/arch/powerpc/kernel/misc_32.S
> +++ b/arch/powerpc/kernel/misc_32.S
> @@ -506,7 +506,20 @@ _GLOBAL(clear_pages)
> li r0,PAGE_SIZE/L1_CACHE_BYTES
> slw r0,r0,r4
> mtctr r0
> +#ifdef CONFIG_L1_WRITETHROUGH
> + /* assuming 32 byte cacheline */
> + li r4, 0
> +1: stw r4, 0(r3)
> + stw r4, 4(r3)
> + stw r4, 8(r3)
> + stw r4, 12(r3)
> + stw r4, 16(r3)
> + stw r4, 20(r3)
> + stw r4, 24(r3)
> + stw r4, 28(r3)
> +#else
> 1: dcbz 0,r3
> +#endif /* CONFIG_L1_WRITETHROUGH */

wtf ? dcbz doesn't work ? yuck ! This isn't a HW design, it's a hack :-)

make it an mmu feature btw, as I said, I'd like to keep it a unified
kernel.

> addi r3,r3,L1_CACHE_BYTES
> bdnz 1b
> blr
> @@ -550,7 +563,9 @@ _GLOBAL(copy_page)
> mtctr r0
> 1:
> dcbt r11,r4
> +#ifndef CONFIG_L1_WRITETHROUGH
> dcbz r5,r3
> +#endif
> COPY_16_BYTES
> #if L1_CACHE_BYTES >= 32
> COPY_16_BYTES
> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
> index 55f19f9..98a07e3 100644
> --- a/arch/powerpc/lib/copy_32.S
> +++ b/arch/powerpc/lib/copy_32.S
> @@ -98,7 +98,11 @@ _GLOBAL(cacheable_memzero)
> bdnz 4b
> 3: mtctr r9
> li r7,4
> +#ifdef CONFIG_L1_WRITETHROUGH
> +10:
> +#else
> 10: dcbz r7,r6
> +#endif /* CONFIG_L1_WRITETHROUGH */
> addi r6,r6,CACHELINE_BYTES
> bdnz 10b
> clrlwi r5,r8,32-LG_CACHELINE_BYTES
> @@ -187,7 +191,9 @@ _GLOBAL(cacheable_memcpy)
> mtctr r0
> beq 63f
> 53:
> +#ifndef CONFIG_L1_WRITETHROUGH
> dcbz r11,r6
> +#endif /* CONFIG_L1_WRITETHROUGH */
> COPY_16_BYTES
> #if L1_CACHE_BYTES >= 32
> COPY_16_BYTES
> @@ -368,7 +374,11 @@ _GLOBAL(__copy_tofrom_user)
> mtctr r8
>
> 53: dcbt r3,r4
> +#ifdef CONFIG_L1_WRITETHROUGH
> +54:
> +#else
> 54: dcbz r11,r6
> +#endif
> .section __ex_table,"a"
> .align 2
> .long 54b,105f
> diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
> index 024acab..b684c8a 100644
> --- a/arch/powerpc/mm/44x_mmu.c
> +++ b/arch/powerpc/mm/44x_mmu.c
> @@ -80,9 +80,12 @@ static void __init ppc44x_pin_tlb(unsigned int virt, unsigned int phys)
> :
> #ifdef CONFIG_PPC47x
> : "r" (PPC47x_TLB2_S_RWX),
> -#else
> +#elseif CONFIG_L1_WRITETHROUGH
> + : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \
> + | PPC44x_TLB_U2 | PPC44x_TLB_M),
> +#else /* neither CONFIG_PPC47x or CONFIG_L1_WRITETHROUGH */
> : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G),
> -#endif
> +#endif /* CONFIG_PPC47x */
> "r" (phys),
> "r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M),
> "r" (entry),
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index f7b0772..684a281 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -348,4 +348,9 @@ config XILINX_PCI
> bool "Xilinx PCI host bridge support"
> depends on PCI && XILINX_VIRTEX
>
> +config L1_WRITETHROUGH
> + bool "Blue Gene/P enabled writethrough mode"
> + depends on BGP
> + default y
> +
> endmenu
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index 111138c..3a3c711 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -329,9 +329,13 @@ config NOT_COHERENT_CACHE
> bool
> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> default n if PPC_47x
> + default n if BGP
> default y
>
> config CHECK_CACHE_COHERENCY
> bool
>
> +config L1_WRITETHROUGH
> + bool
> +
> endmenu

2011-05-20 01:05:42

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 7/7] [RFC] SMP support code

On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:

> +#ifdef CONFIG_BGP
> +/*
> + * The icbi instruction does not broadcast to all cpus in the ppc450
> + * processor used by Blue Gene/P. It is unlikely this problem will
> + * be exhibited in other processors so this remains ifdef'ed for BGP
> + * specifically.
> + *
> + * We deal with this by marking executable pages either writable, or
> + * executable, but never both. The permissions will fault back and
> + * forth if the thread is actively writing to executable sections.
> + * Each time we fault to become executable we flush the dcache into
> + * icache on all cpus.
> + *

I know that hack :-) I think I wrote it even (or a version of it, that
was a long time ago) ;-) That doesn't make it pretty tho ...
>

> +struct bgp_fixup_parm {
> + struct page *page;
> + unsigned long address;
> + struct vm_area_struct *vma;
> +};
> +
> +static void bgp_fixup_cache_tlb(void *parm)
> +{
> + struct bgp_fixup_parm *p = parm;
> +
> + if (!PageHighMem(p->page))
> + flush_dcache_icache_page(p->page);
> + local_flush_tlb_page(p->vma, p->address);
> +}
> +
> +static void bgp_fixup_access_perms(struct vm_area_struct *vma,
> + unsigned long address,
> + int is_write, int is_exec)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + pte_t *ptep = NULL;
> + pmd_t *pmdp;
> +
> + if (get_pteptr(mm, address, &ptep, &pmdp)) {
> + spinlock_t *ptl = pte_lockptr(mm, pmdp);
> + pte_t old;
> +
> + spin_lock(ptl);
> + old = *ptep;
> + if (pte_present(old)) {
> + struct page *page = pte_page(old);
> +
> + if (is_exec) {
> + struct bgp_fixup_parm param = {
> + .page = page,
> + .address = address,
> + .vma = vma,
> + };
> + pte_update(ptep, _PAGE_HWWRITE, 0);
> + on_each_cpu(bgp_fixup_cache_tlb, &param, 1);

Gotta be very careful with on_each_cpu() done within a lock. I wonder if
we could fast-path & simplify that using crits, is there a way to shoot
criticial IPIs to the other cores ? Might even be able in this case to
do it entirely in asm in the page fault path.

> + pte_update(ptep, 0, _PAGE_EXEC);
> + pte_unmap_unlock(ptep, ptl);
> + return;
> + }
> + if (is_write &&
> + (pte_val(old) & _PAGE_RW) &&
> + (pte_val(old) & _PAGE_DIRTY) &&
> + !(pte_val(old) & _PAGE_HWWRITE)) {
> + pte_update(ptep, _PAGE_EXEC, _PAGE_HWWRITE);
> + }
> + }
> + if (!pte_same(old, *ptep))
> + flush_tlb_page(vma, address);
> + pte_unmap_unlock(ptep, ptl);
> + }
> +}
> +#endif /* CONFIG_BGP */
> +
> /*
> * For 600- and 800-family processors, the error_code parameter is DSISR
> * for a data fault, SRR1 for an instruction fault. For 400-family processors
> @@ -333,6 +404,12 @@ good_area:
> perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, 0,
> regs, address);
> }
> +
> +#ifdef CONFIG_BGP
> + /* Fixup _PAGE_EXEC and _PAGE_HWWRITE if necessary */
> + bgp_fixup_access_perms(vma, address, is_write, is_exec);
> +#endif /* CONFIG_BGP */
> +
> up_read(&mm->mmap_sem);
> return 0;
>
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index 3a3c711..b77a25f 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -300,7 +300,7 @@ config PPC_PERF_CTRS
> This enables the powerpc-specific perf_event back-end.
>
> config SMP
> - depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x
> + depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x || BGP
> bool "Symmetric multi-processing support"
> ---help---
> This enables support for systems with more than one CPU. If you have

2011-05-20 01:08:49

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 2/7] [RFC] add bluegene entry to cputable

On Thu, May 19, 2011 at 7:35 PM, Benjamin Herrenschmidt
<[email protected]> wrote:
> On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
>> + ? ? ? ? ? ? .dcache_bsize ? ? ? ? ? = 32,
>> + ? ? ? ? ? ? .cpu_setup ? ? ? ? ? ? ?= __setup_cpu_460gt,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^^^^^^^^^^^^^^^^^^
> Are you sure ?
>

That surprised me too, I figured it must have been a close enough
match (at least in the 2.6.29.1 time frame which is where I'm trying
to merge the BG/P patches up from. The kittyhawk patches don't even
use this, so its possible we could just remove it.


_GLOBAL(__setup_cpu_460ex)
_GLOBAL(__setup_cpu_460gt)
mflr r4
bl __init_fpu_44x
bl __fixup_440A_mcheck
mtlr r4
blr

Looks like the 460 setup invokes a bunch of 440 calls! Would you
prefer I setup my own entry point (setup_cpu_bgp or setup_cpu_450)
which makes the same calls?

-eric

2011-05-20 01:21:56

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [PATCH 6/7] [RFC] enable early TLBs for BG/P

On Thu, May 19, 2011 at 7:39 PM, Benjamin Herrenschmidt
<[email protected]> wrote:
> On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
>> BG/P maps firmware with an early TLB
>
> That's a bit gross. How often do you call that firmware in practice ?
> Aren't you better off instead inserting a TLB entry for it when you call
> it instead ? A simple tlbsx. + tlbwe sequence would do. That would free
> up a TLB entry for normal use.
>

Well, it depends on who you talk to. The production software BG/P
guys use the firmware
constantly, its the primary interface to the networks, the console,
and the management software
which runs the machine. As such the IO Node guys, the Compute Node
Kernel guys and the
ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand
barely use it at all, in fact
I believe they do all the interaction with it during uboot and then shut it off.

IIRC, the sticky question is RAS support, there are certain things it
wants to jump to firmware
to deal with and expects things to be mapped an pinned into memory.
Furthermore, I think it
may make assumptions about where in the TLB the mappings are. Since
the kittyhawk guys
obviously ignore this by shutting it down, its not clear just how
important this is. I'm game to
try the dynamic mapping as you suggest if you would prefer it.

Its worth mentioning that I believe with BG/Q, the plan is to rely on
the firmware even more
extensively, but I haven't looked at any of the code yet to verify
whether or not this is true.

-eric

2011-05-20 01:50:20

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 5/7] [RFC] force 32-byte aligned kmallocs

On Thu, 2011-05-19 at 19:47 -0500, Eric Van Hensbergen wrote:
> On Thu, May 19, 2011 at 7:36 PM, Benjamin Herrenschmidt
> <[email protected]> wrote:
> > On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
> >>
> >> -#ifdef CONFIG_NOT_COHERENT_CACHE
> >> +#if defined(CONFIG_NOT_COHERENT_CACHE) || defined(CONFIG_BGP)
> >> #define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> >> #endif
> >
> > Is DMA cache coherent on BG/P ? That's odd for a 4xx base :-)
> >
>
> My understanding of things (which could be totally wrong) is that the
> DMA we care about on BG/P (namely the Torus and Collective networks)
> is coherent at the L2. Of course the change in question is talking
> about L1_CACHE_BYTES, so my reading of this is that its a sleazy way
> of getting aligned mallocs that make interactions with the tightly
> coupled networks easier/more-efficient. I'm open to alternative
> suggestions.

But if it's not coherent with L1, then you sould have
CONFIG_NOT_COHERENT_CACHE set and not need that patch... or am I missing
something ?

One thing we should do some day as well is make that whole non-coherent
be runtime selected, on the list of things to fix 440+47x in the same
kernel. Pfiew....

Cheers,
Ben.

2011-05-20 01:51:04

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/7] [RFC] add bluegene entry to cputable

On Thu, 2011-05-19 at 20:08 -0500, Eric Van Hensbergen wrote:
> On Thu, May 19, 2011 at 7:35 PM, Benjamin Herrenschmidt
> <[email protected]> wrote:
> > On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
> >> + .dcache_bsize = 32,
> >> + .cpu_setup = __setup_cpu_460gt,
> > ^^^^^^^^^^^^^^^^^^
> > Are you sure ?
> >
>
> That surprised me too, I figured it must have been a close enough
> match (at least in the 2.6.29.1 time frame which is where I'm trying
> to merge the BG/P patches up from. The kittyhawk patches don't even
> use this, so its possible we could just remove it.
>
> _GLOBAL(__setup_cpu_460ex)
> _GLOBAL(__setup_cpu_460gt)
> mflr r4
> bl __init_fpu_44x
> bl __fixup_440A_mcheck
> mtlr r4
> blr
>
> Looks like the 460 setup invokes a bunch of 440 calls! Would you
> prefer I setup my own entry point (setup_cpu_bgp or setup_cpu_450)
> which makes the same calls?

Yes, add an entry. 460's are just 440's btw :-)

Cheers,
Ben.

2011-05-20 01:54:15

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 6/7] [RFC] enable early TLBs for BG/P

On Thu, 2011-05-19 at 20:21 -0500, Eric Van Hensbergen wrote:
> On Thu, May 19, 2011 at 7:39 PM, Benjamin Herrenschmidt
> <[email protected]> wrote:
> > On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
> >> BG/P maps firmware with an early TLB
> >
> > That's a bit gross. How often do you call that firmware in practice ?
> > Aren't you better off instead inserting a TLB entry for it when you call
> > it instead ? A simple tlbsx. + tlbwe sequence would do. That would free
> > up a TLB entry for normal use.
> >
>
> Well, it depends on who you talk to. The production software BG/P
> guys use the firmware constantly, its the primary interface to the networks, the console,
> and the management software which runs the machine.

Yuck.

> As such the IO Node guys, the Compute Node Kernel guys and the
> ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand
> barely use it at all, in fact I believe they do all the interaction with
> it during uboot and then shut it off.

I would prefer that approach.

> IIRC, the sticky question is RAS support, there are certain things it
> wants to jump to firmware to deal with and expects things to be mapped
> an pinned into memory.
>
> Furthermore, I think it may make assumptions about where in the TLB the
> mappings are.

This is gross, especially on a system with only 64 SW loaded TLB
entries :-(

> Since the kittyhawk guys
> obviously ignore this by shutting it down, its not clear just how
> important this is. I'm game to
> try the dynamic mapping as you suggest if you would prefer it.

I would yes, we can sort things out later for RAS.

> Its worth mentioning that I believe with BG/Q, the plan is to rely on
> the firmware even more extensively, but I haven't looked at any of the code yet to verify
> whether or not this is true.

This is tantamount to linking a binary blob with the kernel ... it's a
fine line. At some point we might refuse the patches if they go too far
in that direction.

Cheers,
Ben.

> -eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-05-20 03:38:05

by Kazutomo Yoshii

[permalink] [raw]
Subject: Re: [bg-linux] [PATCH 6/7] [RFC] enable early TLBs for BG/P

On 05/19/2011 08:54 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2011-05-19 at 20:21 -0500, Eric Van Hensbergen wrote:
>
>> On Thu, May 19, 2011 at 7:39 PM, Benjamin Herrenschmidt
>> <[email protected]> wrote:
>>
>>> On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote:
>>>
>>>> BG/P maps firmware with an early TLB
>>>>
>>> That's a bit gross. How often do you call that firmware in practice ?
>>> Aren't you better off instead inserting a TLB entry for it when you call
>>> it instead ? A simple tlbsx. + tlbwe sequence would do. That would free
>>> up a TLB entry for normal use.
>>>
>>>
>> Well, it depends on who you talk to. The production software BG/P
>> guys use the firmware constantly, its the primary interface to the networks, the console,
>> and the management software which runs the machine.
>>
> Yuck.
>

Unfortunately, the firmware is also required:
- to configure Blue Gene Interrupt Controller(BIC)
- to configure Torus DMA unit. e.g. fifo
- to configure global interrupt (even we don't use, we need to disable
some channel correctly)
- to access node personality information (node id, DDR size, HZ, etc) or
maybe we can directly access SRAM?
etc, etc.

>> As such the IO Node guys, the Compute Node Kernel guys and the
>> ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand
>> barely use it at all, in fact I believe they do all the interaction with
>> it during uboot and then shut it off.
>>
>
(I'm one of the ZeptoOS guys, btw)

As a regular ppc linux usage, our firmware dependency is minimum as well.
However, with our HPC extension, the firmware functions are called when
it configures BGP specific network hardware.

We are not planning to submit our HPC extension here anytime soon
because our work is very special purpose and includes lots of dirty hack
right now.

Thanks,
Kaz
> I would prefer that approach.
>
>
>> IIRC, the sticky question is RAS support, there are certain things it
>> wants to jump to firmware to deal with and expects things to be mapped
>> an pinned into memory.
>>
>> Furthermore, I think it may make assumptions about where in the TLB the
>> mappings are.
>>
> This is gross, especially on a system with only 64 SW loaded TLB
> entries :-(
>
>
>> Since the kittyhawk guys
>> obviously ignore this by shutting it down, its not clear just how
>> important this is. I'm game to
>> try the dynamic mapping as you suggest if you would prefer it.
>>
> I would yes, we can sort things out later for RAS.
>
>
>> Its worth mentioning that I believe with BG/Q, the plan is to rely on
>> the firmware even more extensively, but I haven't looked at any of the code yet to verify
>> whether or not this is true.
>>
> This is tantamount to linking a binary blob with the kernel ... it's a
> fine line. At some point we might refuse the patches if they go too far
> in that direction.
>
> Cheers,
> Ben.
>
>
>> -eric
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
> _______________________________________________
> bg-linux mailing list
> [email protected]
> https://lists.anl-external.org/mailman/listinfo/bg-linux
> http://bg-linux.anl-external.org/wiki
>

2011-05-20 03:52:35

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [bg-linux] [PATCH 6/7] [RFC] enable early TLBs for BG/P

> Unfortunately, the firmware is also required:
> - to configure Blue Gene Interrupt Controller(BIC)

Can't we just write bare metal code for that ?

> - to configure Torus DMA unit. e.g. fifo

Same

> - to configure global interrupt (even we don't use, we need to disable
> some channel correctly)

Same

> - to access node personality information (node id, DDR size, HZ, etc) or
> maybe we can directly access SRAM?

That should be turned into device-tree at boot, possibly from a
bootloader or from the zImage wrapper.

> etc, etc.
>
> >> As such the IO Node guys, the Compute Node Kernel guys and the
> >> ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand
> >> barely use it at all, in fact I believe they do all the interaction with
> >> it during uboot and then shut it off.
> >>
> >
> (I'm one of the ZeptoOS guys, btw)

Heh ok.

> As a regular ppc linux usage, our firmware dependency is minimum as well.
> However, with our HPC extension, the firmware functions are called when
> it configures BGP specific network hardware.
>
> We are not planning to submit our HPC extension here anytime soon
> because our work is very special purpose and includes lots of dirty hack
> right now.

Ok.

Cheers,
Ben.

> Thanks,
> Kaz
> > I would prefer that approach.
> >
> >
> >> IIRC, the sticky question is RAS support, there are certain things it
> >> wants to jump to firmware to deal with and expects things to be mapped
> >> an pinned into memory.
> >>
> >> Furthermore, I think it may make assumptions about where in the TLB the
> >> mappings are.
> >>
> > This is gross, especially on a system with only 64 SW loaded TLB
> > entries :-(
> >
> >
> >> Since the kittyhawk guys
> >> obviously ignore this by shutting it down, its not clear just how
> >> important this is. I'm game to
> >> try the dynamic mapping as you suggest if you would prefer it.
> >>
> > I would yes, we can sort things out later for RAS.
> >
> >
> >> Its worth mentioning that I believe with BG/Q, the plan is to rely on
> >> the firmware even more extensively, but I haven't looked at any of the code yet to verify
> >> whether or not this is true.
> >>
> > This is tantamount to linking a binary blob with the kernel ... it's a
> > fine line. At some point we might refuse the patches if they go too far
> > in that direction.
> >
> > Cheers,
> > Ben.
> >
> >
> >> -eric
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >>
> >
> > _______________________________________________
> > bg-linux mailing list
> > [email protected]
> > https://lists.anl-external.org/mailman/listinfo/bg-linux
> > http://bg-linux.anl-external.org/wiki
> >

2011-05-20 13:01:50

by Eric Van Hensbergen

[permalink] [raw]
Subject: Re: [bg-linux] [PATCH 6/7] [RFC] enable early TLBs for BG/P

On Thu, May 19, 2011 at 10:52 PM, Benjamin Herrenschmidt
<[email protected]> wrote:
>> Unfortunately, the firmware is also required:
>> - to configure Blue Gene Interrupt Controller(BIC)
>> - to configure Torus DMA unit. e.g. fifo
>> - to configure global interrupt (even we don't use, we need to disable
>> some channel correctly)
>
> Can't we just write bare metal code for that ?
>

The kittyhawk code has the bare-metal equivalents for all of these.
When I get to the drivers, I'll favor the kittyhawk versions for
submission and then we'll see if it would be possible to adapt the HPC
extensions to use the bare-metal versions of the drivers versus the
firmware interface.

>> - to access node personality information (node id, DDR size, HZ, etc) or
>> maybe we can directly access SRAM?
>
> That should be turned into device-tree at boot, possibly from a
> bootloader or from the zImage wrapper.
>

This is the approach is used by the kittyhawk u-boot approach.
However, it would also be just as easy to construct an in-memory
device-tree within Linux by mapping the personality page and copying
the relevant bits out. This has the advantage of being able to boot
Linux directly on the nodes without an intermediary boot loader (which
kittyhawk uses just to allow us customize which kernel boots on a
node-to-node basis whereas the stock system boots the same kernel on
all the nodes within a partition allocation (64-40,000 nodes)).

-eric

2011-05-20 22:21:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [bg-linux] [PATCH 6/7] [RFC] enable early TLBs for BG/P

On Fri, 2011-05-20 at 08:01 -0500, Eric Van Hensbergen wrote:
> On Thu, May 19, 2011 at 10:52 PM, Benjamin Herrenschmidt
> <[email protected]> wrote:
> >> Unfortunately, the firmware is also required:
> >> - to configure Blue Gene Interrupt Controller(BIC)
> >> - to configure Torus DMA unit. e.g. fifo
> >> - to configure global interrupt (even we don't use, we need to disable
> >> some channel correctly)
> >
> > Can't we just write bare metal code for that ?
> >
>
> The kittyhawk code has the bare-metal equivalents for all of these.
> When I get to the drivers, I'll favor the kittyhawk versions for
> submission and then we'll see if it would be possible to adapt the HPC
> extensions to use the bare-metal versions of the drivers versus the
> firmware interface.

Ok. We can also start with using the FW and then migrate to bare metal.

> >> - to access node personality information (node id, DDR size, HZ, etc) or
> >> maybe we can directly access SRAM?
> >
> > That should be turned into device-tree at boot, possibly from a
> > bootloader or from the zImage wrapper.
> >
>
> This is the approach is used by the kittyhawk u-boot approach.
> However, it would also be just as easy to construct an in-memory
> device-tree within Linux by mapping the personality page and copying
> the relevant bits out. This has the advantage of being able to boot
> Linux directly on the nodes without an intermediary boot loader (which
> kittyhawk uses just to allow us customize which kernel boots on a
> node-to-node basis whereas the stock system boots the same kernel on
> all the nodes within a partition allocation (64-40,000 nodes)).

We can do that from the zImage wrapper... that would be nicer than doing
it from the kernel itself unless there's good reasons to do so like
iSeries.

Cheers,
Ben.