2005-11-02 07:26:21

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: [PATCH] ppc64: 64K pages support

On Wed, 2005-11-02 at 18:07 +1100, Benjamin Herrenschmidt wrote:
> It took a while, but finally, here is the 64K pages support patch for
> ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled,
> changes the kernel base page size to 64K. The resulting kernel still
> boots on any hardware. On current machines with 4K pages support only,
> the kernel will maintain 16 "subpages" for each 64K page
> transparently.
>
> Note that while real 64K capable HW has been tested, the current patch
> will not enable it yet as such hardware is not released yet, and I'm
> still verifying with the firmware architects the proper to get the
> information from the newer hypervisors.
>
> Signed-off-by: Benjamin Herrenschmidt <[email protected]>

Oh, and since the mailing lists are probably filtering this out due to
the patch size, here's an URL where you can find it too:

http://gate.crashing.org/~benh/ppc64-64k-pages.diff

Ben.



2005-11-03 03:25:38

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

Benjamin Herrenschmidt writes:

> It took a while, but finally, here is the 64K pages support patch for
> ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled,
> changes the kernel base page size to 64K. The resulting kernel still
> boots on any hardware. On current machines with 4K pages support only,
> the kernel will maintain 16 "subpages" for each 64K page
> transparently.
>
> Note that while real 64K capable HW has been tested, the current patch
> will not enable it yet as such hardware is not released yet, and I'm
> still verifying with the firmware architects the proper to get the
> information from the newer hypervisors.
>
> Signed-off-by: Benjamin Herrenschmidt <[email protected]>

Acked-by: Paul Mackerras <[email protected]>

2005-11-03 05:26:48

by David Gibson

[permalink] [raw]
Subject: ppc64: Fix bug in SLB miss handler for hugepages

On Thu, Nov 03, 2005 at 02:16:29PM +1100, Paul Mackerras wrote:
> Benjamin Herrenschmidt writes:
>
> > It took a while, but finally, here is the 64K pages support patch for
> > ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled,
> > changes the kernel base page size to 64K. The resulting kernel still
> > boots on any hardware. On current machines with 4K pages support only,
> > the kernel will maintain 16 "subpages" for each 64K page
> > transparently.
> >
> > Note that while real 64K capable HW has been tested, the current patch
> > will not enable it yet as such hardware is not released yet, and I'm
> > still verifying with the firmware architects the proper to get the
> > information from the newer hypervisors.
> >
> > Signed-off-by: Benjamin Herrenschmidt <[email protected]>
>
> Acked-by: Paul Mackerras <[email protected]>

This patch, however, should be applied on top to fix some problems
with hugepage (some pre-existing, another introduced by this patch).

The patch fixes a bug in the SLB miss handler for hugepages on ppc64
introduced by the dynamic hugepage patch (commit id
c594adad5653491813959277fb87a2fef54c4e05) due to a misunderstanding of
the srd instruction's behaviour (mea culpa). The problem arises when
a 64-bit process maps some hugepages in the low 4GB of the address
space (unusual). In this case, as well as the 256M segment in
question being marked for hugepages, other segments at 32G intervals
will be incorrectly marked for hugepages.

In the process, this patch tweaks the semantics of the hugepage
bitmaps to be more sensible. Previously, an address below 4G was
marked for hugepages if the appropriate segment bit in the "low areas"
bitmask was set *or* if the low bit in the "high areas" bitmap was set
(which would mark all addresses below 1TB for hugepage). With this
patch, any given address is governed by a single bitmap. Addresses
below 4GB are marked for hugepage if and only if their bit is set in
the "low areas" bitmap (256M granularity). Addresses between 4GB and
1TB are marked for hugepage iff the low bit in the "high areas" bitmap
is set. Higher addresses are marked for hugepage iff their bit in the
"high areas" bitmap is set (1TB granularity).

To avoid conflicts, this patch must be applied on top of BenH's
pending patch for 64k base page size [0]. As such, this patch also
addresses a hugepage problem introduced by that patch. That patch
allows hugepages of 1MB in size on hardware which supports it,
however, that won't work when using 4k pages (4 level pagetable),
because in that case hugepage PTEs are stored at the PMD level, and
each PMD entry maps 2MB. This patch simply disallows hugepages in
that case (we can do something cleverer to re-enable them some other
day).

Built, booted, and a handful of hugepage related tests passed on
POWER5 LPAR (both ARCH=powerpc and ARCH=ppc64).

[0] http://gate.crashing.org/~benh/ppc64-64k-pages.diff

Signed-off-by: David Gibson <[email protected]>

Index: working-2.6/arch/powerpc/mm/slb_low.S
===================================================================
--- working-2.6.orig/arch/powerpc/mm/slb_low.S 2005-11-03 14:52:16.000000000 +1100
+++ working-2.6/arch/powerpc/mm/slb_low.S 2005-11-03 14:55:56.000000000 +1100
@@ -80,12 +80,17 @@
BEGIN_FTR_SECTION
b 1f
END_FTR_SECTION_IFCLR(CPU_FTR_16M_PAGE)
+ cmpldi r10,16
+
+ lhz r9,PACALOWHTLBAREAS(r13)
+ mr r11,r10
+ blt 5f
+
lhz r9,PACAHIGHHTLBAREAS(r13)
srdi r11,r10,(HTLB_AREA_SHIFT-SID_SHIFT)
- srd r9,r9,r11
- lhz r11,PACALOWHTLBAREAS(r13)
- srd r11,r11,r10
- or. r9,r9,r11
+
+5: srd r9,r9,r11
+ andi. r9,r9,1
beq 1f
_GLOBAL(slb_miss_user_load_huge)
li r11,0
Index: working-2.6/arch/powerpc/mm/hash_utils_64.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c 2005-11-03 14:52:16.000000000 +1100
+++ working-2.6/arch/powerpc/mm/hash_utils_64.c 2005-11-03 15:40:56.000000000 +1100
@@ -329,12 +329,14 @@
*/
if (mmu_psize_defs[MMU_PAGE_16M].shift)
mmu_huge_psize = MMU_PAGE_16M;
+ /* With 4k/4level pagetables, we can't (for now) cope with a
+ * huge page size < PMD_SIZE */
else if (mmu_psize_defs[MMU_PAGE_1M].shift)
mmu_huge_psize = MMU_PAGE_1M;

/* Calculate HPAGE_SHIFT and sanity check it */
- if (mmu_psize_defs[mmu_huge_psize].shift > 16 &&
- mmu_psize_defs[mmu_huge_psize].shift < 28)
+ if (mmu_psize_defs[mmu_huge_psize].shift > MIN_HUGEPTE_SHIFT &&
+ mmu_psize_defs[mmu_huge_psize].shift < SID_SHIFT)
HPAGE_SHIFT = mmu_psize_defs[mmu_huge_psize].shift;
else
HPAGE_SHIFT = 0; /* No huge pages dude ! */
Index: working-2.6/include/asm-ppc64/pgtable-4k.h
===================================================================
--- working-2.6.orig/include/asm-ppc64/pgtable-4k.h 2005-11-03 14:52:16.000000000 +1100
+++ working-2.6/include/asm-ppc64/pgtable-4k.h 2005-11-03 15:38:40.000000000 +1100
@@ -23,6 +23,9 @@
#define PMD_SIZE (1UL << PMD_SHIFT)
#define PMD_MASK (~(PMD_SIZE-1))

+/* With 4k base page size, hugepage PTEs go at the PMD level */
+#define MIN_HUGEPTE_SHIFT PMD_SHIFT
+
/* PUD_SHIFT determines what a third-level page table entry can map */
#define PUD_SHIFT (PMD_SHIFT + PMD_INDEX_SIZE)
#define PUD_SIZE (1UL << PUD_SHIFT)
Index: working-2.6/include/asm-ppc64/pgtable-64k.h
===================================================================
--- working-2.6.orig/include/asm-ppc64/pgtable-64k.h 2005-11-03 14:52:16.000000000 +1100
+++ working-2.6/include/asm-ppc64/pgtable-64k.h 2005-11-03 15:39:07.000000000 +1100
@@ -14,6 +14,9 @@
#define PTRS_PER_PMD (1 << PMD_INDEX_SIZE)
#define PTRS_PER_PGD (1 << PGD_INDEX_SIZE)

+/* With 4k base page size, hugepage PTEs go at the PMD level */
+#define MIN_HUGEPTE_SHIFT PAGE_SHIFT
+
/* PMD_SHIFT determines what a second-level page table entry can map */
#define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE)
#define PMD_SIZE (1UL << PMD_SHIFT)
Index: working-2.6/arch/powerpc/mm/hugetlbpage.c
===================================================================
--- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c 2005-11-03 14:52:16.000000000 +1100
+++ working-2.6/arch/powerpc/mm/hugetlbpage.c 2005-11-03 15:56:34.000000000 +1100
@@ -212,6 +212,12 @@

BUG_ON(area >= NUM_HIGH_AREAS);

+ /* Hack, so that each addresses is controlled by exactly one
+ * of the high or low area bitmaps, the first high area starts
+ * at 4GB, not 0 */
+ if (start == 0)
+ start = 0x100000000UL;
+
/* Check no VMAs are in the region */
vma = find_vma(mm, start);
if (vma && (vma->vm_start < end))


--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson

2005-11-05 00:38:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

So how does the 64k on 4k hardware emulation work? When Hugh did
bigger softpagesize for x86 based on 2.4.x he had to fix drivers all
over to deal with that.

2005-11-05 00:46:14

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Sat, 2005-11-05 at 01:38 +0100, Christoph Hellwig wrote:
> So how does the 64k on 4k hardware emulation work? When Hugh did
> bigger softpagesize for x86 based on 2.4.x he had to fix drivers all
> over to deal with that.

What was the problem with drivers ? On ppc64, it's all hidden in the
arch code. All the kernel sees is a 64k page size. I extended the PTE to
contain tracking informations for the 16 sub pages (HPTE bits & hash
slot index). Sub pages are faulted on demand and flushed all at once,
but it's all transparent to the generic code.

Ben.


2005-11-05 06:37:25

by Dave Airlie

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

> What was the problem with drivers ? On ppc64, it's all hidden in the
> arch code. All the kernel sees is a 64k page size. I extended the PTE to
> contain tracking informations for the 16 sub pages (HPTE bits & hash
> slot index). Sub pages are faulted on demand and flushed all at once,
> but it's all transparent to the generic code.
>

We did that with the VAX port about 5 years ago :-), granted for
different reasons..

The VAX has 512 byte hw pages, we had to make a 4K pagesize for the
kernel by grouping 8 hw pages together and hiding it all in the arch
dir..

granted I don't know if it broke any drivers, we didn't have any...

Dave.

2005-11-09 17:21:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

Booting current mainline with 64K pagesize enabled gives me a purple (!)
screen early during boot.

2005-11-09 20:17:34

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Wed, Nov 09, 2005 at 06:21:25PM +0100, Christoph Hellwig wrote:
> Booting current mainline with 64K pagesize enabled gives me a purple (!)
> screen early during boot.

I seem to also be having problems with this patch. My OpenPOWER 720
stopped booting with 2.6.14-git10(and later). Just using defconfig.
64k page size NOT enabled. If I back out the 64k page size patch,
2.6.14-git10 boots. I'm trying to get more info but it is painful.
It dies before xmon is initialized.

I could have sworn that I booted 2.6.14-git7 with the 64k page size
patch applied. But, I can't do that now either.

Some co-workers have successfully booted other POWER systems with these
kernels. So, it must be specific to my hardware/LPAR configuration.

--
Mike

2005-11-09 20:34:04

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Wed, 2005-11-09 at 12:17 -0800, Mike Kravetz wrote:
> On Wed, Nov 09, 2005 at 06:21:25PM +0100, Christoph Hellwig wrote:
> > Booting current mainline with 64K pagesize enabled gives me a purple (!)
> > screen early during boot.
>
> I seem to also be having problems with this patch. My OpenPOWER 720
> stopped booting with 2.6.14-git10(and later). Just using defconfig.
> 64k page size NOT enabled. If I back out the 64k page size patch,
> 2.6.14-git10 boots. I'm trying to get more info but it is painful.
> It dies before xmon is initialized.

There have been a couple of fixes, try the very latest git. Also, try
enabling early debug in arch/ppc64/kernel/setup.c

> I could have sworn that I booted 2.6.14-git7 with the 64k page size
> patch applied. But, I can't do that now either.
>
> Some co-workers have successfully booted other POWER systems with these
> kernels. So, it must be specific to my hardware/LPAR configuration.

Ok, i'll do more tests here too.

Ben.


2005-11-09 20:38:08

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Wed, 2005-11-09 at 18:21 +0100, Christoph Hellwig wrote:
> Booting current mainline with 64K pagesize enabled gives me a purple (!)
> screen early during boot.

On the G5 ? Weird... I'll test.

Ben.


2005-11-09 21:59:34

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Thu, 2005-11-10 at 07:32 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2005-11-09 at 12:17 -0800, Mike Kravetz wrote:
> > On Wed, Nov 09, 2005 at 06:21:25PM +0100, Christoph Hellwig wrote:
> > > Booting current mainline with 64K pagesize enabled gives me a purple (!)
> > > screen early during boot.
> >
> > I seem to also be having problems with this patch. My OpenPOWER 720
> > stopped booting with 2.6.14-git10(and later). Just using defconfig.
> > 64k page size NOT enabled. If I back out the 64k page size patch,
> > 2.6.14-git10 boots. I'm trying to get more info but it is painful.
> > It dies before xmon is initialized.
>
> There have been a couple of fixes, try the very latest git. Also, try
> enabling early debug in arch/ppc64/kernel/setup.c
>
> > I could have sworn that I booted 2.6.14-git7 with the 64k page size
> > patch applied. But, I can't do that now either.
> >
> > Some co-workers have successfully booted other POWER systems with these
> > kernels. So, it must be specific to my hardware/LPAR configuration.
>
> Ok, i'll do more tests here too.

I didn't have any luck on 2.6.14-git12 either.
I tried 64k page support on my P570.

Here are the console messages:

Thanks,
Badari


Attachments:
64kpage.out (9.88 kB)

2005-11-09 22:03:28

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support


> I didn't have any luck on 2.6.14-git12 either.
> I tried 64k page support on my P570.
>
> Here are the console messages:

What distro do you use in userland ? Some older glibc versions have a
bug that cause issues with 64k pages, though it generally happens with
login blowing up, not init ...

Ben.


2005-11-09 22:07:50

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt wrote:
> > I didn't have any luck on 2.6.14-git12 either.
> > I tried 64k page support on my P570.
> >
> > Here are the console messages:
>
> What distro do you use in userland ? Some older glibc versions have a
> bug that cause issues with 64k pages, though it generally happens with
> login blowing up, not init ...

SLES9 (could be SLES9 SP1).

Thanks,
Badari

2005-11-09 22:15:13

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

Christoph Hellwig writes:

> Booting current mainline with 64K pagesize enabled gives me a purple (!)
> screen early during boot.

Cool!

Is this on a G5, or what sort of machine? What .config are you using?

Paul.

2005-11-09 23:44:33

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Wed, 2005-11-09 at 18:21 +0100, Christoph Hellwig wrote:
> Booting current mainline with 64K pagesize enabled gives me a purple (!)
> screen early during boot.

Do you use one of the nvidia fbdev's ? What if you disable it ?

(Also, rivafb has some funky bugs on my iMac G5, though nvidiafb works
fine with the latest fixes that are now in -git, but I haven't tried
with 64K pages enabled in the .config yet).

Ben.


2005-11-16 23:08:27

by Olaf Hering

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Wed, Nov 09, Badari Pulavarty wrote:

> On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt wrote:
> > > I didn't have any luck on 2.6.14-git12 either.
> > > I tried 64k page support on my P570.
> > >
> > > Here are the console messages:
> >
> > What distro do you use in userland ? Some older glibc versions have a
> > bug that cause issues with 64k pages, though it generally happens with
> > login blowing up, not init ...
>
> SLES9 (could be SLES9 SP1).

Can you double check? rpm -qi glibc | head should be enough.
Would be bad if SP2 or SP3 does not work with 64k.

--
short story of a lazy sysadmin:
alias appserv=wotan

2005-11-16 23:16:52

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Thu, 2005-11-17 at 00:08 +0100, Olaf Hering wrote:
> On Wed, Nov 09, Badari Pulavarty wrote:
>
> > On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt wrote:
> > > > I didn't have any luck on 2.6.14-git12 either.
> > > > I tried 64k page support on my P570.
> > > >
> > > > Here are the console messages:
> > >
> > > What distro do you use in userland ? Some older glibc versions have a
> > > bug that cause issues with 64k pages, though it generally happens with
> > > login blowing up, not init ...
> >
> > SLES9 (could be SLES9 SP1).
>
> Can you double check? rpm -qi glibc | head should be enough.
> Would be bad if SP2 or SP3 does not work with 64k.
>

I think I am using SLES9. Planning to update to SP3.

# rpm -qi glibc | head
Name : glibc Relocations: (not
relocatable)
Version : 2.3.3 Vendor: SuSE Linux AG,
Nuernberg, Germany
Release : 98.28 Build Date: Wed Jun 30
15:55:45 2004
Install date: Wed Jul 6 17:24:44 2005 Build Host:
gooseberry.suse.de
Group : System/Libraries Source RPM:
glibc-2.3.3-98.28.src.rpm
Size : 6161800 License: GPL, LGPL
Signature : DSA/SHA1, Wed Jun 30 16:00:21 2004, Key ID
a84edae89c800aca
Packager : http://www.suse.de/feedback
URL : http://www.gnu.org/software/libc/libc.html
Summary : The standard shared libraries (from the GNU C Library)

Thanks,
Badari

2005-11-16 23:27:23

by Olaf Hering

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Wed, Nov 16, Badari Pulavarty wrote:

> I think I am using SLES9. Planning to update to SP3.
>
> # rpm -qi glibc | head
> Name : glibc Relocations: (not
> relocatable)
> Version : 2.3.3 Vendor: SuSE Linux AG,
> Nuernberg, Germany
> Release : 98.28 Build Date: Wed Jun 30
> 15:55:45 2004

The release number indicates the GA glibc.spec was used, but the
build date indicates its slightly older than SLES9 GA.

--
short story of a lazy sysadmin:
alias appserv=wotan

2005-11-17 00:34:05

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

On Wed, 2005-11-16 at 17:57 -0600, Sonny Rao wrote:
> On 11/16/05, Badari Pulavarty <[email protected]> wrote:
> On Thu, 2005-11-17 at 00:08 +0100, Olaf Hering wrote:
> > On Wed, Nov 09, Badari Pulavarty wrote:
> >
> > > On Thu, 2005-11-10 at 09:01 +1100, Benjamin Herrenschmidt
> wrote:
> > > > > I didn't have any luck on 2.6.14-git12 either.
> > > > > I tried 64k page support on my P570.
> > > > >
> > > > > Here are the console messages:
> > > >
> > > > What distro do you use in userland ? Some older glibc
> versions have a
> > > > bug that cause issues with 64k pages, though it
> generally happens with
> > > > login blowing up, not init ...
> > >
> > > SLES9 (could be SLES9 SP1).
> >
> > Can you double check? rpm -qi glibc | head should be
> enough.
> > Would be bad if SP2 or SP3 does not work with 64k.
> >
>
> I think I am using SLES9. Planning to update to SP3.
>
>
> Badari, the problem is with your toolchain..
> the binutils in SLES9 is too old (even in SP3)
>
> The issue is that it cannot align something (the zero page I think) to
> 64kb .
>
> SLES9 SP3 has "GNU ld version 2.15.90.0.1.1 20040303 (SuSE Linux)"
>
> But I have to use binutils 2.15.94 to make a 64kb kernel boot
> properly
> (I can give you the package offline if you need)

Thank you Sonny. I updated my binutils package and 64k pagesize
kernel works fine for me (atleast booted fine).

Thanks,
Badari

2005-11-17 01:32:54

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH] ppc64: 64K pages support

Olaf Hering <[email protected]> writes:

> On Wed, Nov 16, Badari Pulavarty wrote:
>
>> I think I am using SLES9. Planning to update to SP3.
>>
>> # rpm -qi glibc | head
>> Name : glibc Relocations: (not
>> relocatable)
>> Version : 2.3.3 Vendor: SuSE Linux AG,
>> Nuernberg, Germany
>> Release : 98.28 Build Date: Wed Jun 30
>> 15:55:45 2004
>
> The release number indicates the GA glibc.spec was used, but the
> build date indicates its slightly older than SLES9 GA.

Build date is local time (timezone has been chopped off here).

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."