When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging").
This is a bugfix for two cases:
1. running a 32-bit PAE kernel on a machine with
more than 64GB RAM.
2. running a 32-bit PAE Xen guest on a host machine with
more than 64GB RAM
In both cases, a pte could need to have more than 36 bits of physical,
and masking it to 36-bits will cause fairly severe havoc.
The 46-bit mask used in 64-bit seems pretty arbitrary. The physical
size could be between 40 and 52 bits. Setting the mask to 40 bits
would restrict the physical size to 1TB, which is definitely too
small. Setting it to 52 would be ridiculously large, and runs the
risk that one of the vendors may decide to put flags rather than
physical address in one of the upper reserved bits.
Doing it "properly" would require testing cpuid leaf 0x80000008, but
it would mean that we would lose the ability to make all these
compile-time constants.
So, stick with 46 bits. It's enough for now.
[ Ingo: This needs a test, but I think it should be fairly low-risk.
If it checks out OK, it should be slipped to Linus fairly soon,
since it is a bugfix. It's probably worth putting into stable
too. ]
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Stable Kernel <[email protected]>
diff -r 0eebd30011dc include/asm-x86/page_32.h
--- a/include/asm-x86/page_32.h Wed Jun 04 10:32:01 2008 +0100
+++ b/include/asm-x86/page_32.h Thu Jun 05 16:09:53 2008 +0100
@@ -22,7 +22,7 @@
#ifdef CONFIG_X86_PAE
-#define __PHYSICAL_MASK_SHIFT 36
+#define __PHYSICAL_MASK_SHIFT 46
#define __VIRTUAL_MASK_SHIFT 32
#define PAGETABLE_LEVELS 3
>The 46-bit mask used in 64-bit seems pretty arbitrary. The physical
>size could be between 40 and 52 bits. Setting the mask to 40 bits
>would restrict the physical size to 1TB, which is definitely too
>small. Setting it to 52 would be ridiculously large, and runs the
>risk that one of the vendors may decide to put flags rather than
>physical address in one of the upper reserved bits.
Hmm? There's 11 bits available - why would anyone want to assign bits
from the sufficiently official (at least as far as AMD is concerned, I'm not
sure I saw a precise statement on Intel's side) frame number bits? And
even if they would, it would certainly take some control register bit to
enable the feature, so shrinking the mask if that would ever happen
would seem more appropriate.
Bottom line - I'd suggest pushing both 32- and 64-bits up to 52.
Jan
Jan Beulich wrote:
> Hmm? There's 11 bits available - why would anyone want to assign bits
> from the sufficiently official (at least as far as AMD is concerned, I'm not
> sure I saw a precise statement on Intel's side) frame number bits?
The Intel docs list those 11 bits as available to software, and are not
reserved for any future flags they may want to add. I was a bit
surprised too.
> And
> even if they would, it would certainly take some control register bit to
> enable the feature, so shrinking the mask if that would ever happen
> would seem more appropriate.
>
I suppose.
> Bottom line - I'd suggest pushing both 32- and 64-bits up to 52.
>
We could have an auction:
Do I hear 46? 47? 48? 50? 52! Going once, twice, 52 bits!
Anyway, we can fix it later in a separate patch. This is a
change-as-little-as-possible bugfix patch.
J
Jeremy Fitzhardinge wrote:
>
> We could have an auction:
>
> Do I hear 46? 47? 48? 50? 52! Going once, twice, 52 bits!
>
> Anyway, we can fix it later in a separate patch. This is a
> change-as-little-as-possible bugfix patch.
>
It should either be 52 bits or dynamic based on CPUID information. The
latter is very expensive.
If there end up being additional control bits assigned in this space we
won't use them since we know the size of the address space (which won't
include the control bits) and thus will leave them at zero.
It's largely theoretical, since I believe Linux on x86-64 relies on
virtual >= physical+N, where I believe N is about 3 bits, and the page
table format or page size need to change to support more than 48 bits of
virtual address space.
-hpa
H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>>
>> We could have an auction:
>>
>> Do I hear 46? 47? 48? 50? 52! Going once, twice, 52 bits!
>>
>> Anyway, we can fix it later in a separate patch. This is a
>> change-as-little-as-possible bugfix patch.
>>
>
> It should either be 52 bits or dynamic based on CPUID information.
> The latter is very expensive.
I'm more concerned that it might not be possible. I'm trying to think
how many places have compile-time constants derived from this mask.
Maybe not too many.
> If there end up being additional control bits assigned in this space
> we won't use them since we know the size of the address space (which
> won't include the control bits) and thus will leave them at zero.
You mean, if new bits appear we can just adjust the mask accordingly to
avoid them? And if we don't use them, then they'll be zero?
> It's largely theoretical, since I believe Linux on x86-64 relies on
> virtual >= physical+N, where I believe N is about 3 bits, and the page
> table format or page size need to change to support more than 48 bits
> of virtual address space.
I don't see any relationship between the physical and virtual size.
Certainly virtual is fixed at 48 bits (4*9+12), but I don't think
there's any deep reason why physical needs to be within 3 bits.
J
Jeremy Fitzhardinge <[email protected]> writes:
>
> The 46-bit mask used in 64-bit seems pretty arbitrary.
The rationale for the 46 bits is that the kernel needs roughly 4x as
much virtual space as physical space and the virtual space is limited
to 48bits.
To be exact 47 bits is always user space and the 47 bits remaining
for the kernel are split into half, with one half for the direct mapping
and the other half for random mappings. With some pushing you could
extend it to 46.5 bits or so, but beyond that you'll be in trouble.
It's not arbitrary at all.
-Andi
On Thu, 05 Jun 2008 16:21:14 +0100
Jeremy Fitzhardinge <[email protected]> wrote:
> When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
> potentially have the same number of physical address bits as the
> 64-bit host ("Enhanced Legacy PAE Paging").
>
the problem on 32 bit is that if you have that much ram, you run out of
lowmem FAST.... so you have bigger problems.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
>>> Andi Kleen <[email protected]> 06.06.08 03:40 >>>
>Jeremy Fitzhardinge <[email protected]> writes:
>>
>> The 46-bit mask used in 64-bit seems pretty arbitrary.
>
>The rationale for the 46 bits is that the kernel needs roughly 4x as
>much virtual space as physical space and the virtual space is limited
>to 48bits.
>
>To be exact 47 bits is always user space and the 47 bits remaining
>for the kernel are split into half, with one half for the direct mapping
>and the other half for random mappings. With some pushing you could
>extend it to 46.5 bits or so, but beyond that you'll be in trouble.
>
>It's not arbitrary at all.
That is only half of it. Since PHYSICAL_MASK also controls other than
RAM mappings, there's really two constants that are needed here:
One (46) to indicate how large the 1:1 mapping can possibly get (and
hence what the upper boundary of usable RAM is - without introducing
highmem), and another (52) to indicate how wide a physical address
(perhaps from a 64-bit PCI BAR) can possibly be (i.e. used to validate
physical addresses / page table entries).
Jan
Andi Kleen wrote:
> Jeremy Fitzhardinge <[email protected]> writes:
>
>> The 46-bit mask used in 64-bit seems pretty arbitrary.
>>
>
> The rationale for the 46 bits is that the kernel needs roughly 4x as
> much virtual space as physical space and the virtual space is limited
> to 48bits.
>
> To be exact 47 bits is always user space and the 47 bits remaining
> for the kernel are split into half, with one half for the direct mapping
> and the other half for random mappings. With some pushing you could
> extend it to 46.5 bits or so, but beyond that you'll be in trouble.
>
Why's that? Is the issue the amount of memory needed for pagetables and
page structures if you did have more than 2^48 bytes of physical memory?
> It's not arbitrary at all.
I didn't say it was. That was the introduction to my explanation of why
I didn't think it was arbitrary. Of course, if there had been a comment
there explaining the rationale, I wouldn't have had to make one up...
J
Arjan van de Ven wrote:
> On Thu, 05 Jun 2008 16:21:14 +0100
> Jeremy Fitzhardinge <[email protected]> wrote:
>
>
>> When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
>> potentially have the same number of physical address bits as the
>> 64-bit host ("Enhanced Legacy PAE Paging").
>>
>>
>
> the problem on 32 bit is that if you have that much ram, you run out of
> lowmem FAST.... so you have bigger problems.
>
Sure, you'd have to be barking mad to give a 32-bit system 2^40 bytes of
RAM. But under Xen the host's physical addresses are used in guest
pagetables, so you could have a reasonably sized 32-bit PAE Xen guest be
exposed to huge host physical addresses.
But the basic point is that, given that Enhanced Legacy PAE Paging
exists, 36-bits is not correct, so we should fix it. And if the
platform allows addressable hardware to be physically discontigious -
either memory or devices - then you may end up using large numbers of
physical bits without having a stupid amount of memory actually present.
J
>>> Jeremy Fitzhardinge <[email protected]> 06.06.08 09:59 >>>
>Andi Kleen wrote:
>> Jeremy Fitzhardinge <[email protected]> writes:
>>
>>> The 46-bit mask used in 64-bit seems pretty arbitrary.
>>>
>>
>> The rationale for the 46 bits is that the kernel needs roughly 4x as
>> much virtual space as physical space and the virtual space is limited
>> to 48bits.
>>
>> To be exact 47 bits is always user space and the 47 bits remaining
>> for the kernel are split into half, with one half for the direct mapping
>> and the other half for random mappings. With some pushing you could
>> extend it to 46.5 bits or so, but beyond that you'll be in trouble.
>>
>
>Why's that? Is the issue the amount of memory needed for pagetables and
>page structures if you did have more than 2^48 bytes of physical memory?
No, it's the fact that the 1:1 mapping needs as much virtual space as
the physical range covered (including all holes).
Jan
Jan Beulich wrote:
>>>> Jeremy Fitzhardinge <[email protected]> 06.06.08 09:59 >>>
>>>>
>> Andi Kleen wrote:
>>
>>> Jeremy Fitzhardinge <[email protected]> writes:
>>>
>>>
>>>> The 46-bit mask used in 64-bit seems pretty arbitrary.
>>>>
>>>>
>>> The rationale for the 46 bits is that the kernel needs roughly 4x as
>>> much virtual space as physical space and the virtual space is limited
>>> to 48bits.
>>>
>>> To be exact 47 bits is always user space and the 47 bits remaining
>>> for the kernel are split into half, with one half for the direct mapping
>>> and the other half for random mappings. With some pushing you could
>>> extend it to 46.5 bits or so, but beyond that you'll be in trouble.
>>>
>>>
>> Why's that? Is the issue the amount of memory needed for pagetables and
>> page structures if you did have more than 2^48 bytes of physical memory?
>>
>
> No, it's the fact that the 1:1 mapping needs as much virtual space as
> the physical range covered (including all holes).
Right, I see. And suddenly 64-bits seems... constrained. ;)
J
When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory,
we could have up to 52 bits of physical address in a pte.
The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide. Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: Jan Beulich <[email protected]>
---
include/asm-x86/page_32.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
===================================================================
--- a/include/asm-x86/page_32.h
+++ b/include/asm-x86/page_32.h
@@ -22,7 +22,8 @@
#ifdef CONFIG_X86_PAE
-#define __PHYSICAL_MASK_SHIFT 36
+/* 44=32+12, the limit we can fit into an unsigned long pfn */
+#define __PHYSICAL_MASK_SHIFT 44
#define __VIRTUAL_MASK_SHIFT 32
#define PAGETABLE_LEVELS 3
Ah, yes!
Acked-By: Jan Beulich <[email protected]>
>>> Jeremy Fitzhardinge <[email protected]> 06.06.08 11:21 >>>
When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory,
we could have up to 52 bits of physical address in a pte.
The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide. Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: Jan Beulich <[email protected]>
---
include/asm-x86/page_32.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
===================================================================
--- a/include/asm-x86/page_32.h
+++ b/include/asm-x86/page_32.h
@@ -22,7 +22,8 @@
#ifdef CONFIG_X86_PAE
-#define __PHYSICAL_MASK_SHIFT 36
+/* 44=32+12, the limit we can fit into an unsigned long pfn */
+#define __PHYSICAL_MASK_SHIFT 44
#define __VIRTUAL_MASK_SHIFT 32
#define PAGETABLE_LEVELS 3
Jeremy Fitzhardinge <[email protected]> writes:
> When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
> potentially have the same number of physical address bits as the
> 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory,
> we could have up to 52 bits of physical address in a pte.
>
> The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
> This means that it can only represent physical addresses up to 32+12=44
> bits wide. Rather than widening pfns everywhere, just set 2^44 as the
> Linux x86_32-PAE architectural limit for physical address size.
43bits might be actally safer because of potential sign bugs.
But of course it won't work anyways likely.
-Andi
Andi Kleen wrote:
> 43bits might be actally safer because of potential sign bugs.
>
> But of course it won't work anyways likely.
>
I thought about it, but I think we're pretty consistent about putting
pfns into unsigned types. But, yes, you're very marginal at that point.
J
Jeremy Fitzhardinge wrote:
>>
>> It should either be 52 bits or dynamic based on CPUID information.
>> The latter is very expensive.
>
> I'm more concerned that it might not be possible. I'm trying to think
> how many places have compile-time constants derived from this mask.
> Maybe not too many.
>
>> If there end up being additional control bits assigned in this space
>> we won't use them since we know the size of the address space (which
>> won't include the control bits) and thus will leave them at zero.
>
> You mean, if new bits appear we can just adjust the mask accordingly to
> avoid them? And if we don't use them, then they'll be zero?
Correct. Remember, the page table entries come from the kernel - not
from some random areas.
>> It's largely theoretical, since I believe Linux on x86-64 relies on
>> virtual >= physical+N, where I believe N is about 3 bits, and the page
>> table format or page size need to change to support more than 48 bits
>> of virtual address space.
>
> I don't see any relationship between the physical and virtual size.
> Certainly virtual is fixed at 48 bits (4*9+12), but I don't think
> there's any deep reason why physical needs to be within 3 bits.
>
Identity-mapping. 1 bit goes to kernel/user split, then the kernel area
is split into multiple regions, one of which is identity-mapping. It
may be just 2.
-hpa
Jeremy Fitzhardinge wrote:
>>
>> No, it's the fact that the 1:1 mapping needs as much virtual space as
>> the physical range covered (including all holes).
>
> Right, I see. And suddenly 64-bits seems... constrained. ;)
>
Not really. The vendors are aware of this constraint -- it's hardly
unique to Linux. The reason for canonical addresses and all that jazz
is to keep people from doing stupid things like store stuff in the upper
16 bits of a pointer (happened a lot on the 68000, where the first
implementation had only 24 address bits.) Thus, all changes needed to
go to a larger virtual address space are all internal to the kernel.
-hpa
* Jeremy Fitzhardinge <[email protected]> wrote:
> When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
> potentially have the same number of physical address bits as the
> 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory, we
> could have up to 52 bits of physical address in a pte.
>
> The 32-bit kernel uses a 32-bit unsigned long to represent a pfn. This
> means that it can only represent physical addresses up to 32+12=44
> bits wide. Rather than widening pfns everywhere, just set 2^44 as the
> Linux x86_32-PAE architectural limit for physical address size.
applied to tip/x86/cleanups - thanks Jeremy. No urgency for v2.6.26,
right?
Ingo
Ingo Molnar wrote:
>> When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
>> potentially have the same number of physical address bits as the
>> 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory, we
>> could have up to 52 bits of physical address in a pte.
>>
>> The 32-bit kernel uses a 32-bit unsigned long to represent a pfn. This
>> means that it can only represent physical addresses up to 32+12=44
>> bits wide. Rather than widening pfns everywhere, just set 2^44 as the
>> Linux x86_32-PAE architectural limit for physical address size.
>>
>
> applied to tip/x86/cleanups - thanks Jeremy. No urgency for v2.6.26,
> right?
Not urgent, but it would be nice to have.
J
* Jeremy Fitzhardinge <[email protected]> wrote:
> Ingo Molnar wrote:
>>> When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
>>> potentially have the same number of physical address bits as the
>>> 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory,
>>> we could have up to 52 bits of physical address in a pte.
>>>
>>> The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
>>> This means that it can only represent physical addresses up to
>>> 32+12=44 bits wide. Rather than widening pfns everywhere, just set
>>> 2^44 as the Linux x86_32-PAE architectural limit for physical address
>>> size.
>>>
>>
>> applied to tip/x86/cleanups - thanks Jeremy. No urgency for v2.6.26,
>> right?
>
> Not urgent, but it would be nice to have.
ok, cherry-picked it into x86/urgent. This aspect makes it eligible for
v2.6.26:
| This is a bugfix for two cases:
| 1. running a 32-bit PAE kernel on a machine with
| more than 64GB RAM.
| 2. running a 32-bit PAE Xen guest on a host machine with
| more than 64GB RAM
|
| In both cases, a pte could need to have more than 36 bits of physical,
| and masking it to 36-bits will cause fairly severe havoc.
also added a [email protected] Cc: to the commit, so it will be picked
up in stable as well.
Ingo