LinuxLists.cc - [PATCH]Fix broken VMI in 2.6.27-rc..

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Alok Kataria wrote:
>
> VMI relies on relocating the fixmap area to make room for the
> hypervisor. These 2 commits started accessing the fixmap area's and
> using them before VMI got a chance to check if it wants to relocate the
> fixmap area. Once VMI got to the point of relocating the fixmap area's
> it resulted in BUG's.
>

Could you describe this in more detail? I am not super-happy about this
solution if there is a better one, like simply locating the fixmap area
out of the way to start with.

-hpa

2008-08-07 21:28:49

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 14:20 -0700, H. Peter Anvin wrote:
> Alok Kataria wrote:
> >
> > VMI relies on relocating the fixmap area to make room for the
> > hypervisor. These 2 commits started accessing the fixmap area's and
> > using them before VMI got a chance to check if it wants to relocate the
> > fixmap area. Once VMI got to the point of relocating the fixmap area's
> > it resulted in BUG's.
> >
>
> Could you describe this in more detail? I am not super-happy about this
> solution if there is a better one, like simply locating the fixmap area
> out of the way to start with.

That can't be done until we know the size of the hole to relocate, which
isn't known until we probe in the first meg of memory to find the
associated ROM. It used to be the case that other things that poked
around with platform specific memory and checking ROM areas lived around
here in setup.c, so it was a nice place to put it. With all the
abstraction and combination and overifdeffing going on here, that might
no longer be the case.

We could move it earlier, but then we'd need another hook to call in
after max_low_pfn is known.

Or we could remove the dependency on max_low_pfn and just create a
liberal linear to physical mapping for lomem which spans all possible
low memory; then it doesn't matter so much where it is called.

Zach

2008-08-07 21:35:31

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Zachary Amsden wrote:
>
> That can't be done until we know the size of the hole to relocate, which
> isn't known until we probe in the first meg of memory to find the
> associated ROM. It used to be the case that other things that poked
> around with platform specific memory and checking ROM areas lived around
> here in setup.c, so it was a nice place to put it. With all the
> abstraction and combination and overifdeffing going on here, that might
> no longer be the case.
>
> We could move it earlier, but then we'd need another hook to call in
> after max_low_pfn is known.
>
> Or we could remove the dependency on max_low_pfn and just create a
> liberal linear to physical mapping for lomem which spans all possible
> low memory; then it doesn't matter so much where it is called.
>

Okay, you lost me about halfway through that... could you perhaps
describe the problem from the beginning, exactly what you're trying to do?

-hpa

2008-08-07 21:41:21

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 14:20 -0700, H. Peter Anvin wrote:
> Alok Kataria wrote:
> >
> > VMI relies on relocating the fixmap area to make room for the
> > hypervisor. These 2 commits started accessing the fixmap area's and
> > using them before VMI got a chance to check if it wants to relocate the
> > fixmap area. Once VMI got to the point of relocating the fixmap area's
> > it resulted in BUG's.
> >
>
> Could you describe this in more detail?

Hi Peter,

The first commit,
x86: use acpi_numa_init to parse on 32-bit numa
commit 1c6e55032e24ff79668581a0f296c278ef7edd4e

Moves the call to dmi_scan_machine before the vmi_initialization is
done, dmi_scan_machine internally calls early_ioremap, which does
early_set_fixmap effectively making use of FIXMAP areas before VMI gets
a chance to relocate it.

Similarly, in the other commit,
x86: move fix mapping page table range early
commit e7b3789524eecc96213dd69d6686efd429235051

There is this new call to early_ioremap_page_table_range_init which is
done from init_memory_mapping, this uses FIXADDR_TOP to initialize the
page table range.

Now if you look at vmi_init, we relocate the fixmap area by changing the
__FIXADDR_TOP value. So this needs to happen before anybody starts
using the fixmap area.

> I am not super-happy about this
> solution if there is a better one,

> like simply locating the fixmap area
> out of the way to start with.

I won't say that i completely understand this statement , but IMO the
patch that i sent effectively does the same thing, we make sure that the
fixmap area is set to a final value before anybody else starts using
it.

Thanks,
Alok
>
> -hpa

2008-08-07 21:43:52

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 14:34 -0700, H. Peter Anvin wrote:
> Zachary Amsden wrote:
> >
> > That can't be done until we know the size of the hole to relocate, which
> > isn't known until we probe in the first meg of memory to find the
> > associated ROM. It used to be the case that other things that poked
> > around with platform specific memory and checking ROM areas lived around
> > here in setup.c, so it was a nice place to put it. With all the
> > abstraction and combination and overifdeffing going on here, that might
> > no longer be the case.
> >
> > We could move it earlier, but then we'd need another hook to call in
> > after max_low_pfn is known.
> >
> > Or we could remove the dependency on max_low_pfn and just create a
> > liberal linear to physical mapping for lomem which spans all possible
> > low memory; then it doesn't matter so much where it is called.
> >
>
> Okay, you lost me about halfway through that... could you perhaps
> describe the problem from the beginning, exactly what you're trying to do?

A kernel compiled with VMI enabled may run on a non-VMI platform. If
that is the case, the fixmap should not be relocated. If however, a VMI
ROM is found, we need to hijack up to 64-MB of linear address space from
the top of memory down. This means moving the fixmap down by the same
amount.

Right now the code is structured in such a way that it wants to know how
much physical memory there is, so it can register a mapping table for
mapping linear addresses in the lowmem area to physical addresses. This
causes the code to depend on max_low_pfn being initialized, which
accounts for the current placement.

But it also must be called before anything that creates the fixmap,
because the same code which registers the linear address mapping also
reserves high memory above the fixmap.

My point is 1) these could be two separate calls, or 2) the lowmem
mapping table need not depend on max_low_pfn at all, it is safe to
create an extra large mapping which covers all possible lowmem instead
of the physical ram that is actually available.

Zach

2008-08-07 21:52:33

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Zachary Amsden wrote:
>>>
>> Okay, you lost me about halfway through that... could you perhaps
>> describe the problem from the beginning, exactly what you're trying to do?
>
> A kernel compiled with VMI enabled may run on a non-VMI platform. If
> that is the case, the fixmap should not be relocated. If however, a VMI
> ROM is found, we need to hijack up to 64-MB of linear address space from
> the top of memory down. This means moving the fixmap down by the same
> amount.
>

I take it there are no alternatives other than putting this at the end
of memory?

> Right now the code is structured in such a way that it wants to know how
> much physical memory there is, so it can register a mapping table for
> mapping linear addresses in the lowmem area to physical addresses. This
> causes the code to depend on max_low_pfn being initialized, which
> accounts for the current placement.
>
> But it also must be called before anything that creates the fixmap,
> because the same code which registers the linear address mapping also
> reserves high memory above the fixmap.
>
> My point is 1) these could be two separate calls, or 2) the lowmem
> mapping table need not depend on max_low_pfn at all, it is safe to
> create an extra large mapping which covers all possible lowmem instead
> of the physical ram that is actually available.

Realistically speaking, any (virtual) machine which does *not* have a
full complement of lowmem (i.e. less than 896 MB in the common case)
will not suffer significatly from losing a few megabytes of address space.

-hpa

2008-08-07 21:57:21

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 14:52 -0700, H. Peter Anvin wrote:
> Zachary Amsden wrote:
> >>>
> >> Okay, you lost me about halfway through that... could you perhaps
> >> describe the problem from the beginning, exactly what you're trying to do?
> >
> > A kernel compiled with VMI enabled may run on a non-VMI platform. If
> > that is the case, the fixmap should not be relocated. If however, a VMI
> > ROM is found, we need to hijack up to 64-MB of linear address space from
> > the top of memory down. This means moving the fixmap down by the same
> > amount.
> >
>
> I take it there are no alternatives other than putting this at the end
> of memory?

Nope, it must be in an area allowing for segmentation protection, while
keeping the kernel on zero-based segments; that means only the end of
linear address space is sufficient.

> Realistically speaking, any (virtual) machine which does *not* have a
> full complement of lowmem (i.e. less than 896 MB in the common case)
> will not suffer significatly from losing a few megabytes of address space.

Yes, the reason to make the fixmap moveable is to allow as much address
space as possible for big memory (physical) machines.

2008-08-07 22:17:49

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Zachary Amsden wrote:
>
>> Realistically speaking, any (virtual) machine which does *not* have a
>> full complement of lowmem (i.e. less than 896 MB in the common case)
>> will not suffer significatly from losing a few megabytes of address space.
>
> Yes, the reason to make the fixmap moveable is to allow as much address
> space as possible for big memory (physical) machines.
>

That being said, the fixmap area being movable more than kind of defeats
a major point of the fixmap area; the addresses in it are no longer fixed.

The only way I can see around that, though, is to move the 1:1 mapping
base up by 2/4 MB (for PAE/no PAE, respectively) and put the fixmap area
there. Kind of sucks, but would be doable.

-hpa

2008-08-07 22:39:49

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 7 Aug 2008, H. Peter Anvin wrote:
>
> The only way I can see around that, though, is to move the 1:1 mapping base up
> by 2/4 MB (for PAE/no PAE, respectively) and put the fixmap area there. Kind
> of sucks, but would be doable.

So if the address isn't fixed, you'll end up with an indirect pointer, and
it would likely be much better to just use a fixed direct pointer that is
not at the top.

And anything that is within the top 31 bits of the address space should
generate the same good code, since the fixed offset is always going to be
a 32-bit thing anyway. So moving the FIXMAP area down by 4MB sounds like a
fine thing to do with no real downside, if it then means that we don't
need to move the FIXMAP area at all.

Hmm? Am I missing something?

Linus

2008-08-07 22:58:41

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Linus Torvalds wrote:
>
> On Thu, 7 Aug 2008, H. Peter Anvin wrote:
>> The only way I can see around that, though, is to move the 1:1 mapping base up
>> by 2/4 MB (for PAE/no PAE, respectively) and put the fixmap area there. Kind
>> of sucks, but would be doable.
>
> So if the address isn't fixed, you'll end up with an indirect pointer, and
> it would likely be much better to just use a fixed direct pointer that is
> not at the top.
>
> And anything that is within the top 31 bits of the address space should
> generate the same good code, since the fixed offset is always going to be
> a 32-bit thing anyway. So moving the FIXMAP area down by 4MB sounds like a
> fine thing to do with no real downside, if it then means that we don't
> need to move the FIXMAP area at all.
>
> Hmm? Am I missing something?

Just moving it down by 4 MB doesn't help, since the VMI guys want as
much as 64 MB, which is half the standard vmalloc area and hence too
much address space lost. We can't put it at the bottom of the vmalloc
area, since that boundary is not fixed, either.

The one remaining fixed boundary in the machine is the kernel-userspace
boundary. Hence moving the 1:1 area up by one PDE unit and sticking the
fixmap area in that region.

-hpa

2008-08-07 23:09:49

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 7 Aug 2008, H. Peter Anvin wrote:
>
> Just moving it down by 4 MB doesn't help, since the VMI guys want as much as
> 64 MB, which is half the standard vmalloc area and hence too much address
> space lost. We can't put it at the bottom of the vmalloc area, since that
> boundary is not fixed, either.

Yeah, ok. Since this is a 32-bit only issue, 64MB is actually a fair chunk
of our already limited virtual space.

> The one remaining fixed boundary in the machine is the kernel-userspace
> boundary. Hence moving the 1:1 area up by one PDE unit and sticking the
> fixmap area in that region.

Yeah, ok, but I'd be more nervous about the validation issues there. There
might be a lot of code that assumes that TASK_SIZE is the start of the 1:1
area. It does sound like a good approach, it just makes me worry about the
test coverage.

Linus

2008-08-07 23:13:27

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Linus Torvalds wrote:
>
> On Thu, 7 Aug 2008, H. Peter Anvin wrote:
>> Just moving it down by 4 MB doesn't help, since the VMI guys want as much as
>> 64 MB, which is half the standard vmalloc area and hence too much address
>> space lost. We can't put it at the bottom of the vmalloc area, since that
>> boundary is not fixed, either.
>
> Yeah, ok. Since this is a 32-bit only issue, 64MB is actually a fair chunk
> of our already limited virtual space.
>
>> The one remaining fixed boundary in the machine is the kernel-userspace
>> boundary. Hence moving the 1:1 area up by one PDE unit and sticking the
>> fixmap area in that region.
>
> Yeah, ok, but I'd be more nervous about the validation issues there. There
> might be a lot of code that assumes that TASK_SIZE is the start of the 1:1
> area. It does sound like a good approach, it just makes me worry about the
> test coverage.
>

Indeed. Unfortunately I don't see any other options.

-hpa

2008-08-07 23:21:38

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Zachary Amsden wrote:
> Or we could remove the dependency on max_low_pfn and just create a
> liberal linear to physical mapping for lomem which spans all possible
> low memory; then it doesn't matter so much where it is called.
>

Yes. You could just call reserve_top_address() at a suitably early
point to reserve the space. Its a pvops API call which has been there
since patch one or two of pvops.

It does exactly what the rest of this thread discusses, rendering it moot.

J

2008-08-07 23:23:40

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

H. Peter Anvin wrote:
> That being said, the fixmap area being movable more than kind of
> defeats a major point of the fixmap area; the addresses in it are no
> longer fixed.

It's always been movable with CONFIG_PARAVIRT enabled. Xen needs to
reserve a hole at the top of the address space too.

J

2008-08-07 23:27:40

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 16:08 -0700, Linus Torvalds wrote:
>
> On Thu, 7 Aug 2008, H. Peter Anvin wrote:
> >
> > Just moving it down by 4 MB doesn't help, since the VMI guys want as much as
> > 64 MB, which is half the standard vmalloc area and hence too much address
> > space lost. We can't put it at the bottom of the vmalloc area, since that
> > boundary is not fixed, either.
>
> Yeah, ok. Since this is a 32-bit only issue, 64MB is actually a fair chunk
> of our already limited virtual space.
>
> > The one remaining fixed boundary in the machine is the kernel-userspace
> > boundary. Hence moving the 1:1 area up by one PDE unit and sticking the
> > fixmap area in that region.
>
> Yeah, ok, but I'd be more nervous about the validation issues there. There
> might be a lot of code that assumes that TASK_SIZE is the start of the 1:1
> area. It does sound like a good approach, it just makes me worry about the
> test coverage.

Well, here's an idea from outer space. The fixmap can't possibly be
used until it's got a backing page table and initial mappings installed.
One can imagine a world where references to the fixmap are left as
unresolved, and then those unresolved symbols are linked to the fixmap
area when it gets set up in the kernel page table. Voilla!

The requisite foodling required to massage various gcci and lds into
compliance with this scheme, not to mention the required module loading
changes might be a bit of headache, and even then, I'm not sure that gcc
will be smart enough to allow all the required relocations to generate
optimal code.

But the upshot would be the potential for dynamic registration of fixmap
areas, yet still keeping direct pointers into the thing, and also
removing all the ifdefs from the fixmap definitions for the various
platform specific fixmap pages. Just leave dangling references to some
fixed bad address (fixmap_hole) for things unused. And even allow
kernel modules to register new fixmap types!

All it requires is a well thought out strategy for naming fixmap pages
and then two sprinkles of linker magic. You could even randomize the
non-randomized VDSO location at boot-time. Whee!

Zach

2008-08-07 23:31:51

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
>> Or we could remove the dependency on max_low_pfn and just create a
>> liberal linear to physical mapping for lomem which spans all possible
>> low memory; then it doesn't matter so much where it is called.
>>
>
> Yes. You could just call reserve_top_address() at a suitably early
> point to reserve the space. Its a pvops API call which has been there
> since patch one or two of pvops.
>
> It does exactly what the rest of this thread discusses, rendering it moot.

It's not moot.

The fixmap area should never have been made movable. It's utter
braindamage.

Given the x86 architecture, it's inevitable that PV will want to reserve
address space at the top of memory, and therefore the fixmap area needs
to be moved out of that space.

-hpa

2008-08-07 23:46:53

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

H. Peter Anvin wrote:
> The fixmap area should never have been made movable. It's utter
> braindamage.

Shrug. It's been like that for a couple of years now. It was one of
the very first paravirt-ops patches. It wasn't controversial then, and
nobody seems to have noticed since.

> Given the x86 architecture, it's inevitable that PV will want to
> reserve address space at the top of memory, and therefore the fixmap
> area needs to be moved out of that space.

OK. But there's a few places where the code uses FIXADDR_TOP to mean
"top of kernel address space", so we'd need to come up with a proper
symbol for that.

J

2008-08-07 23:50:25

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Zachary Amsden wrote:
> On Thu, 2008-08-07 at 16:08 -0700, Linus Torvalds wrote:
>
>> On Thu, 7 Aug 2008, H. Peter Anvin wrote:
>>
>>> Just moving it down by 4 MB doesn't help, since the VMI guys want as much as
>>> 64 MB, which is half the standard vmalloc area and hence too much address
>>> space lost. We can't put it at the bottom of the vmalloc area, since that
>>> boundary is not fixed, either.
>>>
>> Yeah, ok. Since this is a 32-bit only issue, 64MB is actually a fair chunk
>> of our already limited virtual space.
>>
>>
>>> The one remaining fixed boundary in the machine is the kernel-userspace
>>> boundary. Hence moving the 1:1 area up by one PDE unit and sticking the
>>> fixmap area in that region.
>>>
>> Yeah, ok, but I'd be more nervous about the validation issues there. There
>> might be a lot of code that assumes that TASK_SIZE is the start of the 1:1
>> area. It does sound like a good approach, it just makes me worry about the
>> test coverage.
>>
>
> Well, here's an idea from outer space. The fixmap can't possibly be
> used until it's got a backing page table and initial mappings installed.
> One can imagine a world where references to the fixmap are left as
> unresolved, and then those unresolved symbols are linked to the fixmap
> area when it gets set up in the kernel page table. Voilla!
>
> The requisite foodling required to massage various gcci and lds into
> compliance with this scheme, not to mention the required module loading
> changes might be a bit of headache, and even then, I'm not sure that gcc
> will be smart enough to allow all the required relocations to generate
> optimal code.
>
> But the upshot would be the potential for dynamic registration of fixmap
> areas, yet still keeping direct pointers into the thing, and also
> removing all the ifdefs from the fixmap definitions for the various
> platform specific fixmap pages. Just leave dangling references to some
> fixed bad address (fixmap_hole) for things unused. And even allow
> kernel modules to register new fixmap types!
>
> All it requires is a well thought out strategy for naming fixmap pages
> and then two sprinkles of linker magic. You could even randomize the
> non-randomized VDSO location at boot-time. Whee!
>

fixmap.ko, except backwards?

That said, isn't this exactly what the immediate values stuff is
supposed to be able to do?

J

2008-08-07 23:54:39

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Jeremy Fitzhardinge wrote:
> H. Peter Anvin wrote:
>> The fixmap area should never have been made movable. It's utter
>> braindamage.
>
> Shrug. It's been like that for a couple of years now. It was one of
> the very first paravirt-ops patches. It wasn't controversial then, and
> nobody seems to have noticed since.

The Linux kernel was never a paragon of perfection - it was never meant
to be. Just because a bit of cruft went unnoticed into the kernel
doesn't mean we shouldn't fix it.

>> Given the x86 architecture, it's inevitable that PV will want to
>> reserve address space at the top of memory, and therefore the fixmap
>> area needs to be moved out of that space.
>
> OK. But there's a few places where the code uses FIXADDR_TOP to mean
> "top of kernel address space", so we'd need to come up with a proper
> symbol for that.

I suggest KERNEL_TOP.

-hpa

2008-08-08 00:01:20

by Yinghai Lu

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, Aug 7, 2008 at 4:51 PM, H. Peter Anvin <[email protected]> wrote:
> Jeremy Fitzhardinge wrote:
>>
>> H. Peter Anvin wrote:
>>>
>>> The fixmap area should never have been made movable. It's utter
>>> braindamage.
>>
>> Shrug. It's been like that for a couple of years now. It was one of the
>> very first paravirt-ops patches. It wasn't controversial then, and nobody
>> seems to have noticed since.
>
> The Linux kernel was never a paragon of perfection - it was never meant to
> be. Just because a bit of cruft went unnoticed into the kernel doesn't mean
> we shouldn't fix it.
>
>>> Given the x86 architecture, it's inevitable that PV will want to reserve
>>> address space at the top of memory, and therefore the fixmap area needs to
>>> be moved out of that space.
>>
>> OK. But there's a few places where the code uses FIXADDR_TOP to mean "top
>> of kernel address space", so we'd need to come up with a proper symbol for
>> that.

why not reserving that in e820 table?

YH

2008-08-08 00:10:52

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>> H. Peter Anvin wrote:
>>> The fixmap area should never have been made movable. It's utter
>>> braindamage.
>>
>> Shrug. It's been like that for a couple of years now. It was one of
>> the very first paravirt-ops patches. It wasn't controversial then,
>> and nobody seems to have noticed since.
>
> The Linux kernel was never a paragon of perfection - it was never
> meant to be. Just because a bit of cruft went unnoticed into the
> kernel doesn't mean we shouldn't fix it.

I don't really see what the issue is.

Fixmaps are primarily used for things that need to be mapped early
before we can allocate address space dynamically. They're predominantly
used for boot-time init, and rarely on any performance-critical path.
The only vaguely regular use a fixmap gets during runtime is poking at
apics, and that's dominated by IO time, and kmap_atomic. Statically,
there's only 100 references in the kernel. And it only affects 32-bit.

Having fixmaps at link-time fixed addresses would be nice, I suppose,
but hardly worth going to vast effort over.

>>> Given the x86 architecture, it's inevitable that PV will want to
>>> reserve address space at the top of memory, and therefore the fixmap
>>> area needs to be moved out of that space.
>>
>> OK. But there's a few places where the code uses FIXADDR_TOP to mean
>> "top of kernel address space", so we'd need to come up with a proper
>> symbol for that.
>
> I suggest KERNEL_TOP.

Fine by me. It would be easy to plug KERNEL_TOP/__KERNEL_TOP in now,
and then fix up fixmap independently.

J

2008-08-08 00:15:06

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Yinghai Lu wrote:
>
> why not reserving that in e820 table?
>

Virtual space, not physical space.

-hpa

2008-08-08 00:17:14

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Jeremy Fitzhardinge wrote:
>>
>> The Linux kernel was never a paragon of perfection - it was never
>> meant to be. Just because a bit of cruft went unnoticed into the
>> kernel doesn't mean we shouldn't fix it.
>
> I don't really see what the issue is.
>
> Fixmaps are primarily used for things that need to be mapped early
> before we can allocate address space dynamically. They're predominantly
> used for boot-time init, and rarely on any performance-critical path.
> The only vaguely regular use a fixmap gets during runtime is poking at
> apics, and that's dominated by IO time, and kmap_atomic. Statically,
> there's only 100 references in the kernel. And it only affects 32-bit.
>
> Having fixmaps at link-time fixed addresses would be nice, I suppose,
> but hardly worth going to vast effort over.
>

No, but it's hardly vast effort, either.

>>>> Given the x86 architecture, it's inevitable that PV will want to
>>>> reserve address space at the top of memory, and therefore the fixmap
>>>> area needs to be moved out of that space.
>>>
>>> OK. But there's a few places where the code uses FIXADDR_TOP to mean
>>> "top of kernel address space", so we'd need to come up with a proper
>>> symbol for that.
>>
>> I suggest KERNEL_TOP.
>
> Fine by me. It would be easy to plug KERNEL_TOP/__KERNEL_TOP in now,
> and then fix up fixmap independently.

Yes, and we should add a symbol for the bottom of the 1:1 area as well
(to disambiguate it from TASK_SIZE).

-hpa

2008-08-08 00:24:14

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

H. Peter Anvin wrote:
>> Having fixmaps at link-time fixed addresses would be nice, I suppose,
>> but hardly worth going to vast effort over.
>>
>
> No, but it's hardly vast effort, either.

Well, off you go.

> Yes, and we should add a symbol for the bottom of the 1:1 area as well
> (to disambiguate it from TASK_SIZE).
>

Well, that's already called "PAGE_OFFSET". 64-bit needs to be careful
about the distinction anyway, because there's the sign extension hole
between user and kernel space. Xen squeezes itself in just above the
hole, so I moved PAGE_OFFSET up accordingly.

J

2008-08-08 00:32:41

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Jeremy Fitzhardinge wrote:
>
>> Yes, and we should add a symbol for the bottom of the 1:1 area as well
>> (to disambiguate it from TASK_SIZE).
>
> Well, that's already called "PAGE_OFFSET". 64-bit needs to be careful
> about the distinction anyway, because there's the sign extension hole
> between user and kernel space. Xen squeezes itself in just above the
> hole, so I moved PAGE_OFFSET up accordingly.

Ah, that explains that strange offset. I was wondering about that.

-hpa

2008-08-08 01:15:34

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 17:10 -0700, Jeremy Fitzhardinge wrote:
> H. Peter Anvin wrote:
> > Jeremy Fitzhardinge wrote:
> >> H. Peter Anvin wrote:
> >>> The fixmap area should never have been made movable. It's utter
> >>> braindamage.
> >>
> >> Shrug. It's been like that for a couple of years now. It was one of
> >> the very first paravirt-ops patches. It wasn't controversial then,
> >> and nobody seems to have noticed since.
> >
> > The Linux kernel was never a paragon of perfection - it was never
> > meant to be. Just because a bit of cruft went unnoticed into the
> > kernel doesn't mean we shouldn't fix it.
>
> I don't really see what the issue is.
>
> Fixmaps are primarily used for things that need to be mapped early
> before we can allocate address space dynamically. They're predominantly
> used for boot-time init, and rarely on any performance-critical path.
> The only vaguely regular use a fixmap gets during runtime is poking at
> apics, and that's dominated by IO time, and kmap_atomic. Statically,
> there's only 100 references in the kernel. And it only affects 32-bit.
>
> Having fixmaps at link-time fixed addresses would be nice, I suppose,
> but hardly worth going to vast effort over.

Using my interplanetary ideas, linking the fixmap at runtime would allow
optimal placement of the fixmap, any hypervisor areas, vmalloc, and
pkmap space. That might allow one to increase the amount of lowmem
available for direct mapping depending on some platform variables such
as hypervisor reserved space, physical memory size, APIC present.. those
aren't known until boot time. Actually, NCPUs is a good one, since we
require atomic kmap space dependent on NCPUs, which could be given back
to linear memory map.

That might actually be more worthwhile than link time fixed addresses.

Zach

2008-08-08 01:22:57

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Zachary Amsden wrote:
>
> Using my interplanetary ideas, linking the fixmap at runtime would allow
> optimal placement of the fixmap, any hypervisor areas, vmalloc, and
> pkmap space. That might allow one to increase the amount of lowmem
> available for direct mapping depending on some platform variables such
> as hypervisor reserved space, physical memory size, APIC present.. those
> aren't known until boot time. Actually, NCPUs is a good one, since we
> require atomic kmap space dependent on NCPUs, which could be given back
> to linear memory map.
>
> That might actually be more worthwhile than link time fixed addresses.
>

Yeah, but that's a huge project.

-hpa

2008-08-08 01:29:35

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 18:19 -0700, H. Peter Anvin wrote:
> Zachary Amsden wrote:
> >
> > Using my interplanetary ideas, linking the fixmap at runtime would allow
> > optimal placement of the fixmap, any hypervisor areas, vmalloc, and
> > pkmap space. That might allow one to increase the amount of lowmem
> > available for direct mapping depending on some platform variables such
> > as hypervisor reserved space, physical memory size, APIC present.. those
> > aren't known until boot time. Actually, NCPUs is a good one, since we
> > require atomic kmap space dependent on NCPUs, which could be given back
> > to linear memory map.
> >
> > That might actually be more worthwhile than link time fixed addresses.
> >
>
> Yeah, but that's a huge project.

But it's awesome.

2008-08-08 06:10:53

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>>>
>>> The Linux kernel was never a paragon of perfection - it was never
>>> meant to be. Just because a bit of cruft went unnoticed into the
>>> kernel doesn't mean we shouldn't fix it.
>>
>> I don't really see what the issue is.
>>
>> Fixmaps are primarily used for things that need to be mapped early
>> before we can allocate address space dynamically. They're
>> predominantly used for boot-time init, and rarely on any
>> performance-critical path. The only vaguely regular use a fixmap
>> gets during runtime is poking at apics, and that's dominated by IO
>> time, and kmap_atomic. Statically, there's only 100 references in
>> the kernel. And it only affects 32-bit.
>>
>> Having fixmaps at link-time fixed addresses would be nice, I suppose,
>> but hardly worth going to vast effort over.
>>
>
> No, but it's hardly vast effort, either.

Thinking about it, the fixmap really has to be as high as possible. If
it were any lower, then it would either truncate the 1:1 mapping, or
shadow some physical memory.

J

2008-08-08 16:18:08

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

Jeremy Fitzhardinge wrote:
>>
>> No, but it's hardly vast effort, either.
>
> Thinking about it, the fixmap really has to be as high as possible. If
> it were any lower, then it would either truncate the 1:1 mapping, or
> shadow some physical memory.
>

The proposal was to put it *before* the 1:1 mapping:

FIX_HOLE would start at TASK_SIZE
PAGE_OFFSET would shift to TASK_SIZE + PMD_PAGE_SIZE

All the remaining offsets remain as-is.

It's slightly less efficient than the current version, since we can't
share the PDE page for the fixmap with the final vmalloc map, but not
terribly so.

-hpa

2008-08-08 19:16:24

by Alok Kataria

[permalink] [raw]

Subject: Re: [PATCH]Fix broken VMI in 2.6.27-rc..

On Thu, 2008-08-07 at 14:52 -0700, H. Peter Anvin wrote:
> Zachary Amsden wrote:
> > My point is 1) these could be two separate calls, or 2) the lowmem
> > mapping table need not depend on max_low_pfn at all, it is safe to
> > create an extra large mapping which covers all possible lowmem instead
> > of the physical ram that is actually available.
>
> Realistically speaking, any (virtual) machine which does *not* have a
> full complement of lowmem (i.e. less than 896 MB in the common case)
> will not suffer significatly from losing a few megabytes of address space.

Ok, since we are already past rc-2, I think we should fix the VMI
problem sooner than later. Any approach that we eventually take to make
the fixmap's actually *fixed*, would be independent of this fix.

Below is the patch which does away from the dependency of activating VMI
only after max_low_pfn is known. We move vmi_initialization very early
in the setup_arch code.

Patch on top of current git. Please have a look and apply.
--
From: Alok N Kataria <[email protected]>

The lowmem mapping table created by VMI need not depend on max_low_pfn
at all. Instead we now create an extra large mapping which covers all
possible lowmem instead of the physical ram that is actually available.

This allows the vmi initialization to be done before max_low_pfn could
be computed. We also move the vmi_init code very early in the boot process
so that nobody accidentally breaks the fixmap dependancy.

Signed-off-by: Alok N Kataria <[email protected]>
Acked-by: Zachary Amsden <[email protected]>
---

arch/x86/kernel/setup.c | 16 ++++++++--------
arch/x86/kernel/vmi_32.c | 3 ++-
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2d88858..6e5823b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -604,6 +604,14 @@ void __init setup_arch(char **cmdline_p)
early_cpu_init();
early_ioremap_init();

+#if defined(CONFIG_VMI) && defined(CONFIG_X86_32)
+ /*
+ * Must be before kernel pagetables are setup
+ * or fixmap area is touched.
+ */
+ vmi_init();
+#endif
+
ROOT_DEV = old_decode_dev(boot_params.hdr.root_dev);
screen_info = boot_params.screen_info;
edid_info = boot_params.edid_info;
@@ -817,14 +825,6 @@ void __init setup_arch(char **cmdline_p)
kvmclock_init();
#endif

-#if defined(CONFIG_VMI) && defined(CONFIG_X86_32)
- /*
- * Must be after max_low_pfn is determined, and before kernel
- * pagetables are setup.
- */
- vmi_init();
-#endif
-
paravirt_pagetable_setup_start(swapper_pg_dir);
paging_init();
paravirt_pagetable_setup_done(swapper_pg_dir);
diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
index 0a1b1a9..6ca515d 100644
--- a/arch/x86/kernel/vmi_32.c
+++ b/arch/x86/kernel/vmi_32.c
@@ -37,6 +37,7 @@
#include <asm/timer.h>
#include <asm/vmi_time.h>
#include <asm/kmap_types.h>
+#include <asm/setup.h>

/* Convenient for calling VMI functions indirectly in the ROM */
typedef u32 __attribute__((regparm(1))) (VROMFUNC)(void);
@@ -683,7 +684,7 @@ void vmi_bringup(void)
{
/* We must establish the lowmem mapping for MMU ops to work */
if (vmi_ops.set_linear_mapping)
- vmi_ops.set_linear_mapping(0, (void *)__PAGE_OFFSET, max_low_pfn, 0);
+ vmi_ops.set_linear_mapping(0, (void *)__PAGE_OFFSET, MAXMEM_PFN, 0);
}

/*

2008-08-08 22:23:38