2013-07-31 20:54:36

by Borislav Petkov

[permalink] [raw]
Subject: Corrupted EFI region

Hi guys,

so I'm seeing this funny thing where an EFI region changes when we enter
efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff:

--- before 2013-07-31 22:20:52.316039492 +0200
+++ after 2013-07-31 22:21:30.960731706 +0200
@@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0
efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
-efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
+efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)

That second boundary of region mem11 suddenly changes *before* we merge
the regions. edk2 bug?

Whole dmesg attached.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


Attachments:
(No filename) (1.17 kB)
test-x86_64.log.gz (8.90 kB)
Download all attachments

2013-07-31 20:59:12

by Matthew Garrett

[permalink] [raw]
Subject: Re: Corrupted EFI region

On Wed, Jul 31, 2013 at 10:54:31PM +0200, Borislav Petkov wrote:

> efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
> efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
> efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
> -efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
> +efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
> efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
> efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
> efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)

Are we making any EFI calls in between? I certainly wouldn't expect the
memory map to change after ExitBootServices, but up until that point the
firmware's free to mess with it.

--
Matthew Garrett | [email protected]

2013-07-31 21:51:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: Corrupted EFI region

On Wed, Jul 31, 2013 at 09:58:58PM +0100, Matthew Garrett wrote:
> On Wed, Jul 31, 2013 at 10:54:31PM +0200, Borislav Petkov wrote:
>
> > efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
> > efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
> > efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
> > -efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
> > +efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
> > efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
> > efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
> > efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
>
> Are we making any EFI calls in between?

Probably.

> I certainly wouldn't expect the memory map to change after
> ExitBootServices, but up until that point the firmware's free to mess
> with it.

Well, we call ExitBootServices pretty early in exit_boot().

But the problem is, something messes up the upper boundary of the region
and it is an EFI_BOOT_SERVICES_DATA region which we need for the runtime
services mapping and if we can't map it properly, we're probably going
to miss functionality or not have runtime at all.

I hope this is an edk2 only bug. Btw, I'm not running the latest version
so the hope that it could've been fixed in the meantime is not dead yet.

:)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-31 21:54:29

by Matthew Garrett

[permalink] [raw]
Subject: Re: Corrupted EFI region

On Wed, Jul 31, 2013 at 11:51:30PM +0200, Borislav Petkov wrote:

> But the problem is, something messes up the upper boundary of the region
> and it is an EFI_BOOT_SERVICES_DATA region which we need for the runtime
> services mapping and if we can't map it properly, we're probably going
> to miss functionality or not have runtime at all.

"Easiest" way around this would probably be to stash the address map
after ExitBootServices() and compare it at SetVirtualAddressMap() time,
then take the widest boundaries and trim the e820 map to match. This is
obviously dependent upon the system not allocating anything further
after that, but it seems safest. The worst case is finding the firmware
writing over bits of the kernel.

--
Matthew Garrett | [email protected]

2013-07-31 21:55:42

by David Woodhouse

[permalink] [raw]
Subject: Re: Corrupted EFI region

On Wed, 2013-07-31 at 22:54 +0200, Borislav Petkov wrote:
> so I'm seeing this funny thing where an EFI region changes when we enter
> efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff:

Perhaps the [email protected] list should be in Cc?

--
dwmw2


Attachments:
smime.p7s (5.61 kB)

2013-08-01 16:49:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: Corrupted EFI region

On Wed, Jul 31, 2013 at 10:55:27PM +0100, David Woodhouse wrote:
> On Wed, 2013-07-31 at 22:54 +0200, Borislav Petkov wrote:
> > so I'm seeing this funny thing where an EFI region changes when we enter
> > efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff:
>
> Perhaps the [email protected] list should be in Cc?

Good idea and message repeated below.

One more thing: I'm using a self-built OVMF with top commit from March:

------------------------------------------------------------------------
r14165 | sfu5 | 2013-03-06 02:42:04 +0100 (Wed, 06 Mar 2013) | 4 lines

Fix a bug that IsSignatureFoundInDatabase() incorrectly computes CertCount.

---

Hi guys,

so I'm seeing this funny thing where an EFI region changes when we enter
efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff:

--- before 2013-07-31 22:20:52.316039492 +0200
+++ after 2013-07-31 22:21:30.960731706 +0200
@@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0
efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
-efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
+efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)

That second boundary of region mem11 suddenly changes *before* we merge
the regions. edk2 bug?

Whole dmesg attached.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


Attachments:
(No filename) (1.84 kB)
test-x86_64.log.gz (8.90 kB)
Download all attachments

2013-08-01 16:51:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: Corrupted EFI region

On Wed, Jul 31, 2013 at 10:54:23PM +0100, Matthew Garrett wrote:
> On Wed, Jul 31, 2013 at 11:51:30PM +0200, Borislav Petkov wrote:
>
> > But the problem is, something messes up the upper boundary of the region
> > and it is an EFI_BOOT_SERVICES_DATA region which we need for the runtime
> > services mapping and if we can't map it properly, we're probably going
> > to miss functionality or not have runtime at all.
>
> "Easiest" way around this would probably be to stash the address map
> after ExitBootServices() and compare it at SetVirtualAddressMap()
> time, then take the widest boundaries and trim the e820 map to match.
> This is obviously dependent upon the system not allocating anything
> further after that, but it seems safest. The worst case is finding the
> firmware writing over bits of the kernel.

Actually, with UEFI impl. f*ckup like that, I'd rather kick it back to
the curb and simply not init runtime services instead of fixing stuff
around it. SetVirtualAddressMap in itself is a f*ckup as it is already.

Oh well, let's see what edk2 guys would say first.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 11:25:52

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/01/13 18:49, Borislav Petkov wrote:
> On Wed, Jul 31, 2013 at 10:55:27PM +0100, David Woodhouse wrote:
>> On Wed, 2013-07-31 at 22:54 +0200, Borislav Petkov wrote:
>>> so I'm seeing this funny thing where an EFI region changes when we enter
>>> efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff:
>>
>> Perhaps the [email protected] list should be in Cc?
>
> Good idea and message repeated below.
>
> One more thing: I'm using a self-built OVMF with top commit from March:
>
> ------------------------------------------------------------------------
> r14165 | sfu5 | 2013-03-06 02:42:04 +0100 (Wed, 06 Mar 2013) | 4 lines
>
> Fix a bug that IsSignatureFoundInDatabase() incorrectly computes CertCount.
>
> ---
>
> Hi guys,
>
> so I'm seeing this funny thing where an EFI region changes when we enter
> efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff:
>
> --- before 2013-07-31 22:20:52.316039492 +0200
> +++ after 2013-07-31 22:21:30.960731706 +0200
> @@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0
> efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
> efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
> efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
> -efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
> +efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)

(type 4 is EfiBootServicesData)

> efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
> efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
> efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
>
> That second boundary of region mem11 suddenly changes *before* we merge
> the regions. edk2 bug?

I take it you mean this change (ie. appearance of the zero-sized range)
occurs when you enable KVM acceleration in qemu?

If so, please locate "gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel"
in OvmfPkg/OvmfPkgX64.dsc, and set the following bit in its value:

# DEBUG_GCD 0x00100000 Global Coherency Database changes

Then please rebuild OVMF, and capture the debug port output of qemu
("-debugcon file:debug.log -global isa-debugcon.iobase=0x402") both with
and without KVM.

DEBUG_GCD should produce messages related to CoreAllocateSpace(), and
might help us find the spot the difference is introduced.

BTW does this have anything to do with the NX bit report of yours, or
have you noticed this independently?

(I'm not subscribed to lkml so apologies if this email doesn't end up in
those archives / doesn't reach everyone.)

Thanks
Laszlo

2013-08-05 13:03:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 01:27:16PM +0200, Laszlo Ersek wrote:
> > --- before 2013-07-31 22:20:52.316039492 +0200
> > +++ after 2013-07-31 22:21:30.960731706 +0200
> > @@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0
> > efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
> > efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
> > efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
> > -efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
> > +efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
>
> (type 4 is EfiBootServicesData)

Yes.

> > efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
> > efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
> > efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
> >
> > That second boundary of region mem11 suddenly changes *before* we merge
> > the regions. edk2 bug?
>
> I take it you mean this change (ie. appearance of the zero-sized range)
> occurs when you enable KVM acceleration in qemu?

Right. And I'm booting with qemu -enable-kvm so KVM acceleration is
enabled?? Or do you mean something else.

> If so, please locate "gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel"
> in OvmfPkg/OvmfPkgX64.dsc, and set the following bit in its value:
>
> # DEBUG_GCD 0x00100000 Global Coherency Database changes
>
> Then please rebuild OVMF, and capture the debug port output of qemu
> ("-debugcon file:debug.log -global isa-debugcon.iobase=0x402") both with
> and without KVM.
>
> DEBUG_GCD should produce messages related to CoreAllocateSpace(), and
> might help us find the spot the difference is introduced.

Ok, I'll try to get this thing done before my vacation. If not, we'll
deal with it afterwards but I won't forget, I promise! :-)

> BTW does this have anything to do with the NX bit report of yours, or
> have you noticed this independently?

Independently, while testing my runtime services mapping patchset. I was
getting an empty region and was wondering whether to discard it from the
mapping or not and then I looked at why I get it in the first place.

Basically, I get this empty region which appears at some point. It is
there when we enter efi_enter_virtual_mode in the kernel to setup the
runtime mappings:

[ 0.005012] efi: efi_enter_virtual_mode: enter
[ 0.006004] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.007004] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.008003] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.009004] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.010004] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.011004] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.012004] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.013003] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.014004] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.015004] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.016004] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.017004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[ 0.018003] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)

When we dump the EFI regions initially, it is ok.

[ 0.000000] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
[ 0.000000] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)

So what basically happens is the end boundary of the region becomes the
start, practically turning it into a 0-size one.

Thanks for looking into it.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 13:37:49

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/13 15:02, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 01:27:16PM +0200, Laszlo Ersek wrote:
>>> --- before 2013-07-31 22:20:52.316039492 +0200
>>> +++ after 2013-07-31 22:21:30.960731706 +0200
>>> @@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0
>>> efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
>>> efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
>>> efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
>>> -efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
>>> +efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
>>
>> (type 4 is EfiBootServicesData)
>
> Yes.
>
>>> efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
>>> efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
>>> efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
>>>
>>> That second boundary of region mem11 suddenly changes *before* we merge
>>> the regions. edk2 bug?
>>
>> I take it you mean this change (ie. appearance of the zero-sized range)
>> occurs when you enable KVM acceleration in qemu?
>
> Right. And I'm booting with qemu -enable-kvm so KVM acceleration is
> enabled?? Or do you mean something else.

My question was: is my understanding correct that you only see this
problem with "-enable-kvm"? Because,

On 08/01/13 18:49, Borislav Petkov wrote:
> so I'm seeing this funny thing where an EFI region changes when we
> enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's
> the diff:

You said "on kvm", and provided a diff. I think (hope) I understand the
environment you've denoted with "after", but what's your "before"? The
absence of "-enable-kvm", or something else?

>
>> If so, please locate "gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel"
>> in OvmfPkg/OvmfPkgX64.dsc, and set the following bit in its value:
>>
>> # DEBUG_GCD 0x00100000 Global Coherency Database changes
>>
>> Then please rebuild OVMF, and capture the debug port output of qemu
>> ("-debugcon file:debug.log -global isa-debugcon.iobase=0x402") both with
>> and without KVM.
>>
>> DEBUG_GCD should produce messages related to CoreAllocateSpace(), and
>> might help us find the spot the difference is introduced.
>
> Ok, I'll try to get this thing done before my vacation. If not, we'll
> deal with it afterwards but I won't forget, I promise! :-)
>
>> BTW does this have anything to do with the NX bit report of yours, or
>> have you noticed this independently?
>
> Independently, while testing my runtime services mapping patchset.

What's the purpose of that series? Can you please provide a link (if you
posted versions of it already)?

> I was
> getting an empty region and was wondering whether to discard it from the
> mapping or not and then I looked at why I get it in the first place.
>
> Basically, I get this empty region which appears at some point. It is
> there when we enter efi_enter_virtual_mode in the kernel to setup the
> runtime mappings:
>
> [ 0.005012] efi: efi_enter_virtual_mode: enter
> [ 0.006004] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
> [ 0.007004] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
> [ 0.008003] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
> [ 0.009004] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
> [ 0.010004] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
> [ 0.011004] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
> [ 0.012004] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
> [ 0.013003] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
> [ 0.014004] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
> [ 0.015004] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
> [ 0.016004] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
> [ 0.017004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> [ 0.018003] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
>
> When we dump the EFI regions initially, it is ok.
>
> [ 0.000000] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
> [ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
> [ 0.000000] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
>
> So what basically happens is the end boundary of the region becomes the
> start, practically turning it into a 0-size one.

... and you guys suspect that some firmware code is responsible, code
that runs between the initial memory map dump, and efi_enter_virtual_mode():

https://lkml.org/lkml/2013/7/31/550

> Thanks for looking into it.

Hopefully DEBUG_GCD will tell us something.

Thanks
Laszlo

2013-08-05 14:03:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 03:39:31PM +0200, Laszlo Ersek wrote:
> My question was: is my understanding correct that you only see this
> problem with "-enable-kvm"? Because,
>
> On 08/01/13 18:49, Borislav Petkov wrote:
> > so I'm seeing this funny thing where an EFI region changes when we
> > enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's
> > the diff:
>
> You said "on kvm", and provided a diff. I think (hope) I understand the
> environment you've denoted with "after", but what's your "before"? The
> absence of "-enable-kvm", or something else?

Ah, I see.

So 'before' is the initial dump of the EFI regions, very early during
boot:

[ 0.000000] efi: EFI v2.31 by EDK II
[ 0.000000] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014
[ 0.000000] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.000000] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.000000] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.000000] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.000000] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.000000] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.000000] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.000000] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.000000] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.000000] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.000000] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
[ 0.000000] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.000000] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.000000] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.000000] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.000000] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.000000] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.000000] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.000000] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.000000] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.000000] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.000000] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.000000] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.000000] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.000000] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.000000] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.000000] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.000000] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)

and with 'after' I've denoted the dump of the EFI regions a second time,
a bit later, when we enter efi_enter_virtual_mode():

[ 0.005012] efi: efi_enter_virtual_mode: enter
[ 0.006004] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.007004] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.008003] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.009004] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.010004] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.011004] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.012004] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.013003] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.014004] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.015004] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.016004] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.017004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
[ 0.018003] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.019003] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.021010] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.022004] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.023003] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.024004] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.025003] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.026004] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.027003] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.028003] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.029004] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.030004] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.031004] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.032004] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.033004] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.034003] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.035003] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)

during the *same* boot.

So, it is one boot but two dumps of the EFI regions. And yes, I'm
booting with the 'kvm' executable which has '-enable-kvm'

> What's the purpose of that series? Can you please provide a link (if
> you posted versions of it already)?

Not yet posted but working on it.

The idea is to map the runtime regions at stable addresses so that when
we kexec a kernel, it can use runtime services too. And we have to do
that because of the braindead design of SetVirtualAddressMap() being
callable only once per boot.

> > So what basically happens is the end boundary of the region becomes the
> > start, practically turning it into a 0-size one.
>
> ... and you guys suspect that some firmware code is responsible, code
> that runs between the initial memory map dump, and efi_enter_virtual_mode():
>
> https://lkml.org/lkml/2013/7/31/550

I wouldn't wonder if we f*cked it up again like the last time. I'll give
it a long hard look.

> > Thanks for looking into it.
>
> Hopefully DEBUG_GCD will tell us something.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 14:26:00

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/13 16:03, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 03:39:31PM +0200, Laszlo Ersek wrote:
>> My question was: is my understanding correct that you only see this
>> problem with "-enable-kvm"? Because,
>>
>> On 08/01/13 18:49, Borislav Petkov wrote:
>>> so I'm seeing this funny thing where an EFI region changes when we
>>> enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's
>>> the diff:
>>
>> You said "on kvm", and provided a diff. I think (hope) I understand the
>> environment you've denoted with "after", but what's your "before"? The
>> absence of "-enable-kvm", or something else?
>
> Ah, I see.
>
> So 'before' is the initial dump of the EFI regions, very early during
> boot:

<snip>

> and with 'after' I've denoted the dump of the EFI regions a second time,
> a bit later, when we enter efi_enter_virtual_mode():

<snip>

>
> during the *same* boot.
>
> So, it is one boot but two dumps of the EFI regions. And yes, I'm
> booting with the 'kvm' executable which has '-enable-kvm'

Okay. Thanks for clarifying it.

>
>> What's the purpose of that series? Can you please provide a link (if
>> you posted versions of it already)?
>
> Not yet posted but working on it.
>
> The idea is to map the runtime regions at stable addresses so that when
> we kexec a kernel, it can use runtime services too. And we have to do
> that because of the braindead design of SetVirtualAddressMap() being
> callable only once per boot.

I wouldn't call the design of SetVirtualAddressMap() braindead.

I'd rather call kexec unique and somewhat unexpected :)

>
>>> So what basically happens is the end boundary of the region becomes the
>>> start, practically turning it into a 0-size one.
>>
>> ... and you guys suspect that some firmware code is responsible, code
>> that runs between the initial memory map dump, and efi_enter_virtual_mode():
>>
>> https://lkml.org/lkml/2013/7/31/550
>
> I wouldn't wonder if we f*cked it up again like the last time. I'll give
> it a long hard look.

Ah sorry, by "and you guys suspect" I didn't mean to imply anything
between the lines, I was simply trying to ascertain your working idea :)

Laszlo

2013-08-05 14:40:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote:
> I wouldn't call the design of SetVirtualAddressMap() braindead.

Ok, I've always wondered and you could probably shed some light on the
matter: why is SetVirtualAddressMap() a call-once only? Why can't I
simply call it again and update the mappings?

> I'd rather call kexec unique and somewhat unexpected :)

In all fairness, it was there before UEFI, AFAICT.

> > I wouldn't wonder if we f*cked it up again like the last time. I'll give
> > it a long hard look.
>
> Ah sorry, by "and you guys suspect" I didn't mean to imply anything
> between the lines, I was simply trying to ascertain your working idea :)

As long as we get to the bottom of this, we're all fine. And I'd
pretty much expect everyone who is dealing with EFI to have grown a
sufficiently thick skin before starting to do so, so don't worry.

:-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 15:13:55

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/13 16:40, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote:
>> I wouldn't call the design of SetVirtualAddressMap() braindead.
>
> Ok, I've always wondered and you could probably shed some light on the
> matter: why is SetVirtualAddressMap() a call-once only? Why can't I
> simply call it again and update the mappings?

The current implementation (how pointers are converted) probably doesn't
accommodate a second call.

Of course you want to know why SetVirtualAddressMap() was designed like
that... I didn't participate in the design so I don't know :)

But, as I said, a kernel directly executing another kernel is an
unexpected idea. IMHO the second kernel in question doesn't fit the UEFI
phases at all. The OS booted like that (ie. the OS whose kernel is the
2nd (=kexec) kernel) never goes through SEC, PEI, DXE, BDS.

SetVirtualAddressMap() is a firmware interface, but the kexec OS
(including its private boot loader and kernel) are not loaded by firmware.

>
>> I'd rather call kexec unique and somewhat unexpected :)
>
> In all fairness, it was there before UEFI, AFAICT.

That doesn't matter as long as the UEFI designers aren't aware of it :)

(Who should have made whom aware, ie. Linux people approaching UEFI
people, or UEFI people exploring Linux, is a separate topic. As always
I'm apolitical about UEFI; I'm not arguing for it or against it. My
feeble efforts for improving OVMF and interfacing code are motivated by
my employer, not my world view, but as a side-effect of working with the
code I can't help but notice some nice things in edk2 and appreciate
them :))

>>> I wouldn't wonder if we f*cked it up again like the last time. I'll give
>>> it a long hard look.
>>
>> Ah sorry, by "and you guys suspect" I didn't mean to imply anything
>> between the lines, I was simply trying to ascertain your working idea :)
>
> As long as we get to the bottom of this, we're all fine. And I'd
> pretty much expect everyone who is dealing with EFI to have grown a
> sufficiently thick skin before starting to do so, so don't worry.
>
> :-)

This is a unique opportunity for me to point the following. (Unique
because it wasn't me bringing up the thick skin thing :)) My skin is
*very thin*. It's not even there, you could say. So, if I mess up,
please don't insult me. (As explained before, my own language above
wasn't even tongue-in-cheek.) Insult my code or my analysis pls.

BTW there's another point I'd like to ask about -- you're saying you see
the region corruption during the same boot, from the first (early)
memmap dump to the second one (when just about to enter virtual mode).
But, is this one boot the very first boot, or the kexec one?

Thanks!
Laszlo

2013-08-05 15:34:55

by James Bottomley

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, 2013-08-05 at 17:15 +0200, Laszlo Ersek wrote:
> On 08/05/13 16:40, Borislav Petkov wrote:
> > On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote:
> >> I wouldn't call the design of SetVirtualAddressMap() braindead.
> >
> > Ok, I've always wondered and you could probably shed some light on the
> > matter: why is SetVirtualAddressMap() a call-once only? Why can't I
> > simply call it again and update the mappings?
>
> The current implementation (how pointers are converted) probably doesn't
> accommodate a second call.

Having actually looked at the code (trying to find why we were getting
an unconverted pointer), I second that. However, the ugliness of the
massive pointer chase should have been an indication that something was
not quite right architecturally (or implementation wise) with
SetVirtualAddressMap().

> Of course you want to know why SetVirtualAddressMap() was designed like
> that... I didn't participate in the design so I don't know :)
>
> But, as I said, a kernel directly executing another kernel is an
> unexpected idea. IMHO the second kernel in question doesn't fit the UEFI
> phases at all. The OS booted like that (ie. the OS whose kernel is the
> 2nd (=kexec) kernel) never goes through SEC, PEI, DXE, BDS.

That thinking is a bit last century (not that I'm blaming you for it, it
seems to be ingrained in the way UEFI sometimes goes about things) ...
in the old days, DOS was bootstrapped by the 512 byte jump code in a
well known sector. In the current century, almost every OS is
bootstrapped by a sophisticated loader, which is effectively another OS
(if you don't believe this, try looking at the grub source code one
day); it's a short step from this to one OS booting another, and that's
really what kexec is. The utility of kexec has proven itself over the
past couple of decades or so by allowing us to dump (kexec to a dump
kernel), short circuit the boot process (simply re-kexec the kernel on
crash) and now do rebootless upgrades (checkpoint the userspace and
kexec to the new kernel). It's not even unique to Linux: Solaris used a
hidden kexec system call to do live upgrades as well and I believe
several other UNIXs have this feature.

James

2013-08-05 16:12:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 05:15:38PM +0200, Laszlo Ersek wrote:
> The current implementation (how pointers are converted) probably doesn't
> accommodate a second call.
>
> Of course you want to know why SetVirtualAddressMap() was designed like
> that... I didn't participate in the design so I don't know :)
>
> But, as I said, a kernel directly executing another kernel is an
> unexpected idea. IMHO the second kernel in question doesn't fit the UEFI
> phases at all. The OS booted like that (ie. the OS whose kernel is the
> 2nd (=kexec) kernel) never goes through SEC, PEI, DXE, BDS.

Yes, the thing is, imposing unnecessary restrictions is very
counterproductive. And kexec is just an example here - if
SetVirtualAddressMap was callable an arbitrary number of times, this
whole work I'm doing is unnecessary. So I'm jumping through hoops just
to accomodate a braindead design.

This is what I cannot fathom in the face of people praising UEFI as the
solution to all problems. Where in fact it causes more, and needlessly
at that.

> That doesn't matter as long as the UEFI designers aren't aware of it :)

Well, it wouldn't have hurt if they at least looked around what's out
there...

> (Who should have made whom aware, ie. Linux people approaching UEFI
> people, or UEFI people exploring Linux, is a separate topic. As always
> I'm apolitical about UEFI; I'm not arguing for it or against it. My
> feeble efforts for improving OVMF and interfacing code are motivated by
> my employer, not my world view, but as a side-effect of working with the
> code I can't help but notice some nice things in edk2 and appreciate
> them :))

No, I completely understand. I was simply asking whether you've managed
to see an aspect which made sense for SetVirtualAddressMap to be
callable only once and to enlighten me about it because I can't see one
so far.

> Insult my code or my analysis pls.

I won't and I don't need to insult anybody or anything. :)

> BTW there's another point I'd like to ask about -- you're saying you
> see the region corruption during the same boot, from the first (early)
> memmap dump to the second one (when just about to enter virtual mode).
> But, is this one boot the very first boot, or the kexec one?

No, kexec is not even involved yet. If you look at the timestamps,
there's 0.005 seconds between the two dumps during the *same* kernel
booting on the machine, baremetal, straight from grub.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 16:25:43

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

Apologies in advance for my response because it diverges from the
technical stuff.

On 08/05/13 17:34, James Bottomley wrote:
> On Mon, 2013-08-05 at 17:15 +0200, Laszlo Ersek wrote:
>> On 08/05/13 16:40, Borislav Petkov wrote:
>>> On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote:
>>>> I wouldn't call the design of SetVirtualAddressMap() braindead.
>>>
>>> Ok, I've always wondered and you could probably shed some light on the
>>> matter: why is SetVirtualAddressMap() a call-once only? Why can't I
>>> simply call it again and update the mappings?
>>
>> The current implementation (how pointers are converted) probably doesn't
>> accommodate a second call.
>
> Having actually looked at the code (trying to find why we were getting
> an unconverted pointer), I second that.

I have also "actually" looked at the code, just not right now.

> However, the ugliness of the
> massive pointer chase should have been an indication that something was
> not quite right architecturally (or implementation wise) with
> SetVirtualAddressMap().
>
>> Of course you want to know why SetVirtualAddressMap() was designed like
>> that... I didn't participate in the design so I don't know :)
>>
>> But, as I said, a kernel directly executing another kernel is an
>> unexpected idea. IMHO the second kernel in question doesn't fit the UEFI
>> phases at all. The OS booted like that (ie. the OS whose kernel is the
>> 2nd (=kexec) kernel) never goes through SEC, PEI, DXE, BDS.
>
> That thinking is a bit last century (not that I'm blaming you for it, it
> seems to be ingrained in the way UEFI sometimes goes about things) ...

It is not *my* thinking. (This is a recurrent pattern and it drives me mad.)

I just tried to come up with a plausible explanation for why things have
come to be like this. edk2-devel is CC'd; the above was an invitation
for others who *have* participated in the design phase to chime in.

I put on the UEFI hat for the sake of argument.

What many people don't seem to understand (and I have honestly no idea
if this includes you) is that in order to write edk2 code, in order to
post bugfixes with a straight face and with a clear conscience, a person
with a FLOSS / Linux / UNIX background has to *brainwash*
himself/herself into edk2. You must force yourself to identify with the
code to some extent in order to be able to track it, to follow the
original author's train of thought. You can't afford to reject edk2 100%
if you want to effect gradual improvements.

When I got the assignment, this brainwashing I forced upon myself took
me months. The reward for it now is that "my thinking is last century".

Again, I *did not* participate in the design of UEFI. I must simply
accept its main design ideas, sometimes even make (hopefully plausible)
guesses at them, to be able to reason about it.

> in the old days, DOS was bootstrapped by the 512 byte jump code in a
> well known sector. In the current century, almost every OS is
> bootstrapped by a sophisticated loader, which is effectively another OS
> (if you don't believe this, try looking at the grub source code one
> day);

I've debugged grub and written patches for it. I believe it.

> it's a short step from this to one OS booting another, and that's
> really what kexec is. The utility of kexec has proven itself over the
> past couple of decades or so by allowing us to dump (kexec to a dump
> kernel),

I've debugged and tested kdump / kexec several times, mainly during my
RHEL-5 Xen days at RH. There's no need to convince me about it.

> short circuit the boot process (simply re-kexec the kernel on
> crash) and now do rebootless upgrades (checkpoint the userspace and
> kexec to the new kernel). It's not even unique to Linux: Solaris used a
> hidden kexec system call to do live upgrades as well and I believe
> several other UNIXs have this feature.

I was in fact missing this bit about Solaris and other UNIXen.

In any case, by calling kexec "unique/unexpected", I meant that Windows
probably doesn't have it, which was likely the reason the UEFI designers
didn't consider it.

I did not spell out this windows-based argument and went for the polite
route instead because I know where justifying anything with "windows
market dominance / mindshare" leads: more application of the adjective
"braindead".

I'm aware of the love for "management by perkele" on lkml. I oppose it.
(At least on edk2-devel, and I'm not subscribed to lkml partially
because of it.) My OVMF work depends on the goodwill of people on
edk2-devel, and I won't ignite flames that endanger that.

So, my unwashed *guess* is that SetVirtualAddressMap() was designed like
this because Windows doesn't have a need or use for a second
SetVirtualAddressMap() call. (Not sure about non-Linux kernels on Itanium.)

In any case, calling this status quo "last century", "braindead" or
worse won't make you many friends on edk2-devel. The practice of
attacking people whose input you want doesn't look very efficient. And I
certainly do not count myself among those people -- I'm just trying to
help with my limited skills & experience. You want the input of people
*much smarter* than I am. As I gather, many of them don't give a rat's
ass about Linux. (Notice the lack of tumultus in this thread?) Don't
start by alienating them.

"UEFI is an abomination, now help me work with it" is an attitude that
lkml people need to dispense with, in their own best interest.
edk2-devel doesn't appear to be the place where people put up with the
abuse just to get their patches into Linux.

(FWIW I'll help with the issue at hand as much as I can. It may not be
much, apologies.)

Cheers,
Laszlo

2013-08-05 16:39:36

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/13 18:12, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 05:15:38PM +0200, Laszlo Ersek wrote:
>> The current implementation (how pointers are converted) probably doesn't
>> accommodate a second call.
>>
>> Of course you want to know why SetVirtualAddressMap() was designed like
>> that... I didn't participate in the design so I don't know :)
>>
>> But, as I said, a kernel directly executing another kernel is an
>> unexpected idea. IMHO the second kernel in question doesn't fit the UEFI
>> phases at all. The OS booted like that (ie. the OS whose kernel is the
>> 2nd (=kexec) kernel) never goes through SEC, PEI, DXE, BDS.
>
> Yes, the thing is, imposing unnecessary restrictions is very
> counterproductive. And kexec is just an example here - if
> SetVirtualAddressMap was callable an arbitrary number of times, this
> whole work I'm doing is unnecessary. So I'm jumping through hoops just
> to accomodate a braindead design.

I doubt it was a deliberate restriction. More like, there was no
incentive (... that the designers were aware of) *not* to design
something easy (or easier) to implement. Your use case has come later.

> This is what I cannot fathom in the face of people praising UEFI as the
> solution to all problems.

I agree that such people exist. I'm not one of them.

>> BTW there's another point I'd like to ask about -- you're saying you
>> see the region corruption during the same boot, from the first (early)
>> memmap dump to the second one (when just about to enter virtual mode).
>> But, is this one boot the very first boot, or the kexec one?
>
> No, kexec is not even involved yet. If you look at the timestamps,
> there's 0.005 seconds between the two dumps during the *same* kernel
> booting on the machine, baremetal, straight from grub.

I didn't realize the timestamps survive kexec. (As far as I remember the
kernels I played with kexec on didn't have the automatic timestamps yet
in dmesg, but I might have messed up just as well...)

Laszlo

2013-08-05 16:47:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 06:41:20PM +0200, Laszlo Ersek wrote:
> I didn't realize the timestamps survive kexec. (As far as I remember
> the kernels I played with kexec on didn't have the automatic
> timestamps yet in dmesg, but I might have messed up just as well...)

No, no, no, kexec is not involved at all.

Here's the whole dmesg up until efi_enter_virtual_map. When we have entered
efi_enter_virtual_mode, the region has changed from

[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)

to

[ 0.023004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)


And yes, I still need to audit whether the kernel actually does that
change. I'm still looking...


[=3h[=3h[=3h[=3h[=3h[=3h[=3hearly console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.10.0-rc7+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #9 SMP PREEMPT Mon Aug 5 16:27:00 CEST 2013
[ 0.000000] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007e667fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007e668000-0x000000007e691fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007e692000-0x000000007fb11fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007fb12000-0x000000007fb69fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007fb6a000-0x000000007fb71fff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000007fb72000-0x000000007fb75fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000007fb76000-0x000000007ffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff] reserved
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] efi: EFI v2.31 by EDK II
[ 0.000000] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014
[ 0.000000] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.000000] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.000000] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.000000] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.000000] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.000000] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.000000] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.000000] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.000000] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.000000] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.000000] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
[ 0.000000] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.000000] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.000000] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.000000] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.000000] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.000000] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.000000] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.000000] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.000000] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.000000] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.000000] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.000000] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.000000] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.000000] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.000000] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.000000] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.000000] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] No AGP bridge found
[ 0.000000] e820: last_pfn = 0x7ffe0 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-FFFFF uncachable
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 0000000000 mask FF80000000 write-back
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x70406, new 0x7010600070106
[ 0.000000] efi: efi_reserve_boot_services: start: 0x800000, size: 0x800000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7c000000, size: 0x20000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e0cd000, size: 0x490000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e55d000, size: 0x3f000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e59c000, size: 0x4000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e5a0000, size: 0xc8000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e692000, size: 0x1300000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7f994000, size: 0x17e000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7fb76000, size: 0x46a000
[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] [mem 0x00000000-0x000fffff] page 4k
[ 0.000000] BRK [0x036be000, 0x036befff] PGTABLE
[ 0.000000] BRK [0x036bf000, 0x036bffff] PGTABLE
[ 0.000000] BRK [0x036c0000, 0x036c0fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7de00000-0x7dffffff]
[ 0.000000] [mem 0x7de00000-0x7dffffff] page 2M
[ 0.000000] BRK [0x036c1000, 0x036c1fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7c000000-0x7ddfffff]
[ 0.000000] [mem 0x7c000000-0x7ddfffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
[ 0.000000] [mem 0x00100000-0x001fffff] page 4k
[ 0.000000] [mem 0x00200000-0x7bffffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x7e000000-0x7e667fff]
[ 0.000000] [mem 0x7e000000-0x7e5fffff] page 2M
[ 0.000000] [mem 0x7e600000-0x7e667fff] page 4k
[ 0.000000] BRK [0x036c2000, 0x036c2fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] [mem 0x7e692000-0x7e7fffff] page 4k
[ 0.000000] [mem 0x7e800000-0x7f9fffff] page 2M
[ 0.000000] [mem 0x7fa00000-0x7fb11fff] page 4k
[ 0.000000] init_memory_mapping: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] [mem 0x7fb76000-0x7fbfffff] page 4k
[ 0.000000] [mem 0x7fc00000-0x7fdfffff] page 2M
[ 0.000000] [mem 0x7fe00000-0x7ffdffff] page 4k
[ 0.000000] log_buf_len: 16777216
[ 0.000000] early log buf free: 2089788(99%)
[ 0.000000] ACPI: RSDP 000000007fb71014 00024 (v02 OVMF )
[ 0.000000] ACPI: XSDT 000000007fb700e8 0003C (v01 OVMF OVMFEDK2 20130221 01000013)
[ 0.000000] ACPI: FACP 000000007fb6f000 000F4 (v03 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: DSDT 000000007fb6d000 00D57 (v01 INTEL OVMF 00000004 INTL 20100528)
[ 0.000000] ACPI: FACS 000000007fb75000 00040
[ 0.000000] ACPI: APIC 000000007fb6e000 00078 (v01 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: SSDT 000000007fb6c000 00057 (v01 REDHAT OVMF 00000001 INTL 20100528)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffdffff]
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x7ffdffff]
[ 0.000000] NODE_DATA [mem 0x7e0ca000-0x7e0cbfff]
[ 0.000000] [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007a000000-ffff88007bffffff] on node 0
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009ffff]
[ 0.000000] node 0: [mem 0x00100000-0x7e667fff]
[ 0.000000] node 0: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] node 0: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] On node 0 totalpages: 524017
[ 0.000000] DMA zone: 64 pages used for memmap
[ 0.000000] DMA zone: 2070 pages reserved
[ 0.000000] DMA zone: 3999 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 8128 pages used for memmap
[ 0.000000] DMA32 zone: 520018 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 40
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[ 0.000000] PM: Registered nosave memory: 000000007e668000 - 000000007e692000
[ 0.000000] PM: Registered nosave memory: 000000007fb12000 - 000000007fb6a000
[ 0.000000] PM: Registered nosave memory: 000000007fb6a000 - 000000007fb72000
[ 0.000000] PM: Registered nosave memory: 000000007fb72000 - 000000007fb76000
[ 0.000000] e820: [mem 0x80000000-0xffffffff] available for PCI devices
[ 0.000000] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007ce00000 s84736 r8192 d21760 u2097152
[ 0.000000] pcpu-alloc: s84736 r8192 d21760 u2097152 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 513755
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 1983624k/2097024k available (5838k kernel code, 956k absent, 112444k reserved, 5274k data, 1080k init)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[ 0.000000] NR_IRQS:4352 nr_irqs:256 16
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled, bootconsole disabled
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.10.0-rc7+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #9 SMP PREEMPT Mon Aug 5 16:27:00 CEST 2013
[ 0.000000] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007e667fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007e668000-0x000000007e691fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007e692000-0x000000007fb11fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007fb12000-0x000000007fb69fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007fb6a000-0x000000007fb71fff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000007fb72000-0x000000007fb75fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000007fb76000-0x000000007ffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff] reserved
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] efi: EFI v2.31 by EDK II
[ 0.000000] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014
[ 0.000000] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.000000] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.000000] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.000000] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.000000] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.000000] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.000000] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.000000] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.000000] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.000000] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.000000] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
[ 0.000000] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.000000] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.000000] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.000000] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.000000] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.000000] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.000000] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.000000] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.000000] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.000000] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.000000] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.000000] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.000000] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.000000] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.000000] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.000000] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.000000] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] No AGP bridge found
[ 0.000000] e820: last_pfn = 0x7ffe0 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-FFFFF uncachable
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 0000000000 mask FF80000000 write-back
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x70406, new 0x7010600070106
[ 0.000000] efi: efi_reserve_boot_services: start: 0x800000, size: 0x800000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7c000000, size: 0x20000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e0cd000, size: 0x490000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e55d000, size: 0x3f000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e59c000, size: 0x4000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e5a0000, size: 0xc8000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e692000, size: 0x1300000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7f994000, size: 0x17e000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7fb76000, size: 0x46a000
[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] [mem 0x00000000-0x000fffff] page 4k
[ 0.000000] BRK [0x036be000, 0x036befff] PGTABLE
[ 0.000000] BRK [0x036bf000, 0x036bffff] PGTABLE
[ 0.000000] BRK [0x036c0000, 0x036c0fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7de00000-0x7dffffff]
[ 0.000000] [mem 0x7de00000-0x7dffffff] page 2M
[ 0.000000] BRK [0x036c1000, 0x036c1fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7c000000-0x7ddfffff]
[ 0.000000] [mem 0x7c000000-0x7ddfffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
[ 0.000000] [mem 0x00100000-0x001fffff] page 4k
[ 0.000000] [mem 0x00200000-0x7bffffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x7e000000-0x7e667fff]
[ 0.000000] [mem 0x7e000000-0x7e5fffff] page 2M
[ 0.000000] [mem 0x7e600000-0x7e667fff] page 4k
[ 0.000000] BRK [0x036c2000, 0x036c2fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] [mem 0x7e692000-0x7e7fffff] page 4k
[ 0.000000] [mem 0x7e800000-0x7f9fffff] page 2M
[ 0.000000] [mem 0x7fa00000-0x7fb11fff] page 4k
[ 0.000000] init_memory_mapping: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] [mem 0x7fb76000-0x7fbfffff] page 4k
[ 0.000000] [mem 0x7fc00000-0x7fdfffff] page 2M
[ 0.000000] [mem 0x7fe00000-0x7ffdffff] page 4k
[ 0.000000] log_buf_len: 16777216
[ 0.000000] early log buf free: 2089788(99%)
[ 0.000000] ACPI: RSDP 000000007fb71014 00024 (v02 OVMF )
[ 0.000000] ACPI: XSDT 000000007fb700e8 0003C (v01 OVMF OVMFEDK2 20130221 01000013)
[ 0.000000] ACPI: FACP 000000007fb6f000 000F4 (v03 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: DSDT 000000007fb6d000 00D57 (v01 INTEL OVMF 00000004 INTL 20100528)
[ 0.000000] ACPI: FACS 000000007fb75000 00040
[ 0.000000] ACPI: APIC 000000007fb6e000 00078 (v01 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: SSDT 000000007fb6c000 00057 (v01 REDHAT OVMF 00000001 INTL 20100528)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffdffff]
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x7ffdffff]
[ 0.000000] NODE_DATA [mem 0x7e0ca000-0x7e0cbfff]
[ 0.000000] [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007a000000-ffff88007bffffff] on node 0
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009ffff]
[ 0.000000] node 0: [mem 0x00100000-0x7e667fff]
[ 0.000000] node 0: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] node 0: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] On node 0 totalpages: 524017
[ 0.000000] DMA zone: 64 pages used for memmap
[ 0.000000] DMA zone: 2070 pages reserved
[ 0.000000] DMA zone: 3999 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 8128 pages used for memmap
[ 0.000000] DMA32 zone: 520018 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 40
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[ 0.000000] PM: Registered nosave memory: 000000007e668000 - 000000007e692000
[ 0.000000] PM: Registered nosave memory: 000000007fb12000 - 000000007fb6a000
[ 0.000000] PM: Registered nosave memory: 000000007fb6a000 - 000000007fb72000
[ 0.000000] PM: Registered nosave memory: 000000007fb72000 - 000000007fb76000
[ 0.000000] e820: [mem 0x80000000-0xffffffff] available for PCI devices
[ 0.000000] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007ce00000 s84736 r8192 d21760 u2097152
[ 0.000000] pcpu-alloc: s84736 r8192 d21760 u2097152 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 513755
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 1983624k/2097024k available (5838k kernel code, 956k absent, 112444k reserved, 5274k data, 1080k init)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[ 0.000000] NR_IRQS:4352 nr_irqs:256 16
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled, bootconsole disabled
[ 0.000000] console [ttyS0] enabled
[ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000] ... MAX_LOCK_DEPTH: 48
[ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
[ 0.000000] ... CLASSHASH_SIZE: 4096
[ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384
[ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768
[ 0.000000] ... CHAINHASH_SIZE: 16384
[ 0.000000] memory used by lock dependency info: 5855 kB
[ 0.000000] per task-struct memory footprint: 1920 bytes
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2893.477 MHz processor
[ 0.004001] Calibrating delay loop (skipped), value calculated using timer frequency.. 5786.95 BogoMIPS (lpj=2893477)
[ 0.006004] pid_max: default: 32768 minimum: 301
[ 0.006776] efi: efi_enter_virtual_mode: enter
[ 0.007004] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.009004] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.010004] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.011004] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.013004] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.014004] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.016004] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.017004] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.019004] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.020004] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.021004] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.023004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
[ 0.024004] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.026004] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.027004] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.028004] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.030004] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.031004] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.033004] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.035004] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.036004] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.037004] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.039004] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.040004] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.042004] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.043004] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.045004] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.046004] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.048004] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 16:50:21

by Andrew Fish

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region


On Aug 5, 2013, at 7:40 AM, Borislav Petkov <[email protected]> wrote:

> On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote:
>> I wouldn't call the design of SetVirtualAddressMap() braindead.
>
> Ok, I've always wondered and you could probably shed some light on the
> matter: why is SetVirtualAddressMap() a call-once only? Why can't I
> simply call it again and update the mappings?
>
>> I'd rather call kexec unique and somewhat unexpected :)
>
> In all fairness, it was there before UEFI, AFAICT.
>

AFAICT EFI pre-dates kexec merge into mainline by a number of years as SetVirtualaddressMap() was part of EFI 1.0 (previous millennium)

The EFI to UEFI conversion was placing EFI 1.10 into an industry standard, UEFI 2.0. UEFI is an industry standard so some one just needs to make a proposal to update the spec. The edk2 open source project is not part of the standards body so complaining on this mailing list is not going to get anything changed.

The conversion of C code to run from address A to address B is a non trivial operation, and a single conversion is bad enough. The infrastructure code required to do the conversion from physical to virtual addressing currently only runs from physical mode, so a call to change virtual address mappings from virtual mode is more complex than the current scheme.
In general you don't want complexity in the locked NOR FLASH of the platform that can only be updated by the platform vendor. Even if the platform firmware is easy to update you want to have complexity in the OS as it is easier to change and easier to get right.

Thanks,

Andrew Fish

>>> I wouldn't wonder if we f*cked it up again like the last time. I'll give
>>> it a long hard look.
>>
>> Ah sorry, by "and you guys suspect" I didn't mean to imply anything
>> between the lines, I was simply trying to ascertain your working idea :)
>
> As long as we get to the bottom of this, we're all fine. And I'd
> pretty much expect everyone who is dealing with EFI to have grown a
> sufficiently thick skin before starting to do so, so don't worry.
>
> :-)
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/edk2-devel

2013-08-05 17:01:01

by Kinney, Michael D

[permalink] [raw]
Subject: RE: [edk2] Corrupted EFI region

Boris,

A memory map entry with zero size does not look right to me.

The memory map passed into SetVirtualAddressMap() must contain the exact same set of memory map entries that existed when ExitBootServices() was called with a return result of EFI_SUCCESS.

When you are showing comparisons of memory maps, are you showing the ExitBootServices() one and the SeVirtualAddressMap() one? If the memory maps are not identical, then somehow the memory map is being modified, and we need to figure that out.

If the ExitBootServices() memory map has the zero sized entry, then we need to see how GetMemoryMap() is returning a zero sized entry. It is not clear that a zero sized entry would actually break anything, but it is a good idea to root cause that issue and make sure those types of memory map entries are not pass from the FW to the OS.

Thanks,

Mike


-----Original Message-----
From: Borislav Petkov [mailto:[email protected]]
Sent: Monday, August 05, 2013 9:48 AM
To: Laszlo Ersek
Cc: [email protected]; Gleb Natapov; [email protected]; lkml; David Woodhouse
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 06:41:20PM +0200, Laszlo Ersek wrote:
> I didn't realize the timestamps survive kexec. (As far as I remember
> the kernels I played with kexec on didn't have the automatic
> timestamps yet in dmesg, but I might have messed up just as well...)

No, no, no, kexec is not involved at all.

Here's the whole dmesg up until efi_enter_virtual_map. When we have entered
efi_enter_virtual_mode, the region has changed from

[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)

to

[ 0.023004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)


And yes, I still need to audit whether the kernel actually does that
change. I'm still looking...


[=3h[=3h[=3h[=3h[=3h[=3h[=3hearly console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.10.0-rc7+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #9 SMP PREEMPT Mon Aug 5 16:27:00 CEST 2013
[ 0.000000] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007e667fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007e668000-0x000000007e691fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007e692000-0x000000007fb11fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007fb12000-0x000000007fb69fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007fb6a000-0x000000007fb71fff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000007fb72000-0x000000007fb75fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000007fb76000-0x000000007ffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff] reserved
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] efi: EFI v2.31 by EDK II
[ 0.000000] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014
[ 0.000000] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.000000] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.000000] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.000000] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.000000] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.000000] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.000000] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.000000] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.000000] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.000000] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.000000] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
[ 0.000000] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.000000] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.000000] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.000000] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.000000] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.000000] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.000000] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.000000] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.000000] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.000000] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.000000] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.000000] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.000000] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.000000] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.000000] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.000000] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.000000] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] No AGP bridge found
[ 0.000000] e820: last_pfn = 0x7ffe0 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-FFFFF uncachable
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 0000000000 mask FF80000000 write-back
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x70406, new 0x7010600070106
[ 0.000000] efi: efi_reserve_boot_services: start: 0x800000, size: 0x800000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7c000000, size: 0x20000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e0cd000, size: 0x490000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e55d000, size: 0x3f000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e59c000, size: 0x4000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e5a0000, size: 0xc8000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e692000, size: 0x1300000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7f994000, size: 0x17e000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7fb76000, size: 0x46a000
[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] [mem 0x00000000-0x000fffff] page 4k
[ 0.000000] BRK [0x036be000, 0x036befff] PGTABLE
[ 0.000000] BRK [0x036bf000, 0x036bffff] PGTABLE
[ 0.000000] BRK [0x036c0000, 0x036c0fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7de00000-0x7dffffff]
[ 0.000000] [mem 0x7de00000-0x7dffffff] page 2M
[ 0.000000] BRK [0x036c1000, 0x036c1fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7c000000-0x7ddfffff]
[ 0.000000] [mem 0x7c000000-0x7ddfffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
[ 0.000000] [mem 0x00100000-0x001fffff] page 4k
[ 0.000000] [mem 0x00200000-0x7bffffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x7e000000-0x7e667fff]
[ 0.000000] [mem 0x7e000000-0x7e5fffff] page 2M
[ 0.000000] [mem 0x7e600000-0x7e667fff] page 4k
[ 0.000000] BRK [0x036c2000, 0x036c2fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] [mem 0x7e692000-0x7e7fffff] page 4k
[ 0.000000] [mem 0x7e800000-0x7f9fffff] page 2M
[ 0.000000] [mem 0x7fa00000-0x7fb11fff] page 4k
[ 0.000000] init_memory_mapping: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] [mem 0x7fb76000-0x7fbfffff] page 4k
[ 0.000000] [mem 0x7fc00000-0x7fdfffff] page 2M
[ 0.000000] [mem 0x7fe00000-0x7ffdffff] page 4k
[ 0.000000] log_buf_len: 16777216
[ 0.000000] early log buf free: 2089788(99%)
[ 0.000000] ACPI: RSDP 000000007fb71014 00024 (v02 OVMF )
[ 0.000000] ACPI: XSDT 000000007fb700e8 0003C (v01 OVMF OVMFEDK2 20130221 01000013)
[ 0.000000] ACPI: FACP 000000007fb6f000 000F4 (v03 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: DSDT 000000007fb6d000 00D57 (v01 INTEL OVMF 00000004 INTL 20100528)
[ 0.000000] ACPI: FACS 000000007fb75000 00040
[ 0.000000] ACPI: APIC 000000007fb6e000 00078 (v01 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: SSDT 000000007fb6c000 00057 (v01 REDHAT OVMF 00000001 INTL 20100528)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffdffff]
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x7ffdffff]
[ 0.000000] NODE_DATA [mem 0x7e0ca000-0x7e0cbfff]
[ 0.000000] [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007a000000-ffff88007bffffff] on node 0
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009ffff]
[ 0.000000] node 0: [mem 0x00100000-0x7e667fff]
[ 0.000000] node 0: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] node 0: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] On node 0 totalpages: 524017
[ 0.000000] DMA zone: 64 pages used for memmap
[ 0.000000] DMA zone: 2070 pages reserved
[ 0.000000] DMA zone: 3999 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 8128 pages used for memmap
[ 0.000000] DMA32 zone: 520018 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 40
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[ 0.000000] PM: Registered nosave memory: 000000007e668000 - 000000007e692000
[ 0.000000] PM: Registered nosave memory: 000000007fb12000 - 000000007fb6a000
[ 0.000000] PM: Registered nosave memory: 000000007fb6a000 - 000000007fb72000
[ 0.000000] PM: Registered nosave memory: 000000007fb72000 - 000000007fb76000
[ 0.000000] e820: [mem 0x80000000-0xffffffff] available for PCI devices
[ 0.000000] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007ce00000 s84736 r8192 d21760 u2097152
[ 0.000000] pcpu-alloc: s84736 r8192 d21760 u2097152 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 513755
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 1983624k/2097024k available (5838k kernel code, 956k absent, 112444k reserved, 5274k data, 1080k init)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[ 0.000000] NR_IRQS:4352 nr_irqs:256 16
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled, bootconsole disabled
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.10.0-rc7+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #9 SMP PREEMPT Mon Aug 5 16:27:00 CEST 2013
[ 0.000000] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007e667fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007e668000-0x000000007e691fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007e692000-0x000000007fb11fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007fb12000-0x000000007fb69fff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007fb6a000-0x000000007fb71fff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000007fb72000-0x000000007fb75fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000007fb76000-0x000000007ffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff] reserved
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] efi: EFI v2.31 by EDK II
[ 0.000000] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014
[ 0.000000] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.000000] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.000000] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.000000] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.000000] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.000000] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.000000] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.000000] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.000000] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.000000] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.000000] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
[ 0.000000] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.000000] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.000000] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.000000] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.000000] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.000000] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.000000] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.000000] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.000000] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.000000] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.000000] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.000000] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.000000] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.000000] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.000000] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.000000] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.000000] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] No AGP bridge found
[ 0.000000] e820: last_pfn = 0x7ffe0 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-FFFFF uncachable
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 0000000000 mask FF80000000 write-back
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x70406, new 0x7010600070106
[ 0.000000] efi: efi_reserve_boot_services: start: 0x800000, size: 0x800000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7c000000, size: 0x20000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e0cd000, size: 0x490000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e55d000, size: 0x3f000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e59c000, size: 0x4000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e5a0000, size: 0xc8000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7e692000, size: 0x1300000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7f994000, size: 0x17e000
[ 0.000000] efi: efi_reserve_boot_services: start: 0x7fb76000, size: 0x46a000
[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] [mem 0x00000000-0x000fffff] page 4k
[ 0.000000] BRK [0x036be000, 0x036befff] PGTABLE
[ 0.000000] BRK [0x036bf000, 0x036bffff] PGTABLE
[ 0.000000] BRK [0x036c0000, 0x036c0fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7de00000-0x7dffffff]
[ 0.000000] [mem 0x7de00000-0x7dffffff] page 2M
[ 0.000000] BRK [0x036c1000, 0x036c1fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7c000000-0x7ddfffff]
[ 0.000000] [mem 0x7c000000-0x7ddfffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
[ 0.000000] [mem 0x00100000-0x001fffff] page 4k
[ 0.000000] [mem 0x00200000-0x7bffffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x7e000000-0x7e667fff]
[ 0.000000] [mem 0x7e000000-0x7e5fffff] page 2M
[ 0.000000] [mem 0x7e600000-0x7e667fff] page 4k
[ 0.000000] BRK [0x036c2000, 0x036c2fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] [mem 0x7e692000-0x7e7fffff] page 4k
[ 0.000000] [mem 0x7e800000-0x7f9fffff] page 2M
[ 0.000000] [mem 0x7fa00000-0x7fb11fff] page 4k
[ 0.000000] init_memory_mapping: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] [mem 0x7fb76000-0x7fbfffff] page 4k
[ 0.000000] [mem 0x7fc00000-0x7fdfffff] page 2M
[ 0.000000] [mem 0x7fe00000-0x7ffdffff] page 4k
[ 0.000000] log_buf_len: 16777216
[ 0.000000] early log buf free: 2089788(99%)
[ 0.000000] ACPI: RSDP 000000007fb71014 00024 (v02 OVMF )
[ 0.000000] ACPI: XSDT 000000007fb700e8 0003C (v01 OVMF OVMFEDK2 20130221 01000013)
[ 0.000000] ACPI: FACP 000000007fb6f000 000F4 (v03 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: DSDT 000000007fb6d000 00D57 (v01 INTEL OVMF 00000004 INTL 20100528)
[ 0.000000] ACPI: FACS 000000007fb75000 00040
[ 0.000000] ACPI: APIC 000000007fb6e000 00078 (v01 OVMF OVMFEDK2 20130221 OVMF 00000099)
[ 0.000000] ACPI: SSDT 000000007fb6c000 00057 (v01 REDHAT OVMF 00000001 INTL 20100528)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffdffff]
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x7ffdffff]
[ 0.000000] NODE_DATA [mem 0x7e0ca000-0x7e0cbfff]
[ 0.000000] [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007a000000-ffff88007bffffff] on node 0
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009ffff]
[ 0.000000] node 0: [mem 0x00100000-0x7e667fff]
[ 0.000000] node 0: [mem 0x7e692000-0x7fb11fff]
[ 0.000000] node 0: [mem 0x7fb76000-0x7ffdffff]
[ 0.000000] On node 0 totalpages: 524017
[ 0.000000] DMA zone: 64 pages used for memmap
[ 0.000000] DMA zone: 2070 pages reserved
[ 0.000000] DMA zone: 3999 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 8128 pages used for memmap
[ 0.000000] DMA32 zone: 520018 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ5 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] ACPI: IRQ10 used by override.
[ 0.000000] ACPI: IRQ11 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 40
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[ 0.000000] PM: Registered nosave memory: 000000007e668000 - 000000007e692000
[ 0.000000] PM: Registered nosave memory: 000000007fb12000 - 000000007fb6a000
[ 0.000000] PM: Registered nosave memory: 000000007fb6a000 - 000000007fb72000
[ 0.000000] PM: Registered nosave memory: 000000007fb72000 - 000000007fb76000
[ 0.000000] e820: [mem 0x80000000-0xffffffff] available for PCI devices
[ 0.000000] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007ce00000 s84736 r8192 d21760 u2097152
[ 0.000000] pcpu-alloc: s84736 r8192 d21760 u2097152 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 513755
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 1983624k/2097024k available (5838k kernel code, 956k absent, 112444k reserved, 5274k data, 1080k init)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[ 0.000000] NR_IRQS:4352 nr_irqs:256 16
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled, bootconsole disabled
[ 0.000000] console [ttyS0] enabled
[ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000] ... MAX_LOCK_DEPTH: 48
[ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
[ 0.000000] ... CLASSHASH_SIZE: 4096
[ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384
[ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768
[ 0.000000] ... CHAINHASH_SIZE: 16384
[ 0.000000] memory used by lock dependency info: 5855 kB
[ 0.000000] per task-struct memory footprint: 1920 bytes
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2893.477 MHz processor
[ 0.004001] Calibrating delay loop (skipped), value calculated using timer frequency.. 5786.95 BogoMIPS (lpj=2893477)
[ 0.006004] pid_max: default: 32768 minimum: 301
[ 0.006776] efi: efi_enter_virtual_mode: enter
[ 0.007004] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
[ 0.009004] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
[ 0.010004] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
[ 0.011004] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
[ 0.013004] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
[ 0.014004] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e3000) (22MB)
[ 0.016004] efi: mem06: type=7, attr=0xf, range=[0x00000000036e3000-0x000000003fffb000) (969MB)
[ 0.017004] efi: mem07: type=2, attr=0xf, range=[0x000000003fffb000-0x0000000040000000) (0MB)
[ 0.019004] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
[ 0.020004] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
[ 0.021004] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007e0ad000) (32MB)
[ 0.023004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
[ 0.024004] efi: mem12: type=7, attr=0xf, range=[0x000000007e0cc000-0x000000007e0cd000) (0MB)
[ 0.026004] efi: mem13: type=4, attr=0xf, range=[0x000000007e0cd000-0x000000007e55d000) (4MB)
[ 0.027004] efi: mem14: type=3, attr=0xf, range=[0x000000007e55d000-0x000000007e59c000) (0MB)
[ 0.028004] efi: mem15: type=4, attr=0xf, range=[0x000000007e59c000-0x000000007e5a0000) (0MB)
[ 0.030004] efi: mem16: type=3, attr=0xf, range=[0x000000007e5a0000-0x000000007e668000) (0MB)
[ 0.031004] efi: mem17: type=5, attr=0x800000000000000f, range=[0x000000007e668000-0x000000007e67d000) (0MB)
[ 0.033004] efi: mem18: type=6, attr=0x800000000000000f, range=[0x000000007e67d000-0x000000007e692000) (0MB)
[ 0.035004] efi: mem19: type=4, attr=0xf, range=[0x000000007e692000-0x000000007f992000) (19MB)
[ 0.036004] efi: mem20: type=7, attr=0xf, range=[0x000000007f992000-0x000000007f994000) (0MB)
[ 0.037004] efi: mem21: type=3, attr=0xf, range=[0x000000007f994000-0x000000007fb12000) (1MB)
[ 0.039004] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007fb12000-0x000000007fb42000) (0MB)
[ 0.040004] efi: mem23: type=6, attr=0x800000000000000f, range=[0x000000007fb42000-0x000000007fb66000) (0MB)
[ 0.042004] efi: mem24: type=0, attr=0xf, range=[0x000000007fb66000-0x000000007fb6a000) (0MB)
[ 0.043004] efi: mem25: type=9, attr=0xf, range=[0x000000007fb6a000-0x000000007fb72000) (0MB)
[ 0.045004] efi: mem26: type=10, attr=0xf, range=[0x000000007fb72000-0x000000007fb76000) (0MB)
[ 0.046004] efi: mem27: type=4, attr=0xf, range=[0x000000007fb76000-0x000000007ffe0000) (4MB)
[ 0.048004] efi: mem28: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/edk2-devel

2013-08-05 17:08:09

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/13 18:47, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 06:41:20PM +0200, Laszlo Ersek wrote:
>> I didn't realize the timestamps survive kexec. (As far as I remember
>> the kernels I played with kexec on didn't have the automatic
>> timestamps yet in dmesg, but I might have messed up just as well...)
>
> No, no, no, kexec is not involved at all.

I understand. I just explained why I could not derive that fact from the
timestamps. You said,

> No, kexec is not even involved yet. If you look at the timestamps,
> there's 0.005 seconds between the two dumps during the *same* kernel
> booting on the machine, baremetal, straight from grub.

There are four memmap dumps:

(1) first boot, initial dump,
(2) first boot, dump when entering virtual mode,
(3) kexec boot, initial dump,
(4) kexec boot, dump when entering virtual mode.

I was aware that we were discussing a problem either between (1) and
(2), *or* between (3) and (4); I just didn't know inside "which pair".

I misunderstood your reply and thought that you were implying the
(1)+(2) pair by the low absolute timestamps. I assumed that (3)+(4)
would print low timestamps as well (due to the time offset starting from
zero in the kexec kernel too) and took your message as a correction to
that idea. But, you didn't say anything about the magnitude of the
timestamps, only about the differences between them.

Sorry for the noise, it's clear now that we're looking at (1)->(2).

Thanks
Laszlo

2013-08-05 18:12:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 08:50:17AM -0700, Andrew Fish wrote:
> AFAICT EFI pre-dates kexec merge into mainline by a number of years as
> SetVirtualaddressMap() was part of EFI 1.0 (previous millennium)

Ok, fair enough.

> The EFI to UEFI conversion was placing EFI 1.10 into an industry
> standard, UEFI 2.0. UEFI is an industry standard so some one just
> needs to make a proposal to update the spec. The edk2 open source
> project is not part of the standards body so complaining on this
> mailing list is not going to get anything changed.

Right, I don't think that even changing the spec would help - it would
actually make things worse because then we'd have to differentiate
between UEFI versions: those which can do SetVirtualaddressMap() more
than once and the older ones.

So let's drop the discussion here - it is what it is, it is too late to
change anything. At least we talked about it. :-)

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 21:25:07

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/13 18:47, Borislav Petkov wrote:

> Here's the whole dmesg up until efi_enter_virtual_map. When we have entered
> efi_enter_virtual_mode, the region has changed from
>
> [ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
>
> to
>
> [ 0.023004] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0ad000) (0MB)
>
>
> And yes, I still need to audit whether the kernel actually does that
> change. I'm still looking...

The following is a long shot, but I have no better idea for now.

Normally the following relevant sequence of calls are made to UEFI services:
(a) GetMemoryMap() --> returns memory map and map key,
(b) ExitBootServices() <-- takes map key
(c) SetVirtualAddressMap() <-- takes memory map (completed with virtual
addresses)

((a)+(b) can be repeated if (b) fails, and Linux seems to retry once.)

Now see Linux commit


<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=916f676f>

by Matthew. If I understand correctly, it introduces the function
efi_reserve_boot_services(). Normally, immediately after a successful
(b) -- ExitBootServices() -- one should be allowed to free boot services
code and data. However (c) itself -- SetVirtualAddressMap() -- seems to
depend on boot services code and data in some firmware implementations
(probably violating the spec). Therefore this commit keeps boot services
code and data around long enough for SetVirtualAddressMap(), and
releases them after.

I *think* efi_reserve_boot_services() runs between (b) and (c), that is,
after the initial EFI memmap dump, and before efi_enter_virtual_mode()
does its thing (ie. before your debug memmap dump is executed there):

efi_main() [arch/x86/boot/compressed/eboot.c]
exit_boot()
--> covers (a) and (b)

start_kernel() [init/main.c]
setup_arch() [arch/x86/kernel/setup.c]
efi_memblock_x86_reserve_range() [arch/x86/platform/efi/efi.c]
efi_reserve_boot_services() [arch/x86/platform/efi/efi.c]
efi_enter_virtual_mode() [arch/x86/platform/efi/efi.c]
--> covers (c)

That is, efi_reserve_boot_services() is called in a place where it can
potentially alter the EFI memmap between the two dumps.

(I only display efi_memblock_x86_reserve_range() in the callstack above
for completeness; I'll refer back to it lower down.)

Now look at Linux commit


<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7d68dc3f>

This commit changes efi_reserve_boot_services() -- it restricts the
function to reserve the boot services code & data only under some
circumstances. If those don't hold, then:

md->num_pages = 0;

Which I think is exactly the source of the region being truncated to
zero size.

("memmap.phys_map" is set to the EFI memory map in
efi_memblock_x86_reserve_range(), see the above partial callstack, and
"memmap.map" is pointed at "memmap.phys_map" in efi_memmap_init().
efi_reserve_boot_services() iterates over "memmap.map", so we can say it
modifies the EFI memory map.)

Granted, memblock_dbg() is called too if num_pages is reset, and the
message it prints is not included in your dmesg. However I think that
could be explained by memblock_debug==0 [include/linux/memblock.h].

What happens if you pass "memblock=debug" on the kernel command line
(see early_memblock() in "mm/memblock.c")?

(I just tried it in my Fedora 19 guest, and it in fact produced the message

[ 0.000000] efi: Could not reserve boot range [0x0000800000-0x0000ffffff]

)


BTW, regarding Michael's answer, I think this is just one of several
ways in which Linux manipulates the EFI memmap between (b) and (c). For
example it seems to merge ranges in the map.

Thanks,
Laszlo

2013-08-05 21:38:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/2013 11:12 AM, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 08:50:17AM -0700, Andrew Fish wrote:
>> AFAICT EFI pre-dates kexec merge into mainline by a number of years as
>> SetVirtualaddressMap() was part of EFI 1.0 (previous millennium)
>
> Ok, fair enough.
>
>> The EFI to UEFI conversion was placing EFI 1.10 into an industry
>> standard, UEFI 2.0. UEFI is an industry standard so some one just
>> needs to make a proposal to update the spec. The edk2 open source
>> project is not part of the standards body so complaining on this
>> mailing list is not going to get anything changed.
>
> Right, I don't think that even changing the spec would help - it would
> actually make things worse because then we'd have to differentiate
> between UEFI versions: those which can do SetVirtualaddressMap() more
> than once and the older ones.
>
> So let's drop the discussion here - it is what it is, it is too late to
> change anything. At least we talked about it. :-)
>

All of this would be a non-problem if there weren't buggy
implementations which can't run *without* SetVirtualAddressMap().

-=hpa

2013-08-05 21:41:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote:
> All of this would be a non-problem if there weren't buggy
> implementations which can't run *without* SetVirtualAddressMap().

Oh, you mean, if we were to call the runtime services through their
physical addresses?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 21:52:01

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/2013 02:41 PM, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote:
>> All of this would be a non-problem if there weren't buggy
>> implementations which can't run *without* SetVirtualAddressMap().
>
> Oh, you mean, if we were to call the runtime services through their
> physical addresses?
>

Yes. It is supposed to work, but at least on some Apple machines it
triggers bugs.

-hpa

2013-08-05 21:54:22

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/05/13 23:41, Borislav Petkov wrote:
> On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote:
>> All of this would be a non-problem if there weren't buggy
>> implementations which can't run *without* SetVirtualAddressMap().
>
> Oh, you mean, if we were to call the runtime services through their
> physical addresses?

I heard that there was a (U)EFI firmware implementation that didn't even
implement SetVirtualAddressMap(). It was okay because the main OS for
that platform didn't want to call it, it thunked to physical mode for
each runtime service call.

(This is not hearsay; I'm omitting the specifics because I'm not sure if
I'm allowed to give any. I've heard about this stuff from a direct
colleague who used to work on these systems.)

Laszlo

2013-08-05 22:08:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Aug 05, 2013 at 11:26:46PM +0200, Laszlo Ersek wrote:
> What happens if you pass "memblock=debug" on the kernel command line
> (see early_memblock() in "mm/memblock.c")?
>
> (I just tried it in my Fedora 19 guest, and it in fact produced the message
>
> [ 0.000000] efi: Could not reserve boot range [0x0000800000-0x0000ffffff]

Note to self: Always look for bugs in Linux' UEFI code first, before
going anywhere else!

Yes, very good analysis and good job Laszlo!

I'll write what I see now but will doublecheck it tomorrow because I'm
almost half asleep.

[ 0.000000] efi: efi_reserve_boot_services: -> start: 0x7e0ad000, size: 0x1f000
[ 0.000000] efi: Could not reserve boot range [0x007e0ad000-0x007e0cbfff]

And yes, this fails because memblock_is_region_reserved(start, size)
returns true.

And why is that:

[ 0.000000] memblock_reserve: [0x000000036be000-0x000000036c3000] setup_arch+0x60e/0xa63
[ 0.000000] MEMBLOCK configuration:
[ 0.000000] memory size = 0x7fef1000 reserved size = 0x1724570
[ 0.000000] memory.cnt = 0x4
[ 0.000000] memory[0x0] [0x00000000001000-0x0000000009ffff], 0x9f000 bytes
[ 0.000000] memory[0x1] [0x00000000100000-0x0000007e667fff], 0x7e568000 bytes
[ 0.000000] memory[0x2] [0x0000007e692000-0x0000007fb11fff], 0x1480000 bytes
[ 0.000000] memory[0x3] [0x0000007fb76000-0x0000007ffdffff], 0x46a000 bytes
[ 0.000000] reserved.cnt = 0x3
[ 0.000000] reserved[0x0] [0x0000000009f000-0x000000000fffff], 0x61000 bytes
[ 0.000000] reserved[0x1] [0x00000002000000-0x000000036c2fff], 0x16c3000 bytes
[ 0.000000] reserved[0x2] [0x0000007e0ad018-0x0000007e0ad587], 0x570 bytes
^^^^^^^^^

There are 0x570 bytes right in this region which are memblock-reserved
and so we truncate it in efi_reserve_boot_services().

This makes me say words which will offend this list so I'll instead go
out on the balcony and wake up the neighbors. :-)

Ok, thanks again for finding it, I'll go and try to figure out the whole
mess tomorrow.

Good night!

> BTW, regarding Michael's answer, I think this is just one of several
> ways in which Linux manipulates the EFI memmap between (b) and (c).
> For example it seems to merge ranges in the map.

Yes, it does so in efi_enter_virtual_mode(). That was my initial
suspicion, that's why I dumped the regions before the merging.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-05 22:52:41

by James Bottomley

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, 2013-08-05 at 23:55 +0200, Laszlo Ersek wrote:
> On 08/05/13 23:41, Borislav Petkov wrote:
> > On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote:
> >> All of this would be a non-problem if there weren't buggy
> >> implementations which can't run *without* SetVirtualAddressMap().
> >
> > Oh, you mean, if we were to call the runtime services through their
> > physical addresses?
>
> I heard that there was a (U)EFI firmware implementation that didn't even
> implement SetVirtualAddressMap(). It was okay because the main OS for
> that platform didn't want to call it, it thunked to physical mode for
> each runtime service call.
>
> (This is not hearsay; I'm omitting the specifics because I'm not sure if
> I'm allowed to give any. I've heard about this stuff from a direct
> colleague who used to work on these systems.)

That's actually the way all non-x86 unix systems operate. If you look
in the firmware mechanisms for almost every non-x86 system in the Linux
kernel architecture directories they do this if they have to access
firmware from Linux (we do it a lot on parisc to get the IODC to give us
the device inventory for instance).

I strongly suspect the origin of this weirdness is that once upon a time
windows didn't run with a separated address space and so needed a way of
accessing firmware in the same address space, hence the pointer
relocation trick, but even windows hasn't needed this for a while.

James

2013-08-06 07:25:22

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/06/13 00:52, James Bottomley wrote:
> On Mon, 2013-08-05 at 23:55 +0200, Laszlo Ersek wrote:
>> On 08/05/13 23:41, Borislav Petkov wrote:
>>> On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote:
>>>> All of this would be a non-problem if there weren't buggy
>>>> implementations which can't run *without* SetVirtualAddressMap().
>>>
>>> Oh, you mean, if we were to call the runtime services through their
>>> physical addresses?
>>
>> I heard that there was a (U)EFI firmware implementation that didn't even
>> implement SetVirtualAddressMap(). It was okay because the main OS for
>> that platform didn't want to call it, it thunked to physical mode for
>> each runtime service call.
>>
>> (This is not hearsay; I'm omitting the specifics because I'm not sure if
>> I'm allowed to give any. I've heard about this stuff from a direct
>> colleague who used to work on these systems.)
>
> That's actually the way all non-x86 unix systems operate. If you look
> in the firmware mechanisms for almost every non-x86 system in the Linux
> kernel architecture directories they do this if they have to access
> firmware from Linux (we do it a lot on parisc to get the IODC to give us
> the device inventory for instance).
>
> I strongly suspect the origin of this weirdness is that once upon a time
> windows didn't run with a separated address space and so needed a way of
> accessing firmware in the same address space, hence the pointer
> relocation trick, but even windows hasn't needed this for a while.

Thank you for educating me.
Laszlo

2013-08-06 14:10:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Tue, Aug 06, 2013 at 12:08:08AM +0200, Borislav Petkov wrote:
> Ok, thanks again for finding it, I'll go and try to figure out the whole
> mess tomorrow.

Ok, some more observations:

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] memblock_reserve: [0x0000000009f000-0x00000000100000] reserve_ebda_region+0x56/0x58
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.11.0-rc4+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #4 SMP PREEMPT Tue Aug 6 15:15:07 CEST 2013
[ 0.000000] memblock_reserve: [0x00000002000000-0x000000036c0000] setup_arch+0x47/0xa63
[ 0.000000] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 memblock=debug
[ 0.000000] efi: efi_memblock_x86_reserve_range: pmap: 0x7e0ad018
[ 0.000000] memblock_reserve: [0x0000007e0ad018-0x0000007e0ad588] efi_memblock_x86_reserve_range+0x70/0x75

And this is it:

efi_memblock_x86_reserve_range() reserves the region which overlaps with
the following region:

[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)

Now, this address 0x7e0ad018 is boot_params.efi_info.efi_memmap which,
AFAICT, we write to in exit_boot() after calling GetMemoryMap(). IOW,
this the EFI memory map descriptor which we mark as reserved.

So, hmm, I'm not sure what we want to do here.

Off the top of my head, I'm thinking this: efi_reserve_boot_services()
which truncates this region to 0 should actually check that this special
region is reserved, and *enlarge* it instead of making it of size 0, no?

Right?

Or does anyone have a better idea?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-06 15:30:45

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/06/13 16:10, Borislav Petkov wrote:
> On Tue, Aug 06, 2013 at 12:08:08AM +0200, Borislav Petkov wrote:
>> Ok, thanks again for finding it, I'll go and try to figure out the whole
>> mess tomorrow.
>
> Ok, some more observations:
>
> Decompressing Linux... Parsing ELF... done.
> Booting the kernel.
> [ 0.000000] memblock_reserve: [0x0000000009f000-0x00000000100000] reserve_ebda_region+0x56/0x58
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 3.11.0-rc4+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #4 SMP PREEMPT Tue Aug 6 15:15:07 CEST 2013
> [ 0.000000] memblock_reserve: [0x00000002000000-0x000000036c0000] setup_arch+0x47/0xa63
> [ 0.000000] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 memblock=debug
> [ 0.000000] efi: efi_memblock_x86_reserve_range: pmap: 0x7e0ad018
> [ 0.000000] memblock_reserve: [0x0000007e0ad018-0x0000007e0ad588] efi_memblock_x86_reserve_range+0x70/0x75
>
> And this is it:
>
> efi_memblock_x86_reserve_range() reserves the region which overlaps with
> the following region:
>
> [ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007e0ad000-0x000000007e0cc000) (0MB)
>
> Now, this address 0x7e0ad018 is boot_params.efi_info.efi_memmap which,
> AFAICT, we write to in exit_boot() after calling GetMemoryMap(). IOW,
> this the EFI memory map descriptor which we mark as reserved.
>
> So, hmm, I'm not sure what we want to do here.

To me this looks like a genuine conflict.

01 efi_main()
02 exit_boot()
03 low_alloc()
04 GetMemoryMap()
05 ExitBootServices()
06
07 start_kernel()
08 setup_arch()
09 efi_memblock_x86_reserve_range()
10 efi_reserve_boot_services()
11 efi_enter_virtual_mode()
12 SetVirtualAddressMap()

GetMemoryMap() does not itself allocate memory of any kind (which could
potentially change the memory map in-flight). It requires an input
buffer, tries to squeeze all map entries into it. If they fit, OK, if
they don't, the caller will know to allocate a bigger buffer
(potentially changing the memory map) and call GetMemoryMap() again.

So, on line 03 we allocate memory for GetMemoryMap(). As you say, this
exact area of 0x570 bytes, holding the memory map, is then marked as
reserved on line 09.

At line 10, when we want to reserve a boot services data region, we find
out that part of it has already been reserved.

I see two problems here. The first problem is what you mention -- the
decision *not* to reserve a region because part of it is already
reserved is hard to comprehend:

> Off the top of my head, I'm thinking this: efi_reserve_boot_services()
> which truncates this region to 0 should actually check that this special
> region is reserved, and *enlarge* it instead of making it of size 0, no?

The second problem is orthogonal and maybe "deeper":

The memory allocated by low_alloc() on line 03, of type
EFI_LOADER_DATA, intersects with a region of type
EFI_BOOT_SERVICES_DATA, according to the GetMemoryMap() call on line
04.

Something is very wrong here.

Clearly, if the 2nd problem didn't exist, then the 1st one wouldn't either.

Allocating the backing store for the memory map itself (on line 03) as
EFI_LOADER_DATA is a good choice. This kind of memory survives
ExitBootServices(), is not relocated, etc.

But, I cannot understand how the subsequent GetMemoryMap() call can
report an overlapping EFI_BOOT_SERVICES_DATA range. (Actually, the
EFI_BOOT_SERVICES_DATA range *surrounds* the EFI_LOADER_DATA range.)

This problem could be related to the logic in low_alloc(). It figures
out an address and allocates (rounded up) pages exactly at that address,
the firmware doesn't have any leeway to change it. The address to
allocate at is a hard requirement (EFI_ALLOCATE_ADDRESS) rather than a hint.

Normally this logic would cleave out a bit of memory from an
EFI_CONVENTIONAL_MEMORY range, and convert it to type EFI_LOADER_DATA.
Which makes it even less understandable how the subsequent
GetMemoryMap() call can report a surrounding EFI_BOOT_SERVICES_DATA range.

Can you capture the OVMF debug output? Do you see

ConvertPages: Incompatible memory types

there?

Can you set the following bits too in the debug mask?

#define DEBUG_POOL 0x00000010 // Alloc & Free's
#define DEBUG_PAGE 0x00000020 // Alloc & Free's

Thanks
Laszlo

2013-08-07 15:19:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Tue, Aug 06, 2013 at 05:31:29PM +0200, Laszlo Ersek wrote:
> Can you capture the OVMF debug output? Do you see
>
> ConvertPages: Incompatible memory types
>
> there?
>
> Can you set the following bits too in the debug mask?
>
> #define DEBUG_POOL 0x00000010 // Alloc & Free's
> #define DEBUG_PAGE 0x00000020 // Alloc & Free's

Ok, I got debug output; I have to be careful now of not missing
anything. Ok, so here we go:

First of all, I changed debugging mask to:

gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel|0x8010007F

(I just set all three bits you requested).

Using the new OVMF.id changed the addresses, of course, so we're looking
at 0x7dc59XXX ones now.

[ 0.000000] memblock_reserve: [0x0000007dc59018-0x0000007dc59618] efi_memblock_x86_reserve_range+0x70/0x75

So, I've attached an archive of the debug logs. The initial observations
I could do is that the region still gets "squashed" to:

[ 0.014041] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007dc59000) (0MB)

from

[ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007e146000) (4MB)

And the interesting stuff in the OVMF output is right at the end:

ConvertRange: 7DC59000-7DC5AFFF to 4
AddRange: 7DC59000-7DC5AFFF to 4
AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072
Jumping to kernel

We get that same output no matter if I boot it with "-enable-kvm" or
not.

If the order of the debug messages is the same as the calls actually
happen, we AllocatePoolI to address 7DC59018 which we already have added
as a range. But I'm not going to pretend I even know the code so I'll
let you comment instead :).

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


Attachments:
(No filename) (1.72 kB)
ovmf-dbg.tar.bz2 (140.24 kB)
Download all attachments

2013-08-07 17:24:01

by Andrew Fish

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region


On Aug 7, 2013, at 8:19 AM, Borislav Petkov <[email protected]> wrote:

> On Tue, Aug 06, 2013 at 05:31:29PM +0200, Laszlo Ersek wrote:
>> Can you capture the OVMF debug output? Do you see
>>
>> ConvertPages: Incompatible memory types
>>
>> there?
>>
>> Can you set the following bits too in the debug mask?
>>
>> #define DEBUG_POOL 0x00000010 // Alloc & Free's
>> #define DEBUG_PAGE 0x00000020 // Alloc & Free's
>
> Ok, I got debug output; I have to be careful now of not missing
> anything. Ok, so here we go:
>
> First of all, I changed debugging mask to:
>
> gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel|0x8010007F
>
> (I just set all three bits you requested).
>
> Using the new OVMF.id changed the addresses, of course, so we're looking
> at 0x7dc59XXX ones now.
>
> [ 0.000000] memblock_reserve: [0x0000007dc59018-0x0000007dc59618] efi_memblock_x86_reserve_range+0x70/0x75
>
> So, I've attached an archive of the debug logs. The initial observations
> I could do is that the region still gets "squashed" to:
>
> [ 0.014041] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007dc59000) (0MB)
>
> from
>
> [ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007e146000) (4MB)
>

OK so I think I need some Cliff Notes here to help me understand what is going on...

type 4 is EfiBootServicesData and attr 0x0f is cache attributes with no request for a runtime mapping. This is not runtime memory so to the OS loader it is just memory EFI has used that will get freed back to the OS after ExitBootServices(), along with EfiBootServicesCode, EfiLoaderCode, and EfiLoaderData. The EfiLoaderCode and EfiLoaderData also get freed back to the OS and they just exist for the convenience of the OS loader.

So I can't figure out why this maters? Given:

typedef enum {
// Boot Services Memory
EfiLoaderCode = 1,
EfiLoaderData = 2,
EfiBootServicesCode = 3,
EfiBootServicesData = 4,
EfiConventionalMemory = 7,

// EFI Runtime Drivers
EfiRuntimeServicesCode = 5,
EfiRuntimeServicesData = 6,

// Stuff that may get mapped into Runtime
EfiReservedMemoryType = 0,
EfiACPIReclaimMemory = 9,
EfiACPIMemoryNVS = 10,
EfiMemoryMappedIO = 11,
EfiMemoryMappedIOPortSpace = 12,
EfiPalCode = 13,

EfiUnusableMemory = 8,
EfiMaxMemoryType = 14
} EFI_MEMORY_TYPE;

[ 0.005012] efi: efi_enter_virtual_mode
**[ 0.006004] efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
*[ 0.007004] efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)

**[ 0.008004] efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000800000) (7MB)
*[ 0.009004] efi: mem03: type=4, attr=0xf, range=[0x0000000000800000-0x0000000001000000) (8MB)
**[ 0.010004] efi: mem04: type=7, attr=0xf, range=[0x0000000001000000-0x0000000002000000) (16MB)
*[ 0.011004] efi: mem05: type=2, attr=0xf, range=[0x0000000002000000-0x00000000036e5000) (22MB)
**[ 0.012004] efi: mem06: type=7, attr=0xf, range=[0x00000000036e5000-0x000000003fffc000) (969MB)
*[ 0.013004] efi: mem07: type=2, attr=0xf, range=[0x000000003fffc000-0x0000000040000000) (0MB)
**[ 0.014004] efi: mem08: type=7, attr=0xf, range=[0x0000000040000000-0x000000007c000000) (960MB)
*[ 0.015004] efi: mem09: type=4, attr=0xf, range=[0x000000007c000000-0x000000007c020000) (0MB)
**[ 0.016004] efi: mem10: type=7, attr=0xf, range=[0x000000007c020000-0x000000007dc59000) (28MB)
*[ 0.017004] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007dc59000) (0MB)
*[ 0.018004] efi: mem12: type=3, attr=0xf, range=[0x000000007e146000-0x000000007e1c2000) (0MB)
*[ 0.019004] efi: mem13: type=4, attr=0xf, range=[0x000000007e1c2000-0x000000007e1ca000) (0MB)
*[ 0.020004] efi: mem14: type=3, attr=0xf, range=[0x000000007e1ca000-0x000000007e1d4000) (0MB)
*[ 0.021004] efi: mem15: type=4, attr=0xf, range=[0x000000007e1d4000-0x000000007e1d6000) (0MB)
*[ 0.022004] efi: mem16: type=3, attr=0xf, range=[0x000000007e1d6000-0x000000007e368000) (1MB)

[ 0.023004] efi: mem17: type=6, attr=0x800000000000000f, range=[0x000000007e368000-0x000000007e37d000) (0MB)

*[ 0.024004] efi: mem18: type=4, attr=0xf, range=[0x000000007e37d000-0x000000007e8c8000) (5MB)

[ 0.025004] efi: mem19: type=5, attr=0x800000000000000f, range=[0x000000007e8c8000-0x000000007e8cf000) (0MB)

*[ 0.026004] efi: mem20: type=4, attr=0xf, range=[0x000000007e8cf000-0x000000007e923000) (0MB)

[ 0.028010] efi: mem21: type=6, attr=0x800000000000000f, range=[0x000000007e923000-0x000000007e925000) (0MB)
[ 0.029004] efi: mem22: type=5, attr=0x800000000000000f, range=[0x000000007e925000-0x000000007e934000) (0MB)

*[ 0.031004] efi: mem23: type=4, attr=0xf, range=[0x000000007e934000-0x000000007f881000) (15MB)
*[ 0.032004] efi: mem24: type=3, attr=0xf, range=[0x000000007f881000-0x000000007fa01000) (1MB)

[ 0.033004] efi: mem25: type=5, attr=0x800000000000000f, range=[0x000000007fa01000-0x000000007fa31000) (0MB)
[ 0.034003] efi: mem26: type=6, attr=0x800000000000000f, range=[0x000000007fa31000-0x000000007fa55000) (0MB)

[ 0.035004] efi: mem27: type=0, attr=0xf, range=[0x000000007fa55000-0x000000007fa59000) (0MB)

[ 0.036004] efi: mem28: type=9, attr=0xf, range=[0x000000007fa59000-0x000000007fa61000) (0MB)

[ 0.038004] efi: mem29: type=10, attr=0xf, range=[0x000000007fa61000-0x000000007fa65000) (0MB)

*[ 0.039004] efi: mem30: type=4, attr=0xf, range=[0x000000007fa65000-0x000000007ffe0000) (5MB)

[ 0.040004] efi: mem31: type=6, attr=0x800000000000000f, range=[0x000000007ffe0000-0x0000000080000000) (0MB)

If I look at this list I EFI Free memory EfiConventionalMemory being tracked (** in the above printouts) and the various types of boot services memory are being tracked (as *). From an OS point of view this is all just system memory so why not make the map simpler?


> And the interesting stuff in the OVMF output is right at the end:
>
> ConvertRange: 7DC59000-7DC5AFFF to 4
> AddRange: 7DC59000-7DC5AFFF to 4
> AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072

This is the internal DXE Core call. It looks like a boot services memory allocation is being made. The converts usually are converting from free memory, EfiConventionalMemory, to the type requested.

Thanks,

Andrew Fish

> Jumping to kernel
>
> We get that same output no matter if I boot it with "-enable-kvm" or
> not.
>
> If the order of the debug messages is the same as the calls actually
> happen, we AllocatePoolI to address 7DC59018 which we already have added
> as a range. But I'm not going to pretend I even know the code so I'll
> let you comment instead :).
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> <ovmf-dbg.tar.bz2>------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/edk2-devel

2013-08-07 17:47:44

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/07/13 17:19, Borislav Petkov wrote:
> On Tue, Aug 06, 2013 at 05:31:29PM +0200, Laszlo Ersek wrote:
>> Can you capture the OVMF debug output? Do you see
>>
>> ConvertPages: Incompatible memory types
>>
>> there?
>>
>> Can you set the following bits too in the debug mask?
>>
>> #define DEBUG_POOL 0x00000010 // Alloc & Free's
>> #define DEBUG_PAGE 0x00000020 // Alloc & Free's
>
> Ok, I got debug output; I have to be careful now of not missing
> anything. Ok, so here we go:
>
> First of all, I changed debugging mask to:
>
> gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel|0x8010007F
>
> (I just set all three bits you requested).
>
> Using the new OVMF.id changed the addresses, of course, so we're looking
> at 0x7dc59XXX ones now.
>
> [ 0.000000] memblock_reserve: [0x0000007dc59018-0x0000007dc59618] efi_memblock_x86_reserve_range+0x70/0x75
>
> So, I've attached an archive of the debug logs. The initial observations
> I could do is that the region still gets "squashed" to:
>
> [ 0.014041] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007dc59000) (0MB)
>
> from
>
> [ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007e146000) (4MB)
>
> And the interesting stuff in the OVMF output is right at the end:
>
> ConvertRange: 7DC59000-7DC5AFFF to 4
> AddRange: 7DC59000-7DC5AFFF to 4
> AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072
> Jumping to kernel
>
> We get that same output no matter if I boot it with "-enable-kvm" or
> not.
>
> If the order of the debug messages is the same as the calls actually
> happen, we AllocatePoolI to address 7DC59018 which we already have added
> as a range. But I'm not going to pretend I even know the code so I'll
> let you comment instead :).

I think this allows us to solve the bug :)

First, forget everything I said :) I was completely lost.

Remember this?

01 efi_main()
02 exit_boot()
03 low_alloc()
04 GetMemoryMap()
05 ExitBootServices()
06
07 start_kernel()
08 setup_arch()
09 efi_memblock_x86_reserve_range()
10 efi_reserve_boot_services()
11 efi_enter_virtual_mode()
12 SetVirtualAddressMap()

Now, lines 01 to 05 *do not happen*.

More precisely, they don't happen in the kernel. They happen in the firmware. Specifically, "OvmfPkg/Library/LoadLinuxLib/Linux.c".

You're booting the kernel from the qemu command line. The kernel you run is also an "[o]ld kernel[] without EFI handover protocol". So what happens is, OVMF downloads the kernel image from qemu over fw_cfg, figures it's an old kernel...

PlatformBdsPolicyBehavior() [OvmfPkg/Library/PlatformBdsLib/BdsPlatform.c]
// Process QEMU's -kernel command line option:
TryRunningQemuKernel() [OvmfPkg/Library/PlatformBdsLib/QemuKernel.c]
LoadLinux() [OvmfPkg/Library/LoadLinuxLib/Linux.c]
// Old kernels without EFI handover protocol
SetupLinuxBootParams()
SetupLinuxMemmap()
AllocatePool() <-------------- !!!
gBS->GetMemoryMap()
gBS->ExitBootServices()
prints "Jumping to kernel"
JumpToKernel()

Now pull up efi_memblock_x86_reserve_range(). It reserves "boot_params.efi_info->efi_memmap".

I assumed this field would come from the exit_boot() kernel function. It doesn't. It comes from SetupLinuxMemmap(). The former allocates the backing store as EFI_LOADER_DATA. The latter, alas, marked with !!! above, as boot services data. :)

So, what you're seeing in the OVMF debug log:

> ConvertRange: 7DC59000-7DC5AFFF to 4
> AddRange: 7DC59000-7DC5AFFF to 4
> AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072

This is self-consistent. It just documents that the AllocatePool() call marked with !!! needs to grab two full pages first (two first lines), carve them up into pool chunks, and then serve the request from them (third line).

The address displayed here shows up in the linux dmesg later on because the storage for the memory map itself is allocated, and populated, by OVMF, not the EFI stub in the kernel.

In one sentence, efi_memblock_x86_reserve_range() expects that "boot_params.efi_info->efi_memmap" has been allocated as "loader data" (by whomever), but SetupLinuxMemmap() violates this by allocating the storage as "boot services data".

This leads to double reservation attempts between efi_memblock_x86_reserve_range(), and efi_reserve_boot_services().

The attached edk2 patch should fix it. Please confirm.

Thanks,
Laszlo


Attachments:
0001-OvmfPkg-allocate-the-EFI-memory-map-for-Linux-as-Loa.patch (1.69 kB)

2013-08-07 20:19:18

by Matt Fleming

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

[ Readding Matthew Garrett to the Cc list, seeing as we both got removed
for some unknown reason ]

On Wed, 07 Aug, at 10:23:56AM, Andrew Fish wrote:

> OK so I think I need some Cliff Notes here to help me understand what
> is going on...
>
> type 4 is EfiBootServicesData and attr 0x0f is cache attributes with
> no request for a runtime mapping. This is not runtime memory so to the
> OS loader it is just memory EFI has used that will get freed back to
> the OS after ExitBootServices(), along with EfiBootServicesCode,
> EfiLoaderCode, and EfiLoaderData. The EfiLoaderCode and EfiLoaderData
> also get freed back to the OS and they just exist for the convenience
> of the OS loader.
>
> So I can't figure out why this maters? Given:

We've seen a bunch of systems that make calls into EfiBootServicesCode
after ExitBootServices(). There were some Apple machines in that list,
though I don't have the details but Matthew should.

So we map these regions unconditionally and in their original state,
otherwise the firmware will generate fatal page faults when trying to
access those memory regions.

--
Matt Fleming, Intel Open Source Technology Center

2013-08-07 20:24:57

by Matt Fleming

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

[ Adding Matthew for reals this time ]

On Wed, 07 Aug, at 09:19:08PM, Matt Fleming wrote:
> [ Readding Matthew Garrett to the Cc list, seeing as we both got removed
> for some unknown reason ]
>
> On Wed, 07 Aug, at 10:23:56AM, Andrew Fish wrote:
>
> > OK so I think I need some Cliff Notes here to help me understand what
> > is going on...
> >
> > type 4 is EfiBootServicesData and attr 0x0f is cache attributes with
> > no request for a runtime mapping. This is not runtime memory so to the
> > OS loader it is just memory EFI has used that will get freed back to
> > the OS after ExitBootServices(), along with EfiBootServicesCode,
> > EfiLoaderCode, and EfiLoaderData. The EfiLoaderCode and EfiLoaderData
> > also get freed back to the OS and they just exist for the convenience
> > of the OS loader.
> >
> > So I can't figure out why this maters? Given:
>
> We've seen a bunch of systems that make calls into EfiBootServicesCode
> after ExitBootServices(). There were some Apple machines in that list,
> though I don't have the details but Matthew should.
>
> So we map these regions unconditionally and in their original state,
> otherwise the firmware will generate fatal page faults when trying to
> access those memory regions.

--
Matt Fleming, Intel Open Source Technology Center

2013-08-07 21:10:33

by Andrew Fish

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region


On Aug 7, 2013, at 1:19 PM, Matt Fleming <[email protected]> wrote:

> [ Readding Matthew Garrett to the Cc list, seeing as we both got removed
> for some unknown reason ]
>
> On Wed, 07 Aug, at 10:23:56AM, Andrew Fish wrote:
>
>> OK so I think I need some Cliff Notes here to help me understand what
>> is going on...
>>
>> type 4 is EfiBootServicesData and attr 0x0f is cache attributes with
>> no request for a runtime mapping. This is not runtime memory so to the
>> OS loader it is just memory EFI has used that will get freed back to
>> the OS after ExitBootServices(), along with EfiBootServicesCode,
>> EfiLoaderCode, and EfiLoaderData. The EfiLoaderCode and EfiLoaderData
>> also get freed back to the OS and they just exist for the convenience
>> of the OS loader.
>>
>> So I can't figure out why this maters? Given:
>
> We've seen a bunch of systems that make calls into EfiBootServicesCode
> after ExitBootServices(). There were some Apple machines in that list,
> though I don't have the details but Matthew should.
>

I think there was some very old EDK (pre edk2) bug that caused some SMM code to grab EfiBootServicesCode at runtime. In some older Apple machines I remember working with Mathew to track down a bug in the WiFi driver not shutting down its DMA at ExitBootServices() time. I'm guessing in general that pre-Windows 8 systems may tend to be buggy.

> So we map these regions unconditionally and in their original state,
> otherwise the firmware will generate fatal page faults when trying to
> access those memory regions.
>

Well the issue I see is I don't think OS X or Windows are doing this. So I'm guessing there is some unique thing beings done on the Linux side and we don't have good tests to catch bugs in the EFI implementations. If the Linux loader hides the bugs and we don't hit them with other operating systems they are never going to get fixed. It would be good if we could track down some of these issues and make a request for some tests that can help catch these issues. The tests would be part of UEFI.org, but since some of us play in both worlds we can forward the known issues to the UEFI test work group.

Is it possible to have a switch to turn off the not required behavior (hiding EFI implementation bugs) so that bad platforms could be detected? This would be a good thing to try on platforms at the upcoming UEFI Plugfest hosted by the Linux Foundation and the UEFI Forum, so the bad behavior can be detected and the vendors can fix the issue.

Thanks,

Andrew Fish

PS Also maybe it would be possible to key this work around behavior on the EFI/UEFI version. So for example no work-around after UEFI v2.3.1?

> --
> Matt Fleming, Intel Open Source Technology Center

2013-08-07 21:23:25

by Matthew Garrett

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Wed, Aug 07, 2013 at 02:10:28PM -0700, Andrew Fish wrote:

> Well the issue I see is I don't think OS X or Windows are doing this.
> So I'm guessing there is some unique thing beings done on the Linux
> side and we don't have good tests to catch bugs in the EFI
> implementations. If the Linux loader hides the bugs and we don't hit
> them with other operating systems they are never going to get fixed.
> It would be good if we could track down some of these issues and make
> a request for some tests that can help catch these issues. The tests
> would be part of UEFI.org, but since some of us play in both worlds we
> can forward the known issues to the UEFI test work group.

Linux enables NX before calling SetVirtualAddressMap(). If other OSes
don't do that, you probably won't see the bug.

> Is it possible to have a switch to turn off the not required behavior
> (hiding EFI implementation bugs) so that bad platforms could be
> detected? This would be a good thing to try on platforms at the
> upcoming UEFI Plugfest hosted by the Linux Foundation and the UEFI
> Forum, so the bad behavior can be detected and the vendors can fix the
> issue.

It's behaviour that we already have to work around due to shipping
hardware exhibiting it, so while we could certainly develop a test,
Linux is always going to need to include the workaround code.

That being said, some of what we do with the memory map in Linux right
now is probably unnecessary - we're modifying the memory map because
that's a convenient place to store the information, rather than because
the memory map actually needs to be modified. We could do a better job
of that.

--
Matthew Garrett | [email protected]

2013-08-08 10:17:38

by Matt Fleming

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Wed, 07 Aug, at 02:10:28PM, Andrew Fish wrote:
> Well the issue I see is I don't think OS X or Windows are doing this.
> So I'm guessing there is some unique thing beings done on the Linux
> side and we don't have good tests to catch bugs in the EFI
> implementations. If the Linux loader hides the bugs and we don't hit
> them with other operating systems they are never going to get fixed.
> It would be good if we could track down some of these issues and make
> a request for some tests that can help catch these issues. The tests
> would be part of UEFI.org, but since some of us play in both worlds we
> can forward the known issues to the UEFI test work group.

I'm all for helping to develop tests that catch these kind of bugs.
What's the next step?

> Is it possible to have a switch to turn off the not required behavior
> (hiding EFI implementation bugs) so that bad platforms could be
> detected? This would be a good thing to try on platforms at the
> upcoming UEFI Plugfest hosted by the Linux Foundation and the UEFI
> Forum, so the bad behavior can be detected and the vendors can fix the
> issue.

We don't tend to provide switches for the kernel to turn off workarounds
because users run the risk of inadvertently stopping their machines from
booting correctly. Also, because the major distributions will always
enable the workarounds, the kernel would need to be built manually to
see any kind of informative error message.

What we do have though is the Firmware Testsuite - fwts,

https://wiki.ubuntu.com/Kernel/Reference/fwts

I know that Brian (Cc'd) has been doing some excellent advocacy work,
getting people at plugfetsts to run this testsuite which tests for
implementation bugs from within a Linux environment.

> PS Also maybe it would be possible to key this work around behavior on
> the EFI/UEFI version. So for example no work-around after UEFI v2.3.1?

That would really depend on who has seen this bug and on which
platforms. Is there a particular reason that mapping the boot services
regions as-is would cause problems?

--
Matt Fleming, Intel Open Source Technology Center

2013-08-08 13:46:10

by Andrew Fish

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region


On Aug 8, 2013, at 3:17 AM, Matt Fleming <[email protected]> wrote:

> On Wed, 07 Aug, at 02:10:28PM, Andrew Fish wrote:
>> Well the issue I see is I don't think OS X or Windows are doing this.
>> So I'm guessing there is some unique thing beings done on the Linux
>> side and we don't have good tests to catch bugs in the EFI
>> implementations. If the Linux loader hides the bugs and we don't hit
>> them with other operating systems they are never going to get fixed.
>> It would be good if we could track down some of these issues and make
>> a request for some tests that can help catch these issues. The tests
>> would be part of UEFI.org, but since some of us play in both worlds we
>> can forward the known issues to the UEFI test work group.
>
> I'm all for helping to develop tests that catch these kind of bugs.
> What's the next step?
>

I'll bring this up with UEFI.org.

>> Is it possible to have a switch to turn off the not required behavior
>> (hiding EFI implementation bugs) so that bad platforms could be
>> detected? This would be a good thing to try on platforms at the
>> upcoming UEFI Plugfest hosted by the Linux Foundation and the UEFI
>> Forum, so the bad behavior can be detected and the vendors can fix the
>> issue.
>
> We don't tend to provide switches for the kernel to turn off workarounds
> because users run the risk of inadvertently stopping their machines from
> booting correctly. Also, because the major distributions will always
> enable the workarounds, the kernel would need to be built manually to
> see any kind of informative error message.
>
> What we do have though is the Firmware Testsuite - fwts,
>
> https://wiki.ubuntu.com/Kernel/Reference/fwts
>
> I know that Brian (Cc'd) has been doing some excellent advocacy work,
> getting people at plugfetsts to run this testsuite which tests for
> implementation bugs from within a Linux environment.
>
>> PS Also maybe it would be possible to key this work around behavior on
>> the EFI/UEFI version. So for example no work-around after UEFI v2.3.1?
>
> That would really depend on who has seen this bug and on which
> platforms. Is there a particular reason that mapping the boot services
> regions as-is would cause problems?
>

1) The firmware bug could also be a security hole and thus needs to get fixed.
2) The kernel gets locked into a design that does not follow the specification, and this limits future design options.
3) Makes the code more complex to maintain and test.

> --
> Matt Fleming, Intel Open Source Technology Center

2013-08-08 15:02:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Wed, Aug 07, 2013 at 07:49:16PM +0200, Laszlo Ersek wrote:

[…]

> Now, lines 01 to 05 *do not happen*.
>
> More precisely, they don't happen in the kernel. They happen in the
> firmware. Specifically, "OvmfPkg/Library/LoadLinuxLib/Linux.c".
>
> You're booting the kernel from the qemu command line. The kernel you
> run is also an "[o]ld kernel[] without EFI handover protocol". So what
> happens is, OVMF downloads the kernel image from qemu over fw_cfg,
> figures it's an old kernel...

Right, I think this is easier than having to go into the EFI shell each
time and run bzImage.efi. Unless there's a faster way to do that along
with passing it kernel command line parameters...

[…]

> In one sentence, efi_memblock_x86_reserve_range() expects that
> "boot_params.efi_info->efi_memmap" has been allocated as "loader data"
> (by whomever), but SetupLinuxMemmap() violates this by allocating the
> storage as "boot services data".
>
> This leads to double reservation attempts between
> efi_memblock_x86_reserve_range(), and efi_reserve_boot_services().

Ok, this makes sense.

> The attached edk2 patch should fix it. Please confirm.
>
> Thanks,
> Laszlo
>

> From 4a9e1f10fa2d06496f1983c25c47c6a1373d2f42 Mon Sep 17 00:00:00 2001
> From: Laszlo Ersek <[email protected]>
> Date: Wed, 7 Aug 2013 19:39:30 +0200
> Subject: [PATCH] OvmfPkg: allocate the EFI memory map for Linux as Loader Data
>
> In Linux, efi_memblock_x86_reserve_range() and efi_reserve_boot_services()
> expect that whoever allocates the EFI memmap allocates it in Loader Data
> type memory. Linux's own exit_boot()-->low_alloc() complies, but
> SetupLinuxMemmap() in LoadLinuxLib doesn't.
>
> The memory type discrepancy leads to efi_memblock_x86_reserve_range() and
> efi_reserve_boot_services() both trying to reserve the range backing the
> memmap, resulting in memmap entry truncation in
> efi_reserve_boot_services().
>
> This fix also makes this allocation consistent with all other persistent
> allocations in "OvmfPkg/Library/LoadLinuxLib/Linux.c".
>
> Contributed-under: TianoCore Contribution Agreement 1.0
>
> Signed-off-by: Laszlo Ersek <[email protected]>

Reported-and-tested-by: Borislav Petkov <[email protected]>

Great, thanks for this.

I guess we got that out of the way too. I finally can concentrate on my
patches again :-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-08 21:46:04

by Brian J. Johnson

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 08/08/2013 10:02 AM, Borislav Petkov wrote:
> On Wed, Aug 07, 2013 at 07:49:16PM +0200, Laszlo Ersek wrote:
>>> Now, lines 01 to 05*do not happen*.
>>>
>>> More precisely, they don't happen in the kernel. They happen in the
>>> firmware. Specifically, "OvmfPkg/Library/LoadLinuxLib/Linux.c".
>>>
>>> You're booting the kernel from the qemu command line. The kernel you
>>> run is also an "[o]ld kernel[] without EFI handover protocol". So what
>>> happens is, OVMF downloads the kernel image from qemu over fw_cfg,
>>> figures it's an old kernel...
>
> Right, I think this is easier than having to go into the EFI shell each
> time and run bzImage.efi. Unless there's a faster way to do that along
> with passing it kernel command line parameters...

You can use mtools or some other utility to update the kernel image and
bootloader configuration files on the disk image, so it boots the way
you want.

Or you could set OVMF to boot to the shell, and put a startup.nsh file
on the boot partition which invokes the loader with the options you
want. That may be a bit simpler than rewriting a grub config. We use
this technique on our internal simulator.
--

Brian Johnson

--------------------------------------------------------------------

"The lack of explanation demands an explanation."
-- Schaffer

2013-08-18 07:33:53

by Jordan Justen

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

0001-OvmfPkg-allocate-the-EFI-memory-map-for-Linux-as-Loa.patch
was applied in r14555.

Thanks for the contribution.

And thanks for the bug report & testing Boris.

On Wed, Aug 7, 2013 at 10:49 AM, Laszlo Ersek <[email protected]> wrote:
> On 08/07/13 17:19, Borislav Petkov wrote:
>> On Tue, Aug 06, 2013 at 05:31:29PM +0200, Laszlo Ersek wrote:
>>> Can you capture the OVMF debug output? Do you see
>>>
>>> ConvertPages: Incompatible memory types
>>>
>>> there?
>>>
>>> Can you set the following bits too in the debug mask?
>>>
>>> #define DEBUG_POOL 0x00000010 // Alloc & Free's
>>> #define DEBUG_PAGE 0x00000020 // Alloc & Free's
>>
>> Ok, I got debug output; I have to be careful now of not missing
>> anything. Ok, so here we go:
>>
>> First of all, I changed debugging mask to:
>>
>> gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel|0x8010007F
>>
>> (I just set all three bits you requested).
>>
>> Using the new OVMF.id changed the addresses, of course, so we're looking
>> at 0x7dc59XXX ones now.
>>
>> [ 0.000000] memblock_reserve: [0x0000007dc59018-0x0000007dc59618] efi_memblock_x86_reserve_range+0x70/0x75
>>
>> So, I've attached an archive of the debug logs. The initial observations
>> I could do is that the region still gets "squashed" to:
>>
>> [ 0.014041] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007dc59000) (0MB)
>>
>> from
>>
>> [ 0.000000] efi: mem11: type=4, attr=0xf, range=[0x000000007dc59000-0x000000007e146000) (4MB)
>>
>> And the interesting stuff in the OVMF output is right at the end:
>>
>> ConvertRange: 7DC59000-7DC5AFFF to 4
>> AddRange: 7DC59000-7DC5AFFF to 4
>> AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072
>> Jumping to kernel
>>
>> We get that same output no matter if I boot it with "-enable-kvm" or
>> not.
>>
>> If the order of the debug messages is the same as the calls actually
>> happen, we AllocatePoolI to address 7DC59018 which we already have added
>> as a range. But I'm not going to pretend I even know the code so I'll
>> let you comment instead :).
>
> I think this allows us to solve the bug :)
>
> First, forget everything I said :) I was completely lost.
>
> Remember this?
>
> 01 efi_main()
> 02 exit_boot()
> 03 low_alloc()
> 04 GetMemoryMap()
> 05 ExitBootServices()
> 06
> 07 start_kernel()
> 08 setup_arch()
> 09 efi_memblock_x86_reserve_range()
> 10 efi_reserve_boot_services()
> 11 efi_enter_virtual_mode()
> 12 SetVirtualAddressMap()
>
> Now, lines 01 to 05 *do not happen*.
>
> More precisely, they don't happen in the kernel. They happen in the firmware. Specifically, "OvmfPkg/Library/LoadLinuxLib/Linux.c".
>
> You're booting the kernel from the qemu command line. The kernel you run is also an "[o]ld kernel[] without EFI handover protocol". So what happens is, OVMF downloads the kernel image from qemu over fw_cfg, figures it's an old kernel...
>
> PlatformBdsPolicyBehavior() [OvmfPkg/Library/PlatformBdsLib/BdsPlatform.c]
> // Process QEMU's -kernel command line option:
> TryRunningQemuKernel() [OvmfPkg/Library/PlatformBdsLib/QemuKernel.c]
> LoadLinux() [OvmfPkg/Library/LoadLinuxLib/Linux.c]
> // Old kernels without EFI handover protocol
> SetupLinuxBootParams()
> SetupLinuxMemmap()
> AllocatePool() <-------------- !!!
> gBS->GetMemoryMap()
> gBS->ExitBootServices()
> prints "Jumping to kernel"
> JumpToKernel()
>
> Now pull up efi_memblock_x86_reserve_range(). It reserves "boot_params.efi_info->efi_memmap".
>
> I assumed this field would come from the exit_boot() kernel function. It doesn't. It comes from SetupLinuxMemmap(). The former allocates the backing store as EFI_LOADER_DATA. The latter, alas, marked with !!! above, as boot services data. :)
>
> So, what you're seeing in the OVMF debug log:
>
>> ConvertRange: 7DC59000-7DC5AFFF to 4
>> AddRange: 7DC59000-7DC5AFFF to 4
>> AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072
>
> This is self-consistent. It just documents that the AllocatePool() call marked with !!! needs to grab two full pages first (two first lines), carve them up into pool chunks, and then serve the request from them (third line).
>
> The address displayed here shows up in the linux dmesg later on because the storage for the memory map itself is allocated, and populated, by OVMF, not the EFI stub in the kernel.
>
> In one sentence, efi_memblock_x86_reserve_range() expects that "boot_params.efi_info->efi_memmap" has been allocated as "loader data" (by whomever), but SetupLinuxMemmap() violates this by allocating the storage as "boot services data".
>
> This leads to double reservation attempts between efi_memblock_x86_reserve_range(), and efi_reserve_boot_services().
>
> The attached edk2 patch should fix it. Please confirm.
>
> Thanks,
> Laszlo
>

2013-09-02 08:20:00

by Matt Fleming

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Thu, 08 Aug, at 06:46:02AM, Andrew Fish wrote:
>
> On Aug 8, 2013, at 3:17 AM, Matt Fleming <[email protected]> wrote:
>
> > On Wed, 07 Aug, at 02:10:28PM, Andrew Fish wrote:
> >> Well the issue I see is I don't think OS X or Windows are doing this.
> >> So I'm guessing there is some unique thing beings done on the Linux
> >> side and we don't have good tests to catch bugs in the EFI
> >> implementations. If the Linux loader hides the bugs and we don't hit
> >> them with other operating systems they are never going to get fixed.
> >> It would be good if we could track down some of these issues and make
> >> a request for some tests that can help catch these issues. The tests
> >> would be part of UEFI.org, but since some of us play in both worlds we
> >> can forward the known issues to the UEFI test work group.
> >
> > I'm all for helping to develop tests that catch these kind of bugs.
> > What's the next step?
> >
>
> I'll bring this up with UEFI.org.

For those attending the UEFI plugfest in New Orleans this would be a
good topic for discussion - figuring out a collaboration process to get
new tests in place.

--
Matt Fleming, Intel Open Source Technology Center

2013-09-13 20:38:47

by Jerry Hoemann

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Thu, Aug 08, 2013 at 11:17:30AM +0100, Matt Fleming wrote:
> > Is it possible to have a switch to turn off the not required behavior
> > (hiding EFI implementation bugs) so that bad platforms could be
> > detected? This would be a good thing to try on platforms at the
> > upcoming UEFI Plugfest hosted by the Linux Foundation and the UEFI
> > Forum, so the bad behavior can be detected and the vendors can fix the
> > issue.
>
> We don't tend to provide switches for the kernel to turn off workarounds
> because users run the risk of inadvertently stopping their machines from
> booting correctly. Also, because the major distributions will always
> enable the workarounds, the kernel would need to be built manually to
> see any kind of informative error message.
>

...

>
> > PS Also maybe it would be possible to key this work around behavior on
> > the EFI/UEFI version. So for example no work-around after UEFI v2.3.1?
>
> That would really depend on who has seen this bug and on which
> platforms. Is there a particular reason that mapping the boot services
> regions as-is would cause problems?
>

Matt,

We have hit an issue on our new platform in development related to the
call of efi_reserve_boot_services() from setup_arch().

The reservation can interfere with allocation of the crash kernel.

In pre 3.9(?) kernels, the crash kernel is required to be allocated from
physically contiguous memory below 896 MB.

Our new platforms are large in both the amount of memory and the amount
of IO. This requires large crash kernels for kdump to work. This is even
after the work done for makedumpfile v 1.5 to allow it to work with a
smaller foot print.


One of the problems is that drivers will allocate memory as boot code and/or
data in the region < 896 that effectively fragments this memory.
With the reservation, we can't reuse the memory when needed for the
crash kernels. If we remove the reservation and allow the kernel
to reuse the memory, we the reservation of the crash kernel succeeds.

This is definitely a problem for distros that are pre 3.9. Probably less
so for top of tree, but i haven't been focused there.

So we are definitely interested in finding a mechanism to not
do this reservation on platforms that don't have the issues described
earlier in this thread.

thanks,

Jerry

--

----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard

3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: [email protected]
----------------------------------------------------------------------------

2013-09-16 10:59:34

by Matt Fleming

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Fri, 13 Sep, at 02:38:12PM, [email protected] wrote:
> Matt,
>
> We have hit an issue on our new platform in development related to the
> call of efi_reserve_boot_services() from setup_arch().
>
> The reservation can interfere with allocation of the crash kernel.

Jerry, thanks for bringing this up.

> In pre 3.9(?) kernels, the crash kernel is required to be allocated from
> physically contiguous memory below 896 MB.
>
> Our new platforms are large in both the amount of memory and the amount
> of IO. This requires large crash kernels for kdump to work. This is even
> after the work done for makedumpfile v 1.5 to allow it to work with a
> smaller foot print.
>
>
> One of the problems is that drivers will allocate memory as boot code and/or
> data in the region < 896 that effectively fragments this memory.
> With the reservation, we can't reuse the memory when needed for the
> crash kernels. If we remove the reservation and allow the kernel
> to reuse the memory, we the reservation of the crash kernel succeeds.
>
> This is definitely a problem for distros that are pre 3.9. Probably less
> so for top of tree, but i haven't been focused there.
>
> So we are definitely interested in finding a mechanism to not
> do this reservation on platforms that don't have the issues described
> earlier in this thread.

OK, in an ideal world we'd move the crash kernel reservation after
efi_free_boot_services(), because at that point the boot regions are
available again. But it seems that we reserve the boot regions really
early during startup and release them relatively late. The reason is
that the Boot Graphics Resource Table (BGRT) data, if present, is
located in the Boot Services Data regions but we can't extract the
address of the region from the ACPI tables until we've setup the ACPI
subsystem, which happens quite late.

I wonder whether performing the reservation of the crash kernel memory
first, before efi_reserve_boot_services(), would help. That way we'd
only need to reserve remaining regions in efi_reserve_boot_services().
This scheme would rely on nothing writing into the crash kernel area
before we've extracted the BGRT data, however.

--
Matt Fleming, Intel Open Source Technology Center

2013-09-16 11:49:25

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 09/16/13 12:59, Matt Fleming wrote:
> On Fri, 13 Sep, at 02:38:12PM, [email protected] wrote:
>> Matt,
>>
>> We have hit an issue on our new platform in development related to the
>> call of efi_reserve_boot_services() from setup_arch().
>>
>> The reservation can interfere with allocation of the crash kernel.
>
> Jerry, thanks for bringing this up.
>
>> In pre 3.9(?) kernels, the crash kernel is required to be allocated from
>> physically contiguous memory below 896 MB.
>>
>> Our new platforms are large in both the amount of memory and the amount
>> of IO. This requires large crash kernels for kdump to work. This is even
>> after the work done for makedumpfile v 1.5 to allow it to work with a
>> smaller foot print.
>>
>>
>> One of the problems is that drivers will allocate memory as boot code and/or
>> data in the region < 896 that effectively fragments this memory.
>> With the reservation, we can't reuse the memory when needed for the
>> crash kernels. If we remove the reservation and allow the kernel
>> to reuse the memory, we the reservation of the crash kernel succeeds.
>>
>> This is definitely a problem for distros that are pre 3.9. Probably less
>> so for top of tree, but i haven't been focused there.
>>
>> So we are definitely interested in finding a mechanism to not
>> do this reservation on platforms that don't have the issues described
>> earlier in this thread.
>
> OK, in an ideal world we'd move the crash kernel reservation after
> efi_free_boot_services(), because at that point the boot regions are
> available again. But it seems that we reserve the boot regions really
> early during startup and release them relatively late. The reason is
> that the Boot Graphics Resource Table (BGRT) data, if present, is
> located in the Boot Services Data regions but we can't extract the
> address of the region from the ACPI tables until we've setup the ACPI
> subsystem, which happens quite late.

Why is BGRT allocated as Boot Services Data?

In file
"MdeModulePkg/Universal/Acpi/BootGraphicsResourceTableDxe/BootGraphicsResourceTableDxe.c":

InstallBootGraphicsResourceTable()
BgrtAllocateBsDataMemoryBelow4G()
gBS->AllocatePages(... EfiBootServicesData ...)

>From Table 25. Memory Type Usage before ExitBootServices():

EfiBootServicesData -- The data portions of a loaded Boot Services
Driver, and the default data allocation type
used by a Boot Services Driver to allocate
pool memory.

EfiACPIReclaimMemory -- Memory that holds the ACPI tables.

>From Table 26. Memory Type Usage after ExitBootServices():

EfiBootServicesData -- Memory available for general use.

EfiACPIReclaimMemory -- This memory is to be preserved by the loader
and OS until ACPI is enabled. Once ACPI is
enabled, the memory in this range is available
for general use.

I thought that anything referenced by a pointer in any ACPI table was
EfiACPIReclaimMemory or stricter. Specifically, the RSDT or XSDT points
to BGRT, so BGRT is EfiACPIReclaimMemory. BGRT points to the image data
(with its Image Address field), hence the image data should be
EfiACPIReclaimMemory too.

Otherwise, the pointer (BGRT.ImageAddress) can outlive the pointed-to
storage (the image data).

The image data sounds to me like textbook example for
EfiACPIReclaimMemory. This way the kernel could free Boot Services Data
early, perform the crash kernel reservation right after, and safely
access BGRT whenever the ACPI subsystem is brought up later.


The edk2 commit that flipped the memory type underneath the image data
from EfiReservedMemoryType to EfiBootServicesData is:

https://github.com/tianocore/edk2/commit/4c58575e

I think this commit is wrong. It's fine for OSPM to release the image
data at some point, but not right after ExitBootServices(), because
referencing pointers in ACPI tables survive strictly longer.

... Actually, the commit does follow the ACPI spec 5.0:

5.2.22.4 Image Address

The Image Address contains the location in memory where an
in-memory copy of the boot image can be found. The image should be
stored in EfiBootServicesData, allowing the system to reclaim
the memory when the image is no longer needed.

The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I
take the current wording ("should be stored") as a recommendation only.)

If that's in fact a recommendation (and not a hard requirement), then it
should be easy to change BgrtAllocateBsDataMemoryBelow4G() again.

Thanks,
Laszlo

2013-09-16 15:58:22

by Josh Triplett

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Sep 16, 2013 at 01:50:46PM +0200, Laszlo Ersek wrote:
> On 09/16/13 12:59, Matt Fleming wrote:
> > On Fri, 13 Sep, at 02:38:12PM, [email protected] wrote:
> >> Matt,
> >>
> >> We have hit an issue on our new platform in development related to the
> >> call of efi_reserve_boot_services() from setup_arch().
> >>
> >> The reservation can interfere with allocation of the crash kernel.
> >
> > Jerry, thanks for bringing this up.
> >
> >> In pre 3.9(?) kernels, the crash kernel is required to be allocated from
> >> physically contiguous memory below 896 MB.
> >>
> >> Our new platforms are large in both the amount of memory and the amount
> >> of IO. This requires large crash kernels for kdump to work. This is even
> >> after the work done for makedumpfile v 1.5 to allow it to work with a
> >> smaller foot print.
> >>
> >>
> >> One of the problems is that drivers will allocate memory as boot code and/or
> >> data in the region < 896 that effectively fragments this memory.
> >> With the reservation, we can't reuse the memory when needed for the
> >> crash kernels. If we remove the reservation and allow the kernel
> >> to reuse the memory, we the reservation of the crash kernel succeeds.
> >>
> >> This is definitely a problem for distros that are pre 3.9. Probably less
> >> so for top of tree, but i haven't been focused there.
> >>
> >> So we are definitely interested in finding a mechanism to not
> >> do this reservation on platforms that don't have the issues described
> >> earlier in this thread.
> >
> > OK, in an ideal world we'd move the crash kernel reservation after
> > efi_free_boot_services(), because at that point the boot regions are
> > available again. But it seems that we reserve the boot regions really
> > early during startup and release them relatively late. The reason is
> > that the Boot Graphics Resource Table (BGRT) data, if present, is
> > located in the Boot Services Data regions but we can't extract the
> > address of the region from the ACPI tables until we've setup the ACPI
> > subsystem, which happens quite late.
>
> Why is BGRT allocated as Boot Services Data?
>
> In file
> "MdeModulePkg/Universal/Acpi/BootGraphicsResourceTableDxe/BootGraphicsResourceTableDxe.c":
>
> InstallBootGraphicsResourceTable()
> BgrtAllocateBsDataMemoryBelow4G()
> gBS->AllocatePages(... EfiBootServicesData ...)
>
> From Table 25. Memory Type Usage before ExitBootServices():
>
> EfiBootServicesData -- The data portions of a loaded Boot Services
> Driver, and the default data allocation type
> used by a Boot Services Driver to allocate
> pool memory.
>
> EfiACPIReclaimMemory -- Memory that holds the ACPI tables.
>
> From Table 26. Memory Type Usage after ExitBootServices():
>
> EfiBootServicesData -- Memory available for general use.
>
> EfiACPIReclaimMemory -- This memory is to be preserved by the loader
> and OS until ACPI is enabled. Once ACPI is
> enabled, the memory in this range is available
> for general use.
>
> I thought that anything referenced by a pointer in any ACPI table was
> EfiACPIReclaimMemory or stricter. Specifically, the RSDT or XSDT points
> to BGRT, so BGRT is EfiACPIReclaimMemory. BGRT points to the image data
> (with its Image Address field), hence the image data should be
> EfiACPIReclaimMemory too.
>
> Otherwise, the pointer (BGRT.ImageAddress) can outlive the pointed-to
> storage (the image data).
>
> The image data sounds to me like textbook example for
> EfiACPIReclaimMemory. This way the kernel could free Boot Services Data
> early, perform the crash kernel reservation right after, and safely
> access BGRT whenever the ACPI subsystem is brought up later.
>
>
> The edk2 commit that flipped the memory type underneath the image data
> from EfiReservedMemoryType to EfiBootServicesData is:
>
> https://github.com/tianocore/edk2/commit/4c58575e
>
> I think this commit is wrong. It's fine for OSPM to release the image
> data at some point, but not right after ExitBootServices(), because
> referencing pointers in ACPI tables survive strictly longer.
>
> ... Actually, the commit does follow the ACPI spec 5.0:
>
> 5.2.22.4 Image Address
>
> The Image Address contains the location in memory where an
> in-memory copy of the boot image can be found. The image should be
> stored in EfiBootServicesData, allowing the system to reclaim
> the memory when the image is no longer needed.
>
> The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I
> take the current wording ("should be stored") as a recommendation only.)

I agree that UEFI *should* store the BGRT in EfiACPIReclaimMemory, but
in practice the UEFI firmware I've seen with a BGRT does follow that
recommendation and store it in EfiBootServicesData. So, even if the
recommendation in the spec changed, the kernel would still have to
accomodate both possibilities.

- Josh Triplett

2013-09-16 16:23:57

by Laszlo Ersek

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On 09/16/13 17:57, Josh Triplett wrote:

>> The edk2 commit that flipped the memory type underneath the image data
>> from EfiReservedMemoryType to EfiBootServicesData is:
>>
>> https://github.com/tianocore/edk2/commit/4c58575e
>>
>> I think this commit is wrong. It's fine for OSPM to release the image
>> data at some point, but not right after ExitBootServices(), because
>> referencing pointers in ACPI tables survive strictly longer.
>>
>> ... Actually, the commit does follow the ACPI spec 5.0:
>>
>> 5.2.22.4 Image Address
>>
>> The Image Address contains the location in memory where an
>> in-memory copy of the boot image can be found. The image should be
>> stored in EfiBootServicesData, allowing the system to reclaim
>> the memory when the image is no longer needed.
>>
>> The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I
>> take the current wording ("should be stored") as a recommendation only.)
>
> I agree that UEFI *should* store the BGRT in EfiACPIReclaimMemory, but
> in practice the UEFI firmware I've seen with a BGRT does follow that
> recommendation and store it in EfiBootServicesData. So, even if the
> recommendation in the spec changed, the kernel would still have to
> accomodate both possibilities.

Just for the theoretical debate:

The edk2 commit linked above is 5 days old. All UEFI firmware in the
wild (on released hardware) should be using EfiReservedMemoryType (the
pre-patch memory type), which is even stricter.

EfiReservedMemoryType can never be released & repurposed, so it should
make no difference for crash kernel allocation, shouldn't it?

- call efi_free_boot_services() -- doesn't touch the image data (which
is in RAM of EfiReservedMemoryType),
- reserve crash kernel,
- access BGRT via ACPI.

BGRT had appeared in edk2 with

https://github.com/tianocore/edk2/commit/0284e90c

and EfiReservedMemoryType used to be the allocation type until commit
4c58575e.

Or are you alluding to UEFI firmware that's not based on TianoCore?

Laszlo

2013-09-16 16:28:07

by Matthew Garrett

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Sep 16, 2013 at 06:25:22PM +0200, Laszlo Ersek wrote:

> Or are you alluding to UEFI firmware that's not based on TianoCore?

Most BGRT implementations are IBV specific rather than coming from
Tiano. The ACPI spec says that the image should be stored in
EfiBootServicesData, and most implementations follow that.

--
Matthew Garrett | [email protected]

2013-09-16 16:30:11

by Josh Triplett

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Sep 16, 2013 at 06:25:22PM +0200, Laszlo Ersek wrote:
> On 09/16/13 17:57, Josh Triplett wrote:
>
> >> The edk2 commit that flipped the memory type underneath the image data
> >> from EfiReservedMemoryType to EfiBootServicesData is:
> >>
> >> https://github.com/tianocore/edk2/commit/4c58575e
> >>
> >> I think this commit is wrong. It's fine for OSPM to release the image
> >> data at some point, but not right after ExitBootServices(), because
> >> referencing pointers in ACPI tables survive strictly longer.
> >>
> >> ... Actually, the commit does follow the ACPI spec 5.0:
> >>
> >> 5.2.22.4 Image Address
> >>
> >> The Image Address contains the location in memory where an
> >> in-memory copy of the boot image can be found. The image should be
> >> stored in EfiBootServicesData, allowing the system to reclaim
> >> the memory when the image is no longer needed.
> >>
> >> The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I
> >> take the current wording ("should be stored") as a recommendation only.)
> >
> > I agree that UEFI *should* store the BGRT in EfiACPIReclaimMemory, but
> > in practice the UEFI firmware I've seen with a BGRT does follow that
> > recommendation and store it in EfiBootServicesData. So, even if the
> > recommendation in the spec changed, the kernel would still have to
> > accomodate both possibilities.
>
> Just for the theoretical debate:
>
> The edk2 commit linked above is 5 days old. All UEFI firmware in the
> wild (on released hardware) should be using EfiReservedMemoryType (the
> pre-patch memory type), which is even stricter.
>
> EfiReservedMemoryType can never be released & repurposed, so it should
> make no difference for crash kernel allocation, shouldn't it?
>
> - call efi_free_boot_services() -- doesn't touch the image data (which
> is in RAM of EfiReservedMemoryType),
> - reserve crash kernel,
> - access BGRT via ACPI.
>
> BGRT had appeared in edk2 with
>
> https://github.com/tianocore/edk2/commit/0284e90c
>
> and EfiReservedMemoryType used to be the allocation type until commit
> 4c58575e.
>
> Or are you alluding to UEFI firmware that's not based on TianoCore?

I'm saying, in practice, that the systems I tested BGRT support on and
submitted patches for stored the BGRT's image in EfiBootServicesData.

- Josh Triplett

2013-09-18 19:24:35

by Jerry Hoemann

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Mon, Sep 16, 2013 at 11:59:20AM +0100, Matt Fleming wrote:
> On Fri, 13 Sep, at 02:38:12PM, [email protected] wrote:
> > Matt,
> >
> > We have hit an issue on our new platform in development related to the
> > call of efi_reserve_boot_services() from setup_arch().
> >
> > The reservation can interfere with allocation of the crash kernel.
>
> Jerry, thanks for bringing this up.
>
> > In pre 3.9(?) kernels, the crash kernel is required to be allocated from
> > physically contiguous memory below 896 MB.
> >
> > Our new platforms are large in both the amount of memory and the amount
> > of IO. This requires large crash kernels for kdump to work. This is even
> > after the work done for makedumpfile v 1.5 to allow it to work with a
> > smaller foot print.
> >
> >
> > One of the problems is that drivers will allocate memory as boot code and/or
> > data in the region < 896 that effectively fragments this memory.
> > With the reservation, we can't reuse the memory when needed for the
> > crash kernels. If we remove the reservation and allow the kernel
> > to reuse the memory, we the reservation of the crash kernel succeeds.
> >
> > This is definitely a problem for distros that are pre 3.9. Probably less
> > so for top of tree, but i haven't been focused there.
> >
> > So we are definitely interested in finding a mechanism to not
> > do this reservation on platforms that don't have the issues described
> > earlier in this thread.
>
> OK, in an ideal world we'd move the crash kernel reservation after
> efi_free_boot_services(), because at that point the boot regions are
> available again. But it seems that we reserve the boot regions really
> early during startup and release them relatively late. The reason is
> that the Boot Graphics Resource Table (BGRT) data, if present, is
> located in the Boot Services Data regions but we can't extract the
> address of the region from the ACPI tables until we've setup the ACPI
> subsystem, which happens quite late.
>
> I wonder whether performing the reservation of the crash kernel memory
> first, before efi_reserve_boot_services(), would help. That way we'd
> only need to reserve remaining regions in efi_reserve_boot_services().
> This scheme would rely on nothing writing into the crash kernel area
> before we've extracted the BGRT data, however.
>
> --
> Matt Fleming, Intel Open Source Technology Center


Matt,

I conducted the following experiments on a 3.11 kernel:

1) Moved the call of reserve_crashkernel to after efi_free_boot_services.
Booted with crashkernel=512M

a) when memory below 896M was *not* fragmented by BootCode segments
reserve_crashkernel succeeded.

b) when memory below 896M *was* fragmented by BootCode segments
reserve_crashkernel failed.

2) Moved the call to reserve_crashkernel to before call to efi_reserve_boot_services.
Booted with crashkernel=512M

reserve_crashkernel succeeded irrespective of whether the memory below 896M was
fragmented by BootCode segments.


I haven't determined why reserve_crashkernel failed in 1b) above.

I don't see the memory reserved for the crash kernel being accessed
before call to efi_free_boot_services.

CC'ing kexec list for their input as I may have missed something.


Jerry


--

----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: [email protected]
----------------------------------------------------------------------------

2013-09-20 09:06:13

by Matt Fleming

[permalink] [raw]
Subject: Re: [edk2] Corrupted EFI region

On Wed, 18 Sep, at 01:24:14PM, [email protected] wrote:
> Matt,
>
> I conducted the following experiments on a 3.11 kernel:

Jerry, could you paste your memory map from the kernel log?

--
Matt Fleming, Intel Open Source Technology Center