2012-05-15 12:17:53

by Jan Beulich

[permalink] [raw]
Subject: [PATCH] x86-64: use EFI to deal with platform wall clock

Other than ix86, x86-64 on EFI so far didn't set the {g,s}et_wallclock
accessors to the EFI routines, thus incorrectly using raw RTC accesses
instead.

Simply removing the #ifdef around the respective code isn't enough,
however: The runtime code must not only be forced to be executable, it
also must have a proper virtual address set (which on at least the
system I'm testing on isn't the case for all necessary regions, or at
least not as early as they're now being required).

The earlier calling of efi_set_executable() in turn require the CPA
code to cope, i.e. during early boot it must be avoided to call
cpa_flush_array(), as the first thing this function does is a
BUG_ON(irqs_disabled()).

Also make the two EFI functions in question here static - they're not
being referenced elsewhere.

Signed-off-by: Jan Beulich <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Matthew Garrett <[email protected]>

---
arch/x86/mm/pageattr.c | 10 ++++++----
arch/x86/platform/efi/efi.c | 6 ++----
arch/x86/platform/efi/efi_64.c | 10 +++++++++-
include/linux/efi.h | 2 --
4 files changed, 17 insertions(+), 11 deletions(-)

--- 3.4-rc7/arch/x86/mm/pageattr.c
+++ 3.4-rc7-x86_64-EFI-time/arch/x86/mm/pageattr.c
@@ -919,11 +919,13 @@ static int change_page_attr_set_clr(unsi

/*
* On success we use clflush, when the CPU supports it to
- * avoid the wbindv. If the CPU does not support it and in the
- * error case we fall back to cpa_flush_all (which uses
- * wbindv):
+ * avoid the wbindv. If the CPU does not support it, in the
+ * error case, and during early boot (for EFI) we fall back
+ * to cpa_flush_all (which uses wbinvd):
*/
- if (!ret && cpu_has_clflush) {
+ if (early_boot_irqs_disabled)
+ __cpa_flush_all((void *)(long)cache);
+ else if (!ret && cpu_has_clflush) {
if (cpa.flags & (CPA_PAGES_ARRAY | CPA_ARRAY)) {
cpa_flush_array(addr, numpages, cache,
cpa.flags, pages);
--- 3.4-rc7/arch/x86/platform/efi/efi.c
+++ 3.4-rc7-x86_64-EFI-time/arch/x86/platform/efi/efi.c
@@ -249,7 +249,7 @@ static efi_status_t __init phys_efi_get_
return status;
}

-int efi_set_rtc_mmss(unsigned long nowtime)
+static int efi_set_rtc_mmss(unsigned long nowtime)
{
int real_seconds, real_minutes;
efi_status_t status;
@@ -278,7 +278,7 @@ int efi_set_rtc_mmss(unsigned long nowti
return 0;
}

-unsigned long efi_get_time(void)
+static unsigned long efi_get_time(void)
{
efi_status_t status;
efi_time_t eft;
@@ -720,12 +720,10 @@ void __init efi_init(void)
efi_enabled = 0;
return;
}
-#ifdef CONFIG_X86_32
if (efi_native) {
x86_platform.get_wallclock = efi_get_time;
x86_platform.set_wallclock = efi_set_rtc_mmss;
}
-#endif

#if EFI_DEBUG
print_efi_memmap();
--- 3.4-rc7/arch/x86/platform/efi/efi_64.c
+++ 3.4-rc7-x86_64-EFI-time/arch/x86/platform/efi/efi_64.c
@@ -53,8 +53,16 @@ static void __init early_code_mapping_se
for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
md = p;
if (md->type == EFI_RUNTIME_SERVICES_CODE ||
- md->type == EFI_BOOT_SERVICES_CODE)
+ md->type == EFI_BOOT_SERVICES_CODE) {
+ if (!md->virt_addr) {
+ /*
+ * We have to hope that the mappings set up so
+ * far cover all that's needed for the call.
+ */
+ md->virt_addr = (u64)__va(md->phys_addr);
+ }
efi_set_executable(md, executable);
+ }
}
}

--- 3.4-rc7/include/linux/efi.h
+++ 3.4-rc7-x86_64-EFI-time/include/linux/efi.h
@@ -503,8 +503,6 @@ extern u64 efi_mem_attribute (unsigned l
extern int __init efi_uart_console_only (void);
extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern unsigned long efi_get_time(void);
-extern int efi_set_rtc_mmss(unsigned long nowtime);
extern void efi_reserve_boot_services(void);
extern struct efi_memory_map memmap;




2012-05-15 12:47:16

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

On Tue, May 15, 2012 at 01:18:19PM +0100, Jan Beulich wrote:

> Simply removing the #ifdef around the respective code isn't enough,
> however: The runtime code must not only be forced to be executable, it
> also must have a proper virtual address set (which on at least the
> system I'm testing on isn't the case for all necessary regions, or at
> least not as early as they're now being required).

I don't understand this. The get_time pointer won't be updated to the
virtual function until the end of efi_enter_virtual_mode, at which point
all runtime regions should have a virtual address mapped. We also call
runtime_code_page_mkexec() immediately after updating that pointer,
although maybe the order should be swapped. So I think the bug you're
fixing is not the bug you think you're fixing...

--
Matthew Garrett | [email protected]

2012-05-15 13:18:40

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

>>> On 15.05.12 at 14:47, Matthew Garrett <[email protected]> wrote:
> On Tue, May 15, 2012 at 01:18:19PM +0100, Jan Beulich wrote:
>
>> Simply removing the #ifdef around the respective code isn't enough,
>> however: The runtime code must not only be forced to be executable, it
>> also must have a proper virtual address set (which on at least the
>> system I'm testing on isn't the case for all necessary regions, or at
>> least not as early as they're now being required).
>
> I don't understand this. The get_time pointer won't be updated to the
> virtual function until the end of efi_enter_virtual_mode, at which point
> all runtime regions should have a virtual address mapped. We also call
> runtime_code_page_mkexec() immediately after updating that pointer,
> although maybe the order should be swapped. So I think the bug you're
> fixing is not the bug you think you're fixing...

I would have expected that things work that way, but they
don't. In particular is the function in efi_64.c that's being
modified here called from efi_call_phys_{pro,epi}log(), and at
that point we can't expect virtual addresses to be uniformly
set yet. So it's a physical call that requires the fixup done
here, as efi_set_executable() simply expects ->virt_addr to
be valid. I suspect that no physical calls other than
phys_efi_set_virtual_address_map() were being done so far
at all on 64-bit, hiding the problem.

Jan

2012-05-15 13:20:16

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

On Tue, May 15, 2012 at 02:19:07PM +0100, Jan Beulich wrote:

> I would have expected that things work that way, but they
> don't. In particular is the function in efi_64.c that's being
> modified here called from efi_call_phys_{pro,epi}log(), and at
> that point we can't expect virtual addresses to be uniformly
> set yet. So it's a physical call that requires the fixup done
> here, as efi_set_executable() simply expects ->virt_addr to
> be valid. I suspect that no physical calls other than
> phys_efi_set_virtual_address_map() were being done so far
> at all on 64-bit, hiding the problem.

In that case we need to split the mapping code into two chunks and
configure the memory map earlier. You can't depend on __va() doing
anything useful on runtime addresses.

--
Matthew Garrett | [email protected]

2012-05-16 12:18:00

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

>>> On 15.05.12 at 15:20, Matthew Garrett <[email protected]> wrote:
> On Tue, May 15, 2012 at 02:19:07PM +0100, Jan Beulich wrote:
>
>> I would have expected that things work that way, but they
>> don't. In particular is the function in efi_64.c that's being
>> modified here called from efi_call_phys_{pro,epi}log(), and at
>> that point we can't expect virtual addresses to be uniformly
>> set yet. So it's a physical call that requires the fixup done
>> here, as efi_set_executable() simply expects ->virt_addr to
>> be valid. I suspect that no physical calls other than
>> phys_efi_set_virtual_address_map() were being done so far
>> at all on 64-bit, hiding the problem.
>
> In that case we need to split the mapping code into two chunks and
> configure the memory map earlier. You can't depend on __va() doing
> anything useful on runtime addresses.

Okay, looks like calling efi_ioremap() at this point is possible.
But why is efi_enter_virtual_mode() being called as late as is
the case currently for x86 anyway?

And then again the current logic in efi_enter_virtual_mode()
looks flawed (it assumes two contiguous pieces of direct
mappings, and while on systems with dis-contiguous physical
memory this currently appears to be true, it's not correct - the
holes could have MMIO assignments in them - and hence
shouldn't be relied upon), and I wouldn't want to copy this
elsewhere.

Similarly for efi_ioremap() itself - it neither honors the
cacheability requested by the firmware (calling
set_memory_uc() subsequently isn't sufficient, as the already
established mappings may be used in prefetches already),
nor does its self-recursion look very reliable (why would
init_memory_mapping() do any better on a second call,
and the code doesn't deal with last_map_pfn pointing below
or precisely at phys_addr at all).

Plus the use of set_virtual_address_map is bogus in the first
place, as it makes it impossible for a kexec-ed kernel to also
use EFI services (as it would have to call the function a
second and possibly third time, yet it is not permitted to be
called more than once). Imo all calls have to happen in
physical mode.

So I'm afraid if the patch as I provided it isn't acceptable, and
if the call to efi_enter_virtual_mode() can't be moved ahead
of the one to timekeeping_init(), this winds down to the whole
logic needing a re-write.

Jan

2012-05-16 12:39:31

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

On Wed, May 16, 2012 at 01:18:28PM +0100, Jan Beulich wrote:

> Okay, looks like calling efi_ioremap() at this point is possible.
> But why is efi_enter_virtual_mode() being called as late as is
> the case currently for x86 anyway?

Assuming that things are as they are for a good reason is not
necessarily true in the EFI code...

> And then again the current logic in efi_enter_virtual_mode()
> looks flawed (it assumes two contiguous pieces of direct
> mappings, and while on systems with dis-contiguous physical
> memory this currently appears to be true, it's not correct - the
> holes could have MMIO assignments in them - and hence
> shouldn't be relied upon), and I wouldn't want to copy this
> elsewhere.

Could you elaborate on that a little?

> Plus the use of set_virtual_address_map is bogus in the first
> place, as it makes it impossible for a kexec-ed kernel to also
> use EFI services (as it would have to call the function a
> second and possibly third time, yet it is not permitted to be
> called more than once). Imo all calls have to happen in
> physical mode.

Platforms don't correctly deal with the case where you make physical
calls after ExitBootServices(). We tried running in physical mode. It
simply doesn't work.

> So I'm afraid if the patch as I provided it isn't acceptable, and
> if the call to efi_enter_virtual_mode() can't be moved ahead
> of the one to timekeeping_init(), this winds down to the whole
> logic needing a re-write.

I have zero objection to this being cleaned up, but I don't know of any
obvious reason why we can't do enter_virtual_mode() earlier.

--
Matthew Garrett | [email protected]

2012-05-16 12:59:32

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

>>> On 16.05.12 at 14:39, Matthew Garrett <[email protected]> wrote:
> On Wed, May 16, 2012 at 01:18:28PM +0100, Jan Beulich wrote:
>
>> Okay, looks like calling efi_ioremap() at this point is possible.
>> But why is efi_enter_virtual_mode() being called as late as is
>> the case currently for x86 anyway?
>
> Assuming that things are as they are for a good reason is not
> necessarily true in the EFI code...

Okay...

>> And then again the current logic in efi_enter_virtual_mode()
>> looks flawed (it assumes two contiguous pieces of direct
>> mappings, and while on systems with dis-contiguous physical
>> memory this currently appears to be true, it's not correct - the
>> holes could have MMIO assignments in them - and hence
>> shouldn't be relied upon), and I wouldn't want to copy this
>> elsewhere.
>
> Could you elaborate on that a little?

There are systems where RAM on individual nodes is always
starting at e.g. a 1Tb boundary. Obviously there can (at
least theoretically) be anything in between, and hence
assuming that __va() is usable here is simply wrong, as likely
no mapping was created at all for the hole space (or if there
is one, it would conflict with the one to be established here
in the EfiMemoryMappedIO case).

>> Plus the use of set_virtual_address_map is bogus in the first
>> place, as it makes it impossible for a kexec-ed kernel to also
>> use EFI services (as it would have to call the function a
>> second and possibly third time, yet it is not permitted to be
>> called more than once). Imo all calls have to happen in
>> physical mode.
>
> Platforms don't correctly deal with the case where you make physical
> calls after ExitBootServices(). We tried running in physical mode. It
> simply doesn't work.

Interesting, especially as we're using physical mode exclusively so
far in Xen. That's a platform issue though, and working around
it by (silently!) sacrificing other functionality is questionable imo.
It should at best be an option (default off), so that on systems
where physical mode works, kexec can work too.

>> So I'm afraid if the patch as I provided it isn't acceptable, and
>> if the call to efi_enter_virtual_mode() can't be moved ahead
>> of the one to timekeeping_init(), this winds down to the whole
>> logic needing a re-write.
>
> I have zero objection to this being cleaned up, but I don't know of any
> obvious reason why we can't do enter_virtual_mode() earlier.

I shall give that a try then. It would imply that we don't need
phys_efi_get_time() then anymore. I don't have a 32-bit EFI box
though to test that things don't break there...

Jan

2012-05-16 13:07:14

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

On Wed, May 16, 2012 at 01:59:57PM +0100, Jan Beulich wrote:
> >>> On 16.05.12 at 14:39, Matthew Garrett <[email protected]> wrote:
> > Could you elaborate on that a little?
>
> There are systems where RAM on individual nodes is always
> starting at e.g. a 1Tb boundary. Obviously there can (at
> least theoretically) be anything in between, and hence
> assuming that __va() is usable here is simply wrong, as likely
> no mapping was created at all for the hole space (or if there
> is one, it would conflict with the one to be established here
> in the EfiMemoryMappedIO case).

Ok, that does sound like it needs fixing.

> > Platforms don't correctly deal with the case where you make physical
> > calls after ExitBootServices(). We tried running in physical mode. It
> > simply doesn't work.
>
> Interesting, especially as we're using physical mode exclusively so
> far in Xen. That's a platform issue though, and working around
> it by (silently!) sacrificing other functionality is questionable imo.
> It should at best be an option (default off), so that on systems
> where physical mode works, kexec can work too.

There was a patchset posted that provided that option. We experimented
with it in RHEL for a while and found that physical mode simply isn't
reliable - no other OS uses it, so it's entirely untested on most
platforms.

--
Matthew Garrett | [email protected]

2012-05-17 08:31:51

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

On Wed, 2012-05-16 at 13:59 +0100, Jan Beulich wrote:
> I shall give that a try then. It would imply that we don't need
> phys_efi_get_time() then anymore. I don't have a 32-bit EFI box
> though to test that things don't break there...

I can test out 32-bit EFI, or at least find someone with access to such
a system.

2012-05-25 15:00:07

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86-64: use EFI to deal with platform wall clock

>>> On 16.05.12 at 14:39, Matthew Garrett <[email protected]> wrote:
> On Wed, May 16, 2012 at 01:18:28PM +0100, Jan Beulich wrote:
>> So I'm afraid if the patch as I provided it isn't acceptable, and
>> if the call to efi_enter_virtual_mode() can't be moved ahead
>> of the one to timekeeping_init(), this winds down to the whole
>> logic needing a re-write.
>
> I have zero objection to this being cleaned up, but I don't know of any
> obvious reason why we can't do enter_virtual_mode() earlier.

So this indeed works (moving enter_virual_mode() invocation
to the end of mm_init(), which seems to most logical place to
me) on the single system I can try this out on. I'll submit an
updated patch soon.

Jan