2007-05-02 20:03:43

by chandramouli narayanan

[permalink] [raw]
Subject: [PATCH 2.6.21 1/3] x86_64: EFI64 support

General note on EFI x86_64 support
----------------------------------

The following set of patches implements EFI x86_64 Linux kernel support.
References to EFI and UEFI (Unified Extensible Firmware Interface) are used
interchangeably in the text below.

UEFI specification can be found here: http://www.uefi.org

UEFI x86_64 support is implemented in the following 3 kernel patches.
For booting the UEFI x86_64 enabled kernel, bootloader support is required.
ELILO with x86_64 support has been submitted to the ELILO project.
Please visit the ELILO source link for boot loader source and instructions.

Testing
-------
The x86_64 UEFI kernel patches below have been applied against 2.6.21
and tested with x86_64 ELILO bootloader on Intel platforms with EFI1.10 and
UEFI 2.0 firmware. The EFI firmware that I have tested also had CSM supporting
compatibility for legacy operating systems.

With this said, the patches require wider testing on systems with EFI64
firmware support.

Mechanics:
---------

- Create a VFAT partition on the disk
- Copy the following to the VFAT partition:
elilo bootloader with x86_64 support and elilo configuration file
efi64 kernel image, initrd
- Boot to EFI shell and invoke elilo choosing efi64 kernel image
- On UEFI2.0 firmware systems, pass vga=fbcon for boot messages to appear
on console.

Issues noted:
-------------
1. With the patches applied to 2.6.21rc5-git3 kernel, KDE desktop did not get
repainted when the system was woken up from sleep. This happened when the
screen lock was in effect due to inactivity and the system went to sleep.
Following the key press, the system was woken up but KMenu did not show up.
The mouse click around the desktop did not have the desired effect, although
windows that were open before sleep showed up and keyboard input was taken and
the system was operational. Taking the system down to 'init 3' and back to
'init 5' restored the desktop and mouse control.

This behavior did not manifest with the patches applied against 2.6.21rc7-git2
and the final 2.6.21 release.

2. With CALGARY_IOMMU=y in the kernel configuration, the Calgary detection fails
with the message "Calgary: Unable to locate Rio Grande Table in EBDA - bailing".
However, the legacy kernel has no such error.

3. On some x86_64 systems with EFI1.10 firmware, early boot messages
did not appear on console. However, I didn't encounter this behavior on
x86_64 systems with UEFI2.0 firmware. This does not appear to be kernel issue
but rather firmware/bootloader issue. I will post an update on the status
of this issue as I learn more.

EFI x86_64 support Patch 1 of 3
-------------------------------
The main file of this patch is the addition of efi.c for x86_64.
This file is modelled after the EFI IA32 avatar. Some x86_64 specifics are
worth noting here. EFI initialization and EFI service mapping are
implemented in efi.c. NX bit is turned off for EFI runtime service area so
that EFI runtime code is executable. On x86_64, parameters passed to
UEFI firmware services need to follow the UEFI calling convention.
For this purpose, a uefi_call_wrapper() is implemented. EFI calls
are wrapped before calling the firmware service. An efi_stub.S to make
EFI calls in physical mode with interrupts turned off is added for x86_64.

Boot parameter setup file is updated for x86_64 EFI support. x86_64 EFI boot
loader must conform to the EFI boot parameter offsets defined in the file
include/asm-x86_64/bootsetup.h (and x86_64 patches submitted to ELILO
bootloader conforms to this).

EFI x86_64 build option is added to the kernel configuration.

Signed-off-by: Chandramouli Narayanan <[email protected]>

diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/Kconfig linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/Kconfig
--- linux-2.6.21rc7-git2-orig/arch/x86_64/Kconfig 2007-04-19 12:39:39.000000000 -0700
+++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/Kconfig 2007-04-19 13:01:02.000000000 -0700
@@ -254,6 +254,20 @@ config X86_HT
depends on SMP && !MK8
default y

+config EFI
+ bool "Boot from EFI support (EXPERIMENTAL)"
+ default n
+ ---help---
+
+ This enables the the kernel to boot on EFI platforms using
+ system configuration information passed to it from the firmware.
+ This also enables the kernel to use any EFI runtime services that are
+ available (such as the EFI variable services).
+ This option is only useful on systems that have EFI firmware
+ and will result in a kernel image that is ~8k larger. However,
+ even with this option, the resultant kernel should continue to
+ boot on existing non-EFI platforms.
+
config MATH_EMULATION
bool

diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/drivers/char/Kconfig linux-2.6.21rc7-git2-uefi-finaltest/drivers/char/Kconfig
--- linux-2.6.21rc7-git2-orig/drivers/char/Kconfig 2007-04-19 12:39:39.000000000 -0700
+++ linux-2.6.21rc7-git2-uefi-finaltest/drivers/char/Kconfig 2007-04-19 13:01:02.000000000 -0700
@@ -837,7 +837,7 @@ config GEN_RTC_X

config EFI_RTC
bool "EFI Real Time Clock Services"
- depends on IA64
+ depends on IA64 || X86_64

config DS1302
tristate "DS1302 RTC support"
diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/include/asm-x86_64/bootsetup.h linux-2.6.21rc7-git2-uefi-finaltest/include/asm-x86_64/bootsetup.h
--- linux-2.6.21rc7-git2-orig/include/asm-x86_64/bootsetup.h 2007-04-19 12:39:40.000000000 -0700
+++ linux-2.6.21rc7-git2-uefi-finaltest/include/asm-x86_64/bootsetup.h 2007-04-19 13:01:02.000000000 -0700
@@ -17,6 +17,12 @@ extern char x86_boot_params[BOOT_PARAM_S
#define APM_BIOS_INFO (*(struct apm_bios_info *) (PARAM+0x40))
#define DRIVE_INFO (*(struct drive_info_struct *) (PARAM+0x80))
#define SYS_DESC_TABLE (*(struct sys_desc_table_struct*)(PARAM+0xa0))
+#define EFI_SYSTAB (*((unsigned long *)(PARAM+0x1b8)))
+#define EFI_LOADER_SIG ((unsigned char *)(PARAM+0x1c0))
+#define EFI_MEMDESC_SIZE (*((unsigned int *) (PARAM+0x1c4)))
+#define EFI_MEMDESC_VERSION (*((unsigned int *) (PARAM+0x1c8)))
+#define EFI_MEMMAP_SIZE (*((unsigned int *) (PARAM+0x1cc)))
+#define EFI_MEMMAP (*((unsigned long *)(PARAM+0x1d0)))
#define MOUNT_ROOT_RDONLY (*(unsigned short *) (PARAM+0x1F2))
#define RAMDISK_FLAGS (*(unsigned short *) (PARAM+0x1F8))
#define SAVED_VIDEO_MODE (*(unsigned short *) (PARAM+0x1FA))
diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi.c linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi.c
--- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi.c 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi.c 2007-04-19 13:01:02.000000000 -0700
@@ -0,0 +1,824 @@
+/*
+ * Extensible Firmware Interface
+ *
+ * Based on Extensible Firmware Interface Specification version 1.0
+ *
+ * Copyright (C) 1999 VA Linux Systems
+ * Copyright (C) 1999 Walt Drummond <[email protected]>
+ * Copyright (C) 1999-2002 Hewlett-Packard Co.
+ * David Mosberger-Tang <[email protected]>
+ * Stephane Eranian <[email protected]>
+ * Copyright (C) 2005-2008 Intel Co.
+ * Fenghua Yu <[email protected]>
+ * Bibo Mao <[email protected]>
+ * Chandramouli Narayanan <[email protected]>
+ *
+ * All EFI Runtime Services are not implemented yet as EFI only
+ * supports physical mode addressing on SoftSDV. This is to be fixed
+ * in a future version. --drummond 1999-07-20
+ *
+ * Implemented EFI runtime services and virtual mode calls. --davidm
+ *
+ * Goutham Rao: <[email protected]>
+ * Skip non-WB memory and ignore empty memory ranges.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/spinlock.h>
+#include <linux/bootmem.h>
+#include <linux/ioport.h>
+#include <linux/module.h>
+#include <linux/efi.h>
+
+#include <asm/setup.h>
+#include <asm/bootsetup.h>
+#include <asm/io.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+#include <asm/proto.h>
+
+#define EFI_DEBUG 0
+#define PFX "EFI: "
+
+#define EFI_ARG_NUM_GET_TIME 2
+#define EFI_ARG_NUM_SET_TIME 1
+#define EFI_ARG_NUM_GET_WAKEUP_TIME 3
+#define EFI_ARG_NUM_SET_WAKEUP_TIME 2
+#define EFI_ARG_NUM_GET_VARIABLE 5
+#define EFI_ARG_NUM_GET_NEXT_VARIABLE 3
+#define EFI_ARG_NUM_SET_VARIABLE 5
+#define EFI_ARG_NUM_GET_NEXT_HIGH_MONO_COUNT 1
+#define EFI_ARG_NUM_RESET_SYSTEM 4
+#define EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP 4
+
+#define EFI_ARG_NUM_MAX 10
+#define EFI_REG_ARG_NUM 4
+
+extern unsigned long efi_call_phys(void *fp, u64 arg_num, ...);
+struct efi efi;
+EXPORT_SYMBOL(efi);
+struct efi efi_phys __initdata;
+struct efi_memory_map memmap ;
+static efi_system_table_t efi_systab __initdata;
+
+static unsigned long efi_rt_eflags;
+static spinlock_t efi_rt_lock = SPIN_LOCK_UNLOCKED;
+static pgd_t save_pgd;
+
+/* Convert SysV calling convention to EFI x86_64 calling convention */
+
+static efi_status_t uefi_call_wrapper(void *fp, unsigned long va_num, ...)
+{
+ va_list ap;
+ int i;
+ unsigned long args[EFI_ARG_NUM_MAX];
+ unsigned int arg_size,stack_adjust_size;
+ efi_status_t status;
+
+ if (va_num > EFI_ARG_NUM_MAX || va_num<0) {
+ return EFI_LOAD_ERROR;
+ }
+ if (va_num==0)
+ /* There is no need to convert arguments for void argument. */
+ __asm__ __volatile__("call *%0;ret;"::"r"(fp));
+
+ /* The EFI arguments is stored in an array. Then later on it will be
+ * pushed into stack or passed to registers according to MS ABI.
+ */
+ va_start(ap, va_num);
+ for (i = 0; i < va_num; i++) {
+ args[i] = va_arg(ap, unsigned long);
+ }
+ va_end(ap);
+ arg_size = va_num*8;
+ stack_adjust_size = (va_num > EFI_REG_ARG_NUM? EFI_REG_ARG_NUM : va_num)*8;
+
+ /* Starting from here, assembly code makes sure all registers used are
+ * under controlled by our code itself instead of by gcc.
+ */
+ /* Start converting SysV calling convention to MS calling convention. */
+ __asm__ __volatile__(
+ /* 0. Save preserved registers. EFI call may clobbered them. */
+ " pushq %%rbp;pushq %%rbx;pushq %%r12;"
+ " pushq %%r13;pushq %%r14;pushq %%r15;"
+ /* 1. Push arguments passed by stack into stack. */
+ " mov %1, %%r12;"
+ " mov %3, %%r13;"
+ " mov %1, %%rax;"
+ " dec %%rax;"
+ " mov $8, %%bl;"
+ " mul %%bl;"
+ " add %%rax, %%r13;"
+ "lstack:"
+ " cmp $4, %%r12;"
+ " jle lregister;"
+ " pushq (%%r13);"
+ " sub $8, %%r13;"
+ " dec %%r12;"
+ " jmp lstack;"
+ /* 2. Move arguments passed by registers into registers.
+ * rdi->rcx, rsi->rdx, rdx->r8, rcx->r9.
+ */
+ "lregister:"
+ " mov %3, %%r14;"
+ " mov $0, %%r12;"
+ "lloadregister:"
+ " cmp %1, %%r12;"
+ " jge lcall;"
+ " mov (%%r14), %%rcx;"
+ " inc %%r12;"
+ " cmp %1, %%r12;"
+ " jge lcall;"
+ " mov 8(%%r14), %%rdx;"
+ " inc %%r12;"
+ " cmp %1, %%r12;"
+ " jge lcall;"
+ " mov 0x10(%%r14), %%r8;"
+ " inc %%r12;"
+ " cmp %1, %%r12;"
+ " jge lcall;"
+ " mov 0x18(%%r14), %%r9;"
+ /* 3. Save stack space for those register arguments. */
+ "lcall: "
+ " sub %2, %%rsp;"
+ /* 4. Save arg_size to r12 which is preserved in EFI call. */
+ " mov %4, %%r12;"
+ /* 5. Call EFI function. */
+ " call *%5;"
+ " mov %%rax, %0;"
+ /* 6. Restore stack space reserved for those register
+ * arguments.
+ */
+ " add %%r12, %%rsp;"
+ /* 7. Restore preserved registers. */
+ " popq %%r15;popq %%r14;popq %%r13;"
+ " popq %%r12;popq %%rbx;popq %%rbp;"
+ : "=r"(status)
+ :"r"((unsigned long)va_num),
+ "r"((unsigned long)stack_adjust_size),
+ "r"(args),
+ "r"((unsigned long)arg_size),
+ "r"(fp)
+ :"rsp","rbx","rax","r11","r12","r13","r14","rcx","rdx","r8","r9"
+ );
+ return status;
+}
+
+static efi_status_t _efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->get_time,
+ EFI_ARG_NUM_GET_TIME,
+ (u64)tm,
+ (u64)tc);
+}
+
+static efi_status_t _efi_set_time(efi_time_t *tm)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->set_time,
+ EFI_ARG_NUM_SET_TIME,
+ (u64)tm);
+}
+
+static efi_status_t
+_efi_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending,
+ efi_time_t *tm)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->get_wakeup_time,
+ EFI_ARG_NUM_GET_WAKEUP_TIME,
+ (u64)enabled,
+ (u64)pending,
+ (u64)tm);
+}
+
+static efi_status_t _efi_set_wakeup_time(efi_bool_t enabled, efi_time_t *tm)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->set_wakeup_time,
+ EFI_ARG_NUM_SET_WAKEUP_TIME,
+ (u64)enabled,
+ (u64)(tm));
+}
+
+static efi_status_t
+_efi_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 *attr,
+ unsigned long *data_size, void *data)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->get_variable,
+ EFI_ARG_NUM_GET_VARIABLE,
+ (u64)name,
+ (u64)vendor,
+ (u64)attr,
+ (u64)data_size,
+ (u64)data);
+}
+
+static efi_status_t
+_efi_get_next_variable(unsigned long *name_size, efi_char16_t *name,
+ efi_guid_t *vendor)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->get_next_variable,
+ EFI_ARG_NUM_GET_NEXT_VARIABLE,
+ (u64)name_size,
+ (u64)name,
+ (u64)vendor);
+}
+
+static efi_status_t _efi_set_variable(efi_char16_t *name, efi_guid_t *vendor,
+ u64 attr, u64 data_size,
+ void *data)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->set_variable,
+ EFI_ARG_NUM_SET_VARIABLE,
+ (u64)name,
+ (u64)vendor,
+ (u64)attr,
+ (u64)data_size,
+ (u64)data);
+}
+
+static efi_status_t _efi_get_next_high_mono_count(u32 *count)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->get_next_high_mono_count,
+ EFI_ARG_NUM_GET_NEXT_HIGH_MONO_COUNT,
+ (u64)count);
+}
+
+static efi_status_t _efi_reset_system(int reset_type, efi_status_t status,
+ unsigned long data_size, efi_char16_t *data)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->reset_system,
+ EFI_ARG_NUM_RESET_SYSTEM,
+ (u64)reset_type,
+ (u64)status,
+ (u64)data_size,
+ (u64)data);
+}
+
+static efi_status_t _efi_set_virtual_address_map(unsigned long memory_map_size,
+ unsigned long descriptor_size,
+ u32 descriptor_version,
+ efi_memory_desc_t *virtual_map)
+{
+ return uefi_call_wrapper((void*)efi.systab->runtime->set_virtual_address_map,
+ EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP,
+ (u64)memory_map_size,
+ (u64)descriptor_size,
+ (u64)descriptor_version,
+ (u64)virtual_map);
+}
+
+static void efi_call_phys_prelog(void)
+{
+ unsigned long vaddress;
+
+ spin_lock(&efi_rt_lock);
+ local_irq_save(efi_rt_eflags);
+
+ vaddress = (unsigned long)__va(0x0UL);
+ pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
+ set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
+
+ local_flush_tlb();
+}
+
+static void efi_call_phys_epilog(void)
+{
+ /*
+ * After the lock is released, the original page table is restored.
+ */
+ set_pgd(pgd_offset_k(0x0UL), save_pgd);
+ local_flush_tlb();
+ local_irq_restore(efi_rt_eflags);
+ spin_unlock(&efi_rt_lock);
+}
+
+static efi_status_t
+phys_efi_set_virtual_address_map(unsigned long memory_map_size,
+ unsigned long descriptor_size,
+ u32 descriptor_version,
+ efi_memory_desc_t *virtual_map)
+{
+ efi_status_t status;
+
+ efi_call_phys_prelog();
+ status = efi_call_phys(efi_phys.set_virtual_address_map,
+ EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP,
+ (unsigned long)memory_map_size,
+ (unsigned long)descriptor_size,
+ (unsigned long)descriptor_version,
+ (unsigned long)virtual_map);
+ efi_call_phys_epilog();
+ return status;
+}
+
+efi_status_t
+phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
+{
+
+ efi_status_t status;
+
+ efi_call_phys_prelog();
+ status = efi_call_phys(efi_phys.get_time,
+ EFI_ARG_NUM_GET_TIME,
+ (unsigned long)tm,
+ (unsigned long)tc);
+ efi_call_phys_epilog();
+ return status;
+}
+
+inline int efi_set_rtc_mmss(unsigned long nowtime)
+{
+ int real_seconds, real_minutes;
+ efi_status_t status;
+ efi_time_t eft;
+ efi_time_cap_t cap;
+
+ spin_lock(&efi_rt_lock);
+ status = efi.get_time(&eft, &cap);
+ spin_unlock(&efi_rt_lock);
+ if (status != EFI_SUCCESS) {
+ printk("Ooops: efitime: can't read time!\n");
+ return -1;
+ }
+
+ real_seconds = nowtime % 60;
+ real_minutes = nowtime / 60;
+ if (((abs(real_minutes - eft.minute) + 15)/30) & 1)
+ real_minutes += 30;
+ real_minutes %= 60;
+ eft.minute = real_minutes;
+ eft.second = real_seconds;
+
+ spin_lock(&efi_rt_lock);
+ status = efi.set_time(&eft);
+ spin_unlock(&efi_rt_lock);
+ if (status != EFI_SUCCESS) {
+ printk("Ooops: efitime: can't write time!\n");
+ return -1;
+ }
+ return 0;
+}
+/*
+ * This should only be used during kernel init and before runtime
+ * services have been remapped, therefore, we'll need to call in physical
+ * mode. Note, this call isn't used later, so mark it __init.
+ */
+/*
+ * This is used during kernel init before runtime
+ * services have been remapped and also during suspend, therefore,
+ * we'll need to call both in physical and virtual modes.
+ */
+inline unsigned long efi_get_time(void)
+{
+ efi_status_t status;
+ efi_time_t eft;
+ efi_time_cap_t cap;
+
+ if (efi.get_time) {
+ /* if we are in virtual mode use remapped function */
+ status = efi.get_time(&eft, &cap);
+ } else {
+ /* we are in physical mode */
+ status = phys_efi_get_time(&eft, &cap);
+ }
+ if (status != EFI_SUCCESS)
+ printk("Oops: efitime: can't read time status: 0x%lx\n",status);
+
+ return mktime(eft.year, eft.month, eft.day, eft.hour,
+ eft.minute, eft.second);
+}
+
+inline int is_available_memory(efi_memory_desc_t * md)
+{
+ if (!(md->attribute & EFI_MEMORY_WB))
+ return 0;
+
+ switch (md->type) {
+ case EFI_LOADER_CODE:
+ case EFI_LOADER_DATA:
+ case EFI_BOOT_SERVICES_CODE:
+ case EFI_BOOT_SERVICES_DATA:
+ case EFI_CONVENTIONAL_MEMORY:
+ return 1;
+ }
+ return 0;
+}
+
+/* Make EFI runtime code executable */
+static void
+phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
+{
+ int i = pmd_index(address);
+
+ for (; i < PTRS_PER_PMD && address < end; i++, address += PMD_SIZE) {
+ unsigned long entry;
+ pmd_t *pmd = pmd_page + pmd_index(address);
+
+ entry = pmd_val(*pmd);
+ entry &= ~_PAGE_NX;
+ set_pmd(pmd, __pmd(entry));
+ }
+}
+
+static void
+phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end)
+{
+ int i = pud_index(addr);
+
+ for (; i < PTRS_PER_PUD && addr < end; i++, addr += PUD_SIZE ) {
+ pud_t *pud = pud_page + pud_index(addr);
+ pmd_t *pmd;
+
+ if (pud_val(*pud)) {
+ pmd = pmd_offset(pud,0);
+ phys_pmd_init(pmd, addr, end);
+ }
+ }
+}
+
+static void change_rt_pmd(unsigned long start, unsigned long end)
+{
+ unsigned long next;
+
+ start = (unsigned long)__va(start);
+ end = (unsigned long)__va(end);
+
+ for (; start < end; start = next) {
+ pgd_t *pgd = pgd_offset_k(start);
+ pud_t *pud;
+
+ pud = pud_offset(pgd, start & PGDIR_MASK);
+ next = start + PGDIR_SIZE;
+ if (next > end)
+ next = end;
+ phys_pud_init(pud, __pa(start), __pa(next));
+ }
+ __flush_tlb_all();
+}
+/*
+ * We need to map the EFI memory map again after paging_init().
+ */
+void __init efi_map_memmap(void)
+{
+ efi_memory_desc_t *md;
+ void *p;
+
+ memmap.map = __va((unsigned long) memmap.phys_map);
+ memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size);
+
+ /* Make EFI runtime code executable */
+ for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ md = p;
+ if (md->type == EFI_RUNTIME_SERVICES_CODE &&
+ (__supported_pte_mask & _PAGE_NX))
+ change_rt_pmd(md->phys_addr, md->phys_addr +
+ (md->num_pages << EFI_PAGE_SHIFT));
+ }
+}
+
+#if EFI_DEBUG
+void __init print_efi_memmap(void)
+{
+ efi_memory_desc_t *md;
+ void *p;
+ int i;
+
+ for (p = memmap.map, i = 0; p < memmap.map_end; p += memmap.desc_size, i++) {
+ md = p;
+ early_printk("mem%02u: type=%u, attr=0x%lx, "
+ "range=[0x%016lx-0x%016lx) (%luMB)\n",
+ i, md->type, md->attribute, md->phys_addr,
+ md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT),
+ (md->num_pages >> (20 - EFI_PAGE_SHIFT)));
+ }
+}
+#endif /* EFI_DEBUG */
+
+void __init efi_init(void)
+{
+ efi_config_table_t *config_tables;
+ efi_runtime_services_t *runtime;
+ efi_char16_t *c16;
+ char vendor[100] = "unknown";
+ int i = 0;
+
+ memset(&efi, 0, sizeof(efi) );
+ memset(&efi_phys, 0, sizeof(efi_phys));
+
+ efi_phys.systab = (efi_system_table_t *)EFI_SYSTAB;
+ memmap.phys_map = (void*)EFI_MEMMAP;
+ memmap.nr_map = EFI_MEMMAP_SIZE/EFI_MEMDESC_SIZE;
+ memmap.desc_version = EFI_MEMDESC_VERSION;
+ memmap.desc_size = EFI_MEMDESC_SIZE;
+
+ efi.systab = (efi_system_table_t *) early_ioremap(
+ (unsigned long)efi_phys.systab,
+ sizeof(efi_system_table_t));
+ memcpy(&efi_systab, efi.systab, sizeof(efi_system_table_t));
+ efi.systab = &efi_systab;
+ /*
+ * Verify the EFI Table
+ */
+ if (efi.systab->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
+ printk(KERN_ERR PFX "Woah! EFI system table signature incorrect\n");
+ if ((efi.systab->hdr.revision ^ EFI_SYSTEM_TABLE_REVISION) >> 16 != 0)
+ printk(KERN_ERR PFX
+ "Warning: EFI system table major version mismatch: "
+ "got %d.%02d, expected %d.%02d\n",
+ efi.systab->hdr.revision >> 16,
+ efi.systab->hdr.revision & 0xffff,
+ EFI_SYSTEM_TABLE_REVISION >> 16,
+ EFI_SYSTEM_TABLE_REVISION & 0xffff);
+ /*
+ * Grab some details from the system table
+ */
+ config_tables = (efi_config_table_t *)efi.systab->tables;
+ runtime = efi.systab->runtime;
+
+ /*
+ * Show what we know for posterity
+ */
+ c16 = (efi_char16_t *) early_ioremap(efi.systab->fw_vendor, 2);
+ if (c16) {
+ for (i = 0; i < sizeof(vendor) && *c16; ++i)
+ vendor[i] = *c16++;
+ vendor[i] = '\0';
+ } else
+ printk(KERN_ERR PFX "Could not map the firmware vendor!\n");
+
+ printk(KERN_INFO PFX "EFI v%u.%.02u by %s \n",
+ efi.systab->hdr.revision >> 16,
+ efi.systab->hdr.revision & 0xffff, vendor);
+
+ /*
+ * Let's see what config tables the firmware passed to us.
+ */
+ config_tables = (efi_config_table_t *)early_ioremap( efi.systab->tables,
+ efi.systab->nr_tables * sizeof(efi_config_table_t));
+ if (config_tables == NULL)
+ printk(KERN_ERR PFX "Could not map EFI Configuration Table!\n");
+
+ for (i = 0; i < efi.systab->nr_tables; i++) {
+ if (efi_guidcmp(config_tables[i].guid, MPS_TABLE_GUID) == 0) {
+ efi.mps = config_tables[i].table;
+ printk(KERN_INFO " MPS=0x%lx ", config_tables[i].table);
+ } else
+ if (efi_guidcmp(config_tables[i].guid, ACPI_20_TABLE_GUID) == 0) {
+ efi.acpi20 = config_tables[i].table;
+ printk(KERN_INFO " ACPI 2.0=0x%lx ", config_tables[i].table);
+ } else
+ if (efi_guidcmp(config_tables[i].guid, ACPI_TABLE_GUID) == 0) {
+ efi.acpi = config_tables[i].table;
+ printk(KERN_INFO " ACPI=0x%lx ", config_tables[i].table);
+ } else
+ if (efi_guidcmp(config_tables[i].guid, SMBIOS_TABLE_GUID) == 0) {
+ efi.smbios = config_tables[i].table;
+ printk(KERN_INFO " SMBIOS=0x%lx ", config_tables[i].table);
+ } else
+ if (efi_guidcmp(config_tables[i].guid, HCDP_TABLE_GUID) == 0) {
+ efi.hcdp = config_tables[i].table;
+ printk(KERN_INFO " HCDP=0x%lx ", config_tables[i].table);
+ } else
+ if (efi_guidcmp(config_tables[i].guid, UGA_IO_PROTOCOL_GUID) == 0) {
+ efi.uga = config_tables[i].table;
+ printk(KERN_INFO " UGA=0x%lx ", config_tables[i].table);
+ }
+ }
+ printk(KERN_INFO "\n");
+
+ /*
+ * Check out the runtime services table. We need to map
+ * the runtime services table so that we can grab the physical
+ * address of several of the EFI runtime functions, needed to
+ * set the firmware into virtual mode.
+ */
+ runtime = (efi_runtime_services_t *) early_ioremap((unsigned long)
+ efi.systab->runtime,
+ sizeof(efi_runtime_services_t));
+ if (runtime != NULL) {
+ /*
+ * We will only need *early* access to the following
+ * two EFI runtime services before set_virtual_address_map
+ * is invoked.
+ */
+ efi_phys.get_time = (efi_get_time_t *) runtime->get_time;
+ efi_phys.set_virtual_address_map =
+ (efi_set_virtual_address_map_t *)runtime->set_virtual_address_map;
+ } else
+ printk(KERN_ERR PFX "Could not map the runtime service table!\n");
+
+ /* Map the EFI memory map for use until paging_init() */
+ memmap.map = (efi_memory_desc_t *) early_ioremap(
+ (unsigned long) EFI_MEMMAP,
+ EFI_MEMMAP_SIZE);
+ if (memmap.map == NULL)
+ printk(KERN_ERR PFX "Could not map the EFI memory map!\n");
+ if (EFI_MEMDESC_SIZE != sizeof(efi_memory_desc_t)) {
+ printk(KERN_WARNING PFX "Kernel-defined memdesc"
+ "doesn't match the one from EFI!\n");
+ }
+ memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size);
+#if EFI_DEBUG
+ print_efi_memmap();
+#endif
+}
+
+/*
+ * This function will switch the EFI runtime services to virtual mode.
+ * Essentially, look through the EFI memmap and map every region that
+ * has the runtime attribute bit set in its memory descriptor and update
+ * that memory descriptor with the virtual address obtained from ioremap().
+ * This enables the runtime services to be called without having to
+ * thunk back into physical mode for every invocation.
+ */
+void __init efi_enter_virtual_mode(void)
+{
+ efi_memory_desc_t *md;
+ efi_status_t status;
+ unsigned long end;
+ void *p;
+
+ efi.systab = NULL;
+ for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ md = p;
+ if (!(md->attribute & EFI_MEMORY_RUNTIME))
+ continue;
+ if (md->attribute & EFI_MEMORY_WB)
+ md->virt_addr = (unsigned long)__va(md->phys_addr);
+ else if (md->attribute & (EFI_MEMORY_UC | EFI_MEMORY_WC))
+ md->virt_addr = (unsigned long)ioremap(md->phys_addr,
+ md->num_pages << EFI_PAGE_SHIFT);
+ end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
+ if ((md->phys_addr <= (unsigned long)efi_phys.systab) &&
+ ((unsigned long)efi_phys.systab < end))
+ efi.systab = (efi_system_table_t *)
+ (md->virt_addr - md->phys_addr +
+ (unsigned long)efi_phys.systab);
+ }
+
+ if (!efi.systab)
+ BUG();
+
+ status = phys_efi_set_virtual_address_map(
+ memmap.desc_size * memmap.nr_map,
+ memmap.desc_size,
+ memmap.desc_version,
+ memmap.phys_map);
+
+ if (status != EFI_SUCCESS) {
+ printk (KERN_ALERT "You are screwed! "
+ "Unable to switch EFI into virtual mode "
+ "(status=%lx)\n", status);
+ panic("EFI call to SetVirtualAddressMap() failed!");
+ }
+
+ /*
+ * Now that EFI is in virtual mode, update the function
+ * pointers in the runtime service table to the new virtual addresses.
+ *
+ * Since x86_64 EFI follows MS calling convention, we can not call
+ * the services directly. We put a wrapper around the real service
+ * calls and call the wrapper directly.
+ */
+
+ efi.get_time = (efi_get_time_t *)_efi_get_time;
+ efi.set_time = (efi_set_time_t *)_efi_set_time;
+ efi.get_wakeup_time = (efi_get_wakeup_time_t *)_efi_get_wakeup_time;
+ efi.set_wakeup_time = (efi_set_wakeup_time_t *)_efi_set_wakeup_time;
+ efi.get_variable = (efi_get_variable_t *)_efi_get_variable;
+ efi.get_next_variable = (efi_get_next_variable_t *)_efi_get_next_variable;
+ efi.set_variable = (efi_set_variable_t *)_efi_set_variable;
+ efi.get_next_high_mono_count = (efi_get_next_high_mono_count_t *)
+ _efi_get_next_high_mono_count;
+ efi.reset_system = (efi_reset_system_t *)_efi_reset_system;
+ efi.set_virtual_address_map = (efi_set_virtual_address_map_t *)
+ _efi_set_virtual_address_map;
+}
+
+void __init
+efi_initialize_iomem_resources(struct resource *code_resource,
+ struct resource *data_resource)
+{
+ struct resource *res;
+ efi_memory_desc_t *md;
+ void *p;
+
+ for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ md = p;
+
+ if ((md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT)) >
+ 0x100000000ULL)
+ continue;
+ res = alloc_bootmem_low(sizeof(struct resource));
+ switch (md->type) {
+ case EFI_RESERVED_TYPE:
+ res->name = "Reserved Memory";
+ break;
+ case EFI_LOADER_CODE:
+ res->name = "Loader Code";
+ break;
+ case EFI_LOADER_DATA:
+ res->name = "Loader Data";
+ break;
+ case EFI_BOOT_SERVICES_DATA:
+ res->name = "BootServices Data";
+ break;
+ case EFI_BOOT_SERVICES_CODE:
+ res->name = "BootServices Code";
+ break;
+ case EFI_RUNTIME_SERVICES_CODE:
+ res->name = "Runtime Service Code";
+ break;
+ case EFI_RUNTIME_SERVICES_DATA:
+ res->name = "Runtime Service Data";
+ break;
+ case EFI_CONVENTIONAL_MEMORY:
+ res->name = "Conventional Memory";
+ break;
+ case EFI_UNUSABLE_MEMORY:
+ res->name = "Unusable Memory";
+ break;
+ case EFI_ACPI_RECLAIM_MEMORY:
+ res->name = "ACPI Reclaim";
+ break;
+ case EFI_ACPI_MEMORY_NVS:
+ res->name = "ACPI NVS";
+ break;
+ case EFI_MEMORY_MAPPED_IO:
+ res->name = "Memory Mapped IO";
+ break;
+ case EFI_MEMORY_MAPPED_IO_PORT_SPACE:
+ res->name = "Memory Mapped IO Port Space";
+ break;
+ default:
+ res->name = "Reserved";
+ break;
+ }
+ res->start = md->phys_addr;
+ res->end = res->start + ((md->num_pages << EFI_PAGE_SHIFT) - 1);
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ if (request_resource(&iomem_resource, res) < 0)
+ printk(KERN_ERR PFX "Failed to allocate res %s : 0x%llx-0x%llx\n",
+ res->name, res->start, res->end);
+ /*
+ * We don't know which region contains kernel data so we try
+ * it repeatedly and let the resource manager test it.
+ */
+ if (md->type == EFI_CONVENTIONAL_MEMORY) {
+ request_resource(res, code_resource);
+ request_resource(res, data_resource);
+ }
+ }
+}
+
+/*
+ * Convenience functions to obtain memory types and attributes
+ */
+u32 efi_mem_type(unsigned long phys_addr)
+{
+ efi_memory_desc_t *md;
+ void *p;
+
+ for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ md = p;
+ if ((md->phys_addr <= phys_addr) && (phys_addr <
+ (md->phys_addr + (md-> num_pages << EFI_PAGE_SHIFT)) ))
+ return md->type;
+ }
+ return 0;
+}
+
+u64 efi_mem_attributes(unsigned long phys_addr)
+{
+ efi_memory_desc_t *md;
+ void *p;
+
+ for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ md = p;
+ if ((md->phys_addr <= phys_addr) && (phys_addr <
+ (md->phys_addr + (md-> num_pages << EFI_PAGE_SHIFT))))
+ return md->attribute;
+ }
+ return 0;
+}
+
+__init void reserve_efi_runtime(void)
+{
+ efi_memory_desc_t *md;
+ unsigned long start, end;
+ void *p;
+
+ for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+ md = p;
+ start = md->phys_addr;
+ end = start + (md->num_pages << EFI_PAGE_SHIFT);
+ if (is_available_memory(md))
+ continue;
+ reserve_bootmem_generic(start, md->num_pages << EFI_PAGE_SHIFT);
+ }
+}
+
diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi_stub.S linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi_stub.S
--- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi_stub.S 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi_stub.S 2007-04-19 13:01:02.000000000 -0700
@@ -0,0 +1,101 @@
+/*
+ * EFI call stub for x86_64.
+ *
+ * This stub allows us to make EFI calls in physical mode with interrupts
+ * turned off.
+ *
+ * Copyright (C) 2006 Fenghua Yu <[email protected]>
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/page.h>
+
+/*
+ * efi_call_phys(void *,unsigned long, ...)
+ * is a function with variable parameters.
+ * argument 0: funtion pointer to EFI runtime service funtion.
+ * argument 1: number of arguments passed to EFI runtme service funtion.
+ * argument 2,...: arguments passed to EFI runtime service funtion.
+ * All the callers of this function assure that all the parameters are 8-bytes.
+ * Currently there are only two runtime services are called in kernel:
+ * get_time - passes 2 arguments
+ * set_virtual_address_map - passes 4 arguments.
+ * Conversion from SysV calling convention to UEFI calling convention
+ * is needed to call x86_64 EFI:
+ * For 2 arguments: rdx->rcx, r10->rdx
+ * For 4 arguments: rdx->rcx, r10->rdx, r8->r8, r9->r9
+ *
+ * The current efi_call_phys() only considers these two cases to simplify
+ * situation. In the future, if other runtime services are called and other
+ * number of arguments than 2 and 4 are passed, this code may be changed
+ * correspondingly.
+ *
+ */
+
+/*
+ * In gcc calling convention, EBX, ESP, EBP, ESI and EDI are all callee save.
+ * So we'd better save all of them at the beginning of this function and restore
+ * at the end no matter how many we use, because we can not assure EFI runtime
+ * service functions will comply with gcc calling convention, too.
+ */
+
+.text
+ENTRY(efi_call_phys)
+ /*
+ * 0. The function can only be called in Linux kernel. So CS has been
+ * set to 0x0010, DS and SS have been set to 0x0018. In EFI, I found
+ * the values of these registers are the same. And, the corresponding
+ * GDT entries are identical. So I will do nothing about segment reg
+ * and GDT, but change GDT base register in prelog and epilog.
+ */
+
+ /*
+ * 1. Arguments conversion.
+ */
+ mov %rcx, %r12
+ mov %rdx, %rcx
+ mov %r12, %rdx
+
+ /*
+ * 2. Adjust stack pointer.
+ */
+ /* Reserve stack space for arguments. Save the register stack size
+ * into efi_reg_stack_size which will be used after callee.
+ */
+ mov %rsi, %rax
+ mov $8, %bl
+ mul %bl
+ mov %rax, efi_reg_stack_size
+ sub %rax, %rsp
+ /* Make physical address */
+ mov $__START_KERNEL_map, %rsi
+ sub %rsi, %rsp
+
+ /* set up return address from EFI callee. */
+ mov $1f, %r12
+ push %r12
+
+ /*
+ * 3. Call the physical function.
+ */
+ xor %rax, %rax
+ jmp *%rdi
+
+1:
+ /*
+ * 4. After EFI runtime service returns, control will return to
+ * following instruction. We'd better readjust stack pointer first.
+ */
+ /* Restore stack space for arguments */
+ add efi_reg_stack_size, %rsp
+ /* Make virtual address */
+ mov $__START_KERNEL_map, %rsi
+ add %rsi, %rsp
+
+ ret
+.previous
+
+.data
+efi_reg_stack_size:
+ .long 0
diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S
--- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S 2007-04-19 12:39:39.000000000 -0700
+++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S 2007-04-19 13:01:02.000000000 -0700
@@ -94,12 +94,29 @@ startup_32:
* EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
* the new gdt/idt that has __KERNEL_CS with CS.L = 1.
*/
- ljmp $__KERNEL_CS, $(startup_64 - __START_KERNEL_map)
+ ljmp $__KERNEL_CS, $(long64 - __START_KERNEL_map)

.code64
.org 0x100
.globl startup_64
startup_64:
+ /*
+ * At this point the CPU runs in long64 bit with
+ * paging disabled. First of all, need to load new DS, GDT, and CS.
+ * There is no stack until we set one up.
+ */
+
+ /* Initialize the %ds segment register */
+ mov $__KERNEL_DS,%eax
+ mov %eax,%ds
+
+ /* Load new GDT and CS. At this point, BP is in physical mode. */
+ lgdt cpu_gdt_descr_phys-__START_KERNEL_map
+
+ mov $(ljumpvector64 - __START_KERNEL_map), %rax
+ ljmp *(%rax)
+long64:
+
/* We come here either from startup_32
* or directly from a 64bit bootloader.
* Since we may have come directly from a bootloader we
@@ -357,6 +374,16 @@ gdt:
.endr
#endif

+ .align 16
+ .globl cpu_gdt_descr_phys
+cpu_gdt_descr_phys:
+ .word gdt_end-cpu_gdt_table-1
+gdt_phys:
+ .quad cpu_gdt_table-__START_KERNEL_map
+ljumpvector64:
+ .long long64-__START_KERNEL_map
+ .word __KERNEL_CS
+
/* We need valid kernel segments for data and code in long mode too
* IRET will check the segment types kkeil 2000/10/28
* Also sysret mandates a special GDT layout
diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/Makefile linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/Makefile
--- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/Makefile 2007-04-19 12:39:39.000000000 -0700
+++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/Makefile 2007-04-19 13:01:02.000000000 -0700
@@ -37,6 +37,7 @@ obj-$(CONFIG_X86_PM_TIMER) += pmtimer.o
obj-$(CONFIG_X86_VSMP) += vsmp.o
obj-$(CONFIG_K8_NB) += k8.o
obj-$(CONFIG_AUDIT) += audit.o
+obj-$(CONFIG_EFI) += efi.o efi_stub.o

obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_PCI) += early-quirks.o

--


2007-05-03 02:53:06

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

On Tue, 01 May 2007 11:59:46 -0700 Chandramouli Narayanan wrote:

> EFI x86_64 build option is added to the kernel configuration.


Hi Mouli,

Can you share EFI code as much as possible among ia64, i386,
and x86_64 instead of duplicating it?


A diffstat patch summary would be Good.
(see Documentation/SubmittingPatches)


> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/Kconfig linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/Kconfig
> --- linux-2.6.21rc7-git2-orig/arch/x86_64/Kconfig 2007-04-19 12:39:39.000000000 -0700
> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/Kconfig 2007-04-19 13:01:02.000000000 -0700
> @@ -254,6 +254,20 @@ config X86_HT
> depends on SMP && !MK8
> default y
>
> +config EFI
> + bool "Boot from EFI support (EXPERIMENTAL)"
> + default n
> + ---help---
> +

No blank line above.
Indent following lines by 2 spaces: i.e., <tab><space><space>
as in Documentation/CodingStyle.

> + This enables the the kernel to boot on EFI platforms using
> + system configuration information passed to it from the firmware.
> + This also enables the kernel to use any EFI runtime services that are
> + available (such as the EFI variable services).
> + This option is only useful on systems that have EFI firmware
> + and will result in a kernel image that is ~8k larger. However,
> + even with this option, the resultant kernel should continue to
> + boot on existing non-EFI platforms.
> +
> config MATH_EMULATION
> bool


> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/include/asm-x86_64/bootsetup.h linux-2.6.21rc7-git2-uefi-finaltest/include/asm-x86_64/bootsetup.h
> --- linux-2.6.21rc7-git2-orig/include/asm-x86_64/bootsetup.h 2007-04-19 12:39:40.000000000 -0700
> +++ linux-2.6.21rc7-git2-uefi-finaltest/include/asm-x86_64/bootsetup.h 2007-04-19 13:01:02.000000000 -0700
> @@ -17,6 +17,12 @@ extern char x86_boot_params[BOOT_PARAM_S
> #define APM_BIOS_INFO (*(struct apm_bios_info *) (PARAM+0x40))
> #define DRIVE_INFO (*(struct drive_info_struct *) (PARAM+0x80))
> #define SYS_DESC_TABLE (*(struct sys_desc_table_struct*)(PARAM+0xa0))
> +#define EFI_SYSTAB (*((unsigned long *)(PARAM+0x1b8)))
> +#define EFI_LOADER_SIG ((unsigned char *)(PARAM+0x1c0))
> +#define EFI_MEMDESC_SIZE (*((unsigned int *) (PARAM+0x1c4)))
> +#define EFI_MEMDESC_VERSION (*((unsigned int *) (PARAM+0x1c8)))
> +#define EFI_MEMMAP_SIZE (*((unsigned int *) (PARAM+0x1cc)))
> +#define EFI_MEMMAP (*((unsigned long *)(PARAM+0x1d0)))
> #define MOUNT_ROOT_RDONLY (*(unsigned short *) (PARAM+0x1F2))
> #define RAMDISK_FLAGS (*(unsigned short *) (PARAM+0x1F8))
> #define SAVED_VIDEO_MODE (*(unsigned short *) (PARAM+0x1FA))
> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi.c linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi.c
> --- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi.c 1969-12-31 16:00:00.000000000 -0800
> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi.c 2007-04-19 13:01:02.000000000 -0700
> @@ -0,0 +1,824 @@

> +extern unsigned long efi_call_phys(void *fp, u64 arg_num, ...);
> +struct efi efi;
> +EXPORT_SYMBOL(efi);
> +struct efi efi_phys __initdata;
> +struct efi_memory_map memmap ;

no space before ;

> +static efi_system_table_t efi_systab __initdata;
> +
> +static unsigned long efi_rt_eflags;
> +static spinlock_t efi_rt_lock = SPIN_LOCK_UNLOCKED;
> +static pgd_t save_pgd;
> +
> +/* Convert SysV calling convention to EFI x86_64 calling convention */
> +
> +static efi_status_t uefi_call_wrapper(void *fp, unsigned long va_num, ...)
> +{
> + va_list ap;
> + int i;
> + unsigned long args[EFI_ARG_NUM_MAX];
> + unsigned int arg_size,stack_adjust_size;

space after comma.

> + efi_status_t status;
> +
> + if (va_num > EFI_ARG_NUM_MAX || va_num<0) {

va_num < 0) {

> + return EFI_LOAD_ERROR;
> + }
> + if (va_num==0)

if (va_num == 0)

> + /* There is no need to convert arguments for void argument. */
> + __asm__ __volatile__("call *%0;ret;"::"r"(fp));
> +
> + /* The EFI arguments is stored in an array. Then later on it will be
> + * pushed into stack or passed to registers according to MS ABI.

passed _to_ registers? passed via or thru registers?

> + */
> + va_start(ap, va_num);
> + for (i = 0; i < va_num; i++) {
> + args[i] = va_arg(ap, unsigned long);
> + }
> + va_end(ap);
> + arg_size = va_num*8;

arg_size = va_num * 8;

> + stack_adjust_size = (va_num > EFI_REG_ARG_NUM? EFI_REG_ARG_NUM : va_num)*8;

Please re-read Documentation/CodingStyle.
> +
> + /* Starting from here, assembly code makes sure all registers used are
> + * under controlled by our code itself instead of by gcc.
> + */
> + /* Start converting SysV calling convention to MS calling convention. */
> + __asm__ __volatile__(
> + /* 0. Save preserved registers. EFI call may clobbered them. */
> + " pushq %%rbp;pushq %%rbx;pushq %%r12;"
> + " pushq %%r13;pushq %%r14;pushq %%r15;"
> + /* 1. Push arguments passed by stack into stack. */
> + " mov %1, %%r12;"
> + " mov %3, %%r13;"
> + " mov %1, %%rax;"
> + " dec %%rax;"
> + " mov $8, %%bl;"
> + " mul %%bl;"
> + " add %%rax, %%r13;"
> + "lstack:"
> + " cmp $4, %%r12;"
> + " jle lregister;"
> + " pushq (%%r13);"
> + " sub $8, %%r13;"
> + " dec %%r12;"
> + " jmp lstack;"
> + /* 2. Move arguments passed by registers into registers.
> + * rdi->rcx, rsi->rdx, rdx->r8, rcx->r9.
> + */
> + "lregister:"
> + " mov %3, %%r14;"
> + " mov $0, %%r12;"
> + "lloadregister:"
> + " cmp %1, %%r12;"
> + " jge lcall;"
> + " mov (%%r14), %%rcx;"
> + " inc %%r12;"
> + " cmp %1, %%r12;"
> + " jge lcall;"
> + " mov 8(%%r14), %%rdx;"
> + " inc %%r12;"
> + " cmp %1, %%r12;"
> + " jge lcall;"
> + " mov 0x10(%%r14), %%r8;"
> + " inc %%r12;"
> + " cmp %1, %%r12;"
> + " jge lcall;"
> + " mov 0x18(%%r14), %%r9;"
> + /* 3. Save stack space for those register arguments. */
> + "lcall: "
> + " sub %2, %%rsp;"
> + /* 4. Save arg_size to r12 which is preserved in EFI call. */
> + " mov %4, %%r12;"
> + /* 5. Call EFI function. */
> + " call *%5;"
> + " mov %%rax, %0;"
> + /* 6. Restore stack space reserved for those register
> + * arguments.
> + */
> + " add %%r12, %%rsp;"
> + /* 7. Restore preserved registers. */
> + " popq %%r15;popq %%r14;popq %%r13;"
> + " popq %%r12;popq %%rbx;popq %%rbp;"
> + : "=r"(status)
> + :"r"((unsigned long)va_num),
> + "r"((unsigned long)stack_adjust_size),
> + "r"(args),
> + "r"((unsigned long)arg_size),
> + "r"(fp)
> + :"rsp","rbx","rax","r11","r12","r13","r14","rcx","rdx","r8","r9"
> + );
> + return status;
> +}
> +
> +static efi_status_t
> +_efi_get_wakeup_time(efi_bool_t *enabled, efi_bool_t *pending,
> + efi_time_t *tm)

Move 'static efi_status_t' to same line as function name.
Move formal parameters down as needed.

> +{
> + return uefi_call_wrapper((void*)efi.systab->runtime->get_wakeup_time,
> + EFI_ARG_NUM_GET_WAKEUP_TIME,
> + (u64)enabled,
> + (u64)pending,
> + (u64)tm);
> +}
> +
> +static efi_status_t
> +_efi_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 *attr,
> + unsigned long *data_size, void *data)

ditto

> +{
> + return uefi_call_wrapper((void*)efi.systab->runtime->get_variable,
> + EFI_ARG_NUM_GET_VARIABLE,
> + (u64)name,
> + (u64)vendor,
> + (u64)attr,
> + (u64)data_size,
> + (u64)data);
> +}
> +
> +static efi_status_t
> +_efi_get_next_variable(unsigned long *name_size, efi_char16_t *name,
> + efi_guid_t *vendor)

ditto

> +{
> + return uefi_call_wrapper((void*)efi.systab->runtime->get_next_variable,
> + EFI_ARG_NUM_GET_NEXT_VARIABLE,
> + (u64)name_size,
> + (u64)name,
> + (u64)vendor);
> +}
> +
> +static efi_status_t
> +phys_efi_set_virtual_address_map(unsigned long memory_map_size,
> + unsigned long descriptor_size,
> + u32 descriptor_version,
> + efi_memory_desc_t *virtual_map)

ditto

> +{
> + efi_status_t status;
> +
> + efi_call_phys_prelog();
> + status = efi_call_phys(efi_phys.set_virtual_address_map,
> + EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP,
> + (unsigned long)memory_map_size,
> + (unsigned long)descriptor_size,
> + (unsigned long)descriptor_version,
> + (unsigned long)virtual_map);
> + efi_call_phys_epilog();
> + return status;
> +}
> +
> +efi_status_t
> +phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)

ditto

> +{
> +
> + efi_status_t status;
> +
> + efi_call_phys_prelog();
> + status = efi_call_phys(efi_phys.get_time,
> + EFI_ARG_NUM_GET_TIME,
> + (unsigned long)tm,
> + (unsigned long)tc);
> + efi_call_phys_epilog();
> + return status;
> +}
> +
> +inline int efi_set_rtc_mmss(unsigned long nowtime)
> +{
> + int real_seconds, real_minutes;
> + efi_status_t status;
> + efi_time_t eft;
> + efi_time_cap_t cap;
> +
> + spin_lock(&efi_rt_lock);
> + status = efi.get_time(&eft, &cap);
> + spin_unlock(&efi_rt_lock);
> + if (status != EFI_SUCCESS) {
> + printk("Ooops: efitime: can't read time!\n");

Creative spelling of Ooops. :)
(also below)

> + return -1;
> + }
> +
> +}
> +/*
> + * This should only be used during kernel init and before runtime
> + * services have been remapped, therefore, we'll need to call in physical
> + * mode. Note, this call isn't used later, so mark it __init.
> + */

Confusing comments being adjacent as they are...

> +/*
> + * This is used during kernel init before runtime
> + * services have been remapped and also during suspend, therefore,
> + * we'll need to call both in physical and virtual modes.
> + */
> +inline unsigned long efi_get_time(void)
> +{
> + efi_status_t status;
> + efi_time_t eft;
> + efi_time_cap_t cap;
> +
> + if (efi.get_time) {
> + /* if we are in virtual mode use remapped function */
> + status = efi.get_time(&eft, &cap);
> + } else {
> + /* we are in physical mode */
> + status = phys_efi_get_time(&eft, &cap);
> + }
> + if (status != EFI_SUCCESS)
> + printk("Oops: efitime: can't read time status: 0x%lx\n",status);
> +
> + return mktime(eft.year, eft.month, eft.day, eft.hour,
> + eft.minute, eft.second);
> +}
> +
> +
> +/* Make EFI runtime code executable */
> +static void
> +phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)

Fix 'static void'.

> +{
> + int i = pmd_index(address);
> +
> + for (; i < PTRS_PER_PMD && address < end; i++, address += PMD_SIZE) {
> + unsigned long entry;
> + pmd_t *pmd = pmd_page + pmd_index(address);
> +
> + entry = pmd_val(*pmd);
> + entry &= ~_PAGE_NX;
> + set_pmd(pmd, __pmd(entry));
> + }
> +}
> +
> +static void
> +phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end)

ditto

> +{
> + int i = pud_index(addr);
> +
> + for (; i < PTRS_PER_PUD && addr < end; i++, addr += PUD_SIZE ) {
> + pud_t *pud = pud_page + pud_index(addr);
> + pmd_t *pmd;
> +
> + if (pud_val(*pud)) {
> + pmd = pmd_offset(pud,0);
> + phys_pmd_init(pmd, addr, end);
> + }
> + }
> +}
> +
> +static void change_rt_pmd(unsigned long start, unsigned long end)
> +{
> + unsigned long next;
> +
> + start = (unsigned long)__va(start);
> + end = (unsigned long)__va(end);
> +
> + for (; start < end; start = next) {
> + pgd_t *pgd = pgd_offset_k(start);
> + pud_t *pud;
> +
> + pud = pud_offset(pgd, start & PGDIR_MASK);
> + next = start + PGDIR_SIZE;
> + if (next > end)
> + next = end;
> + phys_pud_init(pud, __pa(start), __pa(next));
> + }
> + __flush_tlb_all();
> +}
> +/*
> + * We need to map the EFI memory map again after paging_init().
> + */
> +void __init efi_map_memmap(void)
> +{
> + efi_memory_desc_t *md;
> + void *p;

Use tab(s) instead of spaces to indent.

> +
> + memmap.map = __va((unsigned long) memmap.phys_map);
> + memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size);
> +
> + /* Make EFI runtime code executable */
> + for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> + md = p;
> + if (md->type == EFI_RUNTIME_SERVICES_CODE &&

Use tabs instead of spaces to indent.

> + (__supported_pte_mask & _PAGE_NX))
> + change_rt_pmd(md->phys_addr, md->phys_addr +
> + (md->num_pages << EFI_PAGE_SHIFT));
> + }
> +}

> +/*
> + * This function will switch the EFI runtime services to virtual mode.
> + * Essentially, look through the EFI memmap and map every region that
> + * has the runtime attribute bit set in its memory descriptor and update
> + * that memory descriptor with the virtual address obtained from ioremap().
> + * This enables the runtime services to be called without having to
> + * thunk back into physical mode for every invocation.
> + */
> +void __init efi_enter_virtual_mode(void)
> +{
> + efi_memory_desc_t *md;
> + efi_status_t status;
> + unsigned long end;
> + void *p;
> +
> + efi.systab = NULL;
> + for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> + md = p;
> + if (!(md->attribute & EFI_MEMORY_RUNTIME))
> + continue;
> + if (md->attribute & EFI_MEMORY_WB)
> + md->virt_addr = (unsigned long)__va(md->phys_addr);
> + else if (md->attribute & (EFI_MEMORY_UC | EFI_MEMORY_WC))
> + md->virt_addr = (unsigned long)ioremap(md->phys_addr,
> + md->num_pages << EFI_PAGE_SHIFT);
> + end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
> + if ((md->phys_addr <= (unsigned long)efi_phys.systab) &&
> + ((unsigned long)efi_phys.systab < end))
> + efi.systab = (efi_system_table_t *)
> + (md->virt_addr - md->phys_addr +
> + (unsigned long)efi_phys.systab);
> + }
> +
> + if (!efi.systab)
> + BUG();
> +
> + status = phys_efi_set_virtual_address_map(
> + memmap.desc_size * memmap.nr_map,
> + memmap.desc_size,
> + memmap.desc_version,
> + memmap.phys_map);
> +
> + if (status != EFI_SUCCESS) {
> + printk (KERN_ALERT "You are screwed! "
> + "Unable to switch EFI into virtual mode "
> + "(status=%lx)\n", status);
> + panic("EFI call to SetVirtualAddressMap() failed!");
> + }
> +
> + /*
> + * Now that EFI is in virtual mode, update the function
> + * pointers in the runtime service table to the new virtual addresses.
> + *
> + * Since x86_64 EFI follows MS calling convention, we can not call

cannot

> + * the services directly. We put a wrapper around the real service
> + * calls and call the wrapper directly.
> + */
> +
> +}
> +
> +void __init

Wrong line.

> +efi_initialize_iomem_resources(struct resource *code_resource,
> + struct resource *data_resource)
> +{
> +}

> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi_stub.S linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi_stub.S
> --- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi_stub.S 1969-12-31 16:00:00.000000000 -0800
> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi_stub.S 2007-04-19 13:01:02.000000000 -0700
> @@ -0,0 +1,101 @@

> +/*
> + * In gcc calling convention, EBX, ESP, EBP, ESI and EDI are all callee save.
> + * So we'd better save all of them at the beginning of this function and restore
> + * at the end no matter how many we use, because we can not assure EFI runtime

cannot

> + * service functions will comply with gcc calling convention, too.
> + */

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-05-03 19:20:34

by chandramouli narayanan

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

Randy Dunlap wrote:
> Can you share EFI code as much as possible among ia64, i386,
> and x86_64 instead of duplicating it?
Hi Randy,

Based on the feedback from Andi and you, these are the areas:

1. conversion of EFI memory map to e820 map
2. Consolidation/sharing of efi code among the architectures.
3. Coding style violations/typos/clarity in comments.

First I apologize for the coding style violations where I should have
done better. I will fix these in the next patch.

I would like to address the above in stages with patch updates. This
will take me some time with the fixes and testing.

A while back, Edgar Hucek submitted a patch (probably 2.6.18 mm tree?)
for efi to e820 memory mapping for i386.I don't see this code in the
2.6.21 release which makes me wonder whether it was accepted and the
history behind it. Any pointers in this regard?

thanks
- mouli

2007-05-03 20:14:22

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

On Thu, 03 May 2007 12:20:30 -0700 chandramouli narayanan wrote:

> Randy Dunlap wrote:
> > Can you share EFI code as much as possible among ia64, i386,
> > and x86_64 instead of duplicating it?
> Hi Randy,
>
> Based on the feedback from Andi and you, these are the areas:
>
> 1. conversion of EFI memory map to e820 map
> 2. Consolidation/sharing of efi code among the architectures.
> 3. Coding style violations/typos/clarity in comments.
>
> First I apologize for the coding style violations where I should have
> done better. I will fix these in the next patch.
>
> I would like to address the above in stages with patch updates. This
> will take me some time with the fixes and testing.
>
> A while back, Edgar Hucek submitted a patch (probably 2.6.18 mm tree?)
> for efi to e820 memory mapping for i386.I don't see this code in the
> 2.6.21 release which makes me wonder whether it was accepted and the
> history behind it. Any pointers in this regard?

I guess that Edgar stopped pushing it. Sometimes persistence is
required.

Linus said that the patch "looks fine per se",
http://marc.info/?l=linux-kernel&m=115380564326171&w=2
but needs more testers.

Presumably you have done more testing. :)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-05-04 13:02:07

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

On Tuesday 01 May 2007 20:59:46 Chandramouli Narayanan wrote:
> General note on EFI x86_64 support
> ----------------------------------

More review. This code unfortunately has some problems.

First this seems to be quite different from what the 32bit EFI
support does (which i suppose is pre UEFI) Are there plans to sync this
up eventually?

Also your howto above should be somewhere in Documentation/

> - Create a VFAT partition on the disk
> - Copy the following to the VFAT partition:
> elilo bootloader with x86_64 support and elilo configuration file
> efi64 kernel image, initrd

What format is the kernel image?

> - Boot to EFI shell and invoke elilo choosing efi64 kernel image
> - On UEFI2.0 firmware systems, pass vga=fbcon for boot messages to appear
> on console.

They don't have a compat mode for vga anymore?

>
> 2. With CALGARY_IOMMU=y in the kernel configuration, the Calgary detection fails
> with the message "Calgary: Unable to locate Rio Grande Table in EBDA - bailing".
> However, the legacy kernel has no such error.

Getting that message when you don't have a IBM Summit system with Calgary is expected.
It would be more worrying why the old kernel didn't give it.

> +config EFI
> + bool "Boot from EFI support (EXPERIMENTAL)"
> + default n

The config should be only added after the feature works -- later in the series.

Drop the default n
> + ---help---
> +
> + This enables the the kernel to boot on EFI platforms using
> + system configuration information passed to it from the firmware.
> + This also enables the kernel to use any EFI runtime services that are
> + available (such as the EFI variable services).
> + This option is only useful on systems that have EFI firmware
> + and will result in a kernel image that is ~8k larger. However,
> + even with this option, the resultant kernel should continue to
> + boot on existing non-EFI platforms.

The description should probably have a reference to the Documentation describing
how to set this up.

Mention UEFI?

> +#define EFI_SYSTAB (*((unsigned long *)(PARAM+0x1b8)))
> +#define EFI_LOADER_SIG ((unsigned char *)(PARAM+0x1c0))
> +#define EFI_MEMDESC_SIZE (*((unsigned int *) (PARAM+0x1c4)))
> +#define EFI_MEMDESC_VERSION (*((unsigned int *) (PARAM+0x1c8)))
> +#define EFI_MEMMAP_SIZE (*((unsigned int *) (PARAM+0x1cc)))
> +#define EFI_MEMMAP (*((unsigned long *)(PARAM+0x1d0)))

This needs to be documented in Documentation/i386/zero-page.txt

But it might be already obsolete with the early conversion to e820 change?

> +#define EFI_ARG_NUM_GET_TIME 2
> +#define EFI_ARG_NUM_SET_TIME 1
> +#define EFI_ARG_NUM_GET_WAKEUP_TIME 3
> +#define EFI_ARG_NUM_SET_WAKEUP_TIME 2
> +#define EFI_ARG_NUM_GET_VARIABLE 5
> +#define EFI_ARG_NUM_GET_NEXT_VARIABLE 3
> +#define EFI_ARG_NUM_SET_VARIABLE 5
> +#define EFI_ARG_NUM_GET_NEXT_HIGH_MONO_COUNT 1
> +#define EFI_ARG_NUM_RESET_SYSTEM 4
> +#define EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP 4
> +
> +#define EFI_ARG_NUM_MAX 10
> +#define EFI_REG_ARG_NUM 4
> +
> +extern unsigned long efi_call_phys(void *fp, u64 arg_num, ...);
> +struct efi efi;
> +EXPORT_SYMBOL(efi);
> +struct efi efi_phys __initdata;
> +struct efi_memory_map memmap ;
> +static efi_system_table_t efi_systab __initdata;
> +
> +static unsigned long efi_rt_eflags;
> +static spinlock_t efi_rt_lock = SPIN_LOCK_UNLOCKED;

Each lock needs a comment what it protects and if there is a locking order.

> +static pgd_t save_pgd;

That looks dubious, more comments later.

> +
> +/* Convert SysV calling convention to EFI x86_64 calling convention */
> +
> +static efi_status_t uefi_call_wrapper(void *fp, unsigned long va_num, ...)
> +{

Any reason you can't do something simple like (untested)

/* rdi, rsi, rdx, rcx, r8, r9 -> rcx,rdx,r8,r9,stack,stack
arg1 function to call */

call_ms:
mov %rsi,%rcx
mov %rdx,%rdx
mov %rcx,%r8
mov %r8,%r9
push %r9
call *%rdi
pop %r9
ret

I assume none of the calls has more than 6 arguments.
ndiswrapper probably has already tested variants if you're lazy.
Then you also wouldn't need the defines for the number of arguments.

Also such code is better written in pure assembly; some versions off
gcc don't like clobbering of too many registers.

> +static efi_status_t _efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
> +{
> + return uefi_call_wrapper((void*)efi.systab->runtime->get_time,
> + EFI_ARG_NUM_GET_TIME,
> + (u64)tm,
> + (u64)tc

Are the casts really needed?


> +static void efi_call_phys_prelog(void)
> +{
> + unsigned long vaddress;
> +
> + spin_lock(&efi_rt_lock);
> + local_irq_save(efi_rt_eflags);
> +
> + vaddress = (unsigned long)__va(0x0UL);
> + pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
> + set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));

Who tells you the other CPUs don't use the current pgd? If it's only used in early
boot it should be __init. If not it's broken.

Most likely you need to create an own thread with special page tables.

> + local_flush_tlb();

That won't flush the global mappings.

> +}
> +
> +static void efi_call_phys_epilog(void)
> +{
> + /*
> + * After the lock is released, the original page table is restored.
> + */
> + set_pgd(pgd_offset_k(0x0UL), save_pgd);
> + local_flush_tlb();

Same

> + local_irq_restore(efi_rt_eflags);
> + spin_unlock(&efi_rt_lock);
> +}
> +
> +static efi_status_t
> +phys_efi_set_virtual_address_map(unsigned long memory_map_size,
> + unsigned long descriptor_size,
> + u32 descriptor_version,
> + efi_memory_desc_t *virtual_map)
> +{
> + efi_status_t status;
> +
> + efi_call_phys_prelog();
> + status = efi_call_phys(efi_phys.set_virtual_address_map,
> + EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP,
> + (unsigned long)memory_map_size,
> + (unsigned long)descriptor_size,
> + (unsigned long)descriptor_version,
> + (unsigned long)virtual_map);
> + efi_call_phys_epilog();
> + return status;
> +}
> +
> +efi_status_t
> +phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
> +{

Looks broken -- i think that is called later, so the pgds
can be messed up.

Ok there is suspend/resume -- if you're careful you might be able
to call this before the other CPUs are put online again. But that
is also current being changed to use frozen CPUs. You probably
need to coordinate with Rafael.

[i suspect your resume hang is somehow related to this]

> +inline int efi_set_rtc_mmss(unsigned long nowtime)

I think that can be called any time so definitely broken
regarding page tables.

> +inline unsigned long efi_get_time(void)

inline without static usually leads to wasted code because
the compiler has to generate an out of line copy.

I don't see why all these functions need to be inline anyways.
Best just drop that everywhere and let the compiler decide.

That would probably eliminate some of your 8k.

> + if (status != EFI_SUCCESS)
> + printk("Oops: efitime: can't read time status: 0x%lx\n",status);

This should have suitable KERN_* prefixes (missing in most printks)

> +/* Make EFI runtime code executable */

It would be better to integrate this into the standard page table setup
instead of cut'n'pasting so much code here.


> + * We need to map the EFI memory map again after paging_init().
> + */
> +void __init efi_map_memmap(void)
> +{
> + efi_memory_desc_t *md;
> + void *p;
> +
> + memmap.map = __va((unsigned long) memmap.phys_map);
> + memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size);

White space broken.


> +#if EFI_DEBUG
> +void __init print_efi_memmap(void)

The memory map should be probably always printed; it's very useful
for debugging. But if you integrate it with e820 that code can do it.

> + /*
> + * Show what we know for posterity
> + */
> + c16 = (efi_char16_t *) early_ioremap(efi.systab->fw_vendor, 2);
> + if (c16) {
> + for (i = 0; i < sizeof(vendor) && *c16; ++i)
> + vendor[i] = *c16++;

Probably safer to use probe_kernel_address() here and bail out if it's broken.


> + vendor[i] = '\0';
> + } else
> + printk(KERN_ERR PFX "Could not map the firmware vendor!\n");

EFI should be in the error string


> + if (status != EFI_SUCCESS) {
> + printk (KERN_ALERT "You are screwed! "
> + "Unable to switch EFI into virtual mode "
> + "(status=%lx)\n", status);
> + panic("EFI call to SetVirtualAddressMap() failed!");

Is the panic really needed?


> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S
> --- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S 2007-04-19 12:39:39.000000000 -0700
> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S 2007-04-19 13:01:02.000000000 -0700
> @@ -94,12 +94,29 @@ startup_32:
> * EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
> * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
> */
> - ljmp $__KERNEL_CS, $(startup_64 - __START_KERNEL_map)
> + ljmp $__KERNEL_CS, $(long64 - __START_KERNEL_map)


What is the head.S change good for?

-Andi

2007-05-25 22:46:37

by chandramouli narayanan

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

Andi Kleen wrote:
> On Tuesday 01 May 2007 20:59:46 Chandramouli Narayanan wrote:
>
>> General note on EFI x86_64 support
>> ----------------------------------
>>
>
> More review. This code unfortunately has some problems.
>
> First this seems to be quite different from what the 32bit EFI
> support does (which i suppose is pre UEFI) Are there plans to sync this
> up eventually?
>
Consolidation of the efi support across architectures needs to be done
at some point and this will be a bigger task. But right now my focus is
x86_64.
> Also your howto above should be somewhere in Documentation/
>
Will do.
>
>> - Create a VFAT partition on the disk
>> - Copy the following to the VFAT partition:
>> elilo bootloader with x86_64 support and elilo configuration file
>> efi64 kernel image, initrd
>>
>
> What format is the kernel image?
>
bzImage
>
>> - Boot to EFI shell and invoke elilo choosing efi64 kernel image
>> - On UEFI2.0 firmware systems, pass vga=fbcon for boot messages to appear
>> on console.
>>
>
> They don't have a compat mode for vga anymore?
>
I'm not sure what you mean by VGA compatibility mode. There is no
requirement in [U]EFI for VGA.
>
>> 2. With CALGARY_IOMMU=y in the kernel configuration, the Calgary detection fails
>> with the message "Calgary: Unable to locate Rio Grande Table in EBDA - bailing".
>> However, the legacy kernel has no such error.
>>
>
> Getting that message when you don't have a IBM Summit system with Calgary is expected.
> It would be more worrying why the old kernel didn't give it.
>
All right. I will double-check into the older kernel.
>
>> +config EFI
>> + bool "Boot from EFI support (EXPERIMENTAL)"
>> + default n
>>
>
> The config should be only added after the feature works -- later in the series.
>
> Drop the default n
>
To the extent I have tested, the feature works. Should this still be 'n'?
>
>> + ---help---
>> +
>> + This enables the the kernel to boot on EFI platforms using
>> + system configuration information passed to it from the firmware.
>> + This also enables the kernel to use any EFI runtime services that are
>> + available (such as the EFI variable services).
>> + This option is only useful on systems that have EFI firmware
>> + and will result in a kernel image that is ~8k larger. However,
>> + even with this option, the resultant kernel should continue to
>> + boot on existing non-EFI platforms.
>>
>
> The description should probably have a reference to the Documentation describing
> how to set this up.
>
> Mention UEFI?
>
Will add to the doc.
>
>> +#define EFI_SYSTAB (*((unsigned long *)(PARAM+0x1b8)))
>> +#define EFI_LOADER_SIG ((unsigned char *)(PARAM+0x1c0))
>> +#define EFI_MEMDESC_SIZE (*((unsigned int *) (PARAM+0x1c4)))
>> +#define EFI_MEMDESC_VERSION (*((unsigned int *) (PARAM+0x1c8)))
>> +#define EFI_MEMMAP_SIZE (*((unsigned int *) (PARAM+0x1cc)))
>> +#define EFI_MEMMAP (*((unsigned long *)(PARAM+0x1d0)))
>>
>
> This needs to be documented in Documentation/i386/zero-page.txt
>
> But it might be already obsolete with the early conversion to e820 change?
>
Not sure if this becomes obsolete with e820 change. I will look into it
and add to the doc.
>
>> +#define EFI_ARG_NUM_GET_TIME 2
>> +#define EFI_ARG_NUM_SET_TIME 1
>> +#define EFI_ARG_NUM_GET_WAKEUP_TIME 3
>> +#define EFI_ARG_NUM_SET_WAKEUP_TIME 2
>> +#define EFI_ARG_NUM_GET_VARIABLE 5
>> +#define EFI_ARG_NUM_GET_NEXT_VARIABLE 3
>> +#define EFI_ARG_NUM_SET_VARIABLE 5
>> +#define EFI_ARG_NUM_GET_NEXT_HIGH_MONO_COUNT 1
>> +#define EFI_ARG_NUM_RESET_SYSTEM 4
>> +#define EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP 4
>> +
>> +#define EFI_ARG_NUM_MAX 10
>> +#define EFI_REG_ARG_NUM 4
>> +
>> +extern unsigned long efi_call_phys(void *fp, u64 arg_num, ...);
>> +struct efi efi;
>> +EXPORT_SYMBOL(efi);
>> +struct efi efi_phys __initdata;
>> +struct efi_memory_map memmap ;
>> +static efi_system_table_t efi_systab __initdata;
>> +
>> +static unsigned long efi_rt_eflags;
>> +static spinlock_t efi_rt_lock = SPIN_LOCK_UNLOCKED;
>>
>
> Each lock needs a comment what it protects and if there is a locking order.
>
>
I will add the comments. Ditto for i386.
>> +static pgd_t save_pgd;
>>
>
> That looks dubious, more comments later.
>
>
>> +
>> +/* Convert SysV calling convention to EFI x86_64 calling convention */
>> +
>> +static efi_status_t uefi_call_wrapper(void *fp, unsigned long va_num, ...)
>> +{
>>
>
> Any reason you can't do something simple like (untested)
>
> /* rdi, rsi, rdx, rcx, r8, r9 -> rcx,rdx,r8,r9,stack,stack
> arg1 function to call */
>
> call_ms:
> mov %rsi,%rcx
> mov %rdx,%rdx
> mov %rcx,%r8
> mov %r8,%r9
> push %r9
> call *%rdi
> pop %r9
> ret
>
> I assume none of the calls has more than 6 arguments.
> ndiswrapper probably has already tested variants if you're lazy.
> Then you also wouldn't need the defines for the number of arguments.
>
> Also such code is better written in pure assembly; some versions off
> gcc don't like clobbering of too many registers.
>
This wrapper code was tested and added to elilo with x86_64 support.
There are some efi calls with > 6 args as used in elilo. So, the wrapper
code is more elaborate to deal with those cases. Although the efi calls
used here in the kernel do not exceed 6 args, for the sake of
generality, the same code is being used here. So, is there a better way
to handle more number of arguments? I tested with the LIN2WINxx macros
from ndiswrapper code ( http://ndiswrapper.sourceforge.net/joomla/) and
that works. The LIN2WINxx defines macros to call with a specified set of
arguments. For instance, an efi runtime call with one argument should be
called with LIN2WIN1() and so on. If this approach is acceptable for
inclusion, that is a possible candidate for replacement.
I had an issue with the code snippet you provided for an efi call with 5
args to get efi variables. I didn't investigate this further.
>
>> +static efi_status_t _efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
>> +{
>> + return uefi_call_wrapper((void*)efi.systab->runtime->get_time,
>> + EFI_ARG_NUM_GET_TIME,
>> + (u64)tm,
>> + (u64)tc
>>
>
> Are the casts really needed?
>
Not needed. I fixed it.
>
>> +static void efi_call_phys_prelog(void)
>> +{
>> + unsigned long vaddress;
>> +
>> + spin_lock(&efi_rt_lock);
>> + local_irq_save(efi_rt_eflags);
>> +
>> + vaddress = (unsigned long)__va(0x0UL);
>> + pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
>> + set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
>>
>
> Who tells you the other CPUs don't use the current pgd? If it's only used in early
> boot it should be __init. If not it's broken.
>
This code is called during early boot. This should be __init. Ditto for
i386 version. It too does not have __init.
> Most likely you need to create an own thread with special page tables.
>
>
>> + local_flush_tlb();
>>
>
> That won't flush the global mappings.
>
>
>> +}
>> +
>> +static void efi_call_phys_epilog(void)
>> +{
>> + /*
>> + * After the lock is released, the original page table is restored.
>> + */
>> + set_pgd(pgd_offset_k(0x0UL), save_pgd);
>> + local_flush_tlb();
>>
>
> Same
>
>
>> + local_irq_restore(efi_rt_eflags);
>> + spin_unlock(&efi_rt_lock);
>> +}
>> +
>> +static efi_status_t
>> +phys_efi_set_virtual_address_map(unsigned long memory_map_size,
>> + unsigned long descriptor_size,
>> + u32 descriptor_version,
>> + efi_memory_desc_t *virtual_map)
>> +{
>> + efi_status_t status;
>> +
>> + efi_call_phys_prelog();
>> + status = efi_call_phys(efi_phys.set_virtual_address_map,
>> + EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP,
>> + (unsigned long)memory_map_size,
>> + (unsigned long)descriptor_size,
>> + (unsigned long)descriptor_version,
>> + (unsigned long)virtual_map);
>> + efi_call_phys_epilog();
>> + return status;
>> +}
>> +
>> +efi_status_t
>> +phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
>> +{
>>
>
> Looks broken -- i think that is called later, so the pgds
> can be messed up.
>
I don't think so. For instance, when efi_get_time is called later, the
virtual mode is set up and it does not
cause the physical mode call.
> Ok there is suspend/resume -- if you're careful you might be able
> to call this before the other CPUs are put online again. But that
> is also current being changed to use frozen CPUs. You probably
> need to coordinate with Rafael.
>
> [i suspect your resume hang is somehow related to this]
>
>
I don't have a suspend/resume hang. The issue I documented was with
regard to the behavior of the desktop with just one of the git kernels
prior to 2.6.21 release. I tested the suspend/resume with the patch
applied to 2.6.21. I tested by directly writing the state to
/sys/power/state. However, powersave -U did not seem to do anything.
This seems to me a user-mode utility issue rather than the kernel support.
>> +inline int efi_set_rtc_mmss(unsigned long nowtime)
>>
>
> I think that can be called any time so definitely broken
> regarding page tables.
>
efi init code sets up efi.get_time to a virtual mode function that can
be called later. So, I don't think there is an issue here. Correct me if
I'm wrong.
>
>> +inline unsigned long efi_get_time(void)
>>
>
> inline without static usually leads to wasted code because
> the compiler has to generate an out of line copy.
>
> I don't see why all these functions need to be inline anyways.
> Best just drop that everywhere and let the compiler decide.
>
> That would probably eliminate some of your 8k.
>
Agreed. However, the function is declared extern in include/linux/efi.h.
So, it can not be both static and inline. Ditto for the i386 code too.
>
>> + if (status != EFI_SUCCESS)
>> + printk("Oops: efitime: can't read time status: 0x%lx\n",status);
>>
>
> This should have suitable KERN_* prefixes (missing in most printks)
>
>
Ok, I will fix this. Ditto for i386/efi code.
>> +/* Make EFI runtime code executable */
>>
>
> It would be better to integrate this into the standard page table setup
> instead of cut'n'pasting so much code here.
>
>
I've finally a version that is working with standard page table setup
code. By the way, I got rid of the duplicate code. Also, I have
integrated EFI memory map into e820 space.
>
>> + * We need to map the EFI memory map again after paging_init().
>> + */
>> +void __init efi_map_memmap(void)
>> +{
>> + efi_memory_desc_t *md;
>> + void *p;
>> +
>> + memmap.map = __va((unsigned long) memmap.phys_map);
>> + memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size);
>>
>
> White space broken.
>
Will fix.
>
>> +#if EFI_DEBUG
>> +void __init print_efi_memmap(void)
>>
>
> The memory map should be probably always printed; it's very useful
> for debugging. But if you integrate it with e820 that code can do it.
>
>
I integrated the EFI memory map into e820 and it takes care of printing
that. So, print_efi_memmap() can be gotten rid of.
>> + /*
>> + * Show what we know for posterity
>> + */
>> + c16 = (efi_char16_t *) early_ioremap(efi.systab->fw_vendor, 2);
>> + if (c16) {
>> + for (i = 0; i < sizeof(vendor) && *c16; ++i)
>> + vendor[i] = *c16++;
>>
>
> Probably safer to use probe_kernel_address() here and bail out if it's broken.
>
>
I will make the needed changes.
>
>> + vendor[i] = '\0';
>> + } else
>> + printk(KERN_ERR PFX "Could not map the firmware vendor!\n");
>>
>
> EFI should be in the error string
>
PFX is set to EFI and that should take care.
>
>
>> + if (status != EFI_SUCCESS) {
>> + printk (KERN_ALERT "You are screwed! "
>> + "Unable to switch EFI into virtual mode "
>> + "(status=%lx)\n", status);
>> + panic("EFI call to SetVirtualAddressMap() failed!");
>>
>
> Is the panic really needed?
>
Probably not needed. This is lifted from i386 efi init code.
>
>> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S
>> --- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S 2007-04-19 12:39:39.000000000 -0700
>> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S 2007-04-19 13:01:02.000000000 -0700
>> @@ -94,12 +94,29 @@ startup_32:
>> * EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
>> * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
>> */
>> - ljmp $__KERNEL_CS, $(startup_64 - __START_KERNEL_map)
>> + ljmp $__KERNEL_CS, $(long64 - __START_KERNEL_map)
>>
>
>
> What is the head.S change good for?
>
Sorry, this is remnant code that should _not_ have been part of the patch.
>
> -Andi
>
>

I'm testing next set of patches with fixes and post them for review.
thanks for feedback,
- mouli

2007-06-01 16:47:57

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

On 5/3/07, chandramouli narayanan <[email protected]> wrote:
> Randy Dunlap wrote:
> > Can you share EFI code as much as possible among ia64, i386,
> > and x86_64 instead of duplicating it?
> Hi Randy,
>
> Based on the feedback from Andi and you, these are the areas:
>
> 1. conversion of EFI memory map to e820 map

for the e820 map, is supposed to bootloader to converting, and pass
that via real_mode_data. right?

YH

2007-06-01 18:41:52

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support


Sorry for the late replay I missed this patch series going by the first time.
> +static efi_status_t
> +phys_efi_set_virtual_address_map(unsigned long memory_map_size,
> + unsigned long descriptor_size,
> + u32 descriptor_version,
> + efi_memory_desc_t *virtual_map)
> +{
> + efi_status_t status;
> +
> + efi_call_phys_prelog();
> + status = efi_call_phys(efi_phys.set_virtual_address_map,
> + EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP,
> + (unsigned long)memory_map_size,
> + (unsigned long)descriptor_size,
> + (unsigned long)descriptor_version,
> + (unsigned long)virtual_map);
> + efi_call_phys_epilog();
> + return status;
> +}

Please, Please kill this.

As far as I can tell virtual mode is incompatible with kexec.
It is unnecessary because none of the EFI calls are fast path.

Further I believe that using virtual addresses is likely to
make things more brittle.

So please drop the EFI virtual mode nonsense.

Thank you,
Eric

2007-06-01 18:45:25

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

Chandramouli Narayanan <[email protected]> writes:
>
> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff
> linux-2.6.21rc7-git2-orig/drivers/char/Kconfig
> linux-2.6.21rc7-git2-uefi-finaltest/drivers/char/Kconfig
> --- linux-2.6.21rc7-git2-orig/drivers/char/Kconfig 2007-04-19 12:39:39.000000000
> -0700
> +++ linux-2.6.21rc7-git2-uefi-finaltest/drivers/char/Kconfig 2007-04-19
> 13:01:02.000000000 -0700
> @@ -837,7 +837,7 @@ config GEN_RTC_X
>
> config EFI_RTC
> bool "EFI Real Time Clock Services"
> - depends on IA64
> + depends on IA64 || X86_64

Please remove this.

We have an architecturally defined hardware realtime clock
on x86_64. We don't need EFI to abstract it for us.

My condolences to hardware manufacturers that can't follow a
20 year old standard.

Eric