This is the initial attempt of implementing kexec_file_load() support
on arm64.[1]
Most of the code is based on kexec-tools (along with some kernel code
from x86 and from powerpc, which also came from kexec-tools).
This patch series enables us to
* load the kernel, either Image or vmlinux, with kexec_file_load
system call, and
* optionally verify its signature at load time for trusted boot.
To load the kernel via kexec_file_load system call, a small change
needs to be applied on kexec-tools. See [2]. This enables '-s' option.
As we discussed a long time ago, users may not be allowed to specify
device-tree file of the 2nd kernel explicitly with kexec-tools, therefore
re-using the blob of the first kernel.
Regarding a method of placing the signature into the kernel binary,
* for 'Image', we conform with x86 (or rather Microsoft?) style of
signing since the binary can also be seen as in PE format
(assuming that CONFIG_EFI is enabled),
* for 'vmlinux', we follow powerpc approach[3]: The signature will
be appended just after the binary itself as module signing does.
This implies that we need to enable CONFIG_MODULE_SIG, too.
Powerpc is also going to support extended-file-attribute-based
verification[3], but arm64 doesn't for now partly because we don't
have TPM-based IMA at this moment.
Accordingly, we can use the existing commands, sbsign and sig-file
respectively, to sign the kernel. Please note that it is totally up to
the system what key/certificate is used for signing.
Some concerns(or future works):
* Even if the kernel is configured with CONFIG_RANDOMIZE_BASE, the 2nd
kernel won't be placed at a randomized address. We will have to
add some boot code similar to efi-stub to implement the feature.
* While big-endian kernel can support kernel signing, I'm not sure that
Image can be recognized as in PE format because x86 standard only
defines little-endian-based format.
So I tested big-endian kernel signing only with vmlinux.
* IMA(and file extended attribute)-based kexec
Patch #1 to #7 are all preparatory patches on generic side.
(Patch #1 is not part of mine, but a prerequisite from [4].)
Patch #8 and #9 are purgatory code.
Patch #10 to #12 are common for enabling kexec_file_load.
Patch #13 is for 'Image' support.
Patch #14 is for 'vmlinux' support.
[1] http://git.linaro.org/people/takahiro.akashi/linux-aarch64.git
branch:arm64/kexec_file
[2] http://git.linaro.org/people/takahiro.akashi/kexec-tools.git
branch:arm64/kexec_file
[3] http://lkml.iu.edu//hypermail/linux/kernel/1707.0/03669.html
[4] http://lkml.iu.edu//hypermail/linux/kernel/1707.0/03670.html
AKASHI Takahiro (13):
include: pe.h: remove message[] from mz header definition
resource: add walk_system_ram_res_rev()
kexec_file: factor out vmlinux (elf) parser from powerpc
kexec_file: factor out crashdump elf header function from x86
kexec_file: add kexec_add_segment()
asm-generic: add kexec_file_load system call to unistd.h
arm64: kexec_file: create purgatory
arm64: kexec_file: add sha256 digest check in purgatory
arm64: kexec_file: load initrd, device-tree and purgatory segments
arm64: kexec_file: set up for crash dump adding elf core header
arm64: enable KEXEC_FILE config
arm64: kexec_file: add Image format support
arm64: kexec_file: add vmlinux format support
Thiago Jung Bauermann (1):
MODSIGN: Export module signature definitions
arch/Kconfig | 3 +
arch/arm64/Kconfig | 33 ++
arch/arm64/Makefile | 1 +
arch/arm64/crypto/sha256-core.S_shipped | 2 +
arch/arm64/include/asm/kexec.h | 23 ++
arch/arm64/include/asm/kexec_file.h | 84 +++++
arch/arm64/kernel/Makefile | 5 +-
arch/arm64/kernel/kexec_elf.c | 216 ++++++++++++
arch/arm64/kernel/kexec_image.c | 112 ++++++
arch/arm64/kernel/machine_kexec_file.c | 606 ++++++++++++++++++++++++++++++++
arch/arm64/purgatory/Makefile | 43 +++
arch/arm64/purgatory/entry.S | 41 +++
arch/arm64/purgatory/purgatory.c | 20 ++
arch/arm64/purgatory/sha256-core.S | 1 +
arch/arm64/purgatory/sha256.c | 79 +++++
arch/arm64/purgatory/sha256.h | 1 +
arch/arm64/purgatory/string.c | 32 ++
arch/arm64/purgatory/string.h | 5 +
arch/powerpc/Kconfig | 1 +
arch/powerpc/kernel/kexec_elf_64.c | 464 ------------------------
arch/x86/kernel/crash.c | 324 -----------------
include/linux/elf.h | 62 ++++
include/linux/ioport.h | 3 +
include/linux/kexec.h | 39 ++
include/linux/module.h | 3 -
include/linux/module_signature.h | 47 +++
include/linux/pe.h | 2 +-
include/uapi/asm-generic/unistd.h | 4 +-
init/Kconfig | 6 +-
kernel/Makefile | 3 +-
kernel/crash_core.c | 333 ++++++++++++++++++
kernel/kexec_file.c | 47 +++
kernel/kexec_file_elf.c | 454 ++++++++++++++++++++++++
kernel/module.c | 1 +
kernel/module_signing.c | 74 ++--
kernel/resource.c | 48 +++
36 files changed, 2383 insertions(+), 839 deletions(-)
create mode 100644 arch/arm64/include/asm/kexec_file.h
create mode 100644 arch/arm64/kernel/kexec_elf.c
create mode 100644 arch/arm64/kernel/kexec_image.c
create mode 100644 arch/arm64/kernel/machine_kexec_file.c
create mode 100644 arch/arm64/purgatory/Makefile
create mode 100644 arch/arm64/purgatory/entry.S
create mode 100644 arch/arm64/purgatory/purgatory.c
create mode 100644 arch/arm64/purgatory/sha256-core.S
create mode 100644 arch/arm64/purgatory/sha256.c
create mode 100644 arch/arm64/purgatory/sha256.h
create mode 100644 arch/arm64/purgatory/string.c
create mode 100644 arch/arm64/purgatory/string.h
create mode 100644 include/linux/module_signature.h
create mode 100644 kernel/kexec_file_elf.c
--
2.14.1
This is a basic purgtory, or a kind of glue code between the two kernel,
for arm64. We will later add a feature of verifying a digest check against
loaded memory segments.
arch_kexec_apply_relocations_add() is responsible for re-linking any
relative symbols in purgatory. Please note that the purgatory is not
an executable, but a non-linked archive of binaries so relative symbols
contained here must be resolved at kexec load time.
Despite that arm64_kernel_start and arm64_dtb_addr are only such global
variables now, arch_kexec_apply_relocations_add() can manage more various
types of relocations.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/Makefile | 1 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/machine_kexec_file.c | 199 +++++++++++++++++++++++++++++++++
arch/arm64/purgatory/Makefile | 24 ++++
arch/arm64/purgatory/entry.S | 28 +++++
5 files changed, 253 insertions(+)
create mode 100644 arch/arm64/kernel/machine_kexec_file.c
create mode 100644 arch/arm64/purgatory/Makefile
create mode 100644 arch/arm64/purgatory/entry.S
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 9b41f1e3b1a0..429f60728c0a 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -105,6 +105,7 @@ core-$(CONFIG_XEN) += arch/arm64/xen/
core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
libs-y := arch/arm64/lib/ $(libs-y)
core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
+core-$(CONFIG_KEXEC_FILE) += arch/arm64/purgatory/
# Default target when executing plain make
boot := arch/arm64/boot
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index f2b4e816b6de..16e9f56b536a 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
+arm64-obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o
arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
new file mode 100644
index 000000000000..183f7776d6dd
--- /dev/null
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -0,0 +1,199 @@
+/*
+ * kexec_file for arm64
+ *
+ * Copyright (C) 2017 Linaro Limited
+ * Author: AKASHI Takahiro <[email protected]>
+ *
+ * Most code is derived from arm64 port of kexec-tools
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#define pr_fmt(fmt) "kexec_file: " fmt
+
+#include <linux/elf.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+/*
+ * Apply purgatory relocations.
+ *
+ * ehdr: Pointer to elf headers
+ * sechdrs: Pointer to section headers.
+ * relsec: section index of SHT_RELA section.
+ *
+ * Note:
+ * Currently R_AARCH64_ABS64, R_AARCH64_LD_PREL_LO19 and R_AARCH64_CALL26
+ * are the only types to be generated from purgatory code.
+ * If we add more functionalities, other types may also be used.
+ */
+int arch_kexec_apply_relocations_add(const Elf64_Ehdr *ehdr,
+ Elf64_Shdr *sechdrs, unsigned int relsec)
+{
+ Elf64_Rela *rel;
+ Elf64_Shdr *section, *symtabsec;
+ Elf64_Sym *sym;
+ const char *strtab, *name, *shstrtab;
+ unsigned long address, sec_base, value;
+ void *location;
+ u64 *loc64;
+ u32 *loc32, imm;
+ unsigned int i;
+
+ /*
+ * ->sh_offset has been modified to keep the pointer to section
+ * contents in memory
+ */
+ rel = (void *)sechdrs[relsec].sh_offset;
+
+ /* Section to which relocations apply */
+ section = &sechdrs[sechdrs[relsec].sh_info];
+
+ pr_debug("reloc: Applying relocate section %u to %u\n", relsec,
+ sechdrs[relsec].sh_info);
+
+ /* Associated symbol table */
+ symtabsec = &sechdrs[sechdrs[relsec].sh_link];
+
+ /* String table */
+ if (symtabsec->sh_link >= ehdr->e_shnum) {
+ /* Invalid strtab section number */
+ pr_err("reloc: Invalid string table section index %d\n",
+ symtabsec->sh_link);
+ return -ENOEXEC;
+ }
+
+ strtab = (char *)sechdrs[symtabsec->sh_link].sh_offset;
+
+ /* section header string table */
+ shstrtab = (char *)sechdrs[ehdr->e_shstrndx].sh_offset;
+
+ for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
+
+ /*
+ * rel[i].r_offset contains byte offset from beginning
+ * of section to the storage unit affected.
+ *
+ * This is location to update (->sh_offset). This is temporary
+ * buffer where section is currently loaded. This will finally
+ * be loaded to a different address later, pointed to by
+ * ->sh_addr. kexec takes care of moving it
+ * (kexec_load_segment()).
+ */
+ location = (void *)(section->sh_offset + rel[i].r_offset);
+
+ /* Final address of the location */
+ address = section->sh_addr + rel[i].r_offset;
+
+ /*
+ * rel[i].r_info contains information about symbol table index
+ * w.r.t which relocation must be made and type of relocation
+ * to apply. ELF64_R_SYM() and ELF64_R_TYPE() macros get
+ * these respectively.
+ */
+ sym = (Elf64_Sym *)symtabsec->sh_offset +
+ ELF64_R_SYM(rel[i].r_info);
+
+ if (sym->st_name)
+ name = strtab + sym->st_name;
+ else
+ name = shstrtab + sechdrs[sym->st_shndx].sh_name;
+
+ pr_debug("Symbol: %-16s info: %02x shndx: %02x value=%llx size: %llx reloc type:%d\n",
+ name, sym->st_info, sym->st_shndx, sym->st_value,
+ sym->st_size, (int)ELF64_R_TYPE(rel[i].r_info));
+
+ if (sym->st_shndx == SHN_UNDEF) {
+ pr_err("reloc: Undefined symbol: %s\n", name);
+ return -ENOEXEC;
+ }
+
+ if (sym->st_shndx == SHN_COMMON) {
+ pr_err("reloc: symbol '%s' in common section\n", name);
+ return -ENOEXEC;
+ }
+
+ if (sym->st_shndx == SHN_ABS) {
+ sec_base = 0;
+ } else if (sym->st_shndx < ehdr->e_shnum) {
+ sec_base = sechdrs[sym->st_shndx].sh_addr;
+ } else {
+ pr_err("reloc: Invalid section %d for symbol %s\n",
+ sym->st_shndx, name);
+ return -ENOEXEC;
+ }
+
+ value = sym->st_value;
+ value += sec_base;
+ value += rel[i].r_addend;
+
+ switch (ELF64_R_TYPE(rel[i].r_info)) {
+ case R_AARCH64_ABS64:
+ loc64 = location;
+ *loc64 = cpu_to_elf64(ehdr,
+ elf64_to_cpu(ehdr, *loc64) + value);
+ break;
+ case R_AARCH64_PREL32:
+ loc32 = location;
+ *loc32 = cpu_to_elf32(ehdr,
+ elf32_to_cpu(ehdr, *loc32) + value
+ - address);
+ break;
+ case R_AARCH64_LD_PREL_LO19:
+ loc32 = location;
+ *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ + (((value - address) << 3) & 0xffffe0));
+ break;
+ case R_AARCH64_ADR_PREL_LO21:
+ if (value & 3) {
+ pr_err("reloc: Unaligned value: %lx\n", value);
+ return -ENOEXEC;
+ }
+ loc32 = location;
+ *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ + (((value - address) << 3) & 0xffffe0));
+ break;
+ case R_AARCH64_ADR_PREL_PG_HI21:
+ imm = ((value & ~0xfff) - (address & ~0xfff)) >> 12;
+ loc32 = location;
+ *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ + ((imm & 3) << 29)
+ + ((imm & 0x1ffffc) << (5 - 2)));
+ break;
+ case R_AARCH64_ADD_ABS_LO12_NC:
+ loc32 = location;
+ *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ + ((value & 0xfff) << 10));
+ break;
+ case R_AARCH64_JUMP26:
+ loc32 = location;
+ *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ + (((value - address) >> 2) & 0x3ffffff));
+ break;
+ case R_AARCH64_CALL26:
+ loc32 = location;
+ *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ + (((value - address) >> 2) & 0x3ffffff));
+ break;
+ case R_AARCH64_LDST64_ABS_LO12_NC:
+ if (value & 7) {
+ pr_err("reloc: Unaligned value: %lx\n", value);
+ return -ENOEXEC;
+ }
+ loc32 = location;
+ *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
+ + ((value & 0xff8) << (10 - 3)));
+ break;
+ default:
+ pr_err("reloc: Unknown relocation type: %llu\n",
+ ELF64_R_TYPE(rel[i].r_info));
+ return -ENOEXEC;
+ }
+ }
+
+ return 0;
+}
diff --git a/arch/arm64/purgatory/Makefile b/arch/arm64/purgatory/Makefile
new file mode 100644
index 000000000000..c2127a2cbd51
--- /dev/null
+++ b/arch/arm64/purgatory/Makefile
@@ -0,0 +1,24 @@
+OBJECT_FILES_NON_STANDARD := y
+
+purgatory-y := entry.o
+
+targets += $(purgatory-y)
+PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
+
+LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined \
+ -nostdlib -z nodefaultlib
+targets += purgatory.ro
+
+$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
+ $(call if_changed,ld)
+
+targets += kexec_purgatory.c
+
+CMD_BIN2C = $(objtree)/scripts/basic/bin2c
+quiet_cmd_bin2c = BIN2C $@
+ cmd_bin2c = $(CMD_BIN2C) kexec_purgatory < $< > $@
+
+$(obj)/kexec_purgatory.c: $(obj)/purgatory.ro FORCE
+ $(call if_changed,bin2c)
+
+obj-${CONFIG_KEXEC_FILE} += kexec_purgatory.o
diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
new file mode 100644
index 000000000000..bc4e6b3bf8a1
--- /dev/null
+++ b/arch/arm64/purgatory/entry.S
@@ -0,0 +1,28 @@
+/*
+ * kexec core purgatory
+ */
+#include <linux/linkage.h>
+
+.text
+
+ENTRY(purgatory_start)
+ /* Start new image. */
+ ldr x17, arm64_kernel_entry
+ ldr x0, arm64_dtb_addr
+ mov x1, xzr
+ mov x2, xzr
+ mov x3, xzr
+ br x17
+END(purgatory_start)
+
+.data
+
+.align 3
+
+ENTRY(arm64_kernel_entry)
+ .quad 0
+END(arm64_kernel_entry)
+
+ENTRY(arm64_dtb_addr)
+ .quad 0
+END(arm64_dtb_addr)
--
2.14.1
message[] field won't be part of the definition of mz header.
This change is crucial for enabling kexec_file_load on arm64 because
arm64's "Image" binary, as in PE format, doesn't have any data for it and
accordingly the following check in pefile_parse_binary() will fail:
chkaddr(cursor, mz->peaddr, sizeof(*pe));
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: David Howells <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: David S. Miller <[email protected]>
---
include/linux/pe.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/pe.h b/include/linux/pe.h
index 143ce75be5f0..3482b18a48b5 100644
--- a/include/linux/pe.h
+++ b/include/linux/pe.h
@@ -166,7 +166,7 @@ struct mz_hdr {
uint16_t oem_info; /* oem specific */
uint16_t reserved1[10]; /* reserved */
uint32_t peaddr; /* address of pe header */
- char message[64]; /* message to print */
+ char message[]; /* message to print */
};
struct mz_reloc {
--
2.14.1
The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
will be loaded at the offset of TEXT_OFFSET from the begining of system
memory. The other PT_LOAD segments are placed relative to the first one.
Regarding kernel verification, since there is no standard way to contain
a signature within elf binary, we follow PowerPC's (not yet upstreamed)
approach, that is, appending a signature right after the kernel binary
itself like module signing.
This way, the signature can be easily retrieved and verified with
verify_pkcs7_signature().
We can sign the kernel with sign-file command.
Unlike PowerPC, we don't support ima-based kexec for now since arm64
doesn't have any secure solution for system appraisal at this moment.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 8 ++
arch/arm64/include/asm/kexec_file.h | 1 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/kexec_elf.c | 216 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/machine_kexec_file.c | 3 +
5 files changed, 229 insertions(+)
create mode 100644 arch/arm64/kernel/kexec_elf.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c8f603700bdd..94021e66b826 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -772,11 +772,19 @@ config KEXEC_FILE_IMAGE_FMT
---help---
Select this option to enable 'Image' kernel loading.
+config KEXEC_FILE_ELF_FMT
+ bool "Enable vmlinux/elf support"
+ depends on KEXEC_FILE
+ select KEXEC_FILE_ELF
+ ---help---
+ Select this option to enable 'vmlinux' kernel loading.
+
config KEXEC_VERIFY_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
select SYSTEM_DATA_VERIFICATION
select SIGNED_PE_FILE_VERIFICATION if KEXEC_FILE_IMAGE_FMT
+ select MODULE_SIG_FORMAT if KEXEC_FILE_ELF_FMT
---help---
This option makes kernel signature verification mandatory for
the kexec_file_load() syscall.
diff --git a/arch/arm64/include/asm/kexec_file.h b/arch/arm64/include/asm/kexec_file.h
index 5df899aa0d2e..eaf2adc1121c 100644
--- a/arch/arm64/include/asm/kexec_file.h
+++ b/arch/arm64/include/asm/kexec_file.h
@@ -2,6 +2,7 @@
#define _ASM_KEXEC_FILE_H
extern struct kexec_file_ops kexec_image_ops;
+extern struct kexec_file_ops kexec_elf64_ops;
/**
* struct arm64_image_header - arm64 kernel image header.
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a1161bab6810..1463337160ea 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -52,6 +52,7 @@ arm64-obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
arm64-obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o
arm64-obj-$(CONFIG_KEXEC_FILE_IMAGE_FMT) += kexec_image.o
+arm64-obj-$(CONFIG_KEXEC_FILE_ELF_FMT) += kexec_elf.o
arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
diff --git a/arch/arm64/kernel/kexec_elf.c b/arch/arm64/kernel/kexec_elf.c
new file mode 100644
index 000000000000..7bd3c1e1f65a
--- /dev/null
+++ b/arch/arm64/kernel/kexec_elf.c
@@ -0,0 +1,216 @@
+/*
+ * Kexec vmlinux loader
+
+ * Copyright (C) 2017 Linaro Limited
+ * Authors: AKASHI Takahiro <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#define pr_fmt(fmt) "kexec_file(elf): " fmt
+
+#include <linux/elf.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/module_signature.h>
+#include <linux/types.h>
+#include <linux/verification.h>
+#include <asm/byteorder.h>
+#include <asm/kexec_file.h>
+#include <asm/memory.h>
+
+static int elf64_probe(const char *buf, unsigned long len)
+{
+ struct elfhdr ehdr;
+
+ /* Check for magic and architecture */
+ memcpy(&ehdr, buf, sizeof(ehdr));
+ if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) ||
+ (elf16_to_cpu(&ehdr, ehdr.e_machine) != EM_AARCH64))
+ return -ENOEXEC;
+
+ return 0;
+}
+
+static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr,
+ struct elf_info *elf_info,
+ unsigned long *kernel_load_addr)
+{
+ struct kexec_buf kbuf;
+ const struct elf_phdr *phdr;
+ const struct arm64_image_header *h;
+ unsigned long text_offset, rand_offset;
+ unsigned long page_offset, phys_offset;
+ int first_segment, i, ret = -ENOEXEC;
+
+ kbuf.image = image;
+ if (image->type == KEXEC_TYPE_CRASH) {
+ kbuf.buf_min = crashk_res.start;
+ kbuf.buf_max = crashk_res.end + 1;
+ } else {
+ kbuf.buf_min = 0;
+ kbuf.buf_max = ULONG_MAX;
+ }
+ kbuf.top_down = 0;
+
+ /* Load PT_LOAD segments. */
+ for (i = 0, first_segment = 1; i < ehdr->e_phnum; i++) {
+ phdr = &elf_info->proghdrs[i];
+ if (phdr->p_type != PT_LOAD)
+ continue;
+
+ kbuf.buffer = (void *) elf_info->buffer + phdr->p_offset;
+ kbuf.bufsz = min(phdr->p_filesz, phdr->p_memsz);
+ kbuf.memsz = phdr->p_memsz;
+ kbuf.buf_align = phdr->p_align;
+
+ if (first_segment) {
+ /*
+ * Identify TEXT_OFFSET:
+ * When CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET=y the image
+ * header could be offset in the elf segment. The linker
+ * script sets ehdr->e_entry to the start of text.
+ *
+ * NOTE: In v3.16 or older, h->text_offset is 0,
+ * so use the default, 0x80000
+ */
+ rand_offset = ehdr->e_entry - phdr->p_vaddr;
+ h = (struct arm64_image_header *)
+ (elf_info->buffer + phdr->p_offset +
+ rand_offset);
+
+ if (!arm64_header_check_magic(h))
+ goto out;
+
+ if (h->image_size)
+ text_offset = le64_to_cpu(h->text_offset);
+ else
+ text_offset = 0x80000;
+
+ /* Adjust kernel segment with TEXT_OFFSET */
+ kbuf.memsz += text_offset - rand_offset;
+
+ ret = kexec_add_buffer(&kbuf);
+ if (ret)
+ goto out;
+
+ image->segment[image->nr_segments - 1].mem
+ += text_offset - rand_offset;
+ image->segment[image->nr_segments - 1].memsz
+ -= text_offset - rand_offset;
+
+ *kernel_load_addr = kbuf.mem + text_offset;
+
+ /* for succeeding segmemts */
+ page_offset = ALIGN_DOWN(phdr->p_vaddr, SZ_2M);
+ phys_offset = kbuf.mem;
+
+ first_segment = 0;
+ } else {
+ /* Calculate physical address */
+ kbuf.mem = phdr->p_vaddr - page_offset + phys_offset;
+
+ ret = kexec_add_segment(&kbuf);
+ if (ret)
+ goto out;
+ }
+ }
+
+out:
+ return ret;
+}
+
+static void *elf64_load(struct kimage *image, char *kernel_buf,
+ unsigned long kernel_len, char *initrd,
+ unsigned long initrd_len, char *cmdline,
+ unsigned long cmdline_len)
+{
+ struct elfhdr ehdr;
+ struct elf_info elf_info;
+ unsigned long kernel_load_addr;
+ int ret;
+
+ /* Create elf core header segment */
+ ret = load_crashdump_segments(image);
+ if (ret)
+ goto out;
+
+ /* Load the kernel */
+ ret = build_elf_exec_info(kernel_buf, kernel_len, &ehdr, &elf_info);
+ if (ret)
+ goto out;
+
+ ret = elf_exec_load(image, &ehdr, &elf_info, &kernel_load_addr);
+ if (ret)
+ goto out;
+ pr_debug("Loaded the kernel at 0x%lx\n", kernel_load_addr);
+
+ /* Load additional data */
+ ret = load_other_segments(image, kernel_load_addr,
+ initrd, initrd_len, cmdline, cmdline_len);
+
+out:
+ elf_free_info(&elf_info);
+
+ return ERR_PTR(ret);
+}
+
+#ifdef CONFIG_KEXEC_VERIFY_SIG
+/*
+ * The file format is the exact same as module signing:
+ * <kernel> := <Image> + <signature part> + <marker>
+ * <signature part> := <signature data> + <struct module_signature>
+ */
+static int elf64_verify_sig(const char *kernel, unsigned long kernel_len)
+{
+ const size_t marker_len = sizeof(MODULE_SIG_STRING) - 1;
+ const struct module_signature *sig;
+ size_t file_len = kernel_len;
+ size_t sig_len;
+ const void *p;
+ int rc;
+
+ if (kernel_len <= marker_len + sizeof(*sig))
+ return -ENOENT;
+
+ /* Check for marker */
+ p = kernel + kernel_len - marker_len;
+ if (memcmp(p, MODULE_SIG_STRING, marker_len)) {
+ pr_err("probably the kernel is not signed.\n");
+ return -ENOENT;
+ }
+
+ /* Validate signature */
+ sig = (const struct module_signature *) (p - sizeof(*sig));
+ file_len -= marker_len;
+
+ rc = validate_module_sig(sig, kernel_len - marker_len);
+ if (rc) {
+ pr_err("signature is not valid\n");
+ return rc;
+ }
+
+ /* Verify kernel with signature */
+ sig_len = be32_to_cpu(sig->sig_len);
+ p -= sig_len + sizeof(*sig);
+ file_len -= sig_len + sizeof(*sig);
+
+ rc = verify_pkcs7_signature(kernel, p - (void *)kernel, p, sig_len,
+ NULL, VERIFYING_MODULE_SIGNATURE,
+ NULL, NULL);
+
+ return rc;
+}
+#endif
+
+struct kexec_file_ops kexec_elf64_ops = {
+ .probe = elf64_probe,
+ .load = elf64_load,
+#ifdef CONFIG_KEXEC_VERIFY_SIG
+ .verify_sig = elf64_verify_sig,
+#endif
+};
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index ab3b19d51727..cb1f24d98f87 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -31,6 +31,9 @@ static struct kexec_file_ops *kexec_file_loaders[] = {
#ifdef CONFIG_KEXEC_FILE_IMAGE_FMT
&kexec_image_ops,
#endif
+#ifdef CONFIG_KEXEC_FILE_ELF_FMT
+ &kexec_elf64_ops,
+#endif
};
int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
--
2.14.1
From: Thiago Jung Bauermann <[email protected]>
IMA will use the module_signature format for append signatures, so export
the relevant definitions and factor out the code which verifies that the
appended signature trailer is valid.
Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
and be able to use validate_module_signature without having to depend on
CONFIG_MODULE_SIG.
Signed-off-by: Thiago Jung Bauermann <[email protected]>
---
include/linux/module.h | 3 --
include/linux/module_signature.h | 47 +++++++++++++++++++++++++
init/Kconfig | 6 +++-
kernel/Makefile | 2 +-
kernel/module.c | 1 +
kernel/module_signing.c | 74 +++++++++++++++++-----------------------
6 files changed, 85 insertions(+), 48 deletions(-)
create mode 100644 include/linux/module_signature.h
diff --git a/include/linux/module.h b/include/linux/module.h
index e7bdd549e527..672ad2016262 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -23,9 +23,6 @@
#include <linux/percpu.h>
#include <asm/module.h>
-/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
-#define MODULE_SIG_STRING "~Module signature appended~\n"
-
/* Not Yet Implemented */
#define MODULE_SUPPORTED_DEVICE(name)
diff --git a/include/linux/module_signature.h b/include/linux/module_signature.h
new file mode 100644
index 000000000000..e80728e5b86c
--- /dev/null
+++ b/include/linux/module_signature.h
@@ -0,0 +1,47 @@
+/* Module signature handling.
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _LINUX_MODULE_SIGNATURE_H
+#define _LINUX_MODULE_SIGNATURE_H
+
+/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
+#define MODULE_SIG_STRING "~Module signature appended~\n"
+
+enum pkey_id_type {
+ PKEY_ID_PGP, /* OpenPGP generated key ID */
+ PKEY_ID_X509, /* X.509 arbitrary subjectKeyIdentifier */
+ PKEY_ID_PKCS7, /* Signature in PKCS#7 message */
+};
+
+/*
+ * Module signature information block.
+ *
+ * The constituents of the signature section are, in order:
+ *
+ * - Signer's name
+ * - Key identifier
+ * - Signature data
+ * - Information block
+ */
+struct module_signature {
+ u8 algo; /* Public-key crypto algorithm [0] */
+ u8 hash; /* Digest algorithm [0] */
+ u8 id_type; /* Key identifier type [PKEY_ID_PKCS7] */
+ u8 signer_len; /* Length of signer's name [0] */
+ u8 key_id_len; /* Length of key identifier [0] */
+ u8 __pad[3];
+ __be32 sig_len; /* Length of signature data */
+};
+
+int validate_module_sig(const struct module_signature *ms, size_t file_len);
+int mod_verify_sig(const void *mod, unsigned long *_modlen);
+
+#endif /* _LINUX_MODULE_SIGNATURE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 8514b25db21c..c3ac1170b93a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1734,7 +1734,7 @@ config MODULE_SRCVERSION_ALL
config MODULE_SIG
bool "Module signature verification"
depends on MODULES
- select SYSTEM_DATA_VERIFICATION
+ select MODULE_SIG_FORMAT
help
Check modules for valid signatures upon load: the signature
is simply appended to the module. For more information see
@@ -1749,6 +1749,10 @@ config MODULE_SIG
debuginfo strip done by some packagers (such as rpmbuild) and
inclusion into an initramfs that wants the module size reduced.
+config MODULE_SIG_FORMAT
+ def_bool n
+ select SYSTEM_DATA_VERIFICATION
+
config MODULE_SIG_FORCE
bool "Require modules to be validly signed"
depends on MODULE_SIG
diff --git a/kernel/Makefile b/kernel/Makefile
index 4cb8e8b23c6e..d5f9748ab19f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -56,7 +56,7 @@ obj-y += up.o
endif
obj-$(CONFIG_UID16) += uid16.o
obj-$(CONFIG_MODULES) += module.o
-obj-$(CONFIG_MODULE_SIG) += module_signing.o
+obj-$(CONFIG_MODULE_SIG_FORMAT) += module_signing.o
obj-$(CONFIG_KALLSYMS) += kallsyms.o
obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
obj-$(CONFIG_CRASH_CORE) += crash_core.o
diff --git a/kernel/module.c b/kernel/module.c
index 40f983cbea81..52921fccb51a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -19,6 +19,7 @@
#include <linux/export.h>
#include <linux/extable.h>
#include <linux/moduleloader.h>
+#include <linux/module_signature.h>
#include <linux/trace_events.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
diff --git a/kernel/module_signing.c b/kernel/module_signing.c
index 937c844bee4a..204c60d4cc9f 100644
--- a/kernel/module_signing.c
+++ b/kernel/module_signing.c
@@ -11,36 +11,38 @@
#include <linux/kernel.h>
#include <linux/errno.h>
+#include <linux/module_signature.h>
#include <linux/string.h>
#include <linux/verification.h>
#include <crypto/public_key.h>
#include "module-internal.h"
-enum pkey_id_type {
- PKEY_ID_PGP, /* OpenPGP generated key ID */
- PKEY_ID_X509, /* X.509 arbitrary subjectKeyIdentifier */
- PKEY_ID_PKCS7, /* Signature in PKCS#7 message */
-};
-
-/*
- * Module signature information block.
- *
- * The constituents of the signature section are, in order:
+/**
+ * validate_module_sig - validate that the given signature is sane
*
- * - Signer's name
- * - Key identifier
- * - Signature data
- * - Information block
+ * @ms: Signature to validate.
+ * @file_len: Size of the file to which @ms is appended.
*/
-struct module_signature {
- u8 algo; /* Public-key crypto algorithm [0] */
- u8 hash; /* Digest algorithm [0] */
- u8 id_type; /* Key identifier type [PKEY_ID_PKCS7] */
- u8 signer_len; /* Length of signer's name [0] */
- u8 key_id_len; /* Length of key identifier [0] */
- u8 __pad[3];
- __be32 sig_len; /* Length of signature data */
-};
+int validate_module_sig(const struct module_signature *ms, size_t file_len)
+{
+ if (be32_to_cpu(ms->sig_len) >= file_len - sizeof(*ms))
+ return -EBADMSG;
+ else if (ms->id_type != PKEY_ID_PKCS7) {
+ pr_err("Module is not signed with expected PKCS#7 message\n");
+ return -ENOPKG;
+ } else if (ms->algo != 0 ||
+ ms->hash != 0 ||
+ ms->signer_len != 0 ||
+ ms->key_id_len != 0 ||
+ ms->__pad[0] != 0 ||
+ ms->__pad[1] != 0 ||
+ ms->__pad[2] != 0) {
+ pr_err("PKCS#7 signature info has unexpected non-zero params\n");
+ return -EBADMSG;
+ }
+
+ return 0;
+}
/*
* Verify the signature on a module.
@@ -49,6 +51,7 @@ int mod_verify_sig(const void *mod, unsigned long *_modlen)
{
struct module_signature ms;
size_t modlen = *_modlen, sig_len;
+ int ret;
pr_devel("==>%s(,%zu)\n", __func__, modlen);
@@ -56,30 +59,15 @@ int mod_verify_sig(const void *mod, unsigned long *_modlen)
return -EBADMSG;
memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms));
- modlen -= sizeof(ms);
+
+ ret = validate_module_sig(&ms, modlen);
+ if (ret)
+ return ret;
sig_len = be32_to_cpu(ms.sig_len);
- if (sig_len >= modlen)
- return -EBADMSG;
- modlen -= sig_len;
+ modlen -= sig_len + sizeof(ms);
*_modlen = modlen;
- if (ms.id_type != PKEY_ID_PKCS7) {
- pr_err("Module is not signed with expected PKCS#7 message\n");
- return -ENOPKG;
- }
-
- if (ms.algo != 0 ||
- ms.hash != 0 ||
- ms.signer_len != 0 ||
- ms.key_id_len != 0 ||
- ms.__pad[0] != 0 ||
- ms.__pad[1] != 0 ||
- ms.__pad[2] != 0) {
- pr_err("PKCS#7 signature info has unexpected non-zero params\n");
- return -EBADMSG;
- }
-
return verify_pkcs7_signature(mod, modlen, mod + modlen, sig_len,
NULL, VERIFYING_MODULE_SIGNATURE,
NULL, NULL);
--
2.14.1
This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.
It will be used in kexec_file implementation on arm64.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
---
include/linux/ioport.h | 3 +++
kernel/resource.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064d7f95..9a212266299f 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -271,6 +271,9 @@ extern int
walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *));
extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+ int (*func)(u64, u64, void *));
+extern int
walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
void *arg, int (*func)(u64, u64, void *));
diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f04404152..1d6d734c75ac 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,7 @@
#include <linux/pfn.h>
#include <linux/mm.h>
#include <linux/resource_ext.h>
+#include <linux/vmalloc.h>
#include <asm/io.h>
@@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
return ret;
}
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+ int (*func)(u64, u64, void *))
+{
+ struct resource res, *rams;
+ u64 orig_end;
+ int count, i;
+ int ret = -1;
+
+ count = 16; /* initial */
+again:
+ /* create a list */
+ rams = vmalloc(sizeof(struct resource) * count);
+ if (!rams)
+ return ret;
+
+ res.start = start;
+ res.end = end;
+ res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+ orig_end = res.end;
+ i = 0;
+ while ((res.start < res.end) &&
+ (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
+ if (i >= count) {
+ /* unlikely but */
+ vfree(rams);
+ count += 16;
+ goto again;
+ }
+
+ rams[i].start = res.start;
+ rams[i++].end = res.end;
+
+ res.start = res.end + 1;
+ res.end = orig_end;
+ }
+
+ /* go reverse */
+ for (i--; i >= 0; i--) {
+ ret = (*func)(rams[i].start, rams[i].end, arg);
+ if (ret)
+ break;
+ }
+
+ vfree(rams);
+ return ret;
+}
+
#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
/*
--
2.14.1
The "Image" binary will be loaded at the offset of TEXT_OFFSET from
the start of system memory. TEXT_OFFSET is basically determined from
the header of the image.
Regarding kernel verification, it will be done through
verify_pefile_signature() as arm64's "Image" binary can be seen as
in PE format. This approach is consistent with x86 implementation.
we can sign it with sbsign command.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 11 ++--
arch/arm64/include/asm/kexec_file.h | 83 ++++++++++++++++++++++++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/kexec_image.c | 112 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/machine_kexec_file.c | 6 +-
5 files changed, 208 insertions(+), 5 deletions(-)
create mode 100644 arch/arm64/include/asm/kexec_file.h
create mode 100644 arch/arm64/kernel/kexec_image.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index cf10bc720d9e..c8f603700bdd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -766,18 +766,21 @@ config KEXEC_FILE
for kernel and initramfs as opposed to list of segments as
accepted by previous system call.
+config KEXEC_FILE_IMAGE_FMT
+ bool "Enable Image support"
+ depends on KEXEC_FILE
+ ---help---
+ Select this option to enable 'Image' kernel loading.
+
config KEXEC_VERIFY_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
select SYSTEM_DATA_VERIFICATION
+ select SIGNED_PE_FILE_VERIFICATION if KEXEC_FILE_IMAGE_FMT
---help---
This option makes kernel signature verification mandatory for
the kexec_file_load() syscall.
- In addition to that option, you need to enable signature
- verification for the corresponding kernel image type being
- loaded in order for this to work.
-
config CRASH_DUMP
bool "Build kdump crash kernel"
help
diff --git a/arch/arm64/include/asm/kexec_file.h b/arch/arm64/include/asm/kexec_file.h
new file mode 100644
index 000000000000..5df899aa0d2e
--- /dev/null
+++ b/arch/arm64/include/asm/kexec_file.h
@@ -0,0 +1,83 @@
+#ifndef _ASM_KEXEC_FILE_H
+#define _ASM_KEXEC_FILE_H
+
+extern struct kexec_file_ops kexec_image_ops;
+
+/**
+ * struct arm64_image_header - arm64 kernel image header.
+ *
+ * @pe_sig: Optional PE format 'MZ' signature.
+ * @branch_code: Reserved for instructions to branch to stext.
+ * @text_offset: The image load offset in LSB byte order.
+ * @image_size: An estimated size of the memory image size in LSB byte order.
+ * @flags: Bit flags:
+ * Bit 7.0: Image byte order, 1=MSB.
+ * @reserved_1: Reserved.
+ * @magic: Magic number, "ARM\x64".
+ * @pe_header: Optional offset to a PE format header.
+ **/
+
+struct arm64_image_header {
+ u8 pe_sig[2];
+ u16 branch_code[3];
+ u64 text_offset;
+ u64 image_size;
+ u8 flags[8];
+ u64 reserved_1[3];
+ u8 magic[4];
+ u32 pe_header;
+};
+
+static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
+static const u8 arm64_image_pe_sig[2] = {'M', 'Z'};
+static const u64 arm64_image_flag_7_be = 0x01U;
+
+/**
+ * arm64_header_check_magic - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if header is OK.
+ */
+
+static inline int arm64_header_check_magic(const struct arm64_image_header *h)
+{
+ if (!h)
+ return 0;
+
+ if (!h->text_offset)
+ return 0;
+
+ return (h->magic[0] == arm64_image_magic[0]
+ && h->magic[1] == arm64_image_magic[1]
+ && h->magic[2] == arm64_image_magic[2]
+ && h->magic[3] == arm64_image_magic[3]);
+}
+
+/**
+ * arm64_header_check_pe_sig - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if 'MZ' signature is found.
+ */
+
+static inline int arm64_header_check_pe_sig(const struct arm64_image_header *h)
+{
+ if (!h)
+ return 0;
+
+ return (h->pe_sig[0] == arm64_image_pe_sig[0]
+ && h->pe_sig[1] == arm64_image_pe_sig[1]);
+}
+
+/**
+ * arm64_header_check_msb - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if the image was built as big endian.
+ */
+
+static inline int arm64_header_check_msb(const struct arm64_image_header *h)
+{
+ if (!h)
+ return 0;
+
+ return !!(h->flags[7] & arm64_image_flag_7_be);
+}
+#endif /* _ASM_KEXE_FILE_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 5df003d6157c..a1161bab6810 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -51,6 +51,7 @@ arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
arm64-obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
arm64-obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o
+arm64-obj-$(CONFIG_KEXEC_FILE_IMAGE_FMT) += kexec_image.o
arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
new file mode 100644
index 000000000000..db4aa1379fec
--- /dev/null
+++ b/arch/arm64/kernel/kexec_image.c
@@ -0,0 +1,112 @@
+/*
+ * Kexec image loader
+
+ * Copyright (C) 2017 Linaro Limited
+ * Authors: AKASHI Takahiro <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#define pr_fmt(fmt) "kexec_file(Image): " fmt
+
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/verification.h>
+#include <asm/byteorder.h>
+#include <asm/kexec_file.h>
+#include <asm/memory.h>
+
+static int image_probe(const char *kernel_buf, unsigned long kernel_len)
+{
+ const struct arm64_image_header *h;
+
+ h = (const struct arm64_image_header *)(kernel_buf);
+
+ if ((kernel_len < sizeof(*h)) || !arm64_header_check_magic(h))
+ return -EINVAL;
+
+ pr_debug("%s: PE format: %s\n", __func__,
+ (arm64_header_check_pe_sig(h) ? "yes" : "no"));
+
+ return 0;
+}
+
+static void *image_load(struct kimage *image, char *kernel,
+ unsigned long kernel_len, char *initrd,
+ unsigned long initrd_len, char *cmdline,
+ unsigned long cmdline_len)
+{
+ struct kexec_buf kbuf;
+ struct arm64_image_header *h = (struct arm64_image_header *)kernel;
+ unsigned long text_offset, kernel_load_addr;
+ int ret;
+
+ /* Create elf core header segment */
+ ret = load_crashdump_segments(image);
+ if (ret)
+ goto out;
+
+ /* Load the kernel */
+ kbuf.image = image;
+ if (image->type == KEXEC_TYPE_CRASH) {
+ kbuf.buf_min = crashk_res.start;
+ kbuf.buf_max = crashk_res.end + 1;
+ } else {
+ kbuf.buf_min = 0;
+ kbuf.buf_max = ULONG_MAX;
+ }
+ kbuf.top_down = 0;
+
+ kbuf.buffer = kernel;
+ kbuf.bufsz = kernel_len;
+ if (h->image_size) {
+ kbuf.memsz = le64_to_cpu(h->image_size);
+ text_offset = le64_to_cpu(h->text_offset);
+ } else {
+ /* v3.16 or older */
+ kbuf.memsz = kbuf.bufsz; /* NOTE: not including BSS */
+ text_offset = 0x80000;
+ }
+ kbuf.buf_align = SZ_2M;
+
+ /* Adjust kernel segment with TEXT_OFFSET */
+ kbuf.memsz += text_offset;
+
+ ret = kexec_add_buffer(&kbuf);
+ if (ret)
+ goto out;
+
+ image->segment[image->nr_segments - 1].mem += text_offset;
+ image->segment[image->nr_segments - 1].memsz -= text_offset;
+ kernel_load_addr = kbuf.mem + text_offset;
+
+ pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+ kernel_load_addr, kbuf.bufsz, kbuf.memsz);
+
+ /* Load additional data */
+ ret = load_other_segments(image, kernel_load_addr,
+ initrd, initrd_len, cmdline, cmdline_len);
+
+out:
+ return ERR_PTR(ret);
+}
+
+#ifdef CONFIG_KEXEC_VERIFY_SIG
+static int image_verify_sig(const char *kernel, unsigned long kernel_len)
+{
+ return verify_pefile_signature(kernel, kernel_len, NULL,
+ VERIFYING_KEXEC_PE_SIGNATURE);
+}
+#endif
+
+struct kexec_file_ops kexec_image_ops = {
+ .probe = image_probe,
+ .load = image_load,
+#ifdef CONFIG_KEXEC_VERIFY_SIG
+ .verify_sig = image_verify_sig,
+#endif
+};
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 012063307001..ab3b19d51727 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -27,7 +27,11 @@
static int __dt_root_addr_cells;
static int __dt_root_size_cells;
-static struct kexec_file_ops *kexec_file_loaders[0];
+static struct kexec_file_ops *kexec_file_loaders[] = {
+#ifdef CONFIG_KEXEC_FILE_IMAGE_FMT
+ &kexec_image_ops,
+#endif
+};
int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
unsigned long buf_len)
--
2.14.1
build_elf_exec_info() can also be useful for other architectures,
including arm64. So let it factored out.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
---
arch/Kconfig | 3 +
arch/powerpc/Kconfig | 1 +
arch/powerpc/kernel/kexec_elf_64.c | 464 -------------------------------------
include/linux/elf.h | 62 +++++
include/linux/kexec.h | 19 ++
kernel/Makefile | 1 +
kernel/kexec_file_elf.c | 454 ++++++++++++++++++++++++++++++++++++
7 files changed, 540 insertions(+), 464 deletions(-)
create mode 100644 kernel/kexec_file_elf.c
diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089117fe..e940d16412f4 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -9,6 +9,9 @@ config KEXEC_CORE
select CRASH_CORE
bool
+config KEXEC_FILE_ELF
+ bool
+
config HAVE_IMA_KEXEC
bool
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 36f858c37ca7..f73921bbe29a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -529,6 +529,7 @@ config KEXEC
config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
+ select KEXEC_FILE_ELF
select HAVE_IMA_KEXEC
select BUILD_BIN2C
depends on PPC64
diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec_elf_64.c
index 9a42309b091a..a0c92bd14259 100644
--- a/arch/powerpc/kernel/kexec_elf_64.c
+++ b/arch/powerpc/kernel/kexec_elf_64.c
@@ -26,475 +26,11 @@
#include <linux/elf.h>
#include <linux/kexec.h>
#include <linux/libfdt.h>
-#include <linux/module.h>
#include <linux/of_fdt.h>
#include <linux/slab.h>
-#include <linux/types.h>
#define PURGATORY_STACK_SIZE (16 * 1024)
-#define elf_addr_to_cpu elf64_to_cpu
-
-#ifndef Elf_Rel
-#define Elf_Rel Elf64_Rel
-#endif /* Elf_Rel */
-
-struct elf_info {
- /*
- * Where the ELF binary contents are kept.
- * Memory managed by the user of the struct.
- */
- const char *buffer;
-
- const struct elfhdr *ehdr;
- const struct elf_phdr *proghdrs;
- struct elf_shdr *sechdrs;
-};
-
-static inline bool elf_is_elf_file(const struct elfhdr *ehdr)
-{
- return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0;
-}
-
-static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value)
-{
- if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
- value = le64_to_cpu(value);
- else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
- value = be64_to_cpu(value);
-
- return value;
-}
-
-static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value)
-{
- if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
- value = le16_to_cpu(value);
- else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
- value = be16_to_cpu(value);
-
- return value;
-}
-
-static uint32_t elf32_to_cpu(const struct elfhdr *ehdr, uint32_t value)
-{
- if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
- value = le32_to_cpu(value);
- else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
- value = be32_to_cpu(value);
-
- return value;
-}
-
-/**
- * elf_is_ehdr_sane - check that it is safe to use the ELF header
- * @buf_len: size of the buffer in which the ELF file is loaded.
- */
-static bool elf_is_ehdr_sane(const struct elfhdr *ehdr, size_t buf_len)
-{
- if (ehdr->e_phnum > 0 && ehdr->e_phentsize != sizeof(struct elf_phdr)) {
- pr_debug("Bad program header size.\n");
- return false;
- } else if (ehdr->e_shnum > 0 &&
- ehdr->e_shentsize != sizeof(struct elf_shdr)) {
- pr_debug("Bad section header size.\n");
- return false;
- } else if (ehdr->e_ident[EI_VERSION] != EV_CURRENT ||
- ehdr->e_version != EV_CURRENT) {
- pr_debug("Unknown ELF version.\n");
- return false;
- }
-
- if (ehdr->e_phoff > 0 && ehdr->e_phnum > 0) {
- size_t phdr_size;
-
- /*
- * e_phnum is at most 65535 so calculating the size of the
- * program header cannot overflow.
- */
- phdr_size = sizeof(struct elf_phdr) * ehdr->e_phnum;
-
- /* Sanity check the program header table location. */
- if (ehdr->e_phoff + phdr_size < ehdr->e_phoff) {
- pr_debug("Program headers at invalid location.\n");
- return false;
- } else if (ehdr->e_phoff + phdr_size > buf_len) {
- pr_debug("Program headers truncated.\n");
- return false;
- }
- }
-
- if (ehdr->e_shoff > 0 && ehdr->e_shnum > 0) {
- size_t shdr_size;
-
- /*
- * e_shnum is at most 65536 so calculating
- * the size of the section header cannot overflow.
- */
- shdr_size = sizeof(struct elf_shdr) * ehdr->e_shnum;
-
- /* Sanity check the section header table location. */
- if (ehdr->e_shoff + shdr_size < ehdr->e_shoff) {
- pr_debug("Section headers at invalid location.\n");
- return false;
- } else if (ehdr->e_shoff + shdr_size > buf_len) {
- pr_debug("Section headers truncated.\n");
- return false;
- }
- }
-
- return true;
-}
-
-static int elf_read_ehdr(const char *buf, size_t len, struct elfhdr *ehdr)
-{
- struct elfhdr *buf_ehdr;
-
- if (len < sizeof(*buf_ehdr)) {
- pr_debug("Buffer is too small to hold ELF header.\n");
- return -ENOEXEC;
- }
-
- memset(ehdr, 0, sizeof(*ehdr));
- memcpy(ehdr->e_ident, buf, sizeof(ehdr->e_ident));
- if (!elf_is_elf_file(ehdr)) {
- pr_debug("No ELF header magic.\n");
- return -ENOEXEC;
- }
-
- if (ehdr->e_ident[EI_CLASS] != ELF_CLASS) {
- pr_debug("Not a supported ELF class.\n");
- return -ENOEXEC;
- } else if (ehdr->e_ident[EI_DATA] != ELFDATA2LSB &&
- ehdr->e_ident[EI_DATA] != ELFDATA2MSB) {
- pr_debug("Not a supported ELF data format.\n");
- return -ENOEXEC;
- }
-
- buf_ehdr = (struct elfhdr *) buf;
- if (elf16_to_cpu(ehdr, buf_ehdr->e_ehsize) != sizeof(*buf_ehdr)) {
- pr_debug("Bad ELF header size.\n");
- return -ENOEXEC;
- }
-
- ehdr->e_type = elf16_to_cpu(ehdr, buf_ehdr->e_type);
- ehdr->e_machine = elf16_to_cpu(ehdr, buf_ehdr->e_machine);
- ehdr->e_version = elf32_to_cpu(ehdr, buf_ehdr->e_version);
- ehdr->e_entry = elf_addr_to_cpu(ehdr, buf_ehdr->e_entry);
- ehdr->e_phoff = elf_addr_to_cpu(ehdr, buf_ehdr->e_phoff);
- ehdr->e_shoff = elf_addr_to_cpu(ehdr, buf_ehdr->e_shoff);
- ehdr->e_flags = elf32_to_cpu(ehdr, buf_ehdr->e_flags);
- ehdr->e_phentsize = elf16_to_cpu(ehdr, buf_ehdr->e_phentsize);
- ehdr->e_phnum = elf16_to_cpu(ehdr, buf_ehdr->e_phnum);
- ehdr->e_shentsize = elf16_to_cpu(ehdr, buf_ehdr->e_shentsize);
- ehdr->e_shnum = elf16_to_cpu(ehdr, buf_ehdr->e_shnum);
- ehdr->e_shstrndx = elf16_to_cpu(ehdr, buf_ehdr->e_shstrndx);
-
- return elf_is_ehdr_sane(ehdr, len) ? 0 : -ENOEXEC;
-}
-
-/**
- * elf_is_phdr_sane - check that it is safe to use the program header
- * @buf_len: size of the buffer in which the ELF file is loaded.
- */
-static bool elf_is_phdr_sane(const struct elf_phdr *phdr, size_t buf_len)
-{
-
- if (phdr->p_offset + phdr->p_filesz < phdr->p_offset) {
- pr_debug("ELF segment location wraps around.\n");
- return false;
- } else if (phdr->p_offset + phdr->p_filesz > buf_len) {
- pr_debug("ELF segment not in file.\n");
- return false;
- } else if (phdr->p_paddr + phdr->p_memsz < phdr->p_paddr) {
- pr_debug("ELF segment address wraps around.\n");
- return false;
- }
-
- return true;
-}
-
-static int elf_read_phdr(const char *buf, size_t len, struct elf_info *elf_info,
- int idx)
-{
- /* Override the const in proghdrs, we are the ones doing the loading. */
- struct elf_phdr *phdr = (struct elf_phdr *) &elf_info->proghdrs[idx];
- const char *pbuf;
- struct elf_phdr *buf_phdr;
-
- pbuf = buf + elf_info->ehdr->e_phoff + (idx * sizeof(*buf_phdr));
- buf_phdr = (struct elf_phdr *) pbuf;
-
- phdr->p_type = elf32_to_cpu(elf_info->ehdr, buf_phdr->p_type);
- phdr->p_offset = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_offset);
- phdr->p_paddr = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_paddr);
- phdr->p_vaddr = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_vaddr);
- phdr->p_flags = elf32_to_cpu(elf_info->ehdr, buf_phdr->p_flags);
-
- /*
- * The following fields have a type equivalent to Elf_Addr
- * both in 32 bit and 64 bit ELF.
- */
- phdr->p_filesz = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_filesz);
- phdr->p_memsz = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_memsz);
- phdr->p_align = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_align);
-
- return elf_is_phdr_sane(phdr, len) ? 0 : -ENOEXEC;
-}
-
-/**
- * elf_read_phdrs - read the program headers from the buffer
- *
- * This function assumes that the program header table was checked for sanity.
- * Use elf_is_ehdr_sane() if it wasn't.
- */
-static int elf_read_phdrs(const char *buf, size_t len,
- struct elf_info *elf_info)
-{
- size_t phdr_size, i;
- const struct elfhdr *ehdr = elf_info->ehdr;
-
- /*
- * e_phnum is at most 65535 so calculating the size of the
- * program header cannot overflow.
- */
- phdr_size = sizeof(struct elf_phdr) * ehdr->e_phnum;
-
- elf_info->proghdrs = kzalloc(phdr_size, GFP_KERNEL);
- if (!elf_info->proghdrs)
- return -ENOMEM;
-
- for (i = 0; i < ehdr->e_phnum; i++) {
- int ret;
-
- ret = elf_read_phdr(buf, len, elf_info, i);
- if (ret) {
- kfree(elf_info->proghdrs);
- elf_info->proghdrs = NULL;
- return ret;
- }
- }
-
- return 0;
-}
-
-/**
- * elf_is_shdr_sane - check that it is safe to use the section header
- * @buf_len: size of the buffer in which the ELF file is loaded.
- */
-static bool elf_is_shdr_sane(const struct elf_shdr *shdr, size_t buf_len)
-{
- bool size_ok;
-
- /* SHT_NULL headers have undefined values, so we can't check them. */
- if (shdr->sh_type == SHT_NULL)
- return true;
-
- /* Now verify sh_entsize */
- switch (shdr->sh_type) {
- case SHT_SYMTAB:
- size_ok = shdr->sh_entsize == sizeof(Elf_Sym);
- break;
- case SHT_RELA:
- size_ok = shdr->sh_entsize == sizeof(Elf_Rela);
- break;
- case SHT_DYNAMIC:
- size_ok = shdr->sh_entsize == sizeof(Elf_Dyn);
- break;
- case SHT_REL:
- size_ok = shdr->sh_entsize == sizeof(Elf_Rel);
- break;
- case SHT_NOTE:
- case SHT_PROGBITS:
- case SHT_HASH:
- case SHT_NOBITS:
- default:
- /*
- * This is a section whose entsize requirements
- * I don't care about. If I don't know about
- * the section I can't care about it's entsize
- * requirements.
- */
- size_ok = true;
- break;
- }
-
- if (!size_ok) {
- pr_debug("ELF section with wrong entry size.\n");
- return false;
- } else if (shdr->sh_addr + shdr->sh_size < shdr->sh_addr) {
- pr_debug("ELF section address wraps around.\n");
- return false;
- }
-
- if (shdr->sh_type != SHT_NOBITS) {
- if (shdr->sh_offset + shdr->sh_size < shdr->sh_offset) {
- pr_debug("ELF section location wraps around.\n");
- return false;
- } else if (shdr->sh_offset + shdr->sh_size > buf_len) {
- pr_debug("ELF section not in file.\n");
- return false;
- }
- }
-
- return true;
-}
-
-static int elf_read_shdr(const char *buf, size_t len, struct elf_info *elf_info,
- int idx)
-{
- struct elf_shdr *shdr = &elf_info->sechdrs[idx];
- const struct elfhdr *ehdr = elf_info->ehdr;
- const char *sbuf;
- struct elf_shdr *buf_shdr;
-
- sbuf = buf + ehdr->e_shoff + idx * sizeof(*buf_shdr);
- buf_shdr = (struct elf_shdr *) sbuf;
-
- shdr->sh_name = elf32_to_cpu(ehdr, buf_shdr->sh_name);
- shdr->sh_type = elf32_to_cpu(ehdr, buf_shdr->sh_type);
- shdr->sh_addr = elf_addr_to_cpu(ehdr, buf_shdr->sh_addr);
- shdr->sh_offset = elf_addr_to_cpu(ehdr, buf_shdr->sh_offset);
- shdr->sh_link = elf32_to_cpu(ehdr, buf_shdr->sh_link);
- shdr->sh_info = elf32_to_cpu(ehdr, buf_shdr->sh_info);
-
- /*
- * The following fields have a type equivalent to Elf_Addr
- * both in 32 bit and 64 bit ELF.
- */
- shdr->sh_flags = elf_addr_to_cpu(ehdr, buf_shdr->sh_flags);
- shdr->sh_size = elf_addr_to_cpu(ehdr, buf_shdr->sh_size);
- shdr->sh_addralign = elf_addr_to_cpu(ehdr, buf_shdr->sh_addralign);
- shdr->sh_entsize = elf_addr_to_cpu(ehdr, buf_shdr->sh_entsize);
-
- return elf_is_shdr_sane(shdr, len) ? 0 : -ENOEXEC;
-}
-
-/**
- * elf_read_shdrs - read the section headers from the buffer
- *
- * This function assumes that the section header table was checked for sanity.
- * Use elf_is_ehdr_sane() if it wasn't.
- */
-static int elf_read_shdrs(const char *buf, size_t len,
- struct elf_info *elf_info)
-{
- size_t shdr_size, i;
-
- /*
- * e_shnum is at most 65536 so calculating
- * the size of the section header cannot overflow.
- */
- shdr_size = sizeof(struct elf_shdr) * elf_info->ehdr->e_shnum;
-
- elf_info->sechdrs = kzalloc(shdr_size, GFP_KERNEL);
- if (!elf_info->sechdrs)
- return -ENOMEM;
-
- for (i = 0; i < elf_info->ehdr->e_shnum; i++) {
- int ret;
-
- ret = elf_read_shdr(buf, len, elf_info, i);
- if (ret) {
- kfree(elf_info->sechdrs);
- elf_info->sechdrs = NULL;
- return ret;
- }
- }
-
- return 0;
-}
-
-/**
- * elf_read_from_buffer - read ELF file and sets up ELF header and ELF info
- * @buf: Buffer to read ELF file from.
- * @len: Size of @buf.
- * @ehdr: Pointer to existing struct which will be populated.
- * @elf_info: Pointer to existing struct which will be populated.
- *
- * This function allows reading ELF files with different byte order than
- * the kernel, byte-swapping the fields as needed.
- *
- * Return:
- * On success returns 0, and the caller should call elf_free_info(elf_info) to
- * free the memory allocated for the section and program headers.
- */
-int elf_read_from_buffer(const char *buf, size_t len, struct elfhdr *ehdr,
- struct elf_info *elf_info)
-{
- int ret;
-
- ret = elf_read_ehdr(buf, len, ehdr);
- if (ret)
- return ret;
-
- elf_info->buffer = buf;
- elf_info->ehdr = ehdr;
- if (ehdr->e_phoff > 0 && ehdr->e_phnum > 0) {
- ret = elf_read_phdrs(buf, len, elf_info);
- if (ret)
- return ret;
- }
- if (ehdr->e_shoff > 0 && ehdr->e_shnum > 0) {
- ret = elf_read_shdrs(buf, len, elf_info);
- if (ret) {
- kfree(elf_info->proghdrs);
- return ret;
- }
- }
-
- return 0;
-}
-
-/**
- * elf_free_info - free memory allocated by elf_read_from_buffer
- */
-void elf_free_info(struct elf_info *elf_info)
-{
- kfree(elf_info->proghdrs);
- kfree(elf_info->sechdrs);
- memset(elf_info, 0, sizeof(*elf_info));
-}
-/**
- * build_elf_exec_info - read ELF executable and check that we can use it
- */
-static int build_elf_exec_info(const char *buf, size_t len, struct elfhdr *ehdr,
- struct elf_info *elf_info)
-{
- int i;
- int ret;
-
- ret = elf_read_from_buffer(buf, len, ehdr, elf_info);
- if (ret)
- return ret;
-
- /* Big endian vmlinux has type ET_DYN. */
- if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) {
- pr_err("Not an ELF executable.\n");
- goto error;
- } else if (!elf_info->proghdrs) {
- pr_err("No ELF program header.\n");
- goto error;
- }
-
- for (i = 0; i < ehdr->e_phnum; i++) {
- /*
- * Kexec does not support loading interpreters.
- * In addition this check keeps us from attempting
- * to kexec ordinay executables.
- */
- if (elf_info->proghdrs[i].p_type == PT_INTERP) {
- pr_err("Requires an ELF interpreter.\n");
- goto error;
- }
- }
-
- return 0;
-error:
- elf_free_info(elf_info);
- return -ENOEXEC;
-}
-
static int elf64_probe(const char *buf, unsigned long len)
{
struct elfhdr ehdr;
diff --git a/include/linux/elf.h b/include/linux/elf.h
index ba069e8f4f78..e758bb4365c1 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -1,6 +1,8 @@
#ifndef _LINUX_ELF_H
#define _LINUX_ELF_H
+#include <linux/types.h>
+#include <asm/byteorder.h>
#include <asm/elf.h>
#include <uapi/linux/elf.h>
@@ -55,4 +57,64 @@ static inline int elf_coredump_extra_notes_write(struct coredump_params *cprm) {
extern int elf_coredump_extra_notes_size(void);
extern int elf_coredump_extra_notes_write(struct coredump_params *cprm);
#endif
+
+static inline u16 elf16_to_cpu(const struct elfhdr *ehdr, u16 value)
+{
+ if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+ value = le16_to_cpu(value);
+ else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+ value = be16_to_cpu(value);
+
+ return value;
+}
+
+static inline u16 cpu_to_elf16(const struct elfhdr *ehdr, u16 value)
+{
+ if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+ value = cpu_to_le16(value);
+ else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+ value = cpu_to_be16(value);
+
+ return value;
+}
+
+static inline u32 elf32_to_cpu(const struct elfhdr *ehdr, u32 value)
+{
+ if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+ value = le32_to_cpu(value);
+ else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+ value = be32_to_cpu(value);
+
+ return value;
+}
+
+static inline u32 cpu_to_elf32(const struct elfhdr *ehdr, u32 value)
+{
+ if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+ value = cpu_to_le32(value);
+ else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+ value = cpu_to_be32(value);
+
+ return value;
+}
+
+static inline u64 elf64_to_cpu(const struct elfhdr *ehdr, u64 value)
+{
+ if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+ value = le64_to_cpu(value);
+ else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+ value = be64_to_cpu(value);
+
+ return value;
+}
+
+static inline u64 cpu_to_elf64(const struct elfhdr *ehdr, u64 value)
+{
+ if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+ value = cpu_to_le64(value);
+ else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+ value = cpu_to_be64(value);
+
+ return value;
+}
#endif /* _LINUX_ELF_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index dd056fab9e35..db98e3459e90 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -22,6 +22,7 @@
#ifdef CONFIG_KEXEC_CORE
#include <linux/list.h>
#include <linux/compat.h>
+#include <linux/elf.h>
#include <linux/ioport.h>
#include <linux/module.h>
#include <asm/kexec.h>
@@ -162,6 +163,24 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
int (*func)(u64, u64, void *));
extern int kexec_add_buffer(struct kexec_buf *kbuf);
int kexec_locate_mem_hole(struct kexec_buf *kbuf);
+
+#ifdef CONFIG_KEXEC_FILE_ELF
+struct elf_info {
+ /*
+ * Where the ELF binary contents are kept.
+ * Memory managed by the user of the struct.
+ */
+ const char *buffer;
+
+ const struct elfhdr *ehdr;
+ const struct elf_phdr *proghdrs;
+ struct elf_shdr *sechdrs;
+};
+
+extern void elf_free_info(struct elf_info *elf_info);
+extern int build_elf_exec_info(const char *buf, size_t len, struct elfhdr *ehdr,
+ struct elf_info *elf_info);
+#endif /* CONFIG_KEXEC_FILE_ELF */
#endif /* CONFIG_KEXEC_FILE */
struct kimage {
diff --git a/kernel/Makefile b/kernel/Makefile
index d5f9748ab19f..d07492eb3804 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o
obj-$(CONFIG_KEXEC_CORE) += kexec_core.o
obj-$(CONFIG_KEXEC) += kexec.o
obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
+obj-$(CONFIG_KEXEC_FILE_ELF) += kexec_file_elf.o
obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
obj-$(CONFIG_COMPAT) += compat.o
obj-$(CONFIG_CGROUPS) += cgroup/
diff --git a/kernel/kexec_file_elf.c b/kernel/kexec_file_elf.c
new file mode 100644
index 000000000000..4fc049e9731b
--- /dev/null
+++ b/kernel/kexec_file_elf.c
@@ -0,0 +1,454 @@
+/*
+ * Load ELF vmlinux file for the kexec_file_load syscall.
+ *
+ * Copyright (C) 2004 Adam Litke ([email protected])
+ * Copyright (C) 2004 IBM Corp.
+ * Copyright (C) 2005 R Sharada ([email protected])
+ * Copyright (C) 2006 Mohan Kumar M ([email protected])
+ * Copyright (C) 2016 IBM Corporation
+ *
+ * Based on kexec-tools' kexec-elf-exec.c and kexec-elf-ppc64.c.
+ * Heavily modified for the kernel by
+ * Thiago Jung Bauermann <[email protected]>.
+ * Factored out for general use by
+ * AKASHI Takahiro <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) "kexec_elf_elf: " fmt
+
+#include <linux/elf.h>
+#include <linux/errno.h>
+#include <linux/kexec.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+#if ELF_CLASS == ELFCLASS32
+#define elf_addr_to_cpu elf32_to_cpu
+#else
+#define elf_addr_to_cpu elf64_to_cpu
+#endif
+
+/**
+ * elf_is_ehdr_sane - check that it is safe to use the ELF header
+ * @buf_len: size of the buffer in which the ELF file is loaded.
+ */
+static bool elf_is_ehdr_sane(const struct elfhdr *ehdr, size_t buf_len)
+{
+ if (ehdr->e_phnum > 0 && ehdr->e_phentsize != sizeof(struct elf_phdr)) {
+ pr_debug("Bad program header size.\n");
+ return false;
+ } else if (ehdr->e_shnum > 0 &&
+ ehdr->e_shentsize != sizeof(struct elf_shdr)) {
+ pr_debug("Bad section header size.\n");
+ return false;
+ } else if (ehdr->e_ident[EI_VERSION] != EV_CURRENT ||
+ ehdr->e_version != EV_CURRENT) {
+ pr_debug("Unknown ELF version.\n");
+ return false;
+ }
+
+ if (ehdr->e_phoff > 0 && ehdr->e_phnum > 0) {
+ size_t phdr_size;
+
+ /*
+ * e_phnum is at most 65535 so calculating the size of the
+ * program header cannot overflow.
+ */
+ phdr_size = sizeof(struct elf_phdr) * ehdr->e_phnum;
+
+ /* Sanity check the program header table location. */
+ if (ehdr->e_phoff + phdr_size < ehdr->e_phoff) {
+ pr_debug("Program headers at invalid location.\n");
+ return false;
+ } else if (ehdr->e_phoff + phdr_size > buf_len) {
+ pr_debug("Program headers truncated.\n");
+ return false;
+ }
+ }
+
+ if (ehdr->e_shoff > 0 && ehdr->e_shnum > 0) {
+ size_t shdr_size;
+
+ /*
+ * e_shnum is at most 65536 so calculating
+ * the size of the section header cannot overflow.
+ */
+ shdr_size = sizeof(struct elf_shdr) * ehdr->e_shnum;
+
+ /* Sanity check the section header table location. */
+ if (ehdr->e_shoff + shdr_size < ehdr->e_shoff) {
+ pr_debug("Section headers at invalid location.\n");
+ return false;
+ } else if (ehdr->e_shoff + shdr_size > buf_len) {
+ pr_debug("Section headers truncated.\n");
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static int elf_read_ehdr(const char *buf, size_t len, struct elfhdr *ehdr)
+{
+ struct elfhdr *buf_ehdr;
+
+ if (len < sizeof(*buf_ehdr)) {
+ pr_debug("Buffer is too small to hold ELF header.\n");
+ return -ENOEXEC;
+ }
+
+ memset(ehdr, 0, sizeof(*ehdr));
+ memcpy(ehdr->e_ident, buf, sizeof(ehdr->e_ident));
+ if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG)) {
+ pr_debug("No ELF header magic.\n");
+ return -ENOEXEC;
+ }
+
+ if (ehdr->e_ident[EI_CLASS] != ELF_CLASS) {
+ pr_debug("Not a supported ELF class.\n");
+ return -ENOEXEC;
+ } else if (ehdr->e_ident[EI_DATA] != ELFDATA2LSB &&
+ ehdr->e_ident[EI_DATA] != ELFDATA2MSB) {
+ pr_debug("Not a supported ELF data format.\n");
+ return -ENOEXEC;
+ }
+
+ buf_ehdr = (struct elfhdr *) buf;
+ if (elf16_to_cpu(ehdr, buf_ehdr->e_ehsize) != sizeof(*buf_ehdr)) {
+ pr_debug("Bad ELF header size.\n");
+ return -ENOEXEC;
+ }
+
+ ehdr->e_type = elf16_to_cpu(ehdr, buf_ehdr->e_type);
+ ehdr->e_machine = elf16_to_cpu(ehdr, buf_ehdr->e_machine);
+ ehdr->e_version = elf32_to_cpu(ehdr, buf_ehdr->e_version);
+ ehdr->e_entry = elf_addr_to_cpu(ehdr, buf_ehdr->e_entry);
+ ehdr->e_phoff = elf_addr_to_cpu(ehdr, buf_ehdr->e_phoff);
+ ehdr->e_shoff = elf_addr_to_cpu(ehdr, buf_ehdr->e_shoff);
+ ehdr->e_flags = elf32_to_cpu(ehdr, buf_ehdr->e_flags);
+ ehdr->e_phentsize = elf16_to_cpu(ehdr, buf_ehdr->e_phentsize);
+ ehdr->e_phnum = elf16_to_cpu(ehdr, buf_ehdr->e_phnum);
+ ehdr->e_shentsize = elf16_to_cpu(ehdr, buf_ehdr->e_shentsize);
+ ehdr->e_shnum = elf16_to_cpu(ehdr, buf_ehdr->e_shnum);
+ ehdr->e_shstrndx = elf16_to_cpu(ehdr, buf_ehdr->e_shstrndx);
+
+ return elf_is_ehdr_sane(ehdr, len) ? 0 : -ENOEXEC;
+}
+
+/**
+ * elf_is_phdr_sane - check that it is safe to use the program header
+ * @buf_len: size of the buffer in which the ELF file is loaded.
+ */
+static bool elf_is_phdr_sane(const struct elf_phdr *phdr, size_t buf_len)
+{
+
+ if (phdr->p_offset + phdr->p_filesz < phdr->p_offset) {
+ pr_debug("ELF segment location wraps around.\n");
+ return false;
+ } else if (phdr->p_offset + phdr->p_filesz > buf_len) {
+ pr_debug("ELF segment not in file.\n");
+ return false;
+ } else if (phdr->p_paddr + phdr->p_memsz < phdr->p_paddr) {
+ pr_debug("ELF segment address wraps around.\n");
+ return false;
+ }
+
+ return true;
+}
+
+static int elf_read_phdr(const char *buf, size_t len, struct elf_info *elf_info,
+ int idx)
+{
+ /* Override the const in proghdrs, we are the ones doing the loading. */
+ struct elf_phdr *phdr = (struct elf_phdr *) &elf_info->proghdrs[idx];
+ const char *pbuf;
+ struct elf_phdr *buf_phdr;
+
+ pbuf = buf + elf_info->ehdr->e_phoff + (idx * sizeof(*buf_phdr));
+ buf_phdr = (struct elf_phdr *) pbuf;
+
+ phdr->p_type = elf32_to_cpu(elf_info->ehdr, buf_phdr->p_type);
+ phdr->p_offset = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_offset);
+ phdr->p_paddr = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_paddr);
+ phdr->p_vaddr = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_vaddr);
+ phdr->p_flags = elf32_to_cpu(elf_info->ehdr, buf_phdr->p_flags);
+
+ /*
+ * The following fields have a type equivalent to Elf_Addr
+ * both in 32 bit and 64 bit ELF.
+ */
+ phdr->p_filesz = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_filesz);
+ phdr->p_memsz = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_memsz);
+ phdr->p_align = elf_addr_to_cpu(elf_info->ehdr, buf_phdr->p_align);
+
+ return elf_is_phdr_sane(phdr, len) ? 0 : -ENOEXEC;
+}
+
+/**
+ * elf_read_phdrs - read the program headers from the buffer
+ *
+ * This function assumes that the program header table was checked for sanity.
+ * Use elf_is_ehdr_sane() if it wasn't.
+ */
+static int elf_read_phdrs(const char *buf, size_t len,
+ struct elf_info *elf_info)
+{
+ size_t phdr_size, i;
+ const struct elfhdr *ehdr = elf_info->ehdr;
+
+ /*
+ * e_phnum is at most 65535 so calculating the size of the
+ * program header cannot overflow.
+ */
+ phdr_size = sizeof(struct elf_phdr) * ehdr->e_phnum;
+
+ elf_info->proghdrs = kzalloc(phdr_size, GFP_KERNEL);
+ if (!elf_info->proghdrs)
+ return -ENOMEM;
+
+ for (i = 0; i < ehdr->e_phnum; i++) {
+ int ret;
+
+ ret = elf_read_phdr(buf, len, elf_info, i);
+ if (ret) {
+ kfree(elf_info->proghdrs);
+ elf_info->proghdrs = NULL;
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * elf_is_shdr_sane - check that it is safe to use the section header
+ * @buf_len: size of the buffer in which the ELF file is loaded.
+ */
+static bool elf_is_shdr_sane(const struct elf_shdr *shdr, size_t buf_len)
+{
+ bool size_ok;
+
+ /* SHT_NULL headers have undefined values, so we can't check them. */
+ if (shdr->sh_type == SHT_NULL)
+ return true;
+
+ /* Now verify sh_entsize */
+ switch (shdr->sh_type) {
+ case SHT_SYMTAB:
+ size_ok = shdr->sh_entsize == sizeof(Elf_Sym);
+ break;
+#ifdef Elf_Rela
+ case SHT_RELA:
+ size_ok = shdr->sh_entsize == sizeof(Elf_Rela);
+ break;
+#endif
+ case SHT_DYNAMIC:
+ size_ok = shdr->sh_entsize == sizeof(Elf_Dyn);
+ break;
+#ifdef Elf_Rel
+ case SHT_REL:
+ size_ok = shdr->sh_entsize == sizeof(Elf_Rel);
+ break;
+#endif
+ case SHT_NOTE:
+ case SHT_PROGBITS:
+ case SHT_HASH:
+ case SHT_NOBITS:
+ default:
+ /*
+ * This is a section whose entsize requirements
+ * I don't care about. If I don't know about
+ * the section I can't care about it's entsize
+ * requirements.
+ */
+ size_ok = true;
+ break;
+ }
+
+ if (!size_ok) {
+ pr_debug("ELF section with wrong entry size.\n");
+ return false;
+ } else if (shdr->sh_addr + shdr->sh_size < shdr->sh_addr) {
+ pr_debug("ELF section address wraps around.\n");
+ return false;
+ }
+
+ if (shdr->sh_type != SHT_NOBITS) {
+ if (shdr->sh_offset + shdr->sh_size < shdr->sh_offset) {
+ pr_debug("ELF section location wraps around.\n");
+ return false;
+ } else if (shdr->sh_offset + shdr->sh_size > buf_len) {
+ pr_debug("ELF section not in file.\n");
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static int elf_read_shdr(const char *buf, size_t len, struct elf_info *elf_info,
+ int idx)
+{
+ struct elf_shdr *shdr = &elf_info->sechdrs[idx];
+ const struct elfhdr *ehdr = elf_info->ehdr;
+ const char *sbuf;
+ struct elf_shdr *buf_shdr;
+
+ sbuf = buf + ehdr->e_shoff + idx * sizeof(*buf_shdr);
+ buf_shdr = (struct elf_shdr *) sbuf;
+
+ shdr->sh_name = elf32_to_cpu(ehdr, buf_shdr->sh_name);
+ shdr->sh_type = elf32_to_cpu(ehdr, buf_shdr->sh_type);
+ shdr->sh_addr = elf_addr_to_cpu(ehdr, buf_shdr->sh_addr);
+ shdr->sh_offset = elf_addr_to_cpu(ehdr, buf_shdr->sh_offset);
+ shdr->sh_link = elf32_to_cpu(ehdr, buf_shdr->sh_link);
+ shdr->sh_info = elf32_to_cpu(ehdr, buf_shdr->sh_info);
+
+ /*
+ * The following fields have a type equivalent to Elf_Addr
+ * both in 32 bit and 64 bit ELF.
+ */
+ shdr->sh_flags = elf_addr_to_cpu(ehdr, buf_shdr->sh_flags);
+ shdr->sh_size = elf_addr_to_cpu(ehdr, buf_shdr->sh_size);
+ shdr->sh_addralign = elf_addr_to_cpu(ehdr, buf_shdr->sh_addralign);
+ shdr->sh_entsize = elf_addr_to_cpu(ehdr, buf_shdr->sh_entsize);
+
+ return elf_is_shdr_sane(shdr, len) ? 0 : -ENOEXEC;
+}
+
+/**
+ * elf_read_shdrs - read the section headers from the buffer
+ *
+ * This function assumes that the section header table was checked for sanity.
+ * Use elf_is_ehdr_sane() if it wasn't.
+ */
+static int elf_read_shdrs(const char *buf, size_t len,
+ struct elf_info *elf_info)
+{
+ size_t shdr_size, i;
+
+ /*
+ * e_shnum is at most 65536 so calculating
+ * the size of the section header cannot overflow.
+ */
+ shdr_size = sizeof(struct elf_shdr) * elf_info->ehdr->e_shnum;
+
+ elf_info->sechdrs = kzalloc(shdr_size, GFP_KERNEL);
+ if (!elf_info->sechdrs)
+ return -ENOMEM;
+
+ for (i = 0; i < elf_info->ehdr->e_shnum; i++) {
+ int ret;
+
+ ret = elf_read_shdr(buf, len, elf_info, i);
+ if (ret) {
+ kfree(elf_info->sechdrs);
+ elf_info->sechdrs = NULL;
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * elf_read_from_buffer - read ELF file and sets up ELF header and ELF info
+ * @buf: Buffer to read ELF file from.
+ * @len: Size of @buf.
+ * @ehdr: Pointer to existing struct which will be populated.
+ * @elf_info: Pointer to existing struct which will be populated.
+ *
+ * This function allows reading ELF files with different byte order than
+ * the kernel, byte-swapping the fields as needed.
+ *
+ * Return:
+ * On success returns 0, and the caller should call elf_free_info(elf_info) to
+ * free the memory allocated for the section and program headers.
+ */
+static int elf_read_from_buffer(const char *buf, size_t len,
+ struct elfhdr *ehdr, struct elf_info *elf_info)
+{
+ int ret;
+
+ ret = elf_read_ehdr(buf, len, ehdr);
+ if (ret)
+ return ret;
+
+ elf_info->buffer = buf;
+ elf_info->ehdr = ehdr;
+ if (ehdr->e_phoff > 0 && ehdr->e_phnum > 0) {
+ ret = elf_read_phdrs(buf, len, elf_info);
+ if (ret)
+ return ret;
+ }
+ if (ehdr->e_shoff > 0 && ehdr->e_shnum > 0) {
+ ret = elf_read_shdrs(buf, len, elf_info);
+ if (ret) {
+ kfree(elf_info->proghdrs);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * elf_free_info - free memory allocated by elf_read_from_buffer
+ */
+void elf_free_info(struct elf_info *elf_info)
+{
+ kfree(elf_info->proghdrs);
+ kfree(elf_info->sechdrs);
+ memset(elf_info, 0, sizeof(*elf_info));
+}
+
+/**
+ * build_elf_exec_info - read ELF executable and check that we can use it
+ */
+int build_elf_exec_info(const char *buf, size_t len, struct elfhdr *ehdr,
+ struct elf_info *elf_info)
+{
+ int i;
+ int ret;
+
+ ret = elf_read_from_buffer(buf, len, ehdr, elf_info);
+ if (ret)
+ return ret;
+
+ /* Big endian vmlinux has type ET_DYN. */
+ if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) {
+ pr_err("Not an ELF executable.\n");
+ goto error;
+ } else if (!elf_info->proghdrs) {
+ pr_err("No ELF program header.\n");
+ goto error;
+ }
+
+ for (i = 0; i < ehdr->e_phnum; i++) {
+ /*
+ * Kexec does not support loading interpreters.
+ * In addition this check keeps us from attempting
+ * to kexec ordinary executables.
+ */
+ if (elf_info->proghdrs[i].p_type == PT_INTERP) {
+ pr_err("Requires an ELF interpreter.\n");
+ goto error;
+ }
+ }
+
+ return 0;
+error:
+ elf_free_info(elf_info);
+ return -ENOEXEC;
+}
--
2.14.1
prepare_elf_headers() can also be useful for other architectures,
including arm64. So let it factored out.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
---
arch/x86/kernel/crash.c | 324 ----------------------------------------------
include/linux/kexec.h | 19 +++
kernel/crash_core.c | 333 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 352 insertions(+), 324 deletions(-)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 44404e2307bb..3c6b880f6dbf 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -21,7 +21,6 @@
#include <linux/elf.h>
#include <linux/elfcore.h>
#include <linux/export.h>
-#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <asm/processor.h>
@@ -41,34 +40,6 @@
/* Alignment required for elf header segment */
#define ELF_CORE_HEADER_ALIGN 4096
-/* This primarily represents number of split ranges due to exclusion */
-#define CRASH_MAX_RANGES 16
-
-struct crash_mem_range {
- u64 start, end;
-};
-
-struct crash_mem {
- unsigned int nr_ranges;
- struct crash_mem_range ranges[CRASH_MAX_RANGES];
-};
-
-/* Misc data about ram ranges needed to prepare elf headers */
-struct crash_elf_data {
- struct kimage *image;
- /*
- * Total number of ram ranges we have after various adjustments for
- * crash reserved region, etc.
- */
- unsigned int max_nr_ranges;
-
- /* Pointer to elf header */
- void *ehdr;
- /* Pointer to next phdr */
- void *bufp;
- struct crash_mem mem;
-};
-
/* Used while preparing memory map entries for second kernel */
struct crash_memmap_data {
struct boot_params *params;
@@ -209,301 +180,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
}
#ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
-{
- unsigned int *nr_ranges = arg;
-
- (*nr_ranges)++;
- return 0;
-}
-
-
-/* Gather all the required information to prepare elf headers for ram regions */
-static void fill_up_crash_elf_data(struct crash_elf_data *ced,
- struct kimage *image)
-{
- unsigned int nr_ranges = 0;
-
- ced->image = image;
-
- walk_system_ram_res(0, -1, &nr_ranges,
- get_nr_ram_ranges_callback);
-
- ced->max_nr_ranges = nr_ranges;
-
- /* Exclusion of crash region could split memory ranges */
- ced->max_nr_ranges++;
-
- /* If crashk_low_res is not 0, another range split possible */
- if (crashk_low_res.end)
- ced->max_nr_ranges++;
-}
-
-static int exclude_mem_range(struct crash_mem *mem,
- unsigned long long mstart, unsigned long long mend)
-{
- int i, j;
- unsigned long long start, end;
- struct crash_mem_range temp_range = {0, 0};
-
- for (i = 0; i < mem->nr_ranges; i++) {
- start = mem->ranges[i].start;
- end = mem->ranges[i].end;
-
- if (mstart > end || mend < start)
- continue;
-
- /* Truncate any area outside of range */
- if (mstart < start)
- mstart = start;
- if (mend > end)
- mend = end;
-
- /* Found completely overlapping range */
- if (mstart == start && mend == end) {
- mem->ranges[i].start = 0;
- mem->ranges[i].end = 0;
- if (i < mem->nr_ranges - 1) {
- /* Shift rest of the ranges to left */
- for (j = i; j < mem->nr_ranges - 1; j++) {
- mem->ranges[j].start =
- mem->ranges[j+1].start;
- mem->ranges[j].end =
- mem->ranges[j+1].end;
- }
- }
- mem->nr_ranges--;
- return 0;
- }
-
- if (mstart > start && mend < end) {
- /* Split original range */
- mem->ranges[i].end = mstart - 1;
- temp_range.start = mend + 1;
- temp_range.end = end;
- } else if (mstart != start)
- mem->ranges[i].end = mstart - 1;
- else
- mem->ranges[i].start = mend + 1;
- break;
- }
-
- /* If a split happend, add the split to array */
- if (!temp_range.end)
- return 0;
-
- /* Split happened */
- if (i == CRASH_MAX_RANGES - 1) {
- pr_err("Too many crash ranges after split\n");
- return -ENOMEM;
- }
-
- /* Location where new range should go */
- j = i + 1;
- if (j < mem->nr_ranges) {
- /* Move over all ranges one slot towards the end */
- for (i = mem->nr_ranges - 1; i >= j; i--)
- mem->ranges[i + 1] = mem->ranges[i];
- }
-
- mem->ranges[j].start = temp_range.start;
- mem->ranges[j].end = temp_range.end;
- mem->nr_ranges++;
- return 0;
-}
-
-/*
- * Look for any unwanted ranges between mstart, mend and remove them. This
- * might lead to split and split ranges are put in ced->mem.ranges[] array
- */
-static int elf_header_exclude_ranges(struct crash_elf_data *ced,
- unsigned long long mstart, unsigned long long mend)
-{
- struct crash_mem *cmem = &ced->mem;
- int ret = 0;
-
- memset(cmem->ranges, 0, sizeof(cmem->ranges));
-
- cmem->ranges[0].start = mstart;
- cmem->ranges[0].end = mend;
- cmem->nr_ranges = 1;
-
- /* Exclude crashkernel region */
- ret = exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
- if (ret)
- return ret;
-
- if (crashk_low_res.end) {
- ret = exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
- if (ret)
- return ret;
- }
-
- return ret;
-}
-
-static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
-{
- struct crash_elf_data *ced = arg;
- Elf64_Ehdr *ehdr;
- Elf64_Phdr *phdr;
- unsigned long mstart, mend;
- struct kimage *image = ced->image;
- struct crash_mem *cmem;
- int ret, i;
-
- ehdr = ced->ehdr;
-
- /* Exclude unwanted mem ranges */
- ret = elf_header_exclude_ranges(ced, start, end);
- if (ret)
- return ret;
-
- /* Go through all the ranges in ced->mem.ranges[] and prepare phdr */
- cmem = &ced->mem;
-
- for (i = 0; i < cmem->nr_ranges; i++) {
- mstart = cmem->ranges[i].start;
- mend = cmem->ranges[i].end;
-
- phdr = ced->bufp;
- ced->bufp += sizeof(Elf64_Phdr);
-
- phdr->p_type = PT_LOAD;
- phdr->p_flags = PF_R|PF_W|PF_X;
- phdr->p_offset = mstart;
-
- /*
- * If a range matches backup region, adjust offset to backup
- * segment.
- */
- if (mstart == image->arch.backup_src_start &&
- (mend - mstart + 1) == image->arch.backup_src_sz)
- phdr->p_offset = image->arch.backup_load_addr;
-
- phdr->p_paddr = mstart;
- phdr->p_vaddr = (unsigned long long) __va(mstart);
- phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
- phdr->p_align = 0;
- ehdr->e_phnum++;
- pr_debug("Crash PT_LOAD elf header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
- phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
- ehdr->e_phnum, phdr->p_offset);
- }
-
- return ret;
-}
-
-static int prepare_elf64_headers(struct crash_elf_data *ced,
- void **addr, unsigned long *sz)
-{
- Elf64_Ehdr *ehdr;
- Elf64_Phdr *phdr;
- unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
- unsigned char *buf, *bufp;
- unsigned int cpu;
- unsigned long long notes_addr;
- int ret;
-
- /* extra phdr for vmcoreinfo elf note */
- nr_phdr = nr_cpus + 1;
- nr_phdr += ced->max_nr_ranges;
-
- /*
- * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
- * area on x86_64 (ffffffff80000000 - ffffffffa0000000).
- * I think this is required by tools like gdb. So same physical
- * memory will be mapped in two elf headers. One will contain kernel
- * text virtual addresses and other will have __va(physical) addresses.
- */
-
- nr_phdr++;
- elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
- elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
-
- buf = vzalloc(elf_sz);
- if (!buf)
- return -ENOMEM;
-
- bufp = buf;
- ehdr = (Elf64_Ehdr *)bufp;
- bufp += sizeof(Elf64_Ehdr);
- memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
- ehdr->e_ident[EI_CLASS] = ELFCLASS64;
- ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
- ehdr->e_ident[EI_VERSION] = EV_CURRENT;
- ehdr->e_ident[EI_OSABI] = ELF_OSABI;
- memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
- ehdr->e_type = ET_CORE;
- ehdr->e_machine = ELF_ARCH;
- ehdr->e_version = EV_CURRENT;
- ehdr->e_phoff = sizeof(Elf64_Ehdr);
- ehdr->e_ehsize = sizeof(Elf64_Ehdr);
- ehdr->e_phentsize = sizeof(Elf64_Phdr);
-
- /* Prepare one phdr of type PT_NOTE for each present cpu */
- for_each_present_cpu(cpu) {
- phdr = (Elf64_Phdr *)bufp;
- bufp += sizeof(Elf64_Phdr);
- phdr->p_type = PT_NOTE;
- notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
- phdr->p_offset = phdr->p_paddr = notes_addr;
- phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
- (ehdr->e_phnum)++;
- }
-
- /* Prepare one PT_NOTE header for vmcoreinfo */
- phdr = (Elf64_Phdr *)bufp;
- bufp += sizeof(Elf64_Phdr);
- phdr->p_type = PT_NOTE;
- phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
- phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
- (ehdr->e_phnum)++;
-
-#ifdef CONFIG_X86_64
- /* Prepare PT_LOAD type program header for kernel text region */
- phdr = (Elf64_Phdr *)bufp;
- bufp += sizeof(Elf64_Phdr);
- phdr->p_type = PT_LOAD;
- phdr->p_flags = PF_R|PF_W|PF_X;
- phdr->p_vaddr = (Elf64_Addr)_text;
- phdr->p_filesz = phdr->p_memsz = _end - _text;
- phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
- (ehdr->e_phnum)++;
-#endif
-
- /* Prepare PT_LOAD headers for system ram chunks. */
- ced->ehdr = ehdr;
- ced->bufp = bufp;
- ret = walk_system_ram_res(0, -1, ced,
- prepare_elf64_ram_headers_callback);
- if (ret < 0)
- return ret;
-
- *addr = buf;
- *sz = elf_sz;
- return 0;
-}
-
-/* Prepare elf headers. Return addr and size */
-static int prepare_elf_headers(struct kimage *image, void **addr,
- unsigned long *sz)
-{
- struct crash_elf_data *ced;
- int ret;
-
- ced = kzalloc(sizeof(*ced), GFP_KERNEL);
- if (!ced)
- return -ENOMEM;
-
- fill_up_crash_elf_data(ced, image);
-
- /* By default prepare 64bit headers */
- ret = prepare_elf64_headers(ced, addr, sz);
- kfree(ced);
- return ret;
-}
-
static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
{
unsigned int nr_e820_entries;
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index db98e3459e90..acaecd72b134 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -163,6 +163,25 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
int (*func)(u64, u64, void *));
extern int kexec_add_buffer(struct kexec_buf *kbuf);
int kexec_locate_mem_hole(struct kexec_buf *kbuf);
+#ifdef CONFIG_CRASH_CORE
+extern int prepare_elf_headers(struct kimage *image, void **addr,
+ unsigned long *sz);
+
+/* This primarily represents number of split ranges due to exclusion */
+#define CRASH_MAX_RANGES 16
+
+struct crash_mem_range {
+ u64 start, end;
+};
+
+struct crash_mem {
+ unsigned int nr_ranges;
+ struct crash_mem_range ranges[CRASH_MAX_RANGES];
+};
+
+extern int exclude_mem_range(struct crash_mem *mem,
+ unsigned long long mstart, unsigned long long mend);
+#endif
#ifdef CONFIG_KEXEC_FILE_ELF
struct elf_info {
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 6db80fc0810b..f2385590e94b 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,6 +7,11 @@
*/
#include <linux/crash_core.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/slab.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
@@ -469,3 +474,331 @@ static int __init crash_save_vmcoreinfo_init(void)
}
subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_KEXEC_FILE
+/*
+ * The following definitions are for local use only.
+ */
+
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN 4096
+
+/* Misc data about ram ranges needed to prepare elf headers */
+struct crash_elf_data {
+ struct kimage *image;
+ /*
+ * Total number of ram ranges we have after various adjustments for
+ * crash reserved region, etc.
+ */
+ unsigned int max_nr_ranges;
+
+ /* Pointer to elf header */
+ void *ehdr;
+ /* Pointer to next phdr */
+ void *bufp;
+ struct crash_mem mem;
+};
+
+static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+{
+ unsigned int *nr_ranges = arg;
+
+ (*nr_ranges)++;
+ return 0;
+}
+
+
+/* Gather all the required information to prepare elf headers for ram regions */
+static void fill_up_crash_elf_data(struct crash_elf_data *ced,
+ struct kimage *image)
+{
+ unsigned int nr_ranges = 0;
+
+ ced->image = image;
+
+ walk_system_ram_res(0, -1, &nr_ranges,
+ get_nr_ram_ranges_callback);
+
+ ced->max_nr_ranges = nr_ranges;
+
+ /* Exclusion of crash region could split memory ranges */
+ ced->max_nr_ranges++;
+
+#ifdef CONFIG_X86_64
+ /* If crashk_low_res is not 0, another range split possible */
+ if (crashk_low_res.end)
+ ced->max_nr_ranges++;
+#endif
+}
+
+int exclude_mem_range(struct crash_mem *mem,
+ unsigned long long mstart, unsigned long long mend)
+{
+ int i, j;
+ unsigned long long start, end;
+ struct crash_mem_range temp_range = {0, 0};
+
+ for (i = 0; i < mem->nr_ranges; i++) {
+ start = mem->ranges[i].start;
+ end = mem->ranges[i].end;
+
+ if (mstart > end || mend < start)
+ continue;
+
+ /* Truncate any area outside of range */
+ if (mstart < start)
+ mstart = start;
+ if (mend > end)
+ mend = end;
+
+ /* Found completely overlapping range */
+ if (mstart == start && mend == end) {
+ mem->ranges[i].start = 0;
+ mem->ranges[i].end = 0;
+ if (i < mem->nr_ranges - 1) {
+ /* Shift rest of the ranges to left */
+ for (j = i; j < mem->nr_ranges - 1; j++) {
+ mem->ranges[j].start =
+ mem->ranges[j+1].start;
+ mem->ranges[j].end =
+ mem->ranges[j+1].end;
+ }
+ }
+ mem->nr_ranges--;
+ return 0;
+ }
+
+ if (mstart > start && mend < end) {
+ /* Split original range */
+ mem->ranges[i].end = mstart - 1;
+ temp_range.start = mend + 1;
+ temp_range.end = end;
+ } else if (mstart != start)
+ mem->ranges[i].end = mstart - 1;
+ else
+ mem->ranges[i].start = mend + 1;
+ break;
+ }
+
+ /* If a split happened, add the split to array */
+ if (!temp_range.end)
+ return 0;
+
+ /* Split happened */
+ if (i == CRASH_MAX_RANGES - 1) {
+ pr_err("Too many crash ranges after split\n");
+ return -ENOMEM;
+ }
+
+ /* Location where new range should go */
+ j = i + 1;
+ if (j < mem->nr_ranges) {
+ /* Move over all ranges one slot towards the end */
+ for (i = mem->nr_ranges - 1; i >= j; i--)
+ mem->ranges[i + 1] = mem->ranges[i];
+ }
+
+ mem->ranges[j].start = temp_range.start;
+ mem->ranges[j].end = temp_range.end;
+ mem->nr_ranges++;
+ return 0;
+}
+
+/*
+ * Look for any unwanted ranges between mstart, mend and remove them. This
+ * might lead to split and split ranges are put in ced->mem.ranges[] array
+ */
+static int elf_header_exclude_ranges(struct crash_elf_data *ced,
+ unsigned long long mstart, unsigned long long mend)
+{
+ struct crash_mem *cmem = &ced->mem;
+ int ret = 0;
+
+ memset(cmem->ranges, 0, sizeof(cmem->ranges));
+
+ cmem->ranges[0].start = mstart;
+ cmem->ranges[0].end = mend;
+ cmem->nr_ranges = 1;
+
+ /* Exclude crashkernel region */
+ ret = exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+ if (ret)
+ return ret;
+
+#ifdef CONFIG_X86_64
+ if (crashk_low_res.end) {
+ ret = exclude_mem_range(cmem, crashk_low_res.start,
+ crashk_low_res.end);
+ if (ret)
+ return ret;
+ }
+#endif
+
+ return ret;
+}
+
+static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
+{
+ struct crash_elf_data *ced = arg;
+ Elf64_Ehdr *ehdr;
+ Elf64_Phdr *phdr;
+ unsigned long mstart, mend;
+#ifdef CONFIG_X86_64
+ struct kimage *image = ced->image;
+#endif
+ struct crash_mem *cmem;
+ int ret, i;
+
+ ehdr = ced->ehdr;
+
+ /* Exclude unwanted mem ranges */
+ ret = elf_header_exclude_ranges(ced, start, end);
+ if (ret)
+ return ret;
+
+ /* Go through all the ranges in ced->mem.ranges[] and prepare phdr */
+ cmem = &ced->mem;
+
+ for (i = 0; i < cmem->nr_ranges; i++) {
+ mstart = cmem->ranges[i].start;
+ mend = cmem->ranges[i].end;
+
+ phdr = ced->bufp;
+ ced->bufp += sizeof(Elf64_Phdr);
+
+ phdr->p_type = PT_LOAD;
+ phdr->p_flags = PF_R|PF_W|PF_X;
+ phdr->p_offset = mstart;
+
+#ifdef CONFIG_X86_64
+ /*
+ * If a range matches backup region, adjust offset to backup
+ * segment.
+ */
+ if (mstart == image->arch.backup_src_start &&
+ (mend - mstart + 1) == image->arch.backup_src_sz)
+ phdr->p_offset = image->arch.backup_load_addr;
+#endif
+
+ phdr->p_paddr = mstart;
+ phdr->p_vaddr = (unsigned long long) __va(mstart);
+ phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
+ phdr->p_align = 0;
+ ehdr->e_phnum++;
+ pr_debug("Crash PT_LOAD elf header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
+ phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
+ ehdr->e_phnum, phdr->p_offset);
+ }
+
+ return ret;
+}
+
+static int prepare_elf64_headers(struct crash_elf_data *ced,
+ void **addr, unsigned long *sz)
+{
+ Elf64_Ehdr *ehdr;
+ Elf64_Phdr *phdr;
+ unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+ unsigned char *buf, *bufp;
+ unsigned int cpu;
+ unsigned long long notes_addr;
+ int ret;
+
+ /* extra phdr for vmcoreinfo elf note */
+ nr_phdr = nr_cpus + 1;
+ nr_phdr += ced->max_nr_ranges;
+
+ /*
+ * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+ * area on x86_64 (ffffffff80000000 - ffffffffa0000000).
+ * I think this is required by tools like gdb. So same physical
+ * memory will be mapped in two elf headers. One will contain kernel
+ * text virtual addresses and other will have __va(physical) addresses.
+ */
+
+ nr_phdr++;
+ elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+ elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+ buf = vzalloc(elf_sz);
+ if (!buf)
+ return -ENOMEM;
+
+ bufp = buf;
+ ehdr = (Elf64_Ehdr *)bufp;
+ bufp += sizeof(Elf64_Ehdr);
+ memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+ ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+ ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+ ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+ ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+ memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+ ehdr->e_type = ET_CORE;
+ ehdr->e_machine = ELF_ARCH;
+ ehdr->e_version = EV_CURRENT;
+ ehdr->e_phoff = sizeof(Elf64_Ehdr);
+ ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+ ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+ /* Prepare one phdr of type PT_NOTE for each present cpu */
+ for_each_present_cpu(cpu) {
+ phdr = (Elf64_Phdr *)bufp;
+ bufp += sizeof(Elf64_Phdr);
+ phdr->p_type = PT_NOTE;
+ notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+ phdr->p_offset = phdr->p_paddr = notes_addr;
+ phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+ (ehdr->e_phnum)++;
+ }
+
+ /* Prepare one PT_NOTE header for vmcoreinfo */
+ phdr = (Elf64_Phdr *)bufp;
+ bufp += sizeof(Elf64_Phdr);
+ phdr->p_type = PT_NOTE;
+ phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
+ phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
+ (ehdr->e_phnum)++;
+
+#ifdef CONFIG_X86_64
+ /* Prepare PT_LOAD type program header for kernel text region */
+ phdr = (Elf64_Phdr *)bufp;
+ bufp += sizeof(Elf64_Phdr);
+ phdr->p_type = PT_LOAD;
+ phdr->p_flags = PF_R|PF_W|PF_X;
+ phdr->p_vaddr = (Elf64_Addr)_text;
+ phdr->p_filesz = phdr->p_memsz = _end - _text;
+ phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
+ (ehdr->e_phnum)++;
+#endif
+
+ /* Prepare PT_LOAD headers for system ram chunks. */
+ ced->ehdr = ehdr;
+ ced->bufp = bufp;
+ ret = walk_system_ram_res(0, -1, ced,
+ prepare_elf64_ram_headers_callback);
+ if (ret < 0)
+ return ret;
+
+ *addr = buf;
+ *sz = elf_sz;
+ return 0;
+}
+
+/* Prepare elf headers. Return addr and size */
+int prepare_elf_headers(struct kimage *image, void **addr, unsigned long *sz)
+{
+ struct crash_elf_data *ced;
+ int ret;
+
+ ced = kzalloc(sizeof(*ced), GFP_KERNEL);
+ if (!ced)
+ return -ENOMEM;
+
+ fill_up_crash_elf_data(ced, image);
+
+ /* By default prepare 64bit headers */
+ ret = prepare_elf64_headers(ced, addr, sz);
+ kfree(ced);
+ return ret;
+}
+#endif /* CONFIG_KEXEC_FILE */
--
2.14.1
In contrast to kexec_add_buffer(), this function assumes that kbuf->mem
is already assigned by caller. This type of allocation is commonly used
in kexec-tools.
This function will be used on arm64 in loading vmlinux(elf) binary.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Baoquan He <[email protected]>
---
include/linux/kexec.h | 1 +
kernel/kexec_file.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index acaecd72b134..be5e99afaf77 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -162,6 +162,7 @@ struct kexec_buf {
int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
int (*func)(u64, u64, void *));
extern int kexec_add_buffer(struct kexec_buf *kbuf);
+extern int kexec_add_segment(struct kexec_buf *kbuf);
int kexec_locate_mem_hole(struct kexec_buf *kbuf);
#ifdef CONFIG_CRASH_CORE
extern int prepare_elf_headers(struct kimage *image, void **addr,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 9f48f4412297..d898dec37816 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -519,6 +519,53 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
return 0;
}
+/**
+ * kexec_add_segment - place a buffer in a kexec segment
+ * @kbuf: Buffer contents and memory parameters.
+ *
+ * In contrast to kexec_add_buffer(), this function assumes
+ * that kbuf->mem is already assigned by caller.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_add_segment(struct kexec_buf *kbuf)
+{
+
+ struct kexec_segment *ksegment;
+
+ /* Currently adding segment this way is allowed only in file mode */
+ if (!kbuf->image->file_mode)
+ return -EINVAL;
+
+ if (kbuf->image->nr_segments >= KEXEC_SEGMENT_MAX)
+ return -EINVAL;
+
+ /*
+ * Make sure we are not trying to add buffer after allocating
+ * control pages. All segments need to be placed first before
+ * any control pages are allocated. As control page allocation
+ * logic goes through list of segments to make sure there are
+ * no destination overlaps.
+ */
+ if (!list_empty(&kbuf->image->control_pages)) {
+ WARN_ON(1);
+ return -EINVAL;
+ }
+
+ /* Ensure minimum alignment needed for segments. */
+ kbuf->memsz = ALIGN(kbuf->memsz, PAGE_SIZE);
+
+ /* Found a suitable memory range */
+ ksegment = &kbuf->image->segment[kbuf->image->nr_segments];
+ ksegment->kbuf = kbuf->buffer;
+ ksegment->bufsz = kbuf->bufsz;
+ ksegment->mem = kbuf->mem;
+ ksegment->memsz = kbuf->memsz;
+ kbuf->image->nr_segments++;
+
+ return 0;
+}
+
/* Calculate and store the digest of segments */
static int kexec_calculate_store_digests(struct kimage *image)
{
--
2.14.1
Modify arm64/Kconfig and Makefile to enable kexec_file_load support.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 22 ++++++++++++++++++++++
arch/arm64/kernel/Makefile | 2 +-
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd908630631..cf10bc720d9e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -756,6 +756,28 @@ config KEXEC
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.
+config KEXEC_FILE
+ bool "kexec file based system call"
+ select KEXEC_CORE
+ select BUILD_BIN2C
+ ---help---
+ This is new version of kexec system call. This system call is
+ file based and takes file descriptors as system call argument
+ for kernel and initramfs as opposed to list of segments as
+ accepted by previous system call.
+
+config KEXEC_VERIFY_SIG
+ bool "Verify kernel signature during kexec_file_load() syscall"
+ depends on KEXEC_FILE
+ select SYSTEM_DATA_VERIFICATION
+ ---help---
+ This option makes kernel signature verification mandatory for
+ the kexec_file_load() syscall.
+
+ In addition to that option, you need to enable signature
+ verification for the corresponding kernel image type being
+ loaded in order for this to work.
+
config CRASH_DUMP
bool "Build kdump crash kernel"
help
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 16e9f56b536a..5df003d6157c 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,7 +48,7 @@ arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL) += acpi_parking_protocol.o
arm64-obj-$(CONFIG_PARAVIRT) += paravirt.o
arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
-arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \
+arm64-obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o \
cpu-reset.o
arm64-obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o
arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
--
2.14.1
The initial user of this system call number is arm64.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Arnd Bergmann <[email protected]>
---
include/uapi/asm-generic/unistd.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 061185a5eb51..086697fe3917 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -731,9 +731,11 @@ __SYSCALL(__NR_pkey_alloc, sys_pkey_alloc)
__SYSCALL(__NR_pkey_free, sys_pkey_free)
#define __NR_statx 291
__SYSCALL(__NR_statx, sys_statx)
+#define __NR_kexec_file_load 292
+__SYSCALL(__NR_kexec_file_load, sys_kexec_file_load)
#undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
/*
* All syscalls below here should go away really,
--
2.14.1
Most of sha256 code is based on crypto/sha256-glue.c, particularly using
non-neon version.
Please note that we won't be able to re-use lib/mem*.S for purgatory
because unaligned memory access is not allowed in purgatory where mmu
is turned off.
Since purgatory is not linked with the other part of kernel, care must be
taken of selecting an appropriate set of compiler options in order to
prevent undefined symbol references from being generated.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
---
arch/arm64/crypto/sha256-core.S_shipped | 2 +
arch/arm64/purgatory/Makefile | 21 ++++++++-
arch/arm64/purgatory/entry.S | 13 ++++++
arch/arm64/purgatory/purgatory.c | 20 +++++++++
arch/arm64/purgatory/sha256-core.S | 1 +
arch/arm64/purgatory/sha256.c | 79 +++++++++++++++++++++++++++++++++
arch/arm64/purgatory/sha256.h | 1 +
arch/arm64/purgatory/string.c | 32 +++++++++++++
arch/arm64/purgatory/string.h | 5 +++
9 files changed, 173 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/purgatory/purgatory.c
create mode 100644 arch/arm64/purgatory/sha256-core.S
create mode 100644 arch/arm64/purgatory/sha256.c
create mode 100644 arch/arm64/purgatory/sha256.h
create mode 100644 arch/arm64/purgatory/string.c
create mode 100644 arch/arm64/purgatory/string.h
diff --git a/arch/arm64/crypto/sha256-core.S_shipped b/arch/arm64/crypto/sha256-core.S_shipped
index 3ce82cc860bc..9ce7419c9152 100644
--- a/arch/arm64/crypto/sha256-core.S_shipped
+++ b/arch/arm64/crypto/sha256-core.S_shipped
@@ -1210,6 +1210,7 @@ sha256_block_armv8:
ret
.size sha256_block_armv8,.-sha256_block_armv8
#endif
+#ifndef __PURGATORY__
#ifdef __KERNEL__
.globl sha256_block_neon
#endif
@@ -2056,6 +2057,7 @@ sha256_block_neon:
add sp,sp,#16*4+16
ret
.size sha256_block_neon,.-sha256_block_neon
+#endif
#ifndef __KERNEL__
.comm OPENSSL_armcap_P,4,4
#endif
diff --git a/arch/arm64/purgatory/Makefile b/arch/arm64/purgatory/Makefile
index c2127a2cbd51..d9b38be31e0a 100644
--- a/arch/arm64/purgatory/Makefile
+++ b/arch/arm64/purgatory/Makefile
@@ -1,14 +1,33 @@
OBJECT_FILES_NON_STANDARD := y
-purgatory-y := entry.o
+purgatory-y := entry.o purgatory.o sha256.o sha256-core.o string.o
targets += $(purgatory-y)
PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
+# Purgatory is expected to be ET_REL, not an executable
LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined \
-nostdlib -z nodefaultlib
+
targets += purgatory.ro
+GCOV_PROFILE := n
+KASAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+
+# Some kernel configurations may generate additional code containing
+# undefined symbols, like _mcount for ftrace and __stack_chk_guard
+# for stack-protector. Those should be removed from purgatory.
+
+CFLAGS_REMOVE_purgatory.o = -pg
+CFLAGS_REMOVE_sha256.o = -pg
+CFLAGS_REMOVE_string.o = -pg
+
+NO_PROTECTOR := $(call cc-option, -fno-stack-protector)
+KBUILD_CFLAGS += $(NO_PROTECTOR)
+
+KBUILD_AFLAGS += -D__PURGATORY__
+
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
index bc4e6b3bf8a1..74d028b838bd 100644
--- a/arch/arm64/purgatory/entry.S
+++ b/arch/arm64/purgatory/entry.S
@@ -6,6 +6,11 @@
.text
ENTRY(purgatory_start)
+ adr x19, .Lstack
+ mov sp, x19
+
+ bl purgatory
+
/* Start new image. */
ldr x17, arm64_kernel_entry
ldr x0, arm64_dtb_addr
@@ -15,6 +20,14 @@ ENTRY(purgatory_start)
br x17
END(purgatory_start)
+.ltorg
+
+.align 4
+ .rept 256
+ .quad 0
+ .endr
+.Lstack:
+
.data
.align 3
diff --git a/arch/arm64/purgatory/purgatory.c b/arch/arm64/purgatory/purgatory.c
new file mode 100644
index 000000000000..7fcbefa786bc
--- /dev/null
+++ b/arch/arm64/purgatory/purgatory.c
@@ -0,0 +1,20 @@
+/*
+ * purgatory: Runs between two kernels
+ *
+ * Copyright (c) 2017 Linaro Limited
+ * Author: AKASHI Takahiro <[email protected]>
+ */
+
+#include "sha256.h"
+
+void purgatory(void)
+{
+ int ret;
+
+ ret = verify_sha256_digest();
+ if (ret) {
+ /* loop forever */
+ for (;;)
+ ;
+ }
+}
diff --git a/arch/arm64/purgatory/sha256-core.S b/arch/arm64/purgatory/sha256-core.S
new file mode 100644
index 000000000000..24f5ce25b61e
--- /dev/null
+++ b/arch/arm64/purgatory/sha256-core.S
@@ -0,0 +1 @@
+#include "../crypto/sha256-core.S_shipped"
diff --git a/arch/arm64/purgatory/sha256.c b/arch/arm64/purgatory/sha256.c
new file mode 100644
index 000000000000..5d20d81767e3
--- /dev/null
+++ b/arch/arm64/purgatory/sha256.c
@@ -0,0 +1,79 @@
+#include <linux/kexec.h>
+#include <linux/purgatory.h>
+#include <linux/types.h>
+
+/*
+ * Under KASAN, those are defined as un-instrumented version, __memxxx()
+ */
+#undef memcmp
+#undef memcpy
+#undef memset
+
+#include "string.h"
+#include <crypto/hash.h>
+#include <crypto/sha.h>
+#include <crypto/sha256_base.h>
+
+u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
+struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX]
+ __section(.kexec-purgatory);
+
+asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
+ unsigned int num_blks);
+
+static int sha256_init(struct shash_desc *desc)
+{
+ return sha256_base_init(desc);
+}
+
+static int sha256_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ return sha256_base_do_update(desc, data, len,
+ (sha256_block_fn *)sha256_block_data_order);
+}
+
+static int __sha256_base_finish(struct shash_desc *desc, u8 *out)
+{
+ /* we can't do crypto_shash_digestsize(desc->tfm) */
+ unsigned int digest_size = 32;
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *digest = (__be32 *)out;
+ int i;
+
+ for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32))
+ put_unaligned_be32(sctx->state[i], digest++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+ sha256_base_do_finalize(desc,
+ (sha256_block_fn *)sha256_block_data_order);
+
+ return __sha256_base_finish(desc, out);
+}
+
+int verify_sha256_digest(void)
+{
+ char __sha256_desc[sizeof(struct shash_desc) +
+ sizeof(struct sha256_state)] CRYPTO_MINALIGN_ATTR;
+ struct shash_desc *desc = (struct shash_desc *)__sha256_desc;
+ struct kexec_sha_region *ptr, *end;
+ u8 digest[SHA256_DIGEST_SIZE];
+
+ sha256_init(desc);
+
+ end = purgatory_sha_regions + ARRAY_SIZE(purgatory_sha_regions);
+ for (ptr = purgatory_sha_regions; ptr < end; ptr++)
+ sha256_update(desc, (uint8_t *)(ptr->start), ptr->len);
+
+ sha256_final(desc, digest);
+
+ if (memcmp(digest, purgatory_sha256_digest, sizeof(digest)))
+ return 1;
+
+ return 0;
+}
diff --git a/arch/arm64/purgatory/sha256.h b/arch/arm64/purgatory/sha256.h
new file mode 100644
index 000000000000..54dc3c33c469
--- /dev/null
+++ b/arch/arm64/purgatory/sha256.h
@@ -0,0 +1 @@
+extern int verify_sha256_digest(void);
diff --git a/arch/arm64/purgatory/string.c b/arch/arm64/purgatory/string.c
new file mode 100644
index 000000000000..33233a210a65
--- /dev/null
+++ b/arch/arm64/purgatory/string.c
@@ -0,0 +1,32 @@
+#include <linux/types.h>
+
+void *memcpy(void *dst, const void *src, size_t len)
+{
+ int i;
+
+ for (i = 0; i < len; i++)
+ ((u8 *)dst)[i] = ((u8 *)src)[i];
+
+ return NULL;
+}
+
+void *memset(void *dst, int c, size_t len)
+{
+ int i;
+
+ for (i = 0; i < len; i++)
+ ((u8 *)dst)[i] = (u8)c;
+
+ return NULL;
+}
+
+int memcmp(const void *src, const void *dst, size_t len)
+{
+ int i;
+
+ for (i = 0; i < len; i++)
+ if (*(char *)src != *(char *)dst)
+ return 1;
+
+ return 0;
+}
diff --git a/arch/arm64/purgatory/string.h b/arch/arm64/purgatory/string.h
new file mode 100644
index 000000000000..cb5f68dd84ef
--- /dev/null
+++ b/arch/arm64/purgatory/string.h
@@ -0,0 +1,5 @@
+#include <linux/types.h>
+
+int memcmp(const void *s1, const void *s2, size_t len);
+void *memcpy(void *dst, const void *src, size_t len);
+void *memset(void *dst, int c, size_t len);
--
2.14.1
load_crashdump_segments() creates and loads a memory segment of elf core
header for crash dump.
"linux,usable-memory-range" and "linux,elfcorehdr" will add to the 2nd
kernel's device-tree blob. The logic of this cod is also from kexec-tools.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/kexec.h | 5 ++
arch/arm64/kernel/machine_kexec_file.c | 145 +++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index ebc4aaa707ae..46b21512efbe 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -98,6 +98,10 @@ static inline void crash_post_resume(void) {}
struct kimage_arch {
void *dtb_buf;
+ /* Core ELF header buffer */
+ void *elf_headers;
+ unsigned long elf_headers_sz;
+ unsigned long elf_load_addr;
};
struct kimage;
@@ -109,6 +113,7 @@ extern int load_other_segments(struct kimage *image,
unsigned long kernel_load_addr,
char *initrd, unsigned long initrd_len,
char *cmdline, unsigned long cmdline_len);
+extern int load_crashdump_segments(struct kimage *image);
#endif
#endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index cd12e451e474..012063307001 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -281,6 +281,77 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
}
+static int __init arch_kexec_file_init(void)
+{
+ /* Those values are used later on loading the kernel */
+ __dt_root_addr_cells = dt_root_addr_cells;
+ __dt_root_size_cells = dt_root_size_cells;
+
+ return 0;
+}
+late_initcall(arch_kexec_file_init);
+
+#define FDT_ALIGN(x, a) (((x) + (a) - 1) & ~((a) - 1))
+#define FDT_TAGALIGN(x) (FDT_ALIGN((x), FDT_TAGSIZE))
+
+static int fdt_prop_len(const char *prop_name, int len)
+{
+ return (strlen(prop_name) + 1) +
+ sizeof(struct fdt_property) +
+ FDT_TAGALIGN(len);
+}
+
+static bool cells_size_fitted(unsigned long base, unsigned long size)
+{
+ /* if *_cells >= 2, cells can hold 64-bit values anyway */
+ if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
+ return false;
+
+ if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
+ return false;
+
+ return true;
+}
+
+static void fill_property(void *buf, u64 val64, int cells)
+{
+ u32 val32;
+ int i;
+
+ if (cells == 1) {
+ val32 = cpu_to_fdt32((u32)val64);
+ memcpy(buf, &val32, sizeof(val32));
+ } else {
+ for (i = 0; i < (cells * sizeof(u32) - sizeof(u64)); i++)
+ *(char *)buf++ = 0;
+
+ val64 = cpu_to_fdt64(val64);
+ memcpy(buf, &val64, sizeof(val64));
+ }
+}
+
+static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
+ unsigned long addr, unsigned long size)
+{
+ u64 range[2];
+ void *prop;
+ size_t buf_size;
+ int result;
+
+ prop = range;
+ buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+
+ fill_property(prop, addr, __dt_root_addr_cells);
+ prop += __dt_root_addr_cells * sizeof(u32);
+
+ fill_property(prop, size, __dt_root_size_cells);
+ prop += __dt_root_size_cells * sizeof(u32);
+
+ result = fdt_setprop(fdt, nodeoffset, name, range, buf_size);
+
+ return result;
+}
+
int setup_dtb(struct kimage *image,
unsigned long initrd_load_addr, unsigned long initrd_len,
char *cmdline, unsigned long cmdline_len,
@@ -293,10 +364,26 @@ int setup_dtb(struct kimage *image,
int range_len;
int ret;
+ /* check ranges against root's #address-cells and #size-cells */
+ if (image->type == KEXEC_TYPE_CRASH &&
+ (!cells_size_fitted(image->arch.elf_load_addr,
+ image->arch.elf_headers_sz) ||
+ !cells_size_fitted(crashk_res.start,
+ crashk_res.end - crashk_res.start + 1))) {
+ pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
+ ret = -EINVAL;
+ goto out_err;
+ }
+
/* duplicate dt blob */
buf_size = fdt_totalsize(initial_boot_params);
range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+ if (image->type == KEXEC_TYPE_CRASH)
+ buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
+ + fdt_prop_len("linux,usable-memory-range",
+ range_len);
+
if (initrd_load_addr)
buf_size += fdt_prop_len("initrd-start", sizeof(u64))
+ fdt_prop_len("initrd-end", sizeof(u64));
@@ -318,6 +405,23 @@ int setup_dtb(struct kimage *image,
if (nodeoffset < 0)
goto out_err;
+ if (image->type == KEXEC_TYPE_CRASH) {
+ /* add linux,elfcorehdr */
+ ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
+ image->arch.elf_load_addr,
+ image->arch.elf_headers_sz);
+ if (ret)
+ goto out_err;
+
+ /* add linux,usable-memory-range */
+ ret = fdt_setprop_range(buf, nodeoffset,
+ "linux,usable-memory-range",
+ crashk_res.start,
+ crashk_res.end - crashk_res.start + 1);
+ if (ret)
+ goto out_err;
+ }
+
/* add bootargs */
if (cmdline) {
ret = fdt_setprop(buf, nodeoffset, "bootargs",
@@ -452,3 +556,44 @@ int load_other_segments(struct kimage *image, unsigned long kernel_load_addr,
image->arch.dtb_buf = NULL;
return ret;
}
+
+int load_crashdump_segments(struct kimage *image)
+{
+ void *elf_addr;
+ unsigned long elf_sz;
+ struct kexec_buf kbuf;
+ int ret;
+
+ if (image->type != KEXEC_TYPE_CRASH)
+ return 0;
+
+ /* Prepare elf headers and add a segment */
+ ret = prepare_elf_headers(image, &elf_addr, &elf_sz);
+ if (ret) {
+ pr_err("Preparing elf core header failed\n");
+ return ret;
+ }
+
+ kbuf.image = image;
+ kbuf.buffer = elf_addr;
+ kbuf.bufsz = elf_sz;
+ kbuf.memsz = elf_sz;
+ kbuf.buf_align = PAGE_SIZE;
+ kbuf.buf_min = crashk_res.start;
+ kbuf.buf_max = crashk_res.end + 1;
+ kbuf.top_down = 1;
+
+ ret = kexec_add_buffer(&kbuf);
+ if (ret) {
+ vfree(elf_addr);
+ return ret;
+ }
+ image->arch.elf_headers = elf_addr;
+ image->arch.elf_headers_sz = elf_sz;
+ image->arch.elf_load_addr = kbuf.mem;
+
+ pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+ image->arch.elf_load_addr, elf_sz, elf_sz);
+
+ return ret;
+}
--
2.14.1
load_other_segments() sets up and adds all the memory segments necessary
other than kernel, including initrd, device-tree blob and purgatory.
Most of the code was borrowed from kexec-tools' counterpart.
In addition, arch_kexec_image_probe(), arch_kexec_image_load() and
arch_kexec_kernel_verify_sig() are stubs for supporting multiple types
of kernel image formats.
Signed-off-by: AKASHI Takahiro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/kexec.h | 18 +++
arch/arm64/kernel/machine_kexec_file.c | 255 +++++++++++++++++++++++++++++++++
2 files changed, 273 insertions(+)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index e17f0529a882..ebc4aaa707ae 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -93,6 +93,24 @@ static inline void crash_prepare_suspend(void) {}
static inline void crash_post_resume(void) {}
#endif
+#ifdef CONFIG_KEXEC_FILE
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+ void *dtb_buf;
+};
+
+struct kimage;
+extern int setup_dtb(struct kimage *image,
+ unsigned long initrd_load_addr, unsigned long initrd_len,
+ char *cmdline, unsigned long cmdline_len,
+ char **dtb_buf, size_t *dtb_buf_len);
+extern int load_other_segments(struct kimage *image,
+ unsigned long kernel_load_addr,
+ char *initrd, unsigned long initrd_len,
+ char *cmdline, unsigned long cmdline_len);
+#endif
+
#endif /* __ASSEMBLY__ */
#endif
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 183f7776d6dd..cd12e451e474 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -16,8 +16,78 @@
#include <linux/elf.h>
#include <linux/errno.h>
#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/libfdt.h>
+#include <linux/memblock.h>
+#include <linux/of_fdt.h>
#include <linux/types.h>
#include <asm/byteorder.h>
+#include <asm/kexec_file.h>
+
+static int __dt_root_addr_cells;
+static int __dt_root_size_cells;
+
+static struct kexec_file_ops *kexec_file_loaders[0];
+
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len)
+{
+ struct kexec_file_ops *fops;
+ int i, ret;
+
+ for (i = 0; i < ARRAY_SIZE(kexec_file_loaders); i++) {
+ fops = kexec_file_loaders[i];
+ if (!fops || !fops->probe)
+ continue;
+
+ ret = fops->probe(buf, buf_len);
+ if (!ret) {
+ image->fops = fops;
+ return 0;
+ }
+ }
+
+ return -ENOEXEC;
+}
+
+void *arch_kexec_kernel_image_load(struct kimage *image)
+{
+ if (!image->fops || !image->fops->load)
+ return ERR_PTR(-ENOEXEC);
+
+ return image->fops->load(image, image->kernel_buf,
+ image->kernel_buf_len, image->initrd_buf,
+ image->initrd_buf_len, image->cmdline_buf,
+ image->cmdline_buf_len);
+}
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+ vfree(image->arch.dtb_buf);
+ image->arch.dtb_buf = NULL;
+
+ vfree(image->arch.elf_headers);
+ image->arch.elf_headers = NULL;
+ image->arch.elf_headers_sz = 0;
+
+ if (!image->fops || !image->fops->cleanup)
+ return 0;
+
+ return image->fops->cleanup(image->image_loader_data);
+}
+
+#ifdef CONFIG_KEXEC_VERIFY_SIG
+int arch_kexec_kernel_verify_sig(struct kimage *image, void *kernel,
+ unsigned long kernel_len)
+{
+ if (!image->fops || !image->fops->verify_sig) {
+ pr_debug("kernel loader does not support signature verification.\n");
+ return -EKEYREJECTED;
+ }
+
+ return image->fops->verify_sig(kernel, kernel_len);
+}
+#endif
/*
* Apply purgatory relocations.
@@ -197,3 +267,188 @@ int arch_kexec_apply_relocations_add(const Elf64_Ehdr *ehdr,
return 0;
}
+
+int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
+{
+ if (kbuf->image->type == KEXEC_TYPE_CRASH)
+ return walk_iomem_res_desc(crashk_res.desc,
+ IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
+ crashk_res.start, crashk_res.end,
+ kbuf, func);
+ else if (kbuf->top_down)
+ return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
+ else
+ return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
+}
+
+int setup_dtb(struct kimage *image,
+ unsigned long initrd_load_addr, unsigned long initrd_len,
+ char *cmdline, unsigned long cmdline_len,
+ char **dtb_buf, size_t *dtb_buf_len)
+{
+ char *buf = NULL;
+ size_t buf_size;
+ int nodeoffset;
+ u64 value;
+ int range_len;
+ int ret;
+
+ /* duplicate dt blob */
+ buf_size = fdt_totalsize(initial_boot_params);
+ range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+
+ if (initrd_load_addr)
+ buf_size += fdt_prop_len("initrd-start", sizeof(u64))
+ + fdt_prop_len("initrd-end", sizeof(u64));
+
+ if (cmdline)
+ buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
+
+ buf = vmalloc(buf_size);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ ret = fdt_open_into(initial_boot_params, buf, buf_size);
+ if (ret)
+ goto out_err;
+
+ nodeoffset = fdt_path_offset(buf, "/chosen");
+ if (nodeoffset < 0)
+ goto out_err;
+
+ /* add bootargs */
+ if (cmdline) {
+ ret = fdt_setprop(buf, nodeoffset, "bootargs",
+ cmdline, cmdline_len + 1);
+ if (ret)
+ goto out_err;
+ }
+
+ /* add initrd-* */
+ if (initrd_load_addr) {
+ value = cpu_to_fdt64(initrd_load_addr);
+ ret = fdt_setprop(buf, nodeoffset, "initrd-start",
+ &value, sizeof(value));
+ if (ret)
+ goto out_err;
+
+ value = cpu_to_fdt64(initrd_load_addr + initrd_len);
+ ret = fdt_setprop(buf, nodeoffset, "initrd-end",
+ &value, sizeof(value));
+ if (ret)
+ goto out_err;
+ }
+
+ /* trim a buffer */
+ fdt_pack(buf);
+ *dtb_buf = buf;
+ *dtb_buf_len = fdt_totalsize(buf);
+
+ return 0;
+
+out_err:
+ vfree(buf);
+ return ret;
+}
+
+int load_other_segments(struct kimage *image, unsigned long kernel_load_addr,
+ char *initrd, unsigned long initrd_len,
+ char *cmdline, unsigned long cmdline_len)
+{
+ struct kexec_buf kbuf;
+ unsigned long initrd_load_addr = 0;
+ unsigned long purgatory_load_addr, dtb_load_addr;
+ char *dtb = NULL;
+ unsigned long dtb_len;
+ int ret = 0;
+
+ kbuf.image = image;
+
+ /* Load initrd */
+ if (initrd) {
+ kbuf.buffer = initrd;
+ kbuf.bufsz = initrd_len;
+ kbuf.memsz = initrd_len;
+ kbuf.buf_align = PAGE_SIZE;
+ /* within 1GB-aligned window of up to 32GB in size */
+ kbuf.buf_min = kernel_load_addr;
+ kbuf.buf_max = round_down(kernel_load_addr, SZ_1G)
+ + (unsigned long)SZ_1G * 31;
+ kbuf.top_down = 0;
+
+ ret = kexec_add_buffer(&kbuf);
+ if (ret)
+ goto out_err;
+ initrd_load_addr = kbuf.mem;
+
+ pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+ initrd_load_addr, initrd_len, initrd_len);
+ }
+
+ /* Load dtb blob */
+ ret = setup_dtb(image, initrd_load_addr, initrd_len,
+ cmdline, cmdline_len, &dtb, &dtb_len);
+ if (ret) {
+ pr_err("Preparing for new dtb failed\n");
+ goto out_err;
+ }
+
+ kbuf.buffer = dtb;
+ kbuf.bufsz = dtb_len;
+ kbuf.memsz = dtb_len;
+ /* not across 2MB boundary */
+ kbuf.buf_align = SZ_2M;
+ /*
+ * Note for backporting:
+ * On kernel prior to v4.2, fdt must reside within 512MB block
+ * where the kernel also resides. So
+ * kbuf.buf_min = round_down(kernel_load_addr, SZ_512M);
+ * kbuf.buf_max = round_up(kernel_load_addr, SZ_512M);
+ * would be required.
+ */
+ kbuf.buf_min = kernel_load_addr;
+ kbuf.buf_max = ULONG_MAX;
+ kbuf.top_down = 1;
+
+ ret = kexec_add_buffer(&kbuf);
+ if (ret)
+ goto out_err;
+ dtb_load_addr = kbuf.mem;
+ image->arch.dtb_buf = dtb;
+
+ pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+ dtb_load_addr, dtb_len, dtb_len);
+
+ /* Load purgatory */
+ ret = kexec_load_purgatory(image, kernel_load_addr, ULONG_MAX, 1,
+ &purgatory_load_addr);
+ if (ret) {
+ pr_err("Loading purgatory failed\n");
+ goto out_err;
+ }
+
+ ret = kexec_purgatory_get_set_symbol(image, "arm64_kernel_entry",
+ &kernel_load_addr, sizeof(kernel_load_addr), 0);
+ if (ret) {
+ pr_err("Relocating symbol (arm64_kernel_entry) failed.\n");
+ goto out_err;
+ }
+
+ ret = kexec_purgatory_get_set_symbol(image, "arm64_dtb_addr",
+ &dtb_load_addr, sizeof(dtb_load_addr), 0);
+ if (ret) {
+ pr_err("Relocating symbol (arm64_dtb_addr) failed.\n");
+ goto out_err;
+ }
+
+ pr_debug("Loaded purgatory at 0x%lx\n", purgatory_load_addr);
+
+ return 0;
+
+out_err:
+ vfree(dtb);
+ image->arch.dtb_buf = NULL;
+ return ret;
+}
--
2.14.1
On 24 August 2017 at 09:17, AKASHI Takahiro <[email protected]> wrote:
> message[] field won't be part of the definition of mz header.
>
> This change is crucial for enabling kexec_file_load on arm64 because
> arm64's "Image" binary, as in PE format, doesn't have any data for it and
> accordingly the following check in pefile_parse_binary() will fail:
>
> chkaddr(cursor, mz->peaddr, sizeof(*pe));
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: David Howells <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Herbert Xu <[email protected]>
> Cc: David S. Miller <[email protected]>
> ---
> include/linux/pe.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/pe.h b/include/linux/pe.h
> index 143ce75be5f0..3482b18a48b5 100644
> --- a/include/linux/pe.h
> +++ b/include/linux/pe.h
> @@ -166,7 +166,7 @@ struct mz_hdr {
> uint16_t oem_info; /* oem specific */
> uint16_t reserved1[10]; /* reserved */
> uint32_t peaddr; /* address of pe header */
> - char message[64]; /* message to print */
> + char message[]; /* message to print */
> };
>
> struct mz_reloc {
Reviewed-by: Ard Biesheuvel <[email protected]>
On 24 August 2017 at 09:18, AKASHI Takahiro <[email protected]> wrote:
> This function, being a variant of walk_system_ram_res() introduced in
> commit 8c86e70acead ("resource: provide new functions to walk through
> resources"), walks through a list of all the resources of System RAM
> in reversed order, i.e., from higher to lower.
>
> It will be used in kexec_file implementation on arm64.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> ---
> include/linux/ioport.h | 3 +++
> kernel/resource.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 51 insertions(+)
>
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 6230064d7f95..9a212266299f 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -271,6 +271,9 @@ extern int
> walk_system_ram_res(u64 start, u64 end, void *arg,
> int (*func)(u64, u64, void *));
> extern int
> +walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(u64, u64, void *));
> +extern int
> walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
> void *arg, int (*func)(u64, u64, void *));
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 9b5f04404152..1d6d734c75ac 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -23,6 +23,7 @@
> #include <linux/pfn.h>
> #include <linux/mm.h>
> #include <linux/resource_ext.h>
> +#include <linux/vmalloc.h>
> #include <asm/io.h>
>
>
> @@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> return ret;
> }
>
> +int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(u64, u64, void *))
> +{
> + struct resource res, *rams;
> + u64 orig_end;
> + int count, i;
> + int ret = -1;
> +
> + count = 16; /* initial */
> +again:
> + /* create a list */
> + rams = vmalloc(sizeof(struct resource) * count);
> + if (!rams)
> + return ret;
> +
> + res.start = start;
> + res.end = end;
> + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> + orig_end = res.end;
> + i = 0;
> + while ((res.start < res.end) &&
> + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
> + if (i >= count) {
> + /* unlikely but */
> + vfree(rams);
> + count += 16;
If the count is likely to be < 16, why are we using vmalloc() here?
> + goto again;
> + }
> +
> + rams[i].start = res.start;
> + rams[i++].end = res.end;
> +
> + res.start = res.end + 1;
> + res.end = orig_end;
> + }
> +
> + /* go reverse */
> + for (i--; i >= 0; i--) {
> + ret = (*func)(rams[i].start, rams[i].end, arg);
> + if (ret)
> + break;
> + }
> +
> + vfree(rams);
> + return ret;
> +}
> +
> #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
>
> /*
> --
> 2.14.1
>
On 24 August 2017 at 09:18, AKASHI Takahiro <[email protected]> wrote:
> This is a basic purgtory, or a kind of glue code between the two kernel,
> for arm64. We will later add a feature of verifying a digest check against
> loaded memory segments.
>
> arch_kexec_apply_relocations_add() is responsible for re-linking any
> relative symbols in purgatory. Please note that the purgatory is not
> an executable, but a non-linked archive of binaries so relative symbols
> contained here must be resolved at kexec load time.
This sounds fragile to me. What is the reason we cannot let the linker
deal with this, similar to, e.g., how the VDSO gets linked?
Otherwise, couldn't we reuse the module loader to get these objects
relocated in memory? I'm sure there are differences that would require
some changes there, but implementing all of this again sounds like
overkill to me.
> Despite that arm64_kernel_start and arm64_dtb_addr are only such global
> variables now, arch_kexec_apply_relocations_add() can manage more various
> types of relocations.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm64/Makefile | 1 +
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/machine_kexec_file.c | 199 +++++++++++++++++++++++++++++++++
> arch/arm64/purgatory/Makefile | 24 ++++
> arch/arm64/purgatory/entry.S | 28 +++++
> 5 files changed, 253 insertions(+)
> create mode 100644 arch/arm64/kernel/machine_kexec_file.c
> create mode 100644 arch/arm64/purgatory/Makefile
> create mode 100644 arch/arm64/purgatory/entry.S
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 9b41f1e3b1a0..429f60728c0a 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -105,6 +105,7 @@ core-$(CONFIG_XEN) += arch/arm64/xen/
> core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
> libs-y := arch/arm64/lib/ $(libs-y)
> core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
> +core-$(CONFIG_KEXEC_FILE) += arch/arm64/purgatory/
>
> # Default target when executing plain make
> boot := arch/arm64/boot
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index f2b4e816b6de..16e9f56b536a 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
> arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \
> cpu-reset.o
> +arm64-obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o
> arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
> arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
> arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> new file mode 100644
> index 000000000000..183f7776d6dd
> --- /dev/null
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -0,0 +1,199 @@
> +/*
> + * kexec_file for arm64
> + *
> + * Copyright (C) 2017 Linaro Limited
> + * Author: AKASHI Takahiro <[email protected]>
> + *
> + * Most code is derived from arm64 port of kexec-tools
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#define pr_fmt(fmt) "kexec_file: " fmt
> +
> +#include <linux/elf.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +
> +/*
> + * Apply purgatory relocations.
> + *
> + * ehdr: Pointer to elf headers
> + * sechdrs: Pointer to section headers.
> + * relsec: section index of SHT_RELA section.
> + *
> + * Note:
> + * Currently R_AARCH64_ABS64, R_AARCH64_LD_PREL_LO19 and R_AARCH64_CALL26
> + * are the only types to be generated from purgatory code.
> + * If we add more functionalities, other types may also be used.
> + */
> +int arch_kexec_apply_relocations_add(const Elf64_Ehdr *ehdr,
> + Elf64_Shdr *sechdrs, unsigned int relsec)
> +{
> + Elf64_Rela *rel;
> + Elf64_Shdr *section, *symtabsec;
> + Elf64_Sym *sym;
> + const char *strtab, *name, *shstrtab;
> + unsigned long address, sec_base, value;
> + void *location;
> + u64 *loc64;
> + u32 *loc32, imm;
> + unsigned int i;
> +
> + /*
> + * ->sh_offset has been modified to keep the pointer to section
> + * contents in memory
> + */
> + rel = (void *)sechdrs[relsec].sh_offset;
> +
> + /* Section to which relocations apply */
> + section = &sechdrs[sechdrs[relsec].sh_info];
> +
> + pr_debug("reloc: Applying relocate section %u to %u\n", relsec,
> + sechdrs[relsec].sh_info);
> +
> + /* Associated symbol table */
> + symtabsec = &sechdrs[sechdrs[relsec].sh_link];
> +
> + /* String table */
> + if (symtabsec->sh_link >= ehdr->e_shnum) {
> + /* Invalid strtab section number */
> + pr_err("reloc: Invalid string table section index %d\n",
> + symtabsec->sh_link);
> + return -ENOEXEC;
> + }
> +
> + strtab = (char *)sechdrs[symtabsec->sh_link].sh_offset;
> +
> + /* section header string table */
> + shstrtab = (char *)sechdrs[ehdr->e_shstrndx].sh_offset;
> +
> + for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
> +
> + /*
> + * rel[i].r_offset contains byte offset from beginning
> + * of section to the storage unit affected.
> + *
> + * This is location to update (->sh_offset). This is temporary
> + * buffer where section is currently loaded. This will finally
> + * be loaded to a different address later, pointed to by
> + * ->sh_addr. kexec takes care of moving it
> + * (kexec_load_segment()).
> + */
> + location = (void *)(section->sh_offset + rel[i].r_offset);
> +
> + /* Final address of the location */
> + address = section->sh_addr + rel[i].r_offset;
> +
> + /*
> + * rel[i].r_info contains information about symbol table index
> + * w.r.t which relocation must be made and type of relocation
> + * to apply. ELF64_R_SYM() and ELF64_R_TYPE() macros get
> + * these respectively.
> + */
> + sym = (Elf64_Sym *)symtabsec->sh_offset +
> + ELF64_R_SYM(rel[i].r_info);
> +
> + if (sym->st_name)
> + name = strtab + sym->st_name;
> + else
> + name = shstrtab + sechdrs[sym->st_shndx].sh_name;
> +
> + pr_debug("Symbol: %-16s info: %02x shndx: %02x value=%llx size: %llx reloc type:%d\n",
> + name, sym->st_info, sym->st_shndx, sym->st_value,
> + sym->st_size, (int)ELF64_R_TYPE(rel[i].r_info));
> +
> + if (sym->st_shndx == SHN_UNDEF) {
> + pr_err("reloc: Undefined symbol: %s\n", name);
> + return -ENOEXEC;
> + }
> +
> + if (sym->st_shndx == SHN_COMMON) {
> + pr_err("reloc: symbol '%s' in common section\n", name);
> + return -ENOEXEC;
> + }
> +
> + if (sym->st_shndx == SHN_ABS) {
> + sec_base = 0;
> + } else if (sym->st_shndx < ehdr->e_shnum) {
> + sec_base = sechdrs[sym->st_shndx].sh_addr;
> + } else {
> + pr_err("reloc: Invalid section %d for symbol %s\n",
> + sym->st_shndx, name);
> + return -ENOEXEC;
> + }
> +
> + value = sym->st_value;
> + value += sec_base;
> + value += rel[i].r_addend;
> +
> + switch (ELF64_R_TYPE(rel[i].r_info)) {
> + case R_AARCH64_ABS64:
> + loc64 = location;
> + *loc64 = cpu_to_elf64(ehdr,
> + elf64_to_cpu(ehdr, *loc64) + value);
> + break;
> + case R_AARCH64_PREL32:
> + loc32 = location;
> + *loc32 = cpu_to_elf32(ehdr,
> + elf32_to_cpu(ehdr, *loc32) + value
> + - address);
> + break;
> + case R_AARCH64_LD_PREL_LO19:
> + loc32 = location;
> + *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> + + (((value - address) << 3) & 0xffffe0));
> + break;
> + case R_AARCH64_ADR_PREL_LO21:
> + if (value & 3) {
> + pr_err("reloc: Unaligned value: %lx\n", value);
> + return -ENOEXEC;
> + }
> + loc32 = location;
> + *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> + + (((value - address) << 3) & 0xffffe0));
> + break;
> + case R_AARCH64_ADR_PREL_PG_HI21:
> + imm = ((value & ~0xfff) - (address & ~0xfff)) >> 12;
> + loc32 = location;
> + *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> + + ((imm & 3) << 29)
> + + ((imm & 0x1ffffc) << (5 - 2)));
> + break;
> + case R_AARCH64_ADD_ABS_LO12_NC:
> + loc32 = location;
> + *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> + + ((value & 0xfff) << 10));
> + break;
> + case R_AARCH64_JUMP26:
> + loc32 = location;
> + *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> + + (((value - address) >> 2) & 0x3ffffff));
> + break;
> + case R_AARCH64_CALL26:
> + loc32 = location;
> + *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> + + (((value - address) >> 2) & 0x3ffffff));
> + break;
> + case R_AARCH64_LDST64_ABS_LO12_NC:
> + if (value & 7) {
> + pr_err("reloc: Unaligned value: %lx\n", value);
> + return -ENOEXEC;
> + }
> + loc32 = location;
> + *loc32 = cpu_to_le32(le32_to_cpu(*loc32)
> + + ((value & 0xff8) << (10 - 3)));
> + break;
> + default:
> + pr_err("reloc: Unknown relocation type: %llu\n",
> + ELF64_R_TYPE(rel[i].r_info));
> + return -ENOEXEC;
> + }
> + }
> +
> + return 0;
> +}
> diff --git a/arch/arm64/purgatory/Makefile b/arch/arm64/purgatory/Makefile
> new file mode 100644
> index 000000000000..c2127a2cbd51
> --- /dev/null
> +++ b/arch/arm64/purgatory/Makefile
> @@ -0,0 +1,24 @@
> +OBJECT_FILES_NON_STANDARD := y
> +
> +purgatory-y := entry.o
> +
> +targets += $(purgatory-y)
> +PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
> +
> +LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined \
> + -nostdlib -z nodefaultlib
> +targets += purgatory.ro
> +
> +$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> + $(call if_changed,ld)
> +
> +targets += kexec_purgatory.c
> +
> +CMD_BIN2C = $(objtree)/scripts/basic/bin2c
> +quiet_cmd_bin2c = BIN2C $@
> + cmd_bin2c = $(CMD_BIN2C) kexec_purgatory < $< > $@
> +
> +$(obj)/kexec_purgatory.c: $(obj)/purgatory.ro FORCE
> + $(call if_changed,bin2c)
> +
> +obj-${CONFIG_KEXEC_FILE} += kexec_purgatory.o
> diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
> new file mode 100644
> index 000000000000..bc4e6b3bf8a1
> --- /dev/null
> +++ b/arch/arm64/purgatory/entry.S
> @@ -0,0 +1,28 @@
> +/*
> + * kexec core purgatory
> + */
> +#include <linux/linkage.h>
> +
> +.text
> +
> +ENTRY(purgatory_start)
> + /* Start new image. */
> + ldr x17, arm64_kernel_entry
> + ldr x0, arm64_dtb_addr
> + mov x1, xzr
> + mov x2, xzr
> + mov x3, xzr
> + br x17
> +END(purgatory_start)
> +
> +.data
> +
> +.align 3
> +
> +ENTRY(arm64_kernel_entry)
> + .quad 0
> +END(arm64_kernel_entry)
> +
> +ENTRY(arm64_dtb_addr)
> + .quad 0
> +END(arm64_dtb_addr)
> --
> 2.14.1
>
On 24 August 2017 at 09:18, AKASHI Takahiro <[email protected]> wrote:
> Most of sha256 code is based on crypto/sha256-glue.c, particularly using
> non-neon version.
>
> Please note that we won't be able to re-use lib/mem*.S for purgatory
> because unaligned memory access is not allowed in purgatory where mmu
> is turned off.
>
> Since purgatory is not linked with the other part of kernel, care must be
> taken of selecting an appropriate set of compiler options in order to
> prevent undefined symbol references from being generated.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> ---
> arch/arm64/crypto/sha256-core.S_shipped | 2 +
> arch/arm64/purgatory/Makefile | 21 ++++++++-
> arch/arm64/purgatory/entry.S | 13 ++++++
> arch/arm64/purgatory/purgatory.c | 20 +++++++++
> arch/arm64/purgatory/sha256-core.S | 1 +
> arch/arm64/purgatory/sha256.c | 79 +++++++++++++++++++++++++++++++++
> arch/arm64/purgatory/sha256.h | 1 +
> arch/arm64/purgatory/string.c | 32 +++++++++++++
> arch/arm64/purgatory/string.h | 5 +++
> 9 files changed, 173 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/purgatory/purgatory.c
> create mode 100644 arch/arm64/purgatory/sha256-core.S
> create mode 100644 arch/arm64/purgatory/sha256.c
> create mode 100644 arch/arm64/purgatory/sha256.h
> create mode 100644 arch/arm64/purgatory/string.c
> create mode 100644 arch/arm64/purgatory/string.h
>
> diff --git a/arch/arm64/crypto/sha256-core.S_shipped b/arch/arm64/crypto/sha256-core.S_shipped
> index 3ce82cc860bc..9ce7419c9152 100644
> --- a/arch/arm64/crypto/sha256-core.S_shipped
> +++ b/arch/arm64/crypto/sha256-core.S_shipped
> @@ -1210,6 +1210,7 @@ sha256_block_armv8:
> ret
> .size sha256_block_armv8,.-sha256_block_armv8
> #endif
> +#ifndef __PURGATORY__
> #ifdef __KERNEL__
> .globl sha256_block_neon
> #endif
> @@ -2056,6 +2057,7 @@ sha256_block_neon:
> add sp,sp,#16*4+16
> ret
> .size sha256_block_neon,.-sha256_block_neon
> +#endif
> #ifndef __KERNEL__
> .comm OPENSSL_armcap_P,4,4
> #endif
Could you please try to find another way to address this?
sha256-core.S_shipped is generated code from the accompanying Perl
script, and that script is kept in sync with upstream OpenSSL. Also,
the performance delta between the generic code is not /that/
spectacular, so we may simply use that instead.
> diff --git a/arch/arm64/purgatory/Makefile b/arch/arm64/purgatory/Makefile
> index c2127a2cbd51..d9b38be31e0a 100644
> --- a/arch/arm64/purgatory/Makefile
> +++ b/arch/arm64/purgatory/Makefile
> @@ -1,14 +1,33 @@
> OBJECT_FILES_NON_STANDARD := y
>
> -purgatory-y := entry.o
> +purgatory-y := entry.o purgatory.o sha256.o sha256-core.o string.o
>
> targets += $(purgatory-y)
> PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
>
> +# Purgatory is expected to be ET_REL, not an executable
> LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined \
> -nostdlib -z nodefaultlib
> +
> targets += purgatory.ro
>
> +GCOV_PROFILE := n
> +KASAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +# Some kernel configurations may generate additional code containing
> +# undefined symbols, like _mcount for ftrace and __stack_chk_guard
> +# for stack-protector. Those should be removed from purgatory.
> +
> +CFLAGS_REMOVE_purgatory.o = -pg
> +CFLAGS_REMOVE_sha256.o = -pg
> +CFLAGS_REMOVE_string.o = -pg
> +
> +NO_PROTECTOR := $(call cc-option, -fno-stack-protector)
> +KBUILD_CFLAGS += $(NO_PROTECTOR)
> +
> +KBUILD_AFLAGS += -D__PURGATORY__
> +
> $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> $(call if_changed,ld)
>
> diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
> index bc4e6b3bf8a1..74d028b838bd 100644
> --- a/arch/arm64/purgatory/entry.S
> +++ b/arch/arm64/purgatory/entry.S
> @@ -6,6 +6,11 @@
> .text
>
> ENTRY(purgatory_start)
> + adr x19, .Lstack
> + mov sp, x19
> +
> + bl purgatory
> +
> /* Start new image. */
> ldr x17, arm64_kernel_entry
> ldr x0, arm64_dtb_addr
> @@ -15,6 +20,14 @@ ENTRY(purgatory_start)
> br x17
> END(purgatory_start)
>
> +.ltorg
> +
> +.align 4
> + .rept 256
> + .quad 0
> + .endr
> +.Lstack:
> +
> .data
>
> .align 3
> diff --git a/arch/arm64/purgatory/purgatory.c b/arch/arm64/purgatory/purgatory.c
> new file mode 100644
> index 000000000000..7fcbefa786bc
> --- /dev/null
> +++ b/arch/arm64/purgatory/purgatory.c
> @@ -0,0 +1,20 @@
> +/*
> + * purgatory: Runs between two kernels
> + *
> + * Copyright (c) 2017 Linaro Limited
> + * Author: AKASHI Takahiro <[email protected]>
> + */
> +
> +#include "sha256.h"
> +
> +void purgatory(void)
> +{
> + int ret;
> +
> + ret = verify_sha256_digest();
> + if (ret) {
> + /* loop forever */
> + for (;;)
> + ;
> + }
> +}
> diff --git a/arch/arm64/purgatory/sha256-core.S b/arch/arm64/purgatory/sha256-core.S
> new file mode 100644
> index 000000000000..24f5ce25b61e
> --- /dev/null
> +++ b/arch/arm64/purgatory/sha256-core.S
> @@ -0,0 +1 @@
> +#include "../crypto/sha256-core.S_shipped"
> diff --git a/arch/arm64/purgatory/sha256.c b/arch/arm64/purgatory/sha256.c
> new file mode 100644
> index 000000000000..5d20d81767e3
> --- /dev/null
> +++ b/arch/arm64/purgatory/sha256.c
> @@ -0,0 +1,79 @@
> +#include <linux/kexec.h>
> +#include <linux/purgatory.h>
> +#include <linux/types.h>
> +
> +/*
> + * Under KASAN, those are defined as un-instrumented version, __memxxx()
> + */
> +#undef memcmp
> +#undef memcpy
> +#undef memset
> +
> +#include "string.h"
> +#include <crypto/hash.h>
> +#include <crypto/sha.h>
> +#include <crypto/sha256_base.h>
> +
> +u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
> +struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX]
> + __section(.kexec-purgatory);
> +
> +asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
> + unsigned int num_blks);
> +
> +static int sha256_init(struct shash_desc *desc)
> +{
> + return sha256_base_init(desc);
> +}
> +
> +static int sha256_update(struct shash_desc *desc, const u8 *data,
> + unsigned int len)
> +{
> + return sha256_base_do_update(desc, data, len,
> + (sha256_block_fn *)sha256_block_data_order);
> +}
> +
> +static int __sha256_base_finish(struct shash_desc *desc, u8 *out)
> +{
> + /* we can't do crypto_shash_digestsize(desc->tfm) */
> + unsigned int digest_size = 32;
> + struct sha256_state *sctx = shash_desc_ctx(desc);
> + __be32 *digest = (__be32 *)out;
> + int i;
> +
> + for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32))
> + put_unaligned_be32(sctx->state[i], digest++);
> +
> + *sctx = (struct sha256_state){};
> + return 0;
> +}
> +
> +static int sha256_final(struct shash_desc *desc, u8 *out)
> +{
> + sha256_base_do_finalize(desc,
> + (sha256_block_fn *)sha256_block_data_order);
> +
> + return __sha256_base_finish(desc, out);
> +}
> +
> +int verify_sha256_digest(void)
> +{
> + char __sha256_desc[sizeof(struct shash_desc) +
> + sizeof(struct sha256_state)] CRYPTO_MINALIGN_ATTR;
> + struct shash_desc *desc = (struct shash_desc *)__sha256_desc;
> + struct kexec_sha_region *ptr, *end;
> + u8 digest[SHA256_DIGEST_SIZE];
> +
> + sha256_init(desc);
> +
> + end = purgatory_sha_regions + ARRAY_SIZE(purgatory_sha_regions);
> + for (ptr = purgatory_sha_regions; ptr < end; ptr++)
> + sha256_update(desc, (uint8_t *)(ptr->start), ptr->len);
> +
> + sha256_final(desc, digest);
> +
> + if (memcmp(digest, purgatory_sha256_digest, sizeof(digest)))
> + return 1;
> +
> + return 0;
> +}
> diff --git a/arch/arm64/purgatory/sha256.h b/arch/arm64/purgatory/sha256.h
> new file mode 100644
> index 000000000000..54dc3c33c469
> --- /dev/null
> +++ b/arch/arm64/purgatory/sha256.h
> @@ -0,0 +1 @@
> +extern int verify_sha256_digest(void);
> diff --git a/arch/arm64/purgatory/string.c b/arch/arm64/purgatory/string.c
> new file mode 100644
> index 000000000000..33233a210a65
> --- /dev/null
> +++ b/arch/arm64/purgatory/string.c
> @@ -0,0 +1,32 @@
> +#include <linux/types.h>
> +
> +void *memcpy(void *dst, const void *src, size_t len)
> +{
> + int i;
> +
> + for (i = 0; i < len; i++)
> + ((u8 *)dst)[i] = ((u8 *)src)[i];
> +
> + return NULL;
> +}
> +
> +void *memset(void *dst, int c, size_t len)
> +{
> + int i;
> +
> + for (i = 0; i < len; i++)
> + ((u8 *)dst)[i] = (u8)c;
> +
> + return NULL;
> +}
> +
> +int memcmp(const void *src, const void *dst, size_t len)
> +{
> + int i;
> +
> + for (i = 0; i < len; i++)
> + if (*(char *)src != *(char *)dst)
> + return 1;
> +
> + return 0;
> +}
> diff --git a/arch/arm64/purgatory/string.h b/arch/arm64/purgatory/string.h
> new file mode 100644
> index 000000000000..cb5f68dd84ef
> --- /dev/null
> +++ b/arch/arm64/purgatory/string.h
> @@ -0,0 +1,5 @@
> +#include <linux/types.h>
> +
> +int memcmp(const void *s1, const void *s2, size_t len);
> +void *memcpy(void *dst, const void *src, size_t len);
> +void *memset(void *dst, int c, size_t len);
> --
> 2.14.1
>
On Thu, Aug 24, 2017 at 10:18 AM, AKASHI Takahiro
<[email protected]> wrote:
> The initial user of this system call number is arm64.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
Acked-by:Arnd Bergmann <[email protected]>
On Thu, Aug 24, 2017 at 05:18:05PM +0900, AKASHI Takahiro wrote:
> This is a basic purgtory, or a kind of glue code between the two kernel,
> for arm64. We will later add a feature of verifying a digest check against
> loaded memory segments.
>
> arch_kexec_apply_relocations_add() is responsible for re-linking any
> relative symbols in purgatory. Please note that the purgatory is not
> an executable, but a non-linked archive of binaries so relative symbols
> contained here must be resolved at kexec load time.
> Despite that arm64_kernel_start and arm64_dtb_addr are only such global
> variables now, arch_kexec_apply_relocations_add() can manage more various
> types of relocations.
Why does the purgatory code need to be so complex?
Why is it not possible to write this as position-independent asm?
> +/*
> + * Apply purgatory relocations.
> + *
> + * ehdr: Pointer to elf headers
> + * sechdrs: Pointer to section headers.
> + * relsec: section index of SHT_RELA section.
> + *
> + * Note:
> + * Currently R_AARCH64_ABS64, R_AARCH64_LD_PREL_LO19 and R_AARCH64_CALL26
> + * are the only types to be generated from purgatory code.
Is this all that has been observed, or is this ensured somehow?
The arch_kexec_apply_relocations_add() function below duplicates a lot
of logic that already exists in the arm64 module loader's
apply_relocate_add() function.
Please reuse that code. Having a duplicate or alternative implementation
is just asking for subtle bugs.
Thanks,
Mark.
On Thu, Aug 24, 2017 at 05:18:06PM +0900, AKASHI Takahiro wrote:
> Most of sha256 code is based on crypto/sha256-glue.c, particularly using
> non-neon version.
>
> Please note that we won't be able to re-use lib/mem*.S for purgatory
> because unaligned memory access is not allowed in purgatory where mmu
> is turned off.
>
> Since purgatory is not linked with the other part of kernel, care must be
> taken of selecting an appropriate set of compiler options in order to
> prevent undefined symbol references from being generated.
What is the point in performing this check in the purgatory code, when
this will presumably have been checked when the image is loaded?
[...]
> diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
> index bc4e6b3bf8a1..74d028b838bd 100644
> --- a/arch/arm64/purgatory/entry.S
> +++ b/arch/arm64/purgatory/entry.S
> @@ -6,6 +6,11 @@
> .text
>
> ENTRY(purgatory_start)
> + adr x19, .Lstack
> + mov sp, x19
> +
> + bl purgatory
> +
> /* Start new image. */
> ldr x17, arm64_kernel_entry
> ldr x0, arm64_dtb_addr
> @@ -15,6 +20,14 @@ ENTRY(purgatory_start)
> br x17
> END(purgatory_start)
>
> +.ltorg
> +
> +.align 4
> + .rept 256
> + .quad 0
> + .endr
> +.Lstack:
> +
> .data
Why is the stack in .text?
Does this need to be zeroed?
If it does, why not something like:
.fill PURGATORY_STACK_SIZE 1, 0
>
> .align 3
> diff --git a/arch/arm64/purgatory/purgatory.c b/arch/arm64/purgatory/purgatory.c
> new file mode 100644
> index 000000000000..7fcbefa786bc
> --- /dev/null
> +++ b/arch/arm64/purgatory/purgatory.c
> @@ -0,0 +1,20 @@
> +/*
> + * purgatory: Runs between two kernels
> + *
> + * Copyright (c) 2017 Linaro Limited
> + * Author: AKASHI Takahiro <[email protected]>
> + */
> +
> +#include "sha256.h"
> +
> +void purgatory(void)
> +{
> + int ret;
> +
> + ret = verify_sha256_digest();
> + if (ret) {
> + /* loop forever */
> + for (;;)
> + ;
> + }
> +}
Surely we can do something slightly better than a busy loop? e.g.
something like the __no_granule_support loop in head.s?
> diff --git a/arch/arm64/purgatory/sha256-core.S b/arch/arm64/purgatory/sha256-core.S
> new file mode 100644
> index 000000000000..24f5ce25b61e
> --- /dev/null
> +++ b/arch/arm64/purgatory/sha256-core.S
> @@ -0,0 +1 @@
> +#include "../crypto/sha256-core.S_shipped"
> diff --git a/arch/arm64/purgatory/sha256.c b/arch/arm64/purgatory/sha256.c
> new file mode 100644
> index 000000000000..5d20d81767e3
> --- /dev/null
> +++ b/arch/arm64/purgatory/sha256.c
> @@ -0,0 +1,79 @@
> +#include <linux/kexec.h>
> +#include <linux/purgatory.h>
> +#include <linux/types.h>
> +
> +/*
> + * Under KASAN, those are defined as un-instrumented version, __memxxx()
> + */
> +#undef memcmp
> +#undef memcpy
> +#undef memset
This doesn't look like the right place for this undeffery; it looks
rather fragile.
> +
> +#include "string.h"
> +#include <crypto/hash.h>
> +#include <crypto/sha.h>
> +#include <crypto/sha256_base.h>
> +
> +u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
> +struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX]
> + __section(.kexec-purgatory);
> +
> +asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
> + unsigned int num_blks);
> +
> +static int sha256_init(struct shash_desc *desc)
> +{
> + return sha256_base_init(desc);
> +}
> +
> +static int sha256_update(struct shash_desc *desc, const u8 *data,
> + unsigned int len)
> +{
> + return sha256_base_do_update(desc, data, len,
> + (sha256_block_fn *)sha256_block_data_order);
> +}
> +
> +static int __sha256_base_finish(struct shash_desc *desc, u8 *out)
> +{
> + /* we can't do crypto_shash_digestsize(desc->tfm) */
> + unsigned int digest_size = 32;
> + struct sha256_state *sctx = shash_desc_ctx(desc);
> + __be32 *digest = (__be32 *)out;
> + int i;
> +
> + for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32))
> + put_unaligned_be32(sctx->state[i], digest++);
> +
> + *sctx = (struct sha256_state){};
> + return 0;
> +}
> +
> +static int sha256_final(struct shash_desc *desc, u8 *out)
> +{
> + sha256_base_do_finalize(desc,
> + (sha256_block_fn *)sha256_block_data_order);
> +
> + return __sha256_base_finish(desc, out);
> +}
> +
> +int verify_sha256_digest(void)
> +{
> + char __sha256_desc[sizeof(struct shash_desc) +
> + sizeof(struct sha256_state)] CRYPTO_MINALIGN_ATTR;
> + struct shash_desc *desc = (struct shash_desc *)__sha256_desc;
> + struct kexec_sha_region *ptr, *end;
> + u8 digest[SHA256_DIGEST_SIZE];
> +
> + sha256_init(desc);
> +
> + end = purgatory_sha_regions + ARRAY_SIZE(purgatory_sha_regions);
> + for (ptr = purgatory_sha_regions; ptr < end; ptr++)
> + sha256_update(desc, (uint8_t *)(ptr->start), ptr->len);
> +
> + sha256_final(desc, digest);
> +
> + if (memcmp(digest, purgatory_sha256_digest, sizeof(digest)))
> + return 1;
> +
> + return 0;
> +}
> diff --git a/arch/arm64/purgatory/sha256.h b/arch/arm64/purgatory/sha256.h
> new file mode 100644
> index 000000000000..54dc3c33c469
> --- /dev/null
> +++ b/arch/arm64/purgatory/sha256.h
> @@ -0,0 +1 @@
> +extern int verify_sha256_digest(void);
> diff --git a/arch/arm64/purgatory/string.c b/arch/arm64/purgatory/string.c
> new file mode 100644
> index 000000000000..33233a210a65
> --- /dev/null
> +++ b/arch/arm64/purgatory/string.c
> @@ -0,0 +1,32 @@
> +#include <linux/types.h>
> +
> +void *memcpy(void *dst, const void *src, size_t len)
> +{
> + int i;
> +
> + for (i = 0; i < len; i++)
> + ((u8 *)dst)[i] = ((u8 *)src)[i];
> +
> + return NULL;
> +}
> +
> +void *memset(void *dst, int c, size_t len)
> +{
> + int i;
> +
> + for (i = 0; i < len; i++)
> + ((u8 *)dst)[i] = (u8)c;
> +
> + return NULL;
> +}
> +
> +int memcmp(const void *src, const void *dst, size_t len)
> +{
> + int i;
> +
> + for (i = 0; i < len; i++)
> + if (*(char *)src != *(char *)dst)
> + return 1;
> +
> + return 0;
> +}
How is the compiler prevented from "optimising" these into calls to
themselves?
I suspect these will need to be written in asm.
Thanks,
Mark.
On Thu, Aug 24, 2017 at 05:18:07PM +0900, AKASHI Takahiro wrote:
> load_other_segments() sets up and adds all the memory segments necessary
> other than kernel, including initrd, device-tree blob and purgatory.
> Most of the code was borrowed from kexec-tools' counterpart.
>
> In addition, arch_kexec_image_probe(), arch_kexec_image_load() and
> arch_kexec_kernel_verify_sig() are stubs for supporting multiple types
> of kernel image formats.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm64/include/asm/kexec.h | 18 +++
> arch/arm64/kernel/machine_kexec_file.c | 255 +++++++++++++++++++++++++++++++++
> 2 files changed, 273 insertions(+)
> +int load_other_segments(struct kimage *image, unsigned long kernel_load_addr,
> + char *initrd, unsigned long initrd_len,
> + char *cmdline, unsigned long cmdline_len)
> +{
> + struct kexec_buf kbuf;
> + unsigned long initrd_load_addr = 0;
> + unsigned long purgatory_load_addr, dtb_load_addr;
> + char *dtb = NULL;
> + unsigned long dtb_len;
> + int ret = 0;
> +
> + kbuf.image = image;
> +
> + /* Load initrd */
> + if (initrd) {
> + kbuf.buffer = initrd;
> + kbuf.bufsz = initrd_len;
> + kbuf.memsz = initrd_len;
> + kbuf.buf_align = PAGE_SIZE;
> + /* within 1GB-aligned window of up to 32GB in size */
> + kbuf.buf_min = kernel_load_addr;
> + kbuf.buf_max = round_down(kernel_load_addr, SZ_1G)
> + + (unsigned long)SZ_1G * 31;
> + kbuf.top_down = 0;
> +
> + ret = kexec_add_buffer(&kbuf);
> + if (ret)
> + goto out_err;
> + initrd_load_addr = kbuf.mem;
> +
> + pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> + initrd_load_addr, initrd_len, initrd_len);
> + }
> +
> + /* Load dtb blob */
> + ret = setup_dtb(image, initrd_load_addr, initrd_len,
> + cmdline, cmdline_len, &dtb, &dtb_len);
> + if (ret) {
> + pr_err("Preparing for new dtb failed\n");
> + goto out_err;
> + }
> +
> + kbuf.buffer = dtb;
> + kbuf.bufsz = dtb_len;
> + kbuf.memsz = dtb_len;
> + /* not across 2MB boundary */
> + kbuf.buf_align = SZ_2M;
> + /*
> + * Note for backporting:
> + * On kernel prior to v4.2, fdt must reside within 512MB block
> + * where the kernel also resides. So
> + * kbuf.buf_min = round_down(kernel_load_addr, SZ_512M);
> + * kbuf.buf_max = round_up(kernel_load_addr, SZ_512M);
> + * would be required.
> + */
> + kbuf.buf_min = kernel_load_addr;
> + kbuf.buf_max = ULONG_MAX;
> + kbuf.top_down = 1;
IIUC, this is trying to load the DTB above the kernel. Is that correct?
Assuming so, shouldn't that kernel_load_addr be kernel_load_addr +
image_size from the kernel header?
Otherwise, if the kernel is loaded close to the end of memory, the DTB
could overlap.
Thanks,
Mark.
On Thu, Aug 24, 2017 at 05:18:10PM +0900, AKASHI Takahiro wrote:
> The "Image" binary will be loaded at the offset of TEXT_OFFSET from
> the start of system memory. TEXT_OFFSET is basically determined from
> the header of the image.
What's the policy for the binary types kexec_file_load() will load, and
how are these identified? AFAICT, there are no flags, so it looks like
we're just checking the magic and hoping.
> Regarding kernel verification, it will be done through
> verify_pefile_signature() as arm64's "Image" binary can be seen as
> in PE format. This approach is consistent with x86 implementation.
This will not work for kernels built without CONFIG_EFI, where we don't
have a PE header.
What happens in that case?
[...]
> +/**
> + * arm64_header_check_msb - Helper to check the arm64 image header.
> + *
> + * Returns non-zero if the image was built as big endian.
> + */
> +
> +static inline int arm64_header_check_msb(const struct arm64_image_header *h)
> +{
> + if (!h)
> + return 0;
> +
> + return !!(h->flags[7] & arm64_image_flag_7_be);
> +}
What are we going to use this for?
In kernel, we use the term "BE" rather than "MSB", and it's unfortunate
to have code with varying naming conventions.
[...]
> +static void *image_load(struct kimage *image, char *kernel,
> + unsigned long kernel_len, char *initrd,
> + unsigned long initrd_len, char *cmdline,
> + unsigned long cmdline_len)
> +{
> + struct kexec_buf kbuf;
> + struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> + unsigned long text_offset, kernel_load_addr;
> + int ret;
> +
> + /* Create elf core header segment */
> + ret = load_crashdump_segments(image);
> + if (ret)
> + goto out;
> +
> + /* Load the kernel */
> + kbuf.image = image;
> + if (image->type == KEXEC_TYPE_CRASH) {
> + kbuf.buf_min = crashk_res.start;
> + kbuf.buf_max = crashk_res.end + 1;
> + } else {
> + kbuf.buf_min = 0;
> + kbuf.buf_max = ULONG_MAX;
> + }
> + kbuf.top_down = 0;
> +
> + kbuf.buffer = kernel;
> + kbuf.bufsz = kernel_len;
> + if (h->image_size) {
> + kbuf.memsz = le64_to_cpu(h->image_size);
> + text_offset = le64_to_cpu(h->text_offset);
> + } else {
> + /* v3.16 or older */
> + kbuf.memsz = kbuf.bufsz; /* NOTE: not including BSS */
Why bother supporting < 3.16 kernels?
They predate regulate kexec, we know we don't have enough information to
boot such kernels reliably, and arguably attempting to load one would
indicate some kind of rollback attack.
Thanks,
Mark.
On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
> The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
> will be loaded at the offset of TEXT_OFFSET from the begining of system
> memory. The other PT_LOAD segments are placed relative to the first one.
I really don't like assuming things about the vmlinux ELF file.
> Regarding kernel verification, since there is no standard way to contain
> a signature within elf binary, we follow PowerPC's (not yet upstreamed)
> approach, that is, appending a signature right after the kernel binary
> itself like module signing.
I also *really* don't like this. It's a bizarre in-band mechanism,
without explcit information. It's not a nice ABI.
If we can load an Image, why do we need to be able to load a vmlinux?
[...]
> diff --git a/arch/arm64/kernel/kexec_elf.c b/arch/arm64/kernel/kexec_elf.c
> new file mode 100644
> index 000000000000..7bd3c1e1f65a
> --- /dev/null
> +++ b/arch/arm64/kernel/kexec_elf.c
> @@ -0,0 +1,216 @@
> +/*
> + * Kexec vmlinux loader
> +
> + * Copyright (C) 2017 Linaro Limited
> + * Authors: AKASHI Takahiro <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#define pr_fmt(fmt) "kexec_file(elf): " fmt
> +
> +#include <linux/elf.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/kexec.h>
> +#include <linux/module_signature.h>
> +#include <linux/types.h>
> +#include <linux/verification.h>
> +#include <asm/byteorder.h>
> +#include <asm/kexec_file.h>
> +#include <asm/memory.h>
> +
> +static int elf64_probe(const char *buf, unsigned long len)
> +{
> + struct elfhdr ehdr;
> +
> + /* Check for magic and architecture */
> + memcpy(&ehdr, buf, sizeof(ehdr));
> + if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) ||
> + (elf16_to_cpu(&ehdr, ehdr.e_machine) != EM_AARCH64))
> + return -ENOEXEC;
> +
> + return 0;
> +}
> +
> +static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr,
> + struct elf_info *elf_info,
> + unsigned long *kernel_load_addr)
> +{
> + struct kexec_buf kbuf;
> + const struct elf_phdr *phdr;
> + const struct arm64_image_header *h;
> + unsigned long text_offset, rand_offset;
> + unsigned long page_offset, phys_offset;
> + int first_segment, i, ret = -ENOEXEC;
> +
> + kbuf.image = image;
> + if (image->type == KEXEC_TYPE_CRASH) {
> + kbuf.buf_min = crashk_res.start;
> + kbuf.buf_max = crashk_res.end + 1;
> + } else {
> + kbuf.buf_min = 0;
> + kbuf.buf_max = ULONG_MAX;
> + }
> + kbuf.top_down = 0;
> +
> + /* Load PT_LOAD segments. */
> + for (i = 0, first_segment = 1; i < ehdr->e_phnum; i++) {
> + phdr = &elf_info->proghdrs[i];
> + if (phdr->p_type != PT_LOAD)
> + continue;
> +
> + kbuf.buffer = (void *) elf_info->buffer + phdr->p_offset;
> + kbuf.bufsz = min(phdr->p_filesz, phdr->p_memsz);
> + kbuf.memsz = phdr->p_memsz;
> + kbuf.buf_align = phdr->p_align;
> +
> + if (first_segment) {
> + /*
> + * Identify TEXT_OFFSET:
> + * When CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET=y the image
> + * header could be offset in the elf segment. The linker
> + * script sets ehdr->e_entry to the start of text.
Please, let's not have to go delving into the vmlinux, knowing intimate
details about how it's put together.
> + *
> + * NOTE: In v3.16 or older, h->text_offset is 0,
> + * so use the default, 0x80000
> + */
> + rand_offset = ehdr->e_entry - phdr->p_vaddr;
> + h = (struct arm64_image_header *)
> + (elf_info->buffer + phdr->p_offset +
> + rand_offset);
> +
> + if (!arm64_header_check_magic(h))
> + goto out;
> +
> + if (h->image_size)
> + text_offset = le64_to_cpu(h->text_offset);
> + else
> + text_offset = 0x80000;
Surely we can share the Image header parsing with the Image parser?
The Image code had practically the exact same logic operating on the
header struct.
Thanks,
Mark.
On Thu, Aug 24, 2017 at 10:06:28AM +0100, Ard Biesheuvel wrote:
> On 24 August 2017 at 09:18, AKASHI Takahiro <[email protected]> wrote:
> > + /* create a list */
> > + rams = vmalloc(sizeof(struct resource) * count);
> > + if (!rams)
> > + return ret;
> > +
> > + res.start = start;
> > + res.end = end;
> > + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > + orig_end = res.end;
> > + i = 0;
> > + while ((res.start < res.end) &&
> > + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
> > + if (i >= count) {
> > + /* unlikely but */
> > + vfree(rams);
> > + count += 16;
>
> If the count is likely to be < 16, why are we using vmalloc() here?
Ah, you're right :)
-Takahiro AKASHI
On Thu, Aug 24, 2017 at 05:56:17PM +0100, Mark Rutland wrote:
> On Thu, Aug 24, 2017 at 05:18:05PM +0900, AKASHI Takahiro wrote:
> > This is a basic purgtory, or a kind of glue code between the two kernel,
> > for arm64. We will later add a feature of verifying a digest check against
> > loaded memory segments.
> >
> > arch_kexec_apply_relocations_add() is responsible for re-linking any
> > relative symbols in purgatory. Please note that the purgatory is not
> > an executable, but a non-linked archive of binaries so relative symbols
> > contained here must be resolved at kexec load time.
> > Despite that arm64_kernel_start and arm64_dtb_addr are only such global
> > variables now, arch_kexec_apply_relocations_add() can manage more various
> > types of relocations.
>
> Why does the purgatory code need to be so complex?
>
> Why is it not possible to write this as position-independent asm?
I don't get your point, but please note that these values are also
re-written by the 1st kernel when it loads the 2nd kernel and so
they must appear as globals.
> > +/*
> > + * Apply purgatory relocations.
> > + *
> > + * ehdr: Pointer to elf headers
> > + * sechdrs: Pointer to section headers.
> > + * relsec: section index of SHT_RELA section.
> > + *
> > + * Note:
> > + * Currently R_AARCH64_ABS64, R_AARCH64_LD_PREL_LO19 and R_AARCH64_CALL26
> > + * are the only types to be generated from purgatory code.
>
> Is this all that has been observed, or is this ensured somehow?
It was observed by inserting a debug print message in this function,
I'm not sure whether we can restrict only those three types.
> The arch_kexec_apply_relocations_add() function below duplicates a lot
> of logic that already exists in the arm64 module loader's
> apply_relocate_add() function.
>
> Please reuse that code. Having a duplicate or alternative implementation
> is just asking for subtle bugs.
Okey, I'll look at it.
Thanks,
-Takahiro AKASHI
> Thanks,
> Mark.
On Thu, Aug 24, 2017 at 10:10:37AM +0100, Ard Biesheuvel wrote:
> On 24 August 2017 at 09:18, AKASHI Takahiro <[email protected]> wrote:
> > This is a basic purgtory, or a kind of glue code between the two kernel,
> > for arm64. We will later add a feature of verifying a digest check against
> > loaded memory segments.
> >
> > arch_kexec_apply_relocations_add() is responsible for re-linking any
> > relative symbols in purgatory. Please note that the purgatory is not
> > an executable, but a non-linked archive of binaries so relative symbols
> > contained here must be resolved at kexec load time.
>
> This sounds fragile to me. What is the reason we cannot let the linker
> deal with this, similar to, e.g., how the VDSO gets linked?
Please note this is exactly what x86 code does.
I guess that the reason is that x86 guys borrowed the logic directly
from kexec-tools.
> Otherwise, couldn't we reuse the module loader to get these objects
> relocated in memory? I'm sure there are differences that would require
> some changes there, but implementing all of this again sounds like
> overkill to me.
I'll look at both of your suggestions.
Thanks,
-Takahiro AKASHI
On Thu, Aug 24, 2017 at 06:04:40PM +0100, Mark Rutland wrote:
> On Thu, Aug 24, 2017 at 05:18:06PM +0900, AKASHI Takahiro wrote:
> > Most of sha256 code is based on crypto/sha256-glue.c, particularly using
> > non-neon version.
> >
> > Please note that we won't be able to re-use lib/mem*.S for purgatory
> > because unaligned memory access is not allowed in purgatory where mmu
> > is turned off.
> >
> > Since purgatory is not linked with the other part of kernel, care must be
> > taken of selecting an appropriate set of compiler options in order to
> > prevent undefined symbol references from being generated.
>
> What is the point in performing this check in the purgatory code, when
> this will presumably have been checked when the image is loaded?
Well, this is what x86 does :)
On powerpc, meanwhile, they don't have this check.
Maybe to avoid booting corrupted kernel after loading?
(loaded data are now protected by making them unmapped, though.)
> [...]
>
> > diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
> > index bc4e6b3bf8a1..74d028b838bd 100644
> > --- a/arch/arm64/purgatory/entry.S
> > +++ b/arch/arm64/purgatory/entry.S
> > @@ -6,6 +6,11 @@
> > .text
> >
> > ENTRY(purgatory_start)
> > + adr x19, .Lstack
> > + mov sp, x19
> > +
> > + bl purgatory
> > +
> > /* Start new image. */
> > ldr x17, arm64_kernel_entry
> > ldr x0, arm64_dtb_addr
> > @@ -15,6 +20,14 @@ ENTRY(purgatory_start)
> > br x17
> > END(purgatory_start)
> >
> > +.ltorg
> > +
> > +.align 4
> > + .rept 256
> > + .quad 0
> > + .endr
> > +.Lstack:
> > +
> > .data
>
> Why is the stack in .text?
to call verify_sha256_digest() from asm
> Does this need to be zeroed?
No :)
> If it does, why not something like:
>
> .fill PURGATORY_STACK_SIZE 1, 0
>
> >
> > .align 3
> > diff --git a/arch/arm64/purgatory/purgatory.c b/arch/arm64/purgatory/purgatory.c
> > new file mode 100644
> > index 000000000000..7fcbefa786bc
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/purgatory.c
> > @@ -0,0 +1,20 @@
> > +/*
> > + * purgatory: Runs between two kernels
> > + *
> > + * Copyright (c) 2017 Linaro Limited
> > + * Author: AKASHI Takahiro <[email protected]>
> > + */
> > +
> > +#include "sha256.h"
> > +
> > +void purgatory(void)
> > +{
> > + int ret;
> > +
> > + ret = verify_sha256_digest();
> > + if (ret) {
> > + /* loop forever */
> > + for (;;)
> > + ;
> > + }
> > +}
>
> Surely we can do something slightly better than a busy loop? e.g.
> something like the __no_granule_support loop in head.s?
Okey.
> > diff --git a/arch/arm64/purgatory/sha256-core.S b/arch/arm64/purgatory/sha256-core.S
> > new file mode 100644
> > index 000000000000..24f5ce25b61e
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/sha256-core.S
> > @@ -0,0 +1 @@
> > +#include "../crypto/sha256-core.S_shipped"
> > diff --git a/arch/arm64/purgatory/sha256.c b/arch/arm64/purgatory/sha256.c
> > new file mode 100644
> > index 000000000000..5d20d81767e3
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/sha256.c
> > @@ -0,0 +1,79 @@
> > +#include <linux/kexec.h>
> > +#include <linux/purgatory.h>
> > +#include <linux/types.h>
> > +
> > +/*
> > + * Under KASAN, those are defined as un-instrumented version, __memxxx()
> > + */
> > +#undef memcmp
> > +#undef memcpy
> > +#undef memset
>
> This doesn't look like the right place for this undeffery; it looks
> rather fragile.
Yeah, I agree, but if not there, __memxxx() are used.
> > diff --git a/arch/arm64/purgatory/string.c b/arch/arm64/purgatory/string.c
> > new file mode 100644
> > index 000000000000..33233a210a65
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/string.c
> > @@ -0,0 +1,32 @@
> > +#include <linux/types.h>
> > +
> > +void *memcpy(void *dst, const void *src, size_t len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < len; i++)
> > + ((u8 *)dst)[i] = ((u8 *)src)[i];
> > +
> > + return NULL;
> > +}
> > +
> > +void *memset(void *dst, int c, size_t len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < len; i++)
> > + ((u8 *)dst)[i] = (u8)c;
> > +
> > + return NULL;
> > +}
> > +
> > +int memcmp(const void *src, const void *dst, size_t len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < len; i++)
> > + if (*(char *)src != *(char *)dst)
> > + return 1;
> > +
> > + return 0;
> > +}
>
> How is the compiler prevented from "optimising" these into calls to
> themselves?
I don't get what you mean by "calls to themselves."
Thanks,
-Takahiro AKASHI
> I suspect these will need to be written in asm.
>
> Thanks,
> Mark.
On Thu, Aug 24, 2017 at 10:13:49AM +0100, Ard Biesheuvel wrote:
> On 24 August 2017 at 09:18, AKASHI Takahiro <[email protected]> wrote:
> > Most of sha256 code is based on crypto/sha256-glue.c, particularly using
> > non-neon version.
> >
> > Please note that we won't be able to re-use lib/mem*.S for purgatory
> > because unaligned memory access is not allowed in purgatory where mmu
> > is turned off.
> >
> > Since purgatory is not linked with the other part of kernel, care must be
> > taken of selecting an appropriate set of compiler options in order to
> > prevent undefined symbol references from being generated.
> >
> > Signed-off-by: AKASHI Takahiro <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: Ard Biesheuvel <[email protected]>
> > ---
> > arch/arm64/crypto/sha256-core.S_shipped | 2 +
> > arch/arm64/purgatory/Makefile | 21 ++++++++-
> > arch/arm64/purgatory/entry.S | 13 ++++++
> > arch/arm64/purgatory/purgatory.c | 20 +++++++++
> > arch/arm64/purgatory/sha256-core.S | 1 +
> > arch/arm64/purgatory/sha256.c | 79 +++++++++++++++++++++++++++++++++
> > arch/arm64/purgatory/sha256.h | 1 +
> > arch/arm64/purgatory/string.c | 32 +++++++++++++
> > arch/arm64/purgatory/string.h | 5 +++
> > 9 files changed, 173 insertions(+), 1 deletion(-)
> > create mode 100644 arch/arm64/purgatory/purgatory.c
> > create mode 100644 arch/arm64/purgatory/sha256-core.S
> > create mode 100644 arch/arm64/purgatory/sha256.c
> > create mode 100644 arch/arm64/purgatory/sha256.h
> > create mode 100644 arch/arm64/purgatory/string.c
> > create mode 100644 arch/arm64/purgatory/string.h
> >
> > diff --git a/arch/arm64/crypto/sha256-core.S_shipped b/arch/arm64/crypto/sha256-core.S_shipped
> > index 3ce82cc860bc..9ce7419c9152 100644
> > --- a/arch/arm64/crypto/sha256-core.S_shipped
> > +++ b/arch/arm64/crypto/sha256-core.S_shipped
> > @@ -1210,6 +1210,7 @@ sha256_block_armv8:
> > ret
> > .size sha256_block_armv8,.-sha256_block_armv8
> > #endif
> > +#ifndef __PURGATORY__
> > #ifdef __KERNEL__
> > .globl sha256_block_neon
> > #endif
> > @@ -2056,6 +2057,7 @@ sha256_block_neon:
> > add sp,sp,#16*4+16
> > ret
> > .size sha256_block_neon,.-sha256_block_neon
> > +#endif
> > #ifndef __KERNEL__
> > .comm OPENSSL_armcap_P,4,4
> > #endif
>
> Could you please try to find another way to address this?
> sha256-core.S_shipped is generated code from the accompanying Perl
> script, and that script is kept in sync with upstream OpenSSL. Also,
> the performance delta between the generic code is not /that/
> spectacular, so we may simply use that instead.
I see.
Do you mean that "generic" code is a C source?
Thanks,
-Takahiro AKASHI
>
> > diff --git a/arch/arm64/purgatory/Makefile b/arch/arm64/purgatory/Makefile
> > index c2127a2cbd51..d9b38be31e0a 100644
> > --- a/arch/arm64/purgatory/Makefile
> > +++ b/arch/arm64/purgatory/Makefile
> > @@ -1,14 +1,33 @@
> > OBJECT_FILES_NON_STANDARD := y
> >
> > -purgatory-y := entry.o
> > +purgatory-y := entry.o purgatory.o sha256.o sha256-core.o string.o
> >
> > targets += $(purgatory-y)
> > PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
> >
> > +# Purgatory is expected to be ET_REL, not an executable
> > LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined \
> > -nostdlib -z nodefaultlib
> > +
> > targets += purgatory.ro
> >
> > +GCOV_PROFILE := n
> > +KASAN_SANITIZE := n
> > +KCOV_INSTRUMENT := n
> > +
> > +# Some kernel configurations may generate additional code containing
> > +# undefined symbols, like _mcount for ftrace and __stack_chk_guard
> > +# for stack-protector. Those should be removed from purgatory.
> > +
> > +CFLAGS_REMOVE_purgatory.o = -pg
> > +CFLAGS_REMOVE_sha256.o = -pg
> > +CFLAGS_REMOVE_string.o = -pg
> > +
> > +NO_PROTECTOR := $(call cc-option, -fno-stack-protector)
> > +KBUILD_CFLAGS += $(NO_PROTECTOR)
> > +
> > +KBUILD_AFLAGS += -D__PURGATORY__
> > +
> > $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> > $(call if_changed,ld)
> >
> > diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
> > index bc4e6b3bf8a1..74d028b838bd 100644
> > --- a/arch/arm64/purgatory/entry.S
> > +++ b/arch/arm64/purgatory/entry.S
> > @@ -6,6 +6,11 @@
> > .text
> >
> > ENTRY(purgatory_start)
> > + adr x19, .Lstack
> > + mov sp, x19
> > +
> > + bl purgatory
> > +
> > /* Start new image. */
> > ldr x17, arm64_kernel_entry
> > ldr x0, arm64_dtb_addr
> > @@ -15,6 +20,14 @@ ENTRY(purgatory_start)
> > br x17
> > END(purgatory_start)
> >
> > +.ltorg
> > +
> > +.align 4
> > + .rept 256
> > + .quad 0
> > + .endr
> > +.Lstack:
> > +
> > .data
> >
> > .align 3
> > diff --git a/arch/arm64/purgatory/purgatory.c b/arch/arm64/purgatory/purgatory.c
> > new file mode 100644
> > index 000000000000..7fcbefa786bc
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/purgatory.c
> > @@ -0,0 +1,20 @@
> > +/*
> > + * purgatory: Runs between two kernels
> > + *
> > + * Copyright (c) 2017 Linaro Limited
> > + * Author: AKASHI Takahiro <[email protected]>
> > + */
> > +
> > +#include "sha256.h"
> > +
> > +void purgatory(void)
> > +{
> > + int ret;
> > +
> > + ret = verify_sha256_digest();
> > + if (ret) {
> > + /* loop forever */
> > + for (;;)
> > + ;
> > + }
> > +}
> > diff --git a/arch/arm64/purgatory/sha256-core.S b/arch/arm64/purgatory/sha256-core.S
> > new file mode 100644
> > index 000000000000..24f5ce25b61e
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/sha256-core.S
> > @@ -0,0 +1 @@
> > +#include "../crypto/sha256-core.S_shipped"
> > diff --git a/arch/arm64/purgatory/sha256.c b/arch/arm64/purgatory/sha256.c
> > new file mode 100644
> > index 000000000000..5d20d81767e3
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/sha256.c
> > @@ -0,0 +1,79 @@
> > +#include <linux/kexec.h>
> > +#include <linux/purgatory.h>
> > +#include <linux/types.h>
> > +
> > +/*
> > + * Under KASAN, those are defined as un-instrumented version, __memxxx()
> > + */
> > +#undef memcmp
> > +#undef memcpy
> > +#undef memset
> > +
> > +#include "string.h"
> > +#include <crypto/hash.h>
> > +#include <crypto/sha.h>
> > +#include <crypto/sha256_base.h>
> > +
> > +u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
> > +struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX]
> > + __section(.kexec-purgatory);
> > +
> > +asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
> > + unsigned int num_blks);
> > +
> > +static int sha256_init(struct shash_desc *desc)
> > +{
> > + return sha256_base_init(desc);
> > +}
> > +
> > +static int sha256_update(struct shash_desc *desc, const u8 *data,
> > + unsigned int len)
> > +{
> > + return sha256_base_do_update(desc, data, len,
> > + (sha256_block_fn *)sha256_block_data_order);
> > +}
> > +
> > +static int __sha256_base_finish(struct shash_desc *desc, u8 *out)
> > +{
> > + /* we can't do crypto_shash_digestsize(desc->tfm) */
> > + unsigned int digest_size = 32;
> > + struct sha256_state *sctx = shash_desc_ctx(desc);
> > + __be32 *digest = (__be32 *)out;
> > + int i;
> > +
> > + for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32))
> > + put_unaligned_be32(sctx->state[i], digest++);
> > +
> > + *sctx = (struct sha256_state){};
> > + return 0;
> > +}
> > +
> > +static int sha256_final(struct shash_desc *desc, u8 *out)
> > +{
> > + sha256_base_do_finalize(desc,
> > + (sha256_block_fn *)sha256_block_data_order);
> > +
> > + return __sha256_base_finish(desc, out);
> > +}
> > +
> > +int verify_sha256_digest(void)
> > +{
> > + char __sha256_desc[sizeof(struct shash_desc) +
> > + sizeof(struct sha256_state)] CRYPTO_MINALIGN_ATTR;
> > + struct shash_desc *desc = (struct shash_desc *)__sha256_desc;
> > + struct kexec_sha_region *ptr, *end;
> > + u8 digest[SHA256_DIGEST_SIZE];
> > +
> > + sha256_init(desc);
> > +
> > + end = purgatory_sha_regions + ARRAY_SIZE(purgatory_sha_regions);
> > + for (ptr = purgatory_sha_regions; ptr < end; ptr++)
> > + sha256_update(desc, (uint8_t *)(ptr->start), ptr->len);
> > +
> > + sha256_final(desc, digest);
> > +
> > + if (memcmp(digest, purgatory_sha256_digest, sizeof(digest)))
> > + return 1;
> > +
> > + return 0;
> > +}
> > diff --git a/arch/arm64/purgatory/sha256.h b/arch/arm64/purgatory/sha256.h
> > new file mode 100644
> > index 000000000000..54dc3c33c469
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/sha256.h
> > @@ -0,0 +1 @@
> > +extern int verify_sha256_digest(void);
> > diff --git a/arch/arm64/purgatory/string.c b/arch/arm64/purgatory/string.c
> > new file mode 100644
> > index 000000000000..33233a210a65
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/string.c
> > @@ -0,0 +1,32 @@
> > +#include <linux/types.h>
> > +
> > +void *memcpy(void *dst, const void *src, size_t len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < len; i++)
> > + ((u8 *)dst)[i] = ((u8 *)src)[i];
> > +
> > + return NULL;
> > +}
> > +
> > +void *memset(void *dst, int c, size_t len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < len; i++)
> > + ((u8 *)dst)[i] = (u8)c;
> > +
> > + return NULL;
> > +}
> > +
> > +int memcmp(const void *src, const void *dst, size_t len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < len; i++)
> > + if (*(char *)src != *(char *)dst)
> > + return 1;
> > +
> > + return 0;
> > +}
> > diff --git a/arch/arm64/purgatory/string.h b/arch/arm64/purgatory/string.h
> > new file mode 100644
> > index 000000000000..cb5f68dd84ef
> > --- /dev/null
> > +++ b/arch/arm64/purgatory/string.h
> > @@ -0,0 +1,5 @@
> > +#include <linux/types.h>
> > +
> > +int memcmp(const void *s1, const void *s2, size_t len);
> > +void *memcpy(void *dst, const void *src, size_t len);
> > +void *memset(void *dst, int c, size_t len);
> > --
> > 2.14.1
> >
On Thu, Aug 24, 2017 at 06:11:31PM +0100, Mark Rutland wrote:
> On Thu, Aug 24, 2017 at 05:18:07PM +0900, AKASHI Takahiro wrote:
> > load_other_segments() sets up and adds all the memory segments necessary
> > other than kernel, including initrd, device-tree blob and purgatory.
> > Most of the code was borrowed from kexec-tools' counterpart.
> >
> > In addition, arch_kexec_image_probe(), arch_kexec_image_load() and
> > arch_kexec_kernel_verify_sig() are stubs for supporting multiple types
> > of kernel image formats.
> >
> > Signed-off-by: AKASHI Takahiro <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > ---
> > arch/arm64/include/asm/kexec.h | 18 +++
> > arch/arm64/kernel/machine_kexec_file.c | 255 +++++++++++++++++++++++++++++++++
> > 2 files changed, 273 insertions(+)
>
> > +int load_other_segments(struct kimage *image, unsigned long kernel_load_addr,
> > + char *initrd, unsigned long initrd_len,
> > + char *cmdline, unsigned long cmdline_len)
> > +{
> > + struct kexec_buf kbuf;
> > + unsigned long initrd_load_addr = 0;
> > + unsigned long purgatory_load_addr, dtb_load_addr;
> > + char *dtb = NULL;
> > + unsigned long dtb_len;
> > + int ret = 0;
> > +
> > + kbuf.image = image;
> > +
> > + /* Load initrd */
> > + if (initrd) {
> > + kbuf.buffer = initrd;
> > + kbuf.bufsz = initrd_len;
> > + kbuf.memsz = initrd_len;
> > + kbuf.buf_align = PAGE_SIZE;
> > + /* within 1GB-aligned window of up to 32GB in size */
> > + kbuf.buf_min = kernel_load_addr;
> > + kbuf.buf_max = round_down(kernel_load_addr, SZ_1G)
> > + + (unsigned long)SZ_1G * 31;
> > + kbuf.top_down = 0;
> > +
> > + ret = kexec_add_buffer(&kbuf);
> > + if (ret)
> > + goto out_err;
> > + initrd_load_addr = kbuf.mem;
> > +
> > + pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > + initrd_load_addr, initrd_len, initrd_len);
> > + }
> > +
> > + /* Load dtb blob */
> > + ret = setup_dtb(image, initrd_load_addr, initrd_len,
> > + cmdline, cmdline_len, &dtb, &dtb_len);
> > + if (ret) {
> > + pr_err("Preparing for new dtb failed\n");
> > + goto out_err;
> > + }
> > +
> > + kbuf.buffer = dtb;
> > + kbuf.bufsz = dtb_len;
> > + kbuf.memsz = dtb_len;
> > + /* not across 2MB boundary */
> > + kbuf.buf_align = SZ_2M;
> > + /*
> > + * Note for backporting:
> > + * On kernel prior to v4.2, fdt must reside within 512MB block
> > + * where the kernel also resides. So
> > + * kbuf.buf_min = round_down(kernel_load_addr, SZ_512M);
> > + * kbuf.buf_max = round_up(kernel_load_addr, SZ_512M);
> > + * would be required.
> > + */
> > + kbuf.buf_min = kernel_load_addr;
> > + kbuf.buf_max = ULONG_MAX;
> > + kbuf.top_down = 1;
>
> IIUC, this is trying to load the DTB above the kernel. Is that correct?
Yes.
> Assuming so, shouldn't that kernel_load_addr be kernel_load_addr +
> image_size from the kernel header?
Okey, it would be much safer.
> Otherwise, if the kernel is loaded close to the end of memory, the DTB
> could overlap.
Right, but we allocate the kernel from "bottom up"(top_down=0)
and such a corruption is very unlikely. If it happens, it means
that system memory is just too small.
Thanks,
-Takahiro AKASHI
> Thanks,
> Mark.
On Thu, Aug 24, 2017 at 06:23:37PM +0100, Mark Rutland wrote:
> On Thu, Aug 24, 2017 at 05:18:10PM +0900, AKASHI Takahiro wrote:
> > The "Image" binary will be loaded at the offset of TEXT_OFFSET from
> > the start of system memory. TEXT_OFFSET is basically determined from
> > the header of the image.
>
> What's the policy for the binary types kexec_file_load() will load, and
> how are these identified? AFAICT, there are no flags, so it looks like
> we're just checking the magic and hoping.
Yes, please see image_probe().
> > Regarding kernel verification, it will be done through
> > verify_pefile_signature() as arm64's "Image" binary can be seen as
> > in PE format. This approach is consistent with x86 implementation.
>
> This will not work for kernels built without CONFIG_EFI, where we don't
> have a PE header.
Right.
> What happens in that case?
In this case, we cannot find a signature in the binary when loading,
so kexec just fails.
Signature is a must if the kernel is configured with KEXEC_FILE_VERIFY.
Thanks,
-Takahiro AKASHI
> [...]
>
> > +/**
> > + * arm64_header_check_msb - Helper to check the arm64 image header.
> > + *
> > + * Returns non-zero if the image was built as big endian.
> > + */
> > +
> > +static inline int arm64_header_check_msb(const struct arm64_image_header *h)
> > +{
> > + if (!h)
> > + return 0;
> > +
> > + return !!(h->flags[7] & arm64_image_flag_7_be);
> > +}
>
> What are we going to use this for?
Nowhere. I forgot to remove it.
> In kernel, we use the term "BE" rather than "MSB", and it's unfortunate
> to have code with varying naming conventions.
>
> [...]
>
> > +static void *image_load(struct kimage *image, char *kernel,
> > + unsigned long kernel_len, char *initrd,
> > + unsigned long initrd_len, char *cmdline,
> > + unsigned long cmdline_len)
> > +{
> > + struct kexec_buf kbuf;
> > + struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> > + unsigned long text_offset, kernel_load_addr;
> > + int ret;
> > +
> > + /* Create elf core header segment */
> > + ret = load_crashdump_segments(image);
> > + if (ret)
> > + goto out;
> > +
> > + /* Load the kernel */
> > + kbuf.image = image;
> > + if (image->type == KEXEC_TYPE_CRASH) {
> > + kbuf.buf_min = crashk_res.start;
> > + kbuf.buf_max = crashk_res.end + 1;
> > + } else {
> > + kbuf.buf_min = 0;
> > + kbuf.buf_max = ULONG_MAX;
> > + }
> > + kbuf.top_down = 0;
> > +
> > + kbuf.buffer = kernel;
> > + kbuf.bufsz = kernel_len;
> > + if (h->image_size) {
> > + kbuf.memsz = le64_to_cpu(h->image_size);
> > + text_offset = le64_to_cpu(h->text_offset);
> > + } else {
> > + /* v3.16 or older */
> > + kbuf.memsz = kbuf.bufsz; /* NOTE: not including BSS */
>
> Why bother supporting < 3.16 kernels?
Because kexec-tools does :)
> They predate regulate kexec, we know we don't have enough information to
> boot such kernels reliably, and arguably attempting to load one would
> indicate some kind of rollback attack.
Around the time when Geoff were originally working on kexec,
there might be some possibility that people might want to boot a bit older
kernel, I guess.
Thanks,
-Takahiro AKASHI
> Thanks,
> Mark.
On Thu, Aug 24, 2017 at 06:30:50PM +0100, Mark Rutland wrote:
> On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
> > The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
> > will be loaded at the offset of TEXT_OFFSET from the begining of system
> > memory. The other PT_LOAD segments are placed relative to the first one.
>
> I really don't like assuming things about the vmlinux ELF file.
If so, vmlinux is not an appropriate format for loading.
> > Regarding kernel verification, since there is no standard way to contain
> > a signature within elf binary, we follow PowerPC's (not yet upstreamed)
> > approach, that is, appending a signature right after the kernel binary
> > itself like module signing.
>
> I also *really* don't like this. It's a bizarre in-band mechanism,
> without explcit information. It's not a nice ABI.
>
> If we can load an Image, why do we need to be able to load a vmlinux?
Well, kexec-tools does. I don't know why Geoff wanted to support vmlinux.
I'm just trying to support what kexec-tools does support.
> [...]
>
> > diff --git a/arch/arm64/kernel/kexec_elf.c b/arch/arm64/kernel/kexec_elf.c
> > new file mode 100644
> > index 000000000000..7bd3c1e1f65a
> > --- /dev/null
> > +++ b/arch/arm64/kernel/kexec_elf.c
> > @@ -0,0 +1,216 @@
> > +/*
> > + * Kexec vmlinux loader
> > +
> > + * Copyright (C) 2017 Linaro Limited
> > + * Authors: AKASHI Takahiro <[email protected]>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#define pr_fmt(fmt) "kexec_file(elf): " fmt
> > +
> > +#include <linux/elf.h>
> > +#include <linux/err.h>
> > +#include <linux/errno.h>
> > +#include <linux/kernel.h>
> > +#include <linux/kexec.h>
> > +#include <linux/module_signature.h>
> > +#include <linux/types.h>
> > +#include <linux/verification.h>
> > +#include <asm/byteorder.h>
> > +#include <asm/kexec_file.h>
> > +#include <asm/memory.h>
> > +
> > +static int elf64_probe(const char *buf, unsigned long len)
> > +{
> > + struct elfhdr ehdr;
> > +
> > + /* Check for magic and architecture */
> > + memcpy(&ehdr, buf, sizeof(ehdr));
> > + if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) ||
> > + (elf16_to_cpu(&ehdr, ehdr.e_machine) != EM_AARCH64))
> > + return -ENOEXEC;
> > +
> > + return 0;
> > +}
> > +
> > +static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr,
> > + struct elf_info *elf_info,
> > + unsigned long *kernel_load_addr)
> > +{
> > + struct kexec_buf kbuf;
> > + const struct elf_phdr *phdr;
> > + const struct arm64_image_header *h;
> > + unsigned long text_offset, rand_offset;
> > + unsigned long page_offset, phys_offset;
> > + int first_segment, i, ret = -ENOEXEC;
> > +
> > + kbuf.image = image;
> > + if (image->type == KEXEC_TYPE_CRASH) {
> > + kbuf.buf_min = crashk_res.start;
> > + kbuf.buf_max = crashk_res.end + 1;
> > + } else {
> > + kbuf.buf_min = 0;
> > + kbuf.buf_max = ULONG_MAX;
> > + }
> > + kbuf.top_down = 0;
> > +
> > + /* Load PT_LOAD segments. */
> > + for (i = 0, first_segment = 1; i < ehdr->e_phnum; i++) {
> > + phdr = &elf_info->proghdrs[i];
> > + if (phdr->p_type != PT_LOAD)
> > + continue;
> > +
> > + kbuf.buffer = (void *) elf_info->buffer + phdr->p_offset;
> > + kbuf.bufsz = min(phdr->p_filesz, phdr->p_memsz);
> > + kbuf.memsz = phdr->p_memsz;
> > + kbuf.buf_align = phdr->p_align;
> > +
> > + if (first_segment) {
> > + /*
> > + * Identify TEXT_OFFSET:
> > + * When CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET=y the image
> > + * header could be offset in the elf segment. The linker
> > + * script sets ehdr->e_entry to the start of text.
>
> Please, let's not have to go delving into the vmlinux, knowing intimate
> details about how it's put together.
If we don't need to take care of RANDOMIZE_TEXT_OFFSET, the code would
be much simpler and look similar to Image code.
>
> > + *
> > + * NOTE: In v3.16 or older, h->text_offset is 0,
> > + * so use the default, 0x80000
> > + */
> > + rand_offset = ehdr->e_entry - phdr->p_vaddr;
> > + h = (struct arm64_image_header *)
> > + (elf_info->buffer + phdr->p_offset +
> > + rand_offset);
> > +
> > + if (!arm64_header_check_magic(h))
> > + goto out;
> > +
> > + if (h->image_size)
> > + text_offset = le64_to_cpu(h->text_offset);
> > + else
> > + text_offset = 0x80000;
>
> Surely we can share the Image header parsing with the Image parser?
>
> The Image code had practically the exact same logic operating on the
> header struct.
Thanks,
-Takahiro AKASHI
> Thanks,
> Mark.
On 08/24/17 at 05:18pm, AKASHI Takahiro wrote:
> prepare_elf_headers() can also be useful for other architectures,
> including arm64. So let it factored out.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Baoquan He <[email protected]>
> ---
> arch/x86/kernel/crash.c | 324 ----------------------------------------------
> include/linux/kexec.h | 19 +++
> kernel/crash_core.c | 333 ++++++++++++++++++++++++++++++++++++++++++++++++
It looks better to add these to kexec_file.c instead.
Thanks
Dave
On 08/25/17 at 11:03am, AKASHI Takahiro wrote:
> On Thu, Aug 24, 2017 at 06:30:50PM +0100, Mark Rutland wrote:
> > On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
> > > The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
> > > will be loaded at the offset of TEXT_OFFSET from the begining of system
> > > memory. The other PT_LOAD segments are placed relative to the first one.
> >
> > I really don't like assuming things about the vmlinux ELF file.
>
> If so, vmlinux is not an appropriate format for loading.
>
> > > Regarding kernel verification, since there is no standard way to contain
> > > a signature within elf binary, we follow PowerPC's (not yet upstreamed)
> > > approach, that is, appending a signature right after the kernel binary
> > > itself like module signing.
> >
> > I also *really* don't like this. It's a bizarre in-band mechanism,
> > without explcit information. It's not a nice ABI.
> >
> > If we can load an Image, why do we need to be able to load a vmlinux?
>
> Well, kexec-tools does. I don't know why Geoff wanted to support vmlinux.
> I'm just trying to support what kexec-tools does support.
We only add things when it is really necessary, kexec-tools
functionalities should have some historic reasons.
If only for doing kexec-tools has done I would say just not to do it.
Thanks
Dave
On Fri, Aug 25, 2017 at 10:00:59AM +0900, AKASHI Takahiro wrote:
> On Thu, Aug 24, 2017 at 05:56:17PM +0100, Mark Rutland wrote:
> > On Thu, Aug 24, 2017 at 05:18:05PM +0900, AKASHI Takahiro wrote:
> > > This is a basic purgtory, or a kind of glue code between the two kernel,
> > > for arm64. We will later add a feature of verifying a digest check against
> > > loaded memory segments.
> > >
> > > arch_kexec_apply_relocations_add() is responsible for re-linking any
> > > relative symbols in purgatory. Please note that the purgatory is not
> > > an executable, but a non-linked archive of binaries so relative symbols
> > > contained here must be resolved at kexec load time.
> > > Despite that arm64_kernel_start and arm64_dtb_addr are only such global
> > > variables now, arch_kexec_apply_relocations_add() can manage more various
> > > types of relocations.
> >
> > Why does the purgatory code need to be so complex?
> >
> > Why is it not possible to write this as position-independent asm?
>
> I don't get your point, but please note that these values are also
> re-written by the 1st kernel when it loads the 2nd kernel and so
> they must appear as globals.
My fear about complexity is that we must "re-link" the purgatory.
I don't understand why that has to be necessary. Surely we can have the
purgatory code be position independent, and store those globals in a
single struct purgatory_info that we can fill in from the host?
i.e. similar to what we do for values shared with the VDSO, where we
just poke vdso_data->field, no re-linking required.
Otherwise, why can't the purgatory code be written in assembly? AFAICT,
the only complex part is the hashing code, which I don't beleive is
strictly necessary.
[...]
> > > +/*
> > > + * Apply purgatory relocations.
> > > + *
> > > + * ehdr: Pointer to elf headers
> > > + * sechdrs: Pointer to section headers.
> > > + * relsec: section index of SHT_RELA section.
> > > + *
> > > + * Note:
> > > + * Currently R_AARCH64_ABS64, R_AARCH64_LD_PREL_LO19 and R_AARCH64_CALL26
> > > + * are the only types to be generated from purgatory code.
> >
> > Is this all that has been observed, or is this ensured somehow?
>
> It was observed by inserting a debug print message in this function,
> I'm not sure whether we can restrict only those three types.
If we have to perform linking, I don't think we can assume the above is
sufficient.
> > The arch_kexec_apply_relocations_add() function below duplicates a lot
> > of logic that already exists in the arm64 module loader's
> > apply_relocate_add() function.
> >
> > Please reuse that code. Having a duplicate or alternative implementation
> > is just asking for subtle bugs.
>
> Okey, I'll look at it.
Ok.
As above, I think it would be preferable that we avoid linking entirely.
Thanks,
Mark.
On Fri, Aug 25, 2017 at 10:21:06AM +0900, AKASHI Takahiro wrote:
> On Thu, Aug 24, 2017 at 06:04:40PM +0100, Mark Rutland wrote:
> > On Thu, Aug 24, 2017 at 05:18:06PM +0900, AKASHI Takahiro wrote:
> > > Most of sha256 code is based on crypto/sha256-glue.c, particularly using
> > > non-neon version.
> > >
> > > Please note that we won't be able to re-use lib/mem*.S for purgatory
> > > because unaligned memory access is not allowed in purgatory where mmu
> > > is turned off.
> > >
> > > Since purgatory is not linked with the other part of kernel, care must be
> > > taken of selecting an appropriate set of compiler options in order to
> > > prevent undefined symbol references from being generated.
> >
> > What is the point in performing this check in the purgatory code, when
> > this will presumably have been checked when the image is loaded?
>
> Well, this is what x86 does :)
> On powerpc, meanwhile, they don't have this check.
>
> Maybe to avoid booting corrupted kernel after loading?
> (loaded data are now protected by making them unmapped, though.)
I'd really prefer to avoid this, since it seems to be what necessitates
all the complexity for executing C code (linking and all), and it's
going to be very slow to execute with the MMU off.
If you can deliberately corrupt the next kernel, you could also have
corrupted the purgatory to skip the check.
Unless we have a strong reason to want the hash check, I think it should
be dropped.
> > > diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
> > > index bc4e6b3bf8a1..74d028b838bd 100644
> > > --- a/arch/arm64/purgatory/entry.S
> > > +++ b/arch/arm64/purgatory/entry.S
> > > @@ -6,6 +6,11 @@
> > > .text
> > >
> > > ENTRY(purgatory_start)
> > > + adr x19, .Lstack
> > > + mov sp, x19
> > > +
> > > + bl purgatory
> > > +
> > > /* Start new image. */
> > > ldr x17, arm64_kernel_entry
> > > ldr x0, arm64_dtb_addr
> > > @@ -15,6 +20,14 @@ ENTRY(purgatory_start)
> > > br x17
> > > END(purgatory_start)
> > >
> > > +.ltorg
> > > +
> > > +.align 4
> > > + .rept 256
> > > + .quad 0
> > > + .endr
> > > +.Lstack:
> > > +
> > > .data
> >
> > Why is the stack in .text?
>
> to call verify_sha256_digest() from asm
Won't that also work if the stack is in .data? or .bss?
... or is there a particular need for it to be in .text?
> > Does this need to be zeroed?
>
> No :)
Ok, so we can probably do:
.data
.align 4
. += PURGATORY_STACK_SIZE
.Lstack_ptr:
... assuming we need to run C code.
[...]
> > > diff --git a/arch/arm64/purgatory/sha256.c b/arch/arm64/purgatory/sha256.c
> > > new file mode 100644
> > > index 000000000000..5d20d81767e3
> > > --- /dev/null
> > > +++ b/arch/arm64/purgatory/sha256.c
> > > @@ -0,0 +1,79 @@
> > > +#include <linux/kexec.h>
> > > +#include <linux/purgatory.h>
> > > +#include <linux/types.h>
> > > +
> > > +/*
> > > + * Under KASAN, those are defined as un-instrumented version, __memxxx()
> > > + */
> > > +#undef memcmp
> > > +#undef memcpy
> > > +#undef memset
> >
> > This doesn't look like the right place for this undeffery; it looks
> > rather fragile.
>
> Yeah, I agree, but if not there, __memxxx() are used.
Ok, but we'll have to add this to every C file used in the purgatory
code, or at the start of any header that uses a memxxx() function, or it
might still be overridden to use __memxxx(), before the undef takes
effect.
Can we define __memxxx() instead?
[...]
> > > +void *memcpy(void *dst, const void *src, size_t len)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < len; i++)
> > > + ((u8 *)dst)[i] = ((u8 *)src)[i];
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > +void *memset(void *dst, int c, size_t len)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < len; i++)
> > > + ((u8 *)dst)[i] = (u8)c;
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > +int memcmp(const void *src, const void *dst, size_t len)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < len; i++)
> > > + if (*(char *)src != *(char *)dst)
> > > + return 1;
> > > +
> > > + return 0;
> > > +}
> >
> > How is the compiler prevented from "optimising" these into calls to
> > themselves?
>
> I don't get what you mean by "calls to themselves."
There are compiler optimizations that recognise sequences like:
for (i = 0; i < len; i++)
dst[i] = src[i];
... and turn those into:
memcpy(dst, src, len);
... these have been known to "optimize" memcpy implementations into
calls to themselves. Likewise for other string operations.
One way we avoid that today is by writing our memcpy in assembly.
Do we have a guarnatee that this will not happen here? e.g. do we pass
some compiler flag that prevents this?
Thanks,
Mark.
Mark Rutland <[email protected]> writes:
> On Fri, Aug 25, 2017 at 10:00:59AM +0900, AKASHI Takahiro wrote:
>> On Thu, Aug 24, 2017 at 05:56:17PM +0100, Mark Rutland wrote:
>> > On Thu, Aug 24, 2017 at 05:18:05PM +0900, AKASHI Takahiro wrote:
>> > > This is a basic purgtory, or a kind of glue code between the two kernel,
>> > > for arm64. We will later add a feature of verifying a digest check against
>> > > loaded memory segments.
>> > >
>> > > arch_kexec_apply_relocations_add() is responsible for re-linking any
>> > > relative symbols in purgatory. Please note that the purgatory is not
>> > > an executable, but a non-linked archive of binaries so relative symbols
>> > > contained here must be resolved at kexec load time.
>> > > Despite that arm64_kernel_start and arm64_dtb_addr are only such global
>> > > variables now, arch_kexec_apply_relocations_add() can manage more various
>> > > types of relocations.
>> >
>> > Why does the purgatory code need to be so complex?
>> >
>> > Why is it not possible to write this as position-independent asm?
>>
>> I don't get your point, but please note that these values are also
>> re-written by the 1st kernel when it loads the 2nd kernel and so
>> they must appear as globals.
>
> My fear about complexity is that we must "re-link" the purgatory.
>
> I don't understand why that has to be necessary. Surely we can have the
> purgatory code be position independent, and store those globals in a
> single struct purgatory_info that we can fill in from the host?
>
> i.e. similar to what we do for values shared with the VDSO, where we
> just poke vdso_data->field, no re-linking required.
Right. I'm not sure why it is a partially linked object. I believe that
the purgatory could be linked at build time into a PIE executable with
exported symbols for the variables that need to be filled in from the
host.
On some architectures (e.g., powerpc), this would greatly reduce the
number of relocation types that the kernel needs to know how to process.
On x86 it make less of a difference because the partially linked object
already has just a handful of relocation types.
> Otherwise, why can't the purgatory code be written in assembly? AFAICT,
> the only complex part is the hashing code, which I don't beleive is
> strictly necessary.
When I posted a similar series for powerpc with similar changes to
handle a partially linked purgatory in the kernel, Michael Ellerman
preferred to go for a purgatory written in assembly, partially based on
the one from kexec-lite. That purgatory doesn't do the checksum
verification of the segments.
--
Thiago Jung Bauermann
IBM Linux Technology Center
Mark Rutland <[email protected]> writes:
> On Thu, Aug 24, 2017 at 06:30:50PM +0100, Mark Rutland wrote:
>> On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
>> > The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
>> > will be loaded at the offset of TEXT_OFFSET from the begining of system
>> > memory. The other PT_LOAD segments are placed relative to the first one.
>>
>> I really don't like assuming things about the vmlinux ELF file.
>>
>> > Regarding kernel verification, since there is no standard way to contain
>> > a signature within elf binary, we follow PowerPC's (not yet upstreamed)
>> > approach, that is, appending a signature right after the kernel binary
>> > itself like module signing.
>>
>> I also *really* don't like this. It's a bizarre in-band mechanism,
>> without explcit information. It's not a nice ABI.
>>
>> If we can load an Image, why do we need to be able to load a vmlinux?
>
> So IIUC, the whole point of this is to be able to kexec_file_load() a
> vmlinux + signature bundle, for !CONFIG_EFI kernels.
>
> For that, I think that we actually need a new kexec_file_load${N}
> syscall, where we can pass the signature for the kernel as a separate
> file. Ideally also with a flags argument and perhaps the ability to sign
> the initrd too.
>
> That way we don't ahve to come up with a magic vmlinux+signature format,
> as we can just pass a regular image and a signature for that image
> separately. That should work for PPC and others, too.
powerpc uses the same format that is used for signed kernel modules,
which is a signature appended at the end of the file. It doesn't need to
be passed separately since it's embedded in the file itself.
The kernel already has a mechanism to verify signatures that aren't
embedded in the file: it's possible to use IMA via the LSM hook in
kernel_read_file_from_fd (which is called in
kimage_file_prepare_segments) to verify a signature stored in an
extended attribute by using an IMA policy rule such as:
appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig
Of course, that only works if the kernel image is stored in a filesystem
which supports extended attributes. But that is the case of most
filesystems nowadays, with the notable exception of FAT-based
filesystems.
evmctl, the IMA userspace tool, also support signatures stored in a
separate file as well ("sidecar" signatures), but the kernel can only
verify them if they are copied into an xattr (which I believe the
userspace tool can do).
--
Thiago Jung Bauermann
IBM Linux Technology Center
On Thu, Aug 24, 2017 at 06:30:50PM +0100, Mark Rutland wrote:
> On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
> > The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
> > will be loaded at the offset of TEXT_OFFSET from the begining of system
> > memory. The other PT_LOAD segments are placed relative to the first one.
>
> I really don't like assuming things about the vmlinux ELF file.
>
> > Regarding kernel verification, since there is no standard way to contain
> > a signature within elf binary, we follow PowerPC's (not yet upstreamed)
> > approach, that is, appending a signature right after the kernel binary
> > itself like module signing.
>
> I also *really* don't like this. It's a bizarre in-band mechanism,
> without explcit information. It's not a nice ABI.
>
> If we can load an Image, why do we need to be able to load a vmlinux?
So IIUC, the whole point of this is to be able to kexec_file_load() a
vmlinux + signature bundle, for !CONFIG_EFI kernels.
For that, I think that we actually need a new kexec_file_load${N}
syscall, where we can pass the signature for the kernel as a separate
file. Ideally also with a flags argument and perhaps the ability to sign
the initrd too.
That way we don't ahve to come up with a magic vmlinux+signature format,
as we can just pass a regular image and a signature for that image
separately. That should work for PPC and others, too.
Thanks,
Mark.
Mark Rutland <[email protected]> writes:
> On Thu, Aug 24, 2017 at 06:30:50PM +0100, Mark Rutland wrote:
>> On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
>> > The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
>> > will be loaded at the offset of TEXT_OFFSET from the begining of system
>> > memory. The other PT_LOAD segments are placed relative to the first one.
>>
>> I really don't like assuming things about the vmlinux ELF file.
>>
>> > Regarding kernel verification, since there is no standard way to contain
>> > a signature within elf binary, we follow PowerPC's (not yet upstreamed)
>> > approach, that is, appending a signature right after the kernel binary
>> > itself like module signing.
>>
>> I also *really* don't like this. It's a bizarre in-band mechanism,
>> without explcit information. It's not a nice ABI.
>>
>> If we can load an Image, why do we need to be able to load a vmlinux?
>
> So IIUC, the whole point of this is to be able to kexec_file_load() a
> vmlinux + signature bundle, for !CONFIG_EFI kernels.
>
> For that, I think that we actually need a new kexec_file_load${N}
> syscall, where we can pass the signature for the kernel as a separate
> file. Ideally also with a flags argument and perhaps the ability to sign
> the initrd too.
>
> That way we don't ahve to come up with a magic vmlinux+signature format,
You don't have to come up with one, it already exists. We've been using
it for signed modules for ~5 years.
It also has the advantages of being a signature of the entire ELF, no
silly games about which sections are included, and it's attached to the
vmlinux so you don't have to remember to copy it around. And the code to
produce it and verify it already exists.
cheers
On Thursday 24 August 2017 01:48 PM, AKASHI Takahiro wrote:
> This function, being a variant of walk_system_ram_res() introduced in
> commit 8c86e70acead ("resource: provide new functions to walk through
> resources"), walks through a list of all the resources of System RAM
> in reversed order, i.e., from higher to lower.
>
> It will be used in kexec_file implementation on arm64.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> ---
> include/linux/ioport.h | 3 +++
> kernel/resource.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 51 insertions(+)
>
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 6230064d7f95..9a212266299f 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -271,6 +271,9 @@ extern int
> walk_system_ram_res(u64 start, u64 end, void *arg,
> int (*func)(u64, u64, void *));
> extern int
> +walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(u64, u64, void *));
> +extern int
> walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
> void *arg, int (*func)(u64, u64, void *));
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 9b5f04404152..1d6d734c75ac 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -23,6 +23,7 @@
> #include <linux/pfn.h>
> #include <linux/mm.h>
> #include <linux/resource_ext.h>
> +#include <linux/vmalloc.h>
> #include <asm/io.h>
>
>
> @@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> return ret;
> }
>
> +int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(u64, u64, void *))
> +{
> + struct resource res, *rams;
> + u64 orig_end;
> + int count, i;
> + int ret = -1;
> +
> + count = 16; /* initial */
> +again:
> + /* create a list */
> + rams = vmalloc(sizeof(struct resource) * count);
> + if (!rams)
> + return ret;
> +
> + res.start = start;
> + res.end = end;
> + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> + orig_end = res.end;
> + i = 0;
> + while ((res.start < res.end) &&
> + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
> + if (i >= count) {
> + /* unlikely but */
> + vfree(rams);
> + count += 16;
> + goto again;
Wounld't it be better to re-alloc a bigger space,copy previous values and free
the previous pointer, instead of going *again*.
> + }
> +
> + rams[i].start = res.start;
> + rams[i++].end = res.end;
> +
> + res.start = res.end + 1;
> + res.end = orig_end;
> + }
> +
> + /* go reverse */
> + for (i--; i >= 0; i--) {
> + ret = (*func)(rams[i].start, rams[i].end, arg);
> + if (ret)
> + break;
> + }
> +
> + vfree(rams);
> + return ret;
> +}
> +
> #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
>
> /*
>
--
Regards
Pratyush
On Fri, Aug 25, 2017 at 01:47:49PM +0800, Dave Young wrote:
> On 08/24/17 at 05:18pm, AKASHI Takahiro wrote:
> > prepare_elf_headers() can also be useful for other architectures,
> > including arm64. So let it factored out.
> >
> > Signed-off-by: AKASHI Takahiro <[email protected]>
> > Cc: Dave Young <[email protected]>
> > Cc: Vivek Goyal <[email protected]>
> > Cc: Baoquan He <[email protected]>
> > ---
> > arch/x86/kernel/crash.c | 324 ----------------------------------------------
> > include/linux/kexec.h | 19 +++
> > kernel/crash_core.c | 333 ++++++++++++++++++++++++++++++++++++++++++++++++
>
> It looks better to add these to kexec_file.c instead.
Sure
-Takahiro AKASHI
> Thanks
> Dave
On Thu, Aug 31, 2017 at 08:04:51AM +0530, Pratyush Anand wrote:
>
>
> On Thursday 24 August 2017 01:48 PM, AKASHI Takahiro wrote:
> >This function, being a variant of walk_system_ram_res() introduced in
> >commit 8c86e70acead ("resource: provide new functions to walk through
> >resources"), walks through a list of all the resources of System RAM
> >in reversed order, i.e., from higher to lower.
> >
> >It will be used in kexec_file implementation on arm64.
> >
> >Signed-off-by: AKASHI Takahiro <[email protected]>
> >Cc: Vivek Goyal <[email protected]>
> >Cc: Andrew Morton <[email protected]>
> >Cc: Linus Torvalds <[email protected]>
> >---
> > include/linux/ioport.h | 3 +++
> > kernel/resource.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 51 insertions(+)
> >
> >diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> >index 6230064d7f95..9a212266299f 100644
> >--- a/include/linux/ioport.h
> >+++ b/include/linux/ioport.h
> >@@ -271,6 +271,9 @@ extern int
> > walk_system_ram_res(u64 start, u64 end, void *arg,
> > int (*func)(u64, u64, void *));
> > extern int
> >+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> >+ int (*func)(u64, u64, void *));
> >+extern int
> > walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
> > void *arg, int (*func)(u64, u64, void *));
> >diff --git a/kernel/resource.c b/kernel/resource.c
> >index 9b5f04404152..1d6d734c75ac 100644
> >--- a/kernel/resource.c
> >+++ b/kernel/resource.c
> >@@ -23,6 +23,7 @@
> > #include <linux/pfn.h>
> > #include <linux/mm.h>
> > #include <linux/resource_ext.h>
> >+#include <linux/vmalloc.h>
> > #include <asm/io.h>
> >@@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
> > return ret;
> > }
> >+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> >+ int (*func)(u64, u64, void *))
> >+{
> >+ struct resource res, *rams;
> >+ u64 orig_end;
> >+ int count, i;
> >+ int ret = -1;
> >+
> >+ count = 16; /* initial */
> >+again:
> >+ /* create a list */
> >+ rams = vmalloc(sizeof(struct resource) * count);
> >+ if (!rams)
> >+ return ret;
> >+
> >+ res.start = start;
> >+ res.end = end;
> >+ res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >+ orig_end = res.end;
> >+ i = 0;
> >+ while ((res.start < res.end) &&
> >+ (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
> >+ if (i >= count) {
> >+ /* unlikely but */
> >+ vfree(rams);
> >+ count += 16;
> >+ goto again;
>
> Wounld't it be better to re-alloc a bigger space,copy previous values and
> free the previous pointer, instead of going *again*.
Okey, I will do that.
Thanks,
-Takahiro AKASHI
> >+ }
> >+
> >+ rams[i].start = res.start;
> >+ rams[i++].end = res.end;
> >+
> >+ res.start = res.end + 1;
> >+ res.end = orig_end;
> >+ }
> >+
> >+ /* go reverse */
> >+ for (i--; i >= 0; i--) {
> >+ ret = (*func)(rams[i].start, rams[i].end, arg);
> >+ if (ret)
> >+ break;
> >+ }
> >+
> >+ vfree(rams);
> >+ return ret;
> >+}
> >+
> > #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
> > /*
> >
>
> --
> Regards
> Pratyush
On Fri, Aug 25, 2017 at 01:16:06PM -0300, Thiago Jung Bauermann wrote:
>
> Mark Rutland <[email protected]> writes:
>
> > On Fri, Aug 25, 2017 at 10:00:59AM +0900, AKASHI Takahiro wrote:
> >> On Thu, Aug 24, 2017 at 05:56:17PM +0100, Mark Rutland wrote:
> >> > On Thu, Aug 24, 2017 at 05:18:05PM +0900, AKASHI Takahiro wrote:
> >> > > This is a basic purgtory, or a kind of glue code between the two kernel,
> >> > > for arm64. We will later add a feature of verifying a digest check against
> >> > > loaded memory segments.
> >> > >
> >> > > arch_kexec_apply_relocations_add() is responsible for re-linking any
> >> > > relative symbols in purgatory. Please note that the purgatory is not
> >> > > an executable, but a non-linked archive of binaries so relative symbols
> >> > > contained here must be resolved at kexec load time.
> >> > > Despite that arm64_kernel_start and arm64_dtb_addr are only such global
> >> > > variables now, arch_kexec_apply_relocations_add() can manage more various
> >> > > types of relocations.
> >> >
> >> > Why does the purgatory code need to be so complex?
> >> >
> >> > Why is it not possible to write this as position-independent asm?
> >>
> >> I don't get your point, but please note that these values are also
> >> re-written by the 1st kernel when it loads the 2nd kernel and so
> >> they must appear as globals.
> >
> > My fear about complexity is that we must "re-link" the purgatory.
> >
> > I don't understand why that has to be necessary. Surely we can have the
> > purgatory code be position independent, and store those globals in a
> > single struct purgatory_info that we can fill in from the host?
> >
> > i.e. similar to what we do for values shared with the VDSO, where we
> > just poke vdso_data->field, no re-linking required.
>
> Right. I'm not sure why it is a partially linked object. I believe that
> the purgatory could be linked at build time into a PIE executable with
> exported symbols for the variables that need to be filled in from the
> host.
For clarification, generic kexec code expects that the purgatory is
*relocatable* (not executable in ELF terms) as compiled with -r gcc option.
On arm64, in this case, all the *global* symbols remain to be un-resolved
even if the references are local within a single section (in a file).
This would require re-linking at purgatory load time.
I'm going to resolve this issue by adding extra *local labels*.
(See my v2.)
> On some architectures (e.g., powerpc), this would greatly reduce the
> number of relocation types that the kernel needs to know how to process.
> On x86 it make less of a difference because the partially linked object
> already has just a handful of relocation types.
>
> > Otherwise, why can't the purgatory code be written in assembly? AFAICT,
> > the only complex part is the hashing code, which I don't beleive is
> > strictly necessary.
>
> When I posted a similar series for powerpc with similar changes to
> handle a partially linked purgatory in the kernel, Michael Ellerman
> preferred to go for a purgatory written in assembly, partially based on
> the one from kexec-lite. That purgatory doesn't do the checksum
> verification of the segments.
Anyhow, I will drop hash-check code from the purgatory in v2 so that
it will now be quite a simple asm.
Thanks,
-Takahiro AKASHI
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
>
On Fri, Aug 25, 2017 at 11:41:33AM +0100, Mark Rutland wrote:
> On Fri, Aug 25, 2017 at 10:21:06AM +0900, AKASHI Takahiro wrote:
> > On Thu, Aug 24, 2017 at 06:04:40PM +0100, Mark Rutland wrote:
> > > On Thu, Aug 24, 2017 at 05:18:06PM +0900, AKASHI Takahiro wrote:
> > > > Most of sha256 code is based on crypto/sha256-glue.c, particularly using
> > > > non-neon version.
> > > >
> > > > Please note that we won't be able to re-use lib/mem*.S for purgatory
> > > > because unaligned memory access is not allowed in purgatory where mmu
> > > > is turned off.
> > > >
> > > > Since purgatory is not linked with the other part of kernel, care must be
> > > > taken of selecting an appropriate set of compiler options in order to
> > > > prevent undefined symbol references from being generated.
> > >
> > > What is the point in performing this check in the purgatory code, when
> > > this will presumably have been checked when the image is loaded?
> >
> > Well, this is what x86 does :)
> > On powerpc, meanwhile, they don't have this check.
> >
> > Maybe to avoid booting corrupted kernel after loading?
> > (loaded data are now protected by making them unmapped, though.)
>
> I'd really prefer to avoid this, since it seems to be what necessitates
> all the complexity for executing C code (linking and all), and it's
> going to be very slow to execute with the MMU off.
>
> If you can deliberately corrupt the next kernel, you could also have
> corrupted the purgatory to skip the check.
>
> Unless we have a strong reason to want the hash check, I think it should
> be dropped.
As I said, I will drop the code in v2 :)
> > > > diff --git a/arch/arm64/purgatory/entry.S b/arch/arm64/purgatory/entry.S
> > > > index bc4e6b3bf8a1..74d028b838bd 100644
> > > > --- a/arch/arm64/purgatory/entry.S
> > > > +++ b/arch/arm64/purgatory/entry.S
> > > > @@ -6,6 +6,11 @@
> > > > .text
> > > >
> > > > ENTRY(purgatory_start)
> > > > + adr x19, .Lstack
> > > > + mov sp, x19
> > > > +
> > > > + bl purgatory
> > > > +
> > > > /* Start new image. */
> > > > ldr x17, arm64_kernel_entry
> > > > ldr x0, arm64_dtb_addr
> > > > @@ -15,6 +20,14 @@ ENTRY(purgatory_start)
> > > > br x17
> > > > END(purgatory_start)
> > > >
> > > > +.ltorg
> > > > +
> > > > +.align 4
> > > > + .rept 256
> > > > + .quad 0
> > > > + .endr
> > > > +.Lstack:
> > > > +
> > > > .data
> > >
> > > Why is the stack in .text?
> >
> > to call verify_sha256_digest() from asm
>
> Won't that also work if the stack is in .data? or .bss?
>
> ... or is there a particular need for it to be in .text?
>
> > > Does this need to be zeroed?
> >
> > No :)
>
> Ok, so we can probably do:
>
> .data
> .align 4
> . += PURGATORY_STACK_SIZE
> .Lstack_ptr:
>
> ... assuming we need to run C code.
>
> [...]
>
> > > > diff --git a/arch/arm64/purgatory/sha256.c b/arch/arm64/purgatory/sha256.c
> > > > new file mode 100644
> > > > index 000000000000..5d20d81767e3
> > > > --- /dev/null
> > > > +++ b/arch/arm64/purgatory/sha256.c
> > > > @@ -0,0 +1,79 @@
> > > > +#include <linux/kexec.h>
> > > > +#include <linux/purgatory.h>
> > > > +#include <linux/types.h>
> > > > +
> > > > +/*
> > > > + * Under KASAN, those are defined as un-instrumented version, __memxxx()
> > > > + */
> > > > +#undef memcmp
> > > > +#undef memcpy
> > > > +#undef memset
> > >
> > > This doesn't look like the right place for this undeffery; it looks
> > > rather fragile.
> >
> > Yeah, I agree, but if not there, __memxxx() are used.
>
> Ok, but we'll have to add this to every C file used in the purgatory
> code, or at the start of any header that uses a memxxx() function, or it
> might still be overridden to use __memxxx(), before the undef takes
> effect.
>
> Can we define __memxxx() instead?
>
> [...]
>
> > > > +void *memcpy(void *dst, const void *src, size_t len)
> > > > +{
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < len; i++)
> > > > + ((u8 *)dst)[i] = ((u8 *)src)[i];
> > > > +
> > > > + return NULL;
> > > > +}
> > > > +
> > > > +void *memset(void *dst, int c, size_t len)
> > > > +{
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < len; i++)
> > > > + ((u8 *)dst)[i] = (u8)c;
> > > > +
> > > > + return NULL;
> > > > +}
> > > > +
> > > > +int memcmp(const void *src, const void *dst, size_t len)
> > > > +{
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < len; i++)
> > > > + if (*(char *)src != *(char *)dst)
> > > > + return 1;
> > > > +
> > > > + return 0;
> > > > +}
> > >
> > > How is the compiler prevented from "optimising" these into calls to
> > > themselves?
> >
> > I don't get what you mean by "calls to themselves."
>
> There are compiler optimizations that recognise sequences like:
>
> for (i = 0; i < len; i++)
> dst[i] = src[i];
>
> ... and turn those into:
>
> memcpy(dst, src, len);
>
> ... these have been known to "optimize" memcpy implementations into
> calls to themselves. Likewise for other string operations.
>
> One way we avoid that today is by writing our memcpy in assembly.
I see, thanks.
> Do we have a guarnatee that this will not happen here? e.g. do we pass
> some compiler flag that prevents this?
I don't know any options to do this.
(maybe -nostdlib?)
-Takahiro AKASHI
> Thanks,
> Mark.
On Fri, Aug 25, 2017 at 02:13:53PM +0800, Dave Young wrote:
> On 08/25/17 at 11:03am, AKASHI Takahiro wrote:
> > On Thu, Aug 24, 2017 at 06:30:50PM +0100, Mark Rutland wrote:
> > > On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
> > > > The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
> > > > will be loaded at the offset of TEXT_OFFSET from the begining of system
> > > > memory. The other PT_LOAD segments are placed relative to the first one.
> > >
> > > I really don't like assuming things about the vmlinux ELF file.
> >
> > If so, vmlinux is not an appropriate format for loading.
> >
> > > > Regarding kernel verification, since there is no standard way to contain
> > > > a signature within elf binary, we follow PowerPC's (not yet upstreamed)
> > > > approach, that is, appending a signature right after the kernel binary
> > > > itself like module signing.
> > >
> > > I also *really* don't like this. It's a bizarre in-band mechanism,
> > > without explcit information. It's not a nice ABI.
> > >
> > > If we can load an Image, why do we need to be able to load a vmlinux?
> >
> > Well, kexec-tools does. I don't know why Geoff wanted to support vmlinux.
> > I'm just trying to support what kexec-tools does support.
>
> We only add things when it is really necessary, kexec-tools
> functionalities should have some historic reasons.
Geoff had been working on kexec since old kernels (3.14 or 15?).
> If only for doing kexec-tools has done I would say just not to do it.
Sure
-Takahiro AKASHI
> Thanks
> Dave
On Tue, Aug 29, 2017 at 11:01:12AM +0100, Mark Rutland wrote:
> On Thu, Aug 24, 2017 at 06:30:50PM +0100, Mark Rutland wrote:
> > On Thu, Aug 24, 2017 at 05:18:11PM +0900, AKASHI Takahiro wrote:
> > > The first PT_LOAD segment, which is assumed to be "text" code, in vmlinux
> > > will be loaded at the offset of TEXT_OFFSET from the begining of system
> > > memory. The other PT_LOAD segments are placed relative to the first one.
> >
> > I really don't like assuming things about the vmlinux ELF file.
> >
> > > Regarding kernel verification, since there is no standard way to contain
> > > a signature within elf binary, we follow PowerPC's (not yet upstreamed)
> > > approach, that is, appending a signature right after the kernel binary
> > > itself like module signing.
> >
> > I also *really* don't like this. It's a bizarre in-band mechanism,
> > without explcit information. It's not a nice ABI.
> >
> > If we can load an Image, why do we need to be able to load a vmlinux?
>
> So IIUC, the whole point of this is to be able to kexec_file_load() a
> vmlinux + signature bundle, for !CONFIG_EFI kernels.
>
> For that, I think that we actually need a new kexec_file_load${N}
> syscall, where we can pass the signature for the kernel as a separate
> file. Ideally also with a flags argument and perhaps the ability to sign
> the initrd too.
Verifying root file system would be another topic in general.
> That way we don't ahve to come up with a magic vmlinux+signature format,
> as we can just pass a regular image and a signature for that image
> separately. That should work for PPC and others, too.
Since some discussions are to be expected around vmlinux signing,
I will drop vmlinux support in v2.
(This means, as you mentioned, that we have no way to sign
a !CONFIG_EFI kernel for now. The possible solution in future would
be to utilize file extended attributes as proposed by powerpc guys?)
Thanks,
-Takahiro AKASHI
> Thanks,
> Mark.
AKASHI Takahiro <[email protected]> writes:
> On Fri, Aug 25, 2017 at 11:41:33AM +0100, Mark Rutland wrote:
>> On Fri, Aug 25, 2017 at 10:21:06AM +0900, AKASHI Takahiro wrote:
>> > On Thu, Aug 24, 2017 at 06:04:40PM +0100, Mark Rutland wrote:
>> > > On Thu, Aug 24, 2017 at 05:18:06PM +0900, AKASHI Takahiro wrote:
>> > > > +void *memcpy(void *dst, const void *src, size_t len)
>> > > > +{
>> > > > + int i;
>> > > > +
>> > > > + for (i = 0; i < len; i++)
>> > > > + ((u8 *)dst)[i] = ((u8 *)src)[i];
>> > > > +
>> > > > + return NULL;
>> > > > +}
>> > > > +
>> > > > +void *memset(void *dst, int c, size_t len)
>> > > > +{
>> > > > + int i;
>> > > > +
>> > > > + for (i = 0; i < len; i++)
>> > > > + ((u8 *)dst)[i] = (u8)c;
>> > > > +
>> > > > + return NULL;
>> > > > +}
>> > > > +
>> > > > +int memcmp(const void *src, const void *dst, size_t len)
>> > > > +{
>> > > > + int i;
>> > > > +
>> > > > + for (i = 0; i < len; i++)
>> > > > + if (*(char *)src != *(char *)dst)
>> > > > + return 1;
>> > > > +
>> > > > + return 0;
>> > > > +}
>> > >
>> > > How is the compiler prevented from "optimising" these into calls to
>> > > themselves?
>> >
>> > I don't get what you mean by "calls to themselves."
>>
>> There are compiler optimizations that recognise sequences like:
>>
>> for (i = 0; i < len; i++)
>> dst[i] = src[i];
>>
>> ... and turn those into:
>>
>> memcpy(dst, src, len);
>>
>> ... these have been known to "optimize" memcpy implementations into
>> calls to themselves. Likewise for other string operations.
>>
>> One way we avoid that today is by writing our memcpy in assembly.
>
> I see, thanks.
>
>> Do we have a guarnatee that this will not happen here? e.g. do we pass
>> some compiler flag that prevents this?
>
> I don't know any options to do this.
> (maybe -nostdlib?)
kexec-tools calls gcc with -fno-builtin -ffreestanding (though according
to the man page, the former is implied in the latter), which tells the
compiler that the standard library may not exist. I don't know
specifically that this options turns off the memcpy optimization, but it
seems logical that it does.
--
Thiago Jung Bauermann
IBM Linux Technology Center