Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4251733ybb; Tue, 7 Apr 2020 03:55:26 -0700 (PDT) X-Google-Smtp-Source: APiQypIJTs4/PkPfztUqddeI5Mh7OU4TvU0Ik1f/1gq2bm7jgojDcvkNSKCgJLCwPngLI/OIDUmx X-Received: by 2002:a9d:6d09:: with SMTP id o9mr1068666otp.34.1586256926632; Tue, 07 Apr 2020 03:55:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586256926; cv=none; d=google.com; s=arc-20160816; b=K5m1zP+FjouVfvyIobf7zUA9F8o/RSfqlZkMJTNBBbz5UZhARfSmikrgq7qc4Qj3uN KvP+zNPhWSshfuDvhEj66PJ+M0x/mBcKEtlESDB085kS7KNL3RZs2HCh3xriZ/hPQEWT ZC5vd7SmAppOYba/1JnQIsQP9+9YqSitD9xtOhnic1sfs7TYi59Ft6dUWa2OVSsO/bWb rrcwZoxEcSWdSUNijN1vL/5CqcCVG1X+1U3djTdpwlaQO9NJ0wTZBHjIufe5kHgBIAo2 OJdSVLPBqz/f2ZKZp4enhUM18ltc6zRaVE1Zw40BZgJ2cdWCY1t55+ttoLof16C/f+B/ 4Lqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=RN6iwjaWxIunMmL8QdfQpUmYP7DLox7aE1L1I4XVcLY=; b=N9zkj0M2aJ4UQVbX4YR6GIiWZmI2t+KXyZ0UXG7xKMcIFvcFwFe4wV3x7NlEKRx0H1 wD29JO1e3G+SQpIzAJmiPxetKE1eeyPrnwwaKDAGKV7rHJa2L39NNOfzJY87K6ZxoKrj 9ZMUQxynuPNRqFlXshDGWQKYQFifFovXhEsB8C0raV0wOsod1fwm98hv+xAjtwE9BFG8 V1oNCYZU8oPRXIxv3WiUYMEp150z+N4naV13ljvkauLkt7EuiNg4ft9APMvO7qKj2Ibu mlkal/+mtJPaS6dMPms6WKohFNSVEztNdv3JUwEj0seQznO0otbkB/RlJdl9PQ3fPSHD gxRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=Lz30Oln4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v9si483673oib.201.2020.04.07.03.55.13; Tue, 07 Apr 2020 03:55:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=Lz30Oln4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728176AbgDGKyG (ORCPT + 99 others); Tue, 7 Apr 2020 06:54:06 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:45234 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726562AbgDGKyG (ORCPT ); Tue, 7 Apr 2020 06:54:06 -0400 Received: by mail-ot1-f68.google.com with SMTP id c9so1230722otl.12 for ; Tue, 07 Apr 2020 03:54:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RN6iwjaWxIunMmL8QdfQpUmYP7DLox7aE1L1I4XVcLY=; b=Lz30Oln4zPOuLnoJL4TgWloTJSht5CAs/hC9RRbuhEwaUBAQY4oW92ZfUjV+OrrwkO 3BMIKuI2Cs7gSOPc4Va0o2u3+Z35U9rKLFehkOTpVOx+PMCkQGGXAh5xVQKlTenO5KQ9 mDcZ4H7hiL5kpmYJa58LpzMZN76qHeP/SATPjYzjI3cQ/9SAYle/+8+lJgRHHE1N25yr khgOXvzv/Zj6qXNGcrJCB3qiGq/KcQ/ULZg+2WDk0j9oxEYv8kJSpcxC+w4LOdeiKcuP QbRz/OxMibA6nD1gr7OI1dnHs6+kx04BZaj/cuXIQkdqeVqk3gnqSoOe6FTdnmGDWtcr Z9iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RN6iwjaWxIunMmL8QdfQpUmYP7DLox7aE1L1I4XVcLY=; b=nx5WsV7xP7yQVBFgmfzzhScZ79pOf1i/8cYlbhxZlu0lL/8quHShnfWs2YPlEFjkAP WUrtb+E5ZoCvl+Cfv6MTmL256eNNaMGuFsEtKZntS5x3ID9pft4tvN90eUKV/G4OyZ5h QpE2xx2afay6wg7nb2bIuE7NI1k+0Sq7wcv7zkV5GwkjJDo4iglr1dkGPhgAvmd2CYqy JWOod2vk4pQ9PI6Z/2SxLSXfUSbsh/CkNAwzk87ZGhbH8h9t4GvPoze+yTkWWi/j+QjX TWP06d8oV7o9K6TEnwHOaSXiTgdAqW+T5qkE6zMyGFY2+WHGJ4IzpikITOnQ+d2hAQGV ZxAQ== X-Gm-Message-State: AGi0PuZtXPZnwOKmmLMWEZW+9jwUcHbX7l0CA9wc4ZAnPFx+Qd9gXIXe snFqYXTHpPNo3qApG18H9SmUrFPmwkSsz5fIMCT5bQ== X-Received: by 2002:a4a:c28e:: with SMTP id b14mr1320189ooq.39.1586256843514; Tue, 07 Apr 2020 03:54:03 -0700 (PDT) MIME-Version: 1.0 References: <16924c3f07b142688a3c0562d229cd67dc7bf8e6.1584352425.git.zong.li@sifive.com> <71cc2070-e867-17e1-cc64-66b634e3f48e@ghiti.fr> In-Reply-To: <71cc2070-e867-17e1-cc64-66b634e3f48e@ghiti.fr> From: Zong Li Date: Tue, 7 Apr 2020 18:53:53 +0800 Message-ID: Subject: Re: [PATCH RFC 4/8] riscv/kaslr: randomize the kernel image offset To: Alex Ghiti Cc: Palmer Dabbelt , Paul Walmsley , linux-riscv , "linux-kernel@vger.kernel.org List" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 7, 2020 at 1:11 PM Alex Ghiti wrote: > > > On 3/24/20 3:30 AM, Zong Li wrote: > > Entropy is derived from the banner and timer, it is better than nothing > > but not enough secure, so previous stage may pass entropy via the device > > tree /chosen/kaslr-seed node. > > > > We limit randomization range within 1GB, so we can exploit early page > > table to map new destination of kernel image. Additionally, the kernel > > offset need 2M alignment to ensure it's good in PMD page table. > > > > We also checks the kernel offset whether it's safe by avoiding to > > overlaps with dtb, initrd and reserved memory regions. > > > > That maybe changes the way my sv48 patchset will be implemented: I can't > get user preference (3-level or 4-level) by any means, device-tree or > kernel parameter. > > But I don't see how you could get a random offset without info from the > device tree anyway (reserved memory regions especially), so maybe I > could parse dtb for allowing the user to choose. I'll move this > discussion to the sv48 introduction. Maybe I'm a little bit misunderstanding here, but I think I got the random offset through some information by parsing dtb. > > > Signed-off-by: Zong Li > > --- > > arch/riscv/kernel/kaslr.c | 274 +++++++++++++++++++++++++++++++++++++- > > arch/riscv/mm/init.c | 2 +- > > 2 files changed, 273 insertions(+), 3 deletions(-) > > > > diff --git a/arch/riscv/kernel/kaslr.c b/arch/riscv/kernel/kaslr.c > > index 281b5fcca5c8..9ec2b608eb7f 100644 > > --- a/arch/riscv/kernel/kaslr.c > > +++ b/arch/riscv/kernel/kaslr.c > > @@ -11,23 +11,293 @@ > > #include > > > > extern char _start[], _end[]; > > +extern void *dtb_early_va; > > +extern phys_addr_t dtb_early_pa; > > extern void secondary_random_target(void); > > extern void kaslr_create_page_table(uintptr_t start, uintptr_t end); > > > > uintptr_t secondary_next_target __initdata; > > static uintptr_t kaslr_offset __initdata; > > > > +static const __init u32 *get_reg_address(int root_cells, > > + const u32 *value, u64 *result) > > +{ > > + int cell; > > + *result = 0; > > + > > + for (cell = root_cells; cell > 0; --cell) > > + *result = (*result << 32) + fdt32_to_cpu(*value++); > > + > > + return value; > > +} > > + > > +static __init int get_node_addr_size_cells(const char *path, int *addr_cell, > > + int *size_cell) > > +{ > > + int node = fdt_path_offset(dtb_early_va, path); > > + fdt64_t *prop; > > + > > + if (node < 0) > > + return -EINVAL; > > + > > + prop = fdt_getprop_w(dtb_early_va, node, "#address-cells", NULL); > > + if (!prop) > > + return -EINVAL; > > + *addr_cell = fdt32_to_cpu(*prop); > > + > > + prop = fdt_getprop_w(dtb_early_va, node, "#size-cells", NULL); > > + if (!prop) > > + return -EINVAL; > > + *size_cell = fdt32_to_cpu(*prop); > > + > > + return node; > > +} > > + > > +static __init void kaslr_get_mem_info(uintptr_t *mem_start, > > + uintptr_t *mem_size) > > +{ > > + int node, root, addr_cells, size_cells; > > + u64 base, size; > > + > > + /* Get root node's address cells and size cells. */ > > + root = get_node_addr_size_cells("/", &addr_cells, &size_cells); > > + if (root < 0) > > + return; > > + > > + /* Get memory base address and size. */ > > + fdt_for_each_subnode(node, dtb_early_va, root) { > > + const char *dev_type; > > + const u32 *reg; > > + > > + dev_type = fdt_getprop(dtb_early_va, node, "device_type", NULL); > > + if (!dev_type) > > + continue; > > + > > + if (!strcmp(dev_type, "memory")) { > > + reg = fdt_getprop(dtb_early_va, node, "reg", NULL); > > + if (!reg) > > + return; > > + > > + reg = get_reg_address(addr_cells, reg, &base); > > + reg = get_reg_address(size_cells, reg, &size); > > + > > + *mem_start = base; > > + *mem_size = size; > > + > > + break; > > + } > > + } > > +} > > + > > +/* Return a default seed if there is no HW generator. */ > > +static u64 kaslr_default_seed = ULL(-1); > > +static __init u64 kaslr_get_seed(void) > > +{ > > + int node, len; > > + fdt64_t *prop; > > + u64 ret; > > + > > + node = fdt_path_offset(dtb_early_va, "/chosen"); > > + if (node < 0) > > + return kaslr_default_seed++; > > + > > + prop = fdt_getprop_w(dtb_early_va, node, "kaslr-seed", &len); > > + if (!prop || len != sizeof(u64)) > > + return kaslr_default_seed++; > > + > > + ret = fdt64_to_cpu(*prop); > > + > > + /* Re-write to zero for checking whether get seed at second time */ > > + *prop = 0; > > + > > + return ret; > > +} > > + > > +static __init bool is_overlap(uintptr_t s1, uintptr_t e1, uintptr_t s2, > > + uintptr_t e2) > > +{ > > + return e1 >= s2 && e2 >= s1; > > +} > > Inline this function or use a macro maybe. Yes, sure. Thanks. > > > + > > +static __init bool is_overlap_reserved_mem(uintptr_t start_addr, > > + uintptr_t end_addr) > > +{ > > + int node, rsv_mem, addr_cells, size_cells; > > + > > + /* Get the reserved-memory node. */ > > + rsv_mem = get_node_addr_size_cells("/reserved-memory", > > + &addr_cells, > > + &size_cells); > > + if (rsv_mem < 0) > > + return false; > > + > > + /* Get memory base address and size. */ > > + fdt_for_each_subnode(node, dtb_early_va, rsv_mem) { > > + uint64_t base, size; > > + const uint32_t *reg; > > + > > + reg = fdt_getprop(dtb_early_va, node, "reg", NULL); > > + if (!reg) > > + return 0; > > + > > + reg = get_reg_address(addr_cells, reg, &base); > > + reg = get_reg_address(size_cells, reg, &size); > > + > > + if (is_overlap(start_addr, end_addr, base, base + size)) > > + return true; > > + } > > + > > + return false; > > +} > > + > > +static __init bool is_overlap_initrd(uintptr_t start_addr, uintptr_t end_addr) > > +{ > > + int node; > > + uintptr_t initrd_start, initrd_end; > > + fdt64_t *prop; > > + > > + node = fdt_path_offset(dtb_early_va, "/chosen"); > > + if (node < 0) > > + return false; > > + > > + prop = fdt_getprop_w(dtb_early_va, node, "linux,initrd-start", NULL); > > + if (!prop) > > + return false; > > + > > + initrd_start = fdt64_to_cpu(*prop); > > + > > + prop = fdt_getprop_w(dtb_early_va, node, "linux,initrd-end", NULL); > > + if (!prop) > > + return false; > > + > > + initrd_end = fdt64_to_cpu(*prop); > > + > > + return is_overlap(start_addr, end_addr, initrd_start, initrd_end); > > +} > > + > > +static __init bool is_overlap_dtb(uintptr_t start_addr, uintptr_t end_addr) > > +{ > > + uintptr_t dtb_start = dtb_early_pa; > > + uintptr_t dtb_end = dtb_start + fdt_totalsize(dtb_early_va); > > + > > + return is_overlap(start_addr, end_addr, dtb_start, dtb_end); > > +} > > + > > +static __init bool has_regions_overlapping(uintptr_t start_addr, > > + uintptr_t end_addr) > > +{ > > + if (is_overlap_dtb(start_addr, end_addr)) > > + return true; > > + > > + if (is_overlap_initrd(start_addr, end_addr)) > > + return true; > > + > > + if (is_overlap_reserved_mem(start_addr, end_addr)) > > + return true; > > + > > + return false; > > +} > > + > > +static inline __init unsigned long get_legal_offset(int random_index, > > + int max_index, > > + uintptr_t mem_start, > > + uintptr_t kernel_size) > > +{ > > + uintptr_t start_addr, end_addr; > > + int idx, stop_idx; > > + > > + idx = stop_idx = random_index; > > + > > + do { > > + start_addr = mem_start + idx * SZ_2M + kernel_size; > > + end_addr = start_addr + kernel_size; > > + > > + /* Check overlap to other regions. */ > > + if (!has_regions_overlapping(start_addr, end_addr)) > > + return idx * SZ_2M + kernel_size; > > + > > + if (idx-- < 0) > > + idx = max_index; > > Isn't the fallback to max_index a security breach ? Because at some > point, the kernel will be loaded at this specific address. The max_index is the maximum safe index for destination of new kernel image. Could you give more explain here? > > > + > > + } while (idx != stop_idx); > > + > > + return 0; > > +} > > + > > +static inline __init u64 rotate_xor(u64 hash, const void *area, size_t size) > > +{ > > + size_t i; > > + uintptr_t *ptr = (uintptr_t *) area; > > + > > + for (i = 0; i < size / sizeof(hash); i++) { > > + /* Rotate by odd number of bits and XOR. */ > > + hash = (hash << ((sizeof(hash) * 8) - 7)) | (hash >> 7); > > + hash ^= ptr[i]; > > + } > > + > > + return hash; > > +} > > + > > +#define MEM_RESERVE_START __pa(PAGE_OFFSET) > > +static __init uintptr_t get_random_offset(u64 seed, uintptr_t kernel_size) > > +{ > > + uintptr_t mem_start = 0, mem_size= 0, random_size; > > + uintptr_t kernel_size_align = round_up(kernel_size, SZ_2M); > > + int index; > > + u64 random = 0; > > + cycles_t time_base; > > + > > + /* Attempt to create a simple but unpredictable starting entropy */ > > + random = rotate_xor(random, linux_banner, strlen(linux_banner)); > > + > > + /* > > + * If there is no HW random number generator, use timer to get a random > > + * number. This is better than nothing but not enough secure. > > + */ > > + time_base = get_cycles() << 32; > > + time_base ^= get_cycles(); > > + random = rotate_xor(random, &time_base, sizeof(time_base)); > > + > > + if (seed) > > + random = rotate_xor(random, &seed, sizeof(seed)); > > + > > + kaslr_get_mem_info(&mem_start, &mem_size); > > + if (!mem_size) > > + return 0; > > + > > + if (mem_start < MEM_RESERVE_START) { > > + mem_size -= MEM_RESERVE_START - mem_start; > > + mem_start = MEM_RESERVE_START; > > + } > > + > > + /* > > + * Limit randomization range within 1G, so we can exploit > > + * early_pmd/early_pte during early page table phase. > > + */ > > + random_size = min_t(u64, > > + mem_size - (kernel_size_align * 2), > > + SZ_1G - (kernel_size_align * 2)); > > pgdir size is 30 bits in sv39, but it's 39 bits in sv48, you should use > PGDIR_SIZE macro here. OK, change it in the next version. Thanks. > > > + > > + /* The index of 2M block in whole avaliable region */ > > + index = random % (random_size / SZ_2M); > > + > > + return get_legal_offset(index, random_size / SZ_2M, > > + mem_start, kernel_size_align); > > +} > > + > > uintptr_t __init kaslr_early_init(void) > > { > > + u64 seed; > > uintptr_t dest_start, dest_end; > > uintptr_t kernel_size = (uintptr_t) _end - (uintptr_t) _start; > > > > /* Get zero value at second time to avoid doing randomization again. */ > > - if (kaslr_offset) > > + seed = kaslr_get_seed(); > > + if (!seed) > > return 0; > > > > /* Get the random number for kaslr offset. */ > > - kaslr_offset = 0x10000000; > > + kaslr_offset = get_random_offset(seed, kernel_size); > > > > /* Update kernel_virt_addr for get_kaslr_offset. */ > > kernel_virt_addr += kaslr_offset; > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > > index 2f5b25f02b6c..34c6ecf2c599 100644 > > --- a/arch/riscv/mm/init.c > > +++ b/arch/riscv/mm/init.c > > @@ -125,7 +125,7 @@ static void __init setup_initrd(void) > > } > > #endif /* CONFIG_BLK_DEV_INITRD */ > > > > -static phys_addr_t dtb_early_pa __initdata; > > +phys_addr_t dtb_early_pa __initdata; > > > > void __init setup_bootmem(void) > > { > > > > Alex