Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4019801ybb; Mon, 6 Apr 2020 22:13:12 -0700 (PDT) X-Google-Smtp-Source: APiQypLBo8CC5WiPPIF3ymOePcvwPZI1gT90wHa6UmzGWQXoX0KmHvs5oF8Q6mOKrxbcdwC2GEHv X-Received: by 2002:aca:1c13:: with SMTP id c19mr402923oic.178.1586236392795; Mon, 06 Apr 2020 22:13:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586236392; cv=none; d=google.com; s=arc-20160816; b=PfW9s8ZS0kW0V1zsBC5qpb46pVRGu6bKqxMGquzRNB0kmBekf7OZxyxsGvSevTcD0g NsKDQ0fsi/BFI7o4bnz5qKPZZCIv9q2FaZn/Q3/RWQ1+E27Ykcatprlb6feNDtZcvrJg qG4NSYcgMNNlgq3UGXGaXORNnWrhWYKYiibubtClogjurdaHMJA/zkodWvi0ztobPbgM yEJJbQRAFMXT5Q5yg6I8EhxCdtP2oqupCE2WLwuYpIl7YO8PTonjTmcPodrpXEHHKbmJ TLbHEI7nifzaLc3a+aivcUdKCVWjZ8RbsAWsBQrEVjmsEYRuTz49pdLGkEEXbK9tq0d5 Kghg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:to:subject:from; bh=61SbMYoeo0OI3W06xxi1TeUueoXsFirvrqy0R2BI43k=; b=h3otXBatJW0i2LtDNn3+o25w5xpnVPCUkQ3tpACK0DCLIoKL/GNpvy9nrxxHbAp1+x xqdlPnJFeiwAK/Ms09r3eqirs27kv9VrjA9Hv9klOTUxV03zWMZeXTomLm2aHzSHkooT QADKnr+4WcPSfU1Xz5sKHbxTr0on5OY5x33EMGmXNlZR6L5iyn9kxy0b9/hatt8uUyYB KBuYj673Za5t+mRW1WROMOKJcWaxCDAZs59f5RvJQ39dX0EwVN7bdYzs6uRXvb9bdB6J 0SfIYi2HftTEJer1V3Agm1l9hsAF+grNmXREpgA4i6jeXkwELCkTiYL3lytw1Pj6zYno z7hw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r6si760272oti.159.2020.04.06.22.13.00; Mon, 06 Apr 2020 22:13:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726787AbgDGFLP (ORCPT + 99 others); Tue, 7 Apr 2020 01:11:15 -0400 Received: from relay6-d.mail.gandi.net ([217.70.183.198]:48597 "EHLO relay6-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726761AbgDGFLO (ORCPT ); Tue, 7 Apr 2020 01:11:14 -0400 X-Originating-IP: 2.7.45.25 Received: from [192.168.1.101] (lfbn-lyo-1-453-25.w2-7.abo.wanadoo.fr [2.7.45.25]) (Authenticated sender: alex@ghiti.fr) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id E0927C0003; Tue, 7 Apr 2020 05:11:10 +0000 (UTC) From: Alex Ghiti Subject: Re: [PATCH RFC 4/8] riscv/kaslr: randomize the kernel image offset To: Zong Li , palmer@dabbelt.com, paul.walmsley@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org References: <16924c3f07b142688a3c0562d229cd67dc7bf8e6.1584352425.git.zong.li@sifive.com> Message-ID: <71cc2070-e867-17e1-cc64-66b634e3f48e@ghiti.fr> Date: Tue, 7 Apr 2020 01:11:10 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: <16924c3f07b142688a3c0562d229cd67dc7bf8e6.1584352425.git.zong.li@sifive.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/24/20 3:30 AM, Zong Li wrote: > Entropy is derived from the banner and timer, it is better than nothing > but not enough secure, so previous stage may pass entropy via the device > tree /chosen/kaslr-seed node. > > We limit randomization range within 1GB, so we can exploit early page > table to map new destination of kernel image. Additionally, the kernel > offset need 2M alignment to ensure it's good in PMD page table. > > We also checks the kernel offset whether it's safe by avoiding to > overlaps with dtb, initrd and reserved memory regions. > That maybe changes the way my sv48 patchset will be implemented: I can't get user preference (3-level or 4-level) by any means, device-tree or kernel parameter. But I don't see how you could get a random offset without info from the device tree anyway (reserved memory regions especially), so maybe I could parse dtb for allowing the user to choose. I'll move this discussion to the sv48 introduction. > Signed-off-by: Zong Li > --- > arch/riscv/kernel/kaslr.c | 274 +++++++++++++++++++++++++++++++++++++- > arch/riscv/mm/init.c | 2 +- > 2 files changed, 273 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/kernel/kaslr.c b/arch/riscv/kernel/kaslr.c > index 281b5fcca5c8..9ec2b608eb7f 100644 > --- a/arch/riscv/kernel/kaslr.c > +++ b/arch/riscv/kernel/kaslr.c > @@ -11,23 +11,293 @@ > #include > > extern char _start[], _end[]; > +extern void *dtb_early_va; > +extern phys_addr_t dtb_early_pa; > extern void secondary_random_target(void); > extern void kaslr_create_page_table(uintptr_t start, uintptr_t end); > > uintptr_t secondary_next_target __initdata; > static uintptr_t kaslr_offset __initdata; > > +static const __init u32 *get_reg_address(int root_cells, > + const u32 *value, u64 *result) > +{ > + int cell; > + *result = 0; > + > + for (cell = root_cells; cell > 0; --cell) > + *result = (*result << 32) + fdt32_to_cpu(*value++); > + > + return value; > +} > + > +static __init int get_node_addr_size_cells(const char *path, int *addr_cell, > + int *size_cell) > +{ > + int node = fdt_path_offset(dtb_early_va, path); > + fdt64_t *prop; > + > + if (node < 0) > + return -EINVAL; > + > + prop = fdt_getprop_w(dtb_early_va, node, "#address-cells", NULL); > + if (!prop) > + return -EINVAL; > + *addr_cell = fdt32_to_cpu(*prop); > + > + prop = fdt_getprop_w(dtb_early_va, node, "#size-cells", NULL); > + if (!prop) > + return -EINVAL; > + *size_cell = fdt32_to_cpu(*prop); > + > + return node; > +} > + > +static __init void kaslr_get_mem_info(uintptr_t *mem_start, > + uintptr_t *mem_size) > +{ > + int node, root, addr_cells, size_cells; > + u64 base, size; > + > + /* Get root node's address cells and size cells. */ > + root = get_node_addr_size_cells("/", &addr_cells, &size_cells); > + if (root < 0) > + return; > + > + /* Get memory base address and size. */ > + fdt_for_each_subnode(node, dtb_early_va, root) { > + const char *dev_type; > + const u32 *reg; > + > + dev_type = fdt_getprop(dtb_early_va, node, "device_type", NULL); > + if (!dev_type) > + continue; > + > + if (!strcmp(dev_type, "memory")) { > + reg = fdt_getprop(dtb_early_va, node, "reg", NULL); > + if (!reg) > + return; > + > + reg = get_reg_address(addr_cells, reg, &base); > + reg = get_reg_address(size_cells, reg, &size); > + > + *mem_start = base; > + *mem_size = size; > + > + break; > + } > + } > +} > + > +/* Return a default seed if there is no HW generator. */ > +static u64 kaslr_default_seed = ULL(-1); > +static __init u64 kaslr_get_seed(void) > +{ > + int node, len; > + fdt64_t *prop; > + u64 ret; > + > + node = fdt_path_offset(dtb_early_va, "/chosen"); > + if (node < 0) > + return kaslr_default_seed++; > + > + prop = fdt_getprop_w(dtb_early_va, node, "kaslr-seed", &len); > + if (!prop || len != sizeof(u64)) > + return kaslr_default_seed++; > + > + ret = fdt64_to_cpu(*prop); > + > + /* Re-write to zero for checking whether get seed at second time */ > + *prop = 0; > + > + return ret; > +} > + > +static __init bool is_overlap(uintptr_t s1, uintptr_t e1, uintptr_t s2, > + uintptr_t e2) > +{ > + return e1 >= s2 && e2 >= s1; > +} Inline this function or use a macro maybe. > + > +static __init bool is_overlap_reserved_mem(uintptr_t start_addr, > + uintptr_t end_addr) > +{ > + int node, rsv_mem, addr_cells, size_cells; > + > + /* Get the reserved-memory node. */ > + rsv_mem = get_node_addr_size_cells("/reserved-memory", > + &addr_cells, > + &size_cells); > + if (rsv_mem < 0) > + return false; > + > + /* Get memory base address and size. */ > + fdt_for_each_subnode(node, dtb_early_va, rsv_mem) { > + uint64_t base, size; > + const uint32_t *reg; > + > + reg = fdt_getprop(dtb_early_va, node, "reg", NULL); > + if (!reg) > + return 0; > + > + reg = get_reg_address(addr_cells, reg, &base); > + reg = get_reg_address(size_cells, reg, &size); > + > + if (is_overlap(start_addr, end_addr, base, base + size)) > + return true; > + } > + > + return false; > +} > + > +static __init bool is_overlap_initrd(uintptr_t start_addr, uintptr_t end_addr) > +{ > + int node; > + uintptr_t initrd_start, initrd_end; > + fdt64_t *prop; > + > + node = fdt_path_offset(dtb_early_va, "/chosen"); > + if (node < 0) > + return false; > + > + prop = fdt_getprop_w(dtb_early_va, node, "linux,initrd-start", NULL); > + if (!prop) > + return false; > + > + initrd_start = fdt64_to_cpu(*prop); > + > + prop = fdt_getprop_w(dtb_early_va, node, "linux,initrd-end", NULL); > + if (!prop) > + return false; > + > + initrd_end = fdt64_to_cpu(*prop); > + > + return is_overlap(start_addr, end_addr, initrd_start, initrd_end); > +} > + > +static __init bool is_overlap_dtb(uintptr_t start_addr, uintptr_t end_addr) > +{ > + uintptr_t dtb_start = dtb_early_pa; > + uintptr_t dtb_end = dtb_start + fdt_totalsize(dtb_early_va); > + > + return is_overlap(start_addr, end_addr, dtb_start, dtb_end); > +} > + > +static __init bool has_regions_overlapping(uintptr_t start_addr, > + uintptr_t end_addr) > +{ > + if (is_overlap_dtb(start_addr, end_addr)) > + return true; > + > + if (is_overlap_initrd(start_addr, end_addr)) > + return true; > + > + if (is_overlap_reserved_mem(start_addr, end_addr)) > + return true; > + > + return false; > +} > + > +static inline __init unsigned long get_legal_offset(int random_index, > + int max_index, > + uintptr_t mem_start, > + uintptr_t kernel_size) > +{ > + uintptr_t start_addr, end_addr; > + int idx, stop_idx; > + > + idx = stop_idx = random_index; > + > + do { > + start_addr = mem_start + idx * SZ_2M + kernel_size; > + end_addr = start_addr + kernel_size; > + > + /* Check overlap to other regions. */ > + if (!has_regions_overlapping(start_addr, end_addr)) > + return idx * SZ_2M + kernel_size; > + > + if (idx-- < 0) > + idx = max_index; Isn't the fallback to max_index a security breach ? Because at some point, the kernel will be loaded at this specific address. > + > + } while (idx != stop_idx); > + > + return 0; > +} > + > +static inline __init u64 rotate_xor(u64 hash, const void *area, size_t size) > +{ > + size_t i; > + uintptr_t *ptr = (uintptr_t *) area; > + > + for (i = 0; i < size / sizeof(hash); i++) { > + /* Rotate by odd number of bits and XOR. */ > + hash = (hash << ((sizeof(hash) * 8) - 7)) | (hash >> 7); > + hash ^= ptr[i]; > + } > + > + return hash; > +} > + > +#define MEM_RESERVE_START __pa(PAGE_OFFSET) > +static __init uintptr_t get_random_offset(u64 seed, uintptr_t kernel_size) > +{ > + uintptr_t mem_start = 0, mem_size= 0, random_size; > + uintptr_t kernel_size_align = round_up(kernel_size, SZ_2M); > + int index; > + u64 random = 0; > + cycles_t time_base; > + > + /* Attempt to create a simple but unpredictable starting entropy */ > + random = rotate_xor(random, linux_banner, strlen(linux_banner)); > + > + /* > + * If there is no HW random number generator, use timer to get a random > + * number. This is better than nothing but not enough secure. > + */ > + time_base = get_cycles() << 32; > + time_base ^= get_cycles(); > + random = rotate_xor(random, &time_base, sizeof(time_base)); > + > + if (seed) > + random = rotate_xor(random, &seed, sizeof(seed)); > + > + kaslr_get_mem_info(&mem_start, &mem_size); > + if (!mem_size) > + return 0; > + > + if (mem_start < MEM_RESERVE_START) { > + mem_size -= MEM_RESERVE_START - mem_start; > + mem_start = MEM_RESERVE_START; > + } > + > + /* > + * Limit randomization range within 1G, so we can exploit > + * early_pmd/early_pte during early page table phase. > + */ > + random_size = min_t(u64, > + mem_size - (kernel_size_align * 2), > + SZ_1G - (kernel_size_align * 2)); pgdir size is 30 bits in sv39, but it's 39 bits in sv48, you should use PGDIR_SIZE macro here. > + > + /* The index of 2M block in whole avaliable region */ > + index = random % (random_size / SZ_2M); > + > + return get_legal_offset(index, random_size / SZ_2M, > + mem_start, kernel_size_align); > +} > + > uintptr_t __init kaslr_early_init(void) > { > + u64 seed; > uintptr_t dest_start, dest_end; > uintptr_t kernel_size = (uintptr_t) _end - (uintptr_t) _start; > > /* Get zero value at second time to avoid doing randomization again. */ > - if (kaslr_offset) > + seed = kaslr_get_seed(); > + if (!seed) > return 0; > > /* Get the random number for kaslr offset. */ > - kaslr_offset = 0x10000000; > + kaslr_offset = get_random_offset(seed, kernel_size); > > /* Update kernel_virt_addr for get_kaslr_offset. */ > kernel_virt_addr += kaslr_offset; > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 2f5b25f02b6c..34c6ecf2c599 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -125,7 +125,7 @@ static void __init setup_initrd(void) > } > #endif /* CONFIG_BLK_DEV_INITRD */ > > -static phys_addr_t dtb_early_pa __initdata; > +phys_addr_t dtb_early_pa __initdata; > > void __init setup_bootmem(void) > { > Alex