Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp731363ybb; Fri, 10 Apr 2020 09:00:20 -0700 (PDT) X-Google-Smtp-Source: APiQypJsjC3ubUNvFvX4yGz3W6pM7qSV7wGcD9pc+V3bq8/UWwttXAD6+2kjGcfi5lhJqjetFcrj X-Received: by 2002:a37:454d:: with SMTP id s74mr4635340qka.271.1586534420070; Fri, 10 Apr 2020 09:00:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586534420; cv=none; d=google.com; s=arc-20160816; b=FutbXQOtZPS/o79QwcgzGnnORHiHBqmQaV8RkcltijWDEsOrwEnmmoLgZE1MfmahUK CoiNPJpV1HA8TC9VyJHoEPl0g3p0bWix8908b12czn/W7bvrnCq7QUPA8TV6EZiINg26 EU1ym37wj237vKx85RbYq2VL7u5TmE9nNAplDw+q4mt9mBkMWnZOOO/yFYTGE8QvVH60 u62H5RQJL9pLca1XskuqXyvb1TrCjLNhnzblU6XLrNJKQ5+2atteEtrxW255tuAtc4Ei dGaUlQjuQ8R9gw7T0EXDNjPjmjZB8ylW2tkNYGiy1P+xSa2PEuWXIkyL04IZHxjFWcfG T3ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=kyqJ7QBGWtAfDP95CZetubRC+41VpS0Cd3puSg6qyWs=; b=n5stugXFb+x1Y6jdPBGceClJ8T3MWK7XgJetzI4Xn1PNXkcPa+yz4n5vtVw+s0E63u 47WnmRvl0+dHJA18zhQ6OO9S1e5RqAhUTUCtZBUh0njfMH/Y6i59RM8GAYuep0Qe8BoA 1Jbpvos2prgsR8ce+P3AL9WKsyZapj7Kd6fFRrQ69F9I3Lbr8YCInrDqukQV5/fEw5jl Os2Zu8+xmujLoRfp7pBI7hmHOUIMmNeWZNDFakp5h38ST8NJiaW++aF2NkjonPd+WoN/ 9artp/YMy5+sn6MLxKXTgzppB58v6uNpRy5980mbiZQYORuqrsd2g4IG0/N0XVREXSuD yLbg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cf18si1247535qvb.119.2020.04.10.09.00.04; Fri, 10 Apr 2020 09:00:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726626AbgDJP7A (ORCPT + 99 others); Fri, 10 Apr 2020 11:59:00 -0400 Received: from relay7-d.mail.gandi.net ([217.70.183.200]:49447 "EHLO relay7-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726145AbgDJP67 (ORCPT ); Fri, 10 Apr 2020 11:58:59 -0400 X-Originating-IP: 90.112.45.105 Received: from [192.168.1.101] (lfbn-gre-1-325-105.w90-112.abo.wanadoo.fr [90.112.45.105]) (Authenticated sender: alex@ghiti.fr) by relay7-d.mail.gandi.net (Postfix) with ESMTPSA id C1D9C20007; Fri, 10 Apr 2020 15:58:57 +0000 (UTC) Subject: Re: [PATCH RFC 4/8] riscv/kaslr: randomize the kernel image offset To: Zong Li Cc: Palmer Dabbelt , Paul Walmsley , linux-riscv , "linux-kernel@vger.kernel.org List" References: <16924c3f07b142688a3c0562d229cd67dc7bf8e6.1584352425.git.zong.li@sifive.com> <71cc2070-e867-17e1-cc64-66b634e3f48e@ghiti.fr> <69a9ecc5-550e-24a0-6f91-d65af3e00f18@ghiti.fr> From: Alex Ghiti Message-ID: Date: Fri, 10 Apr 2020 11:58:57 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Zong, On 4/9/20 6:31 AM, Zong Li wrote: > On Thu, Apr 9, 2020 at 1:51 PM Alex Ghiti wrote: >> >> >> >> On 4/7/20 6:53 AM, Zong Li wrote: >>> On Tue, Apr 7, 2020 at 1:11 PM Alex Ghiti wrote: >>>> >>>> >>>> On 3/24/20 3:30 AM, Zong Li wrote: >>>>> Entropy is derived from the banner and timer, it is better than nothing >>>>> but not enough secure, so previous stage may pass entropy via the device >>>>> tree /chosen/kaslr-seed node. >>>>> >>>>> We limit randomization range within 1GB, so we can exploit early page >>>>> table to map new destination of kernel image. Additionally, the kernel >>>>> offset need 2M alignment to ensure it's good in PMD page table. >>>>> >>>>> We also checks the kernel offset whether it's safe by avoiding to >>>>> overlaps with dtb, initrd and reserved memory regions. >>>>> >>>> >>>> That maybe changes the way my sv48 patchset will be implemented: I can't >>>> get user preference (3-level or 4-level) by any means, device-tree or >>>> kernel parameter. >>>> >>>> But I don't see how you could get a random offset without info from the >>>> device tree anyway (reserved memory regions especially), so maybe I >>>> could parse dtb for allowing the user to choose. I'll move this >>>> discussion to the sv48 introduction. >>> >>> Maybe I'm a little bit misunderstanding here, but I think I got the >>> random offset through some information by parsing dtb. >>> >> >> I was just saying that I may use the dtb too in sv48 patchset to make it >> possible for users to choose sv39 even if sv48 is supported by hardware >> (which is not the case in my current patchset). >> >>>> >>>>> Signed-off-by: Zong Li >>>>> --- >>>>> arch/riscv/kernel/kaslr.c | 274 +++++++++++++++++++++++++++++++++++++- >>>>> arch/riscv/mm/init.c | 2 +- >>>>> 2 files changed, 273 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/arch/riscv/kernel/kaslr.c b/arch/riscv/kernel/kaslr.c >>>>> index 281b5fcca5c8..9ec2b608eb7f 100644 >>>>> --- a/arch/riscv/kernel/kaslr.c >>>>> +++ b/arch/riscv/kernel/kaslr.c >>>>> @@ -11,23 +11,293 @@ >>>>> #include >>>>> >>>>> extern char _start[], _end[]; >>>>> +extern void *dtb_early_va; >>>>> +extern phys_addr_t dtb_early_pa; >>>>> extern void secondary_random_target(void); >>>>> extern void kaslr_create_page_table(uintptr_t start, uintptr_t end); >>>>> >>>>> uintptr_t secondary_next_target __initdata; >>>>> static uintptr_t kaslr_offset __initdata; >>>>> >>>>> +static const __init u32 *get_reg_address(int root_cells, >>>>> + const u32 *value, u64 *result) >>>>> +{ >>>>> + int cell; >>>>> + *result = 0; >>>>> + >>>>> + for (cell = root_cells; cell > 0; --cell) >>>>> + *result = (*result << 32) + fdt32_to_cpu(*value++); >>>>> + >>>>> + return value; >>>>> +} >>>>> + >>>>> +static __init int get_node_addr_size_cells(const char *path, int *addr_cell, >>>>> + int *size_cell) >>>>> +{ >>>>> + int node = fdt_path_offset(dtb_early_va, path); >>>>> + fdt64_t *prop; >>>>> + >>>>> + if (node < 0) >>>>> + return -EINVAL; >>>>> + >>>>> + prop = fdt_getprop_w(dtb_early_va, node, "#address-cells", NULL); >>>>> + if (!prop) >>>>> + return -EINVAL; >>>>> + *addr_cell = fdt32_to_cpu(*prop); >>>>> + >>>>> + prop = fdt_getprop_w(dtb_early_va, node, "#size-cells", NULL); >>>>> + if (!prop) >>>>> + return -EINVAL; >>>>> + *size_cell = fdt32_to_cpu(*prop); >>>>> + >>>>> + return node; >>>>> +} >>>>> + >>>>> +static __init void kaslr_get_mem_info(uintptr_t *mem_start, >>>>> + uintptr_t *mem_size) >>>>> +{ >>>>> + int node, root, addr_cells, size_cells; >>>>> + u64 base, size; >>>>> + >>>>> + /* Get root node's address cells and size cells. */ >>>>> + root = get_node_addr_size_cells("/", &addr_cells, &size_cells); >>>>> + if (root < 0) >>>>> + return; >>>>> + >>>>> + /* Get memory base address and size. */ >>>>> + fdt_for_each_subnode(node, dtb_early_va, root) { >>>>> + const char *dev_type; >>>>> + const u32 *reg; >>>>> + >>>>> + dev_type = fdt_getprop(dtb_early_va, node, "device_type", NULL); >>>>> + if (!dev_type) >>>>> + continue; >>>>> + >>>>> + if (!strcmp(dev_type, "memory")) { >>>>> + reg = fdt_getprop(dtb_early_va, node, "reg", NULL); >>>>> + if (!reg) >>>>> + return; >>>>> + >>>>> + reg = get_reg_address(addr_cells, reg, &base); >>>>> + reg = get_reg_address(size_cells, reg, &size); >>>>> + >>>>> + *mem_start = base; >>>>> + *mem_size = size; >>>>> + >>>>> + break; >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>>> +/* Return a default seed if there is no HW generator. */ >>>>> +static u64 kaslr_default_seed = ULL(-1); >>>>> +static __init u64 kaslr_get_seed(void) >>>>> +{ >>>>> + int node, len; >>>>> + fdt64_t *prop; >>>>> + u64 ret; >>>>> + >>>>> + node = fdt_path_offset(dtb_early_va, "/chosen"); >>>>> + if (node < 0) >>>>> + return kaslr_default_seed++; >>>>> + >>>>> + prop = fdt_getprop_w(dtb_early_va, node, "kaslr-seed", &len); >>>>> + if (!prop || len != sizeof(u64)) >>>>> + return kaslr_default_seed++; >>>>> + >>>>> + ret = fdt64_to_cpu(*prop); >>>>> + >>>>> + /* Re-write to zero for checking whether get seed at second time */ >>>>> + *prop = 0; >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> +static __init bool is_overlap(uintptr_t s1, uintptr_t e1, uintptr_t s2, >>>>> + uintptr_t e2) >>>>> +{ >>>>> + return e1 >= s2 && e2 >= s1; >>>>> +} >>>> >>>> Inline this function or use a macro maybe. >>> >>> Yes, sure. Thanks. >>> >>>> >>>>> + >>>>> +static __init bool is_overlap_reserved_mem(uintptr_t start_addr, >>>>> + uintptr_t end_addr) >>>>> +{ >>>>> + int node, rsv_mem, addr_cells, size_cells; >>>>> + >>>>> + /* Get the reserved-memory node. */ >>>>> + rsv_mem = get_node_addr_size_cells("/reserved-memory", >>>>> + &addr_cells, >>>>> + &size_cells); >>>>> + if (rsv_mem < 0) >>>>> + return false; >>>>> + >>>>> + /* Get memory base address and size. */ >>>>> + fdt_for_each_subnode(node, dtb_early_va, rsv_mem) { >>>>> + uint64_t base, size; >>>>> + const uint32_t *reg; >>>>> + >>>>> + reg = fdt_getprop(dtb_early_va, node, "reg", NULL); >>>>> + if (!reg) >>>>> + return 0; >>>>> + >>>>> + reg = get_reg_address(addr_cells, reg, &base); >>>>> + reg = get_reg_address(size_cells, reg, &size); >>>>> + >>>>> + if (is_overlap(start_addr, end_addr, base, base + size)) >>>>> + return true; >>>>> + } >>>>> + >>>>> + return false; >>>>> +} >>>>> + >>>>> +static __init bool is_overlap_initrd(uintptr_t start_addr, uintptr_t end_addr) >>>>> +{ >>>>> + int node; >>>>> + uintptr_t initrd_start, initrd_end; >>>>> + fdt64_t *prop; >>>>> + >>>>> + node = fdt_path_offset(dtb_early_va, "/chosen"); >>>>> + if (node < 0) >>>>> + return false; >>>>> + >>>>> + prop = fdt_getprop_w(dtb_early_va, node, "linux,initrd-start", NULL); >>>>> + if (!prop) >>>>> + return false; >>>>> + >>>>> + initrd_start = fdt64_to_cpu(*prop); >>>>> + >>>>> + prop = fdt_getprop_w(dtb_early_va, node, "linux,initrd-end", NULL); >>>>> + if (!prop) >>>>> + return false; >>>>> + >>>>> + initrd_end = fdt64_to_cpu(*prop); >>>>> + >>>>> + return is_overlap(start_addr, end_addr, initrd_start, initrd_end); >>>>> +} >>>>> + >>>>> +static __init bool is_overlap_dtb(uintptr_t start_addr, uintptr_t end_addr) >>>>> +{ >>>>> + uintptr_t dtb_start = dtb_early_pa; >>>>> + uintptr_t dtb_end = dtb_start + fdt_totalsize(dtb_early_va); >>>>> + >>>>> + return is_overlap(start_addr, end_addr, dtb_start, dtb_end); >>>>> +} >>>>> + >>>>> +static __init bool has_regions_overlapping(uintptr_t start_addr, >>>>> + uintptr_t end_addr) >>>>> +{ >>>>> + if (is_overlap_dtb(start_addr, end_addr)) >>>>> + return true; >>>>> + >>>>> + if (is_overlap_initrd(start_addr, end_addr)) >>>>> + return true; >>>>> + >>>>> + if (is_overlap_reserved_mem(start_addr, end_addr)) >>>>> + return true; >>>>> + >>>>> + return false; >>>>> +} >>>>> + >>>>> +static inline __init unsigned long get_legal_offset(int random_index, >>>>> + int max_index, >>>>> + uintptr_t mem_start, >>>>> + uintptr_t kernel_size) >>>>> +{ >>>>> + uintptr_t start_addr, end_addr; >>>>> + int idx, stop_idx; >>>>> + >>>>> + idx = stop_idx = random_index; >>>>> + >>>>> + do { >>>>> + start_addr = mem_start + idx * SZ_2M + kernel_size; >>>>> + end_addr = start_addr + kernel_size; >>>>> + >>>>> + /* Check overlap to other regions. */ >>>>> + if (!has_regions_overlapping(start_addr, end_addr)) >>>>> + return idx * SZ_2M + kernel_size; >>>>> + >>>>> + if (idx-- < 0) >>>>> + idx = max_index; >>>> >>>> Isn't the fallback to max_index a security breach ? Because at some >>>> point, the kernel will be loaded at this specific address. >>> >>> The max_index is the maximum safe index for destination of new kernel >>> image. Could you give more explain here? >>> >> >> But max_index is not random at all. I really don't know if that's a >> problem, I just found intriguing the fact the kernel could be loaded at >> some specific location. Would it be more secure, instead of picking >> max_index as fallback when reaching 0, to pick another random number >> between random_index and max_index ? > > ok, I can get your point. The original idea here is that we get a > random index first, then we decrease the index to retry to find a good > place if there are overlapping with other regions. A bit like the ring > buffer, the end of index traversing is not zero, but the random_index > - 1, we might consider it as continuity, so we don't know where is the > end point because the start point is random, whether we stop at zero > or random_index - 1. > > Pick another random number is more secure when occurring overlapping, > but I a little bit worry that it would take very long time to retry > many times in the worst case. for example, there is just only one > index could fit kernel image in (except for original location). In the > meantime, we don't need to wait the index being decreased to zero, > because it seems to me that they are the same to stop at zero or > random_index - 1, so if we decide to re-calculate a new random number, > maybe we could remove the index decreasing here. But you're right that it could take some time before converging to a "good" index. Maybe we could restrict the index range to indexes that we know for sure will be good ? Alex > >> >> Alex >> >>>> >>>>> + >>>>> + } while (idx != stop_idx); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static inline __init u64 rotate_xor(u64 hash, const void *area, size_t size) >>>>> +{ >>>>> + size_t i; >>>>> + uintptr_t *ptr = (uintptr_t *) area; >>>>> + >>>>> + for (i = 0; i < size / sizeof(hash); i++) { >>>>> + /* Rotate by odd number of bits and XOR. */ >>>>> + hash = (hash << ((sizeof(hash) * 8) - 7)) | (hash >> 7); >>>>> + hash ^= ptr[i]; >>>>> + } >>>>> + >>>>> + return hash; >>>>> +} >>>>> + >>>>> +#define MEM_RESERVE_START __pa(PAGE_OFFSET) >>>>> +static __init uintptr_t get_random_offset(u64 seed, uintptr_t kernel_size) >>>>> +{ >>>>> + uintptr_t mem_start = 0, mem_size= 0, random_size; >>>>> + uintptr_t kernel_size_align = round_up(kernel_size, SZ_2M); >>>>> + int index; >>>>> + u64 random = 0; >>>>> + cycles_t time_base; >>>>> + >>>>> + /* Attempt to create a simple but unpredictable starting entropy */ >>>>> + random = rotate_xor(random, linux_banner, strlen(linux_banner)); >>>>> + >>>>> + /* >>>>> + * If there is no HW random number generator, use timer to get a random >>>>> + * number. This is better than nothing but not enough secure. >>>>> + */ >>>>> + time_base = get_cycles() << 32; >>>>> + time_base ^= get_cycles(); >>>>> + random = rotate_xor(random, &time_base, sizeof(time_base)); >>>>> + >>>>> + if (seed) >>>>> + random = rotate_xor(random, &seed, sizeof(seed)); >>>>> + >>>>> + kaslr_get_mem_info(&mem_start, &mem_size); >>>>> + if (!mem_size) >>>>> + return 0; >>>>> + >>>>> + if (mem_start < MEM_RESERVE_START) { >>>>> + mem_size -= MEM_RESERVE_START - mem_start; >>>>> + mem_start = MEM_RESERVE_START; >>>>> + } >>>>> + >>>>> + /* >>>>> + * Limit randomization range within 1G, so we can exploit >>>>> + * early_pmd/early_pte during early page table phase. >>>>> + */ >>>>> + random_size = min_t(u64, >>>>> + mem_size - (kernel_size_align * 2), >>>>> + SZ_1G - (kernel_size_align * 2)); >>>> >>>> pgdir size is 30 bits in sv39, but it's 39 bits in sv48, you should use >>>> PGDIR_SIZE macro here. >>> >>> OK, change it in the next version. Thanks. >>> >>>> >>>>> + >>>>> + /* The index of 2M block in whole avaliable region */ >>>>> + index = random % (random_size / SZ_2M); >>>>> + >>>>> + return get_legal_offset(index, random_size / SZ_2M, >>>>> + mem_start, kernel_size_align); >>>>> +} >>>>> + >>>>> uintptr_t __init kaslr_early_init(void) >>>>> { >>>>> + u64 seed; >>>>> uintptr_t dest_start, dest_end; >>>>> uintptr_t kernel_size = (uintptr_t) _end - (uintptr_t) _start; >>>>> >>>>> /* Get zero value at second time to avoid doing randomization again. */ >>>>> - if (kaslr_offset) >>>>> + seed = kaslr_get_seed(); >>>>> + if (!seed) >>>>> return 0; >>>>> >>>>> /* Get the random number for kaslr offset. */ >>>>> - kaslr_offset = 0x10000000; >>>>> + kaslr_offset = get_random_offset(seed, kernel_size); >>>>> >>>>> /* Update kernel_virt_addr for get_kaslr_offset. */ >>>>> kernel_virt_addr += kaslr_offset; >>>>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >>>>> index 2f5b25f02b6c..34c6ecf2c599 100644 >>>>> --- a/arch/riscv/mm/init.c >>>>> +++ b/arch/riscv/mm/init.c >>>>> @@ -125,7 +125,7 @@ static void __init setup_initrd(void) >>>>> } >>>>> #endif /* CONFIG_BLK_DEV_INITRD */ >>>>> >>>>> -static phys_addr_t dtb_early_pa __initdata; >>>>> +phys_addr_t dtb_early_pa __initdata; >>>>> >>>>> void __init setup_bootmem(void) >>>>> { >>>>> >>>> >>>> Alex