Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp972017ybg; Wed, 3 Jun 2020 19:55:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy53nmXRP2TpUHfigipqSeYGdZLHurnnip8bxZzIY9RCLyGUoMyAIkMSwirt/ol0DHyyaaH X-Received: by 2002:a17:906:d9cd:: with SMTP id qk13mr2049105ejb.268.1591239333560; Wed, 03 Jun 2020 19:55:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591239333; cv=none; d=google.com; s=arc-20160816; b=itYyDJq0tgJKEbf60TqBIdw+txvjWJfmkuINFnyGoFJrvKfXEjl8pEXXhGk1BFL9TU dcuZucbgV11A+0rYGM4rrAItA5v8wYfXFS+Ji/sqt/Uny08INBLeXPDZQ9TSwNKuy7o1 k+E7XR7y/dyWp6yyq52K5KB9+8SLVjjf4HwerSKEY51XY8XXyCQ6QbPfky5YSw1m0ISL pcQmrhUPVV5RuBmA9hd6w6pkBt5SraDYI1Dgf76+TUw/mnpHahh7jXMD+SFD689s8B7p zKPw+/eDctJ9E0Izlob4USrtjgrRMd+kDNIVEVczDIXqmRdWWoxhWpEuHzuRTKRRX0zg lVPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=hNhBTEux2YCwj9xj/sQYAYywb4N076++sidiufmRnK4=; b=pvmCqFOev+ulJ+NMMAJgUPlP8h/V5JWn57IqOOSpQu0SYpcAiIU32xYavuVpV+B6PL pkc10hr6hTOfVT3cqX2Un5nNrUMAY8bzLSSoLEcwIzh3Pm804WHG3lI2BxLcDfE6mtVi bf5Cezj2dVvE8v1pJvB8Ful12UUB/br9IwSPU214QGFiRdnhx/CfQ9utvOZAF9cK+ccq c2LhsiuEzdsCYINtZRF32L3k1i46wGFyw4zz7NHqp7AzsMDZruP9hBIOoPmyRQvNaMeW m4b9gcS7TQYSJkMEbnMz5eOLMQU/2w+oElZ8qgu8yaAnLjuEJ0BLzmq1Im51ZSddQETQ eRrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=WJzQNxEF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mb2si825492ejb.710.2020.06.03.19.55.11; Wed, 03 Jun 2020 19:55:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=WJzQNxEF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726846AbgFDCwf (ORCPT + 99 others); Wed, 3 Jun 2020 22:52:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725951AbgFDCwe (ORCPT ); Wed, 3 Jun 2020 22:52:34 -0400 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23243C03E96D for ; Wed, 3 Jun 2020 19:52:33 -0700 (PDT) Received: by mail-ot1-x342.google.com with SMTP id h7so3650257otr.3 for ; Wed, 03 Jun 2020 19:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hNhBTEux2YCwj9xj/sQYAYywb4N076++sidiufmRnK4=; b=WJzQNxEFGqsVkYUWZrloqKfNYJBigesd8zbO/lFV2NOGz/UHVmNJCu34+QoZdJ23pS hpZWXfGxYWmyN96yARaqo8PcEcni3CUnDurYcBIXSJ7pP9P+Vxmmud04IxiHT7g85JYm kgp5WlGLzROjE2uJZxRwa8xXIHgUqxovoqmjDuWcpmnQfqhVOukY+Ff7XxieE5WeA2wg +0p4GYYaWiC/FcujvQ+smELv/1oaU1AOfvntshx59UOnm/0hV39MXPTFDOEJvorhxgup UhxLolyJfdzXp2FZPnFqHJbpYagDl8U+8d5V6fpvz5dmVyiyloPtGHuAXyf7ncFSfDnP jv4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hNhBTEux2YCwj9xj/sQYAYywb4N076++sidiufmRnK4=; b=sntRQWoaTSwmUhkXrYhjPaiFU5YWE9KB78ynafn0lvZZFjaYphSIsd6rVMdCQyXiIa ldovUgBDLAGg3gdqV/SeCmpNzqNZ6/FqthvcH/JYLPxsBrHE+rron4ovlZFkeR5fRyuc bZp+PAaHPGR7jUvujAGAmSDBMY1IvIIFqbhE0S/FuK4xerHFT2M0AMNgfHGb2e7Bh7Ra 1RfhjRfjrCc+xv6zUBlPhqd19xC6E2c3u+pL0yotLye31qBBK7PmVDkco570PzcX6Q9x 4nPchHssyp+qLP71RCOv2Fn7qTk3MT0mY1/SnIRbfA1UHRGhGUVr9PeCxSeMYgQfmnbE kSpQ== X-Gm-Message-State: AOAM531sumVtvz9zPA3LKWkQSsOXWmzdwLJC1SY5KB/Fk7nfkXQ6i2mD SwvzN0YyphScHmqlP/9fSvmFHx6a0accFcvRHJrCBA== X-Received: by 2002:a05:6830:1e59:: with SMTP id e25mr1900651otj.127.1591239150272; Wed, 03 Jun 2020 19:52:30 -0700 (PDT) MIME-Version: 1.0 References: <20200603080010.13366-1-alex@ghiti.fr> <20200603080010.13366-2-alex@ghiti.fr> In-Reply-To: <20200603080010.13366-2-alex@ghiti.fr> From: Zong Li Date: Thu, 4 Jun 2020 10:52:19 +0800 Message-ID: Subject: Re: [PATCH v4 1/4] riscv: Move kernel mapping to vmalloc zone To: Alexandre Ghiti Cc: Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Paul Walmsley , Palmer Dabbelt , Albert Ou , Anup Patel , Atish Patra , "linux-kernel@vger.kernel.org List" , linuxppc-dev@lists.ozlabs.org, linux-riscv Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 3, 2020 at 4:01 PM Alexandre Ghiti wrote: > > This is a preparatory patch for relocatable kernel. > > The kernel used to be linked at PAGE_OFFSET address and used to be loaded > physically at the beginning of the main memory. Therefore, we could use > the linear mapping for the kernel mapping. > > But the relocated kernel base address will be different from PAGE_OFFSET > and since in the linear mapping, two different virtual addresses cannot > point to the same physical address, the kernel mapping needs to lie outside > the linear mapping. > > In addition, because modules and BPF must be close to the kernel (inside > +-2GB window), the kernel is placed at the end of the vmalloc zone minus > 2GB, which leaves room for modules and BPF. The kernel could not be > placed at the beginning of the vmalloc zone since other vmalloc > allocations from the kernel could get all the +-2GB window around the > kernel which would prevent new modules and BPF programs to be loaded. > > Signed-off-by: Alexandre Ghiti > --- > arch/riscv/boot/loader.lds.S | 3 +- > arch/riscv/include/asm/page.h | 10 +++++- > arch/riscv/include/asm/pgtable.h | 38 ++++++++++++++------- > arch/riscv/kernel/head.S | 3 +- > arch/riscv/kernel/module.c | 4 +-- > arch/riscv/kernel/vmlinux.lds.S | 3 +- > arch/riscv/mm/init.c | 58 +++++++++++++++++++++++++------- > arch/riscv/mm/physaddr.c | 2 +- > 8 files changed, 88 insertions(+), 33 deletions(-) > > diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S > index 47a5003c2e28..62d94696a19c 100644 > --- a/arch/riscv/boot/loader.lds.S > +++ b/arch/riscv/boot/loader.lds.S > @@ -1,13 +1,14 @@ > /* SPDX-License-Identifier: GPL-2.0 */ > > #include > +#include > > OUTPUT_ARCH(riscv) > ENTRY(_start) > > SECTIONS > { > - . = PAGE_OFFSET; > + . = KERNEL_LINK_ADDR; > > .payload : { > *(.payload) > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h > index 2d50f76efe48..48bb09b6a9b7 100644 > --- a/arch/riscv/include/asm/page.h > +++ b/arch/riscv/include/asm/page.h > @@ -90,18 +90,26 @@ typedef struct page *pgtable_t; > > #ifdef CONFIG_MMU > extern unsigned long va_pa_offset; > +extern unsigned long va_kernel_pa_offset; > extern unsigned long pfn_base; > #define ARCH_PFN_OFFSET (pfn_base) > #else > #define va_pa_offset 0 > +#define va_kernel_pa_offset 0 > #define ARCH_PFN_OFFSET (PAGE_OFFSET >> PAGE_SHIFT) > #endif /* CONFIG_MMU */ > > extern unsigned long max_low_pfn; > extern unsigned long min_low_pfn; > +extern unsigned long kernel_virt_addr; > > #define __pa_to_va_nodebug(x) ((void *)((unsigned long) (x) + va_pa_offset)) > -#define __va_to_pa_nodebug(x) ((unsigned long)(x) - va_pa_offset) > +#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset) > +#define kernel_mapping_va_to_pa(x) \ > + ((unsigned long)(x) - va_kernel_pa_offset) > +#define __va_to_pa_nodebug(x) \ > + (((x) >= PAGE_OFFSET) ? \ > + linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x)) > > #ifdef CONFIG_DEBUG_VIRTUAL > extern phys_addr_t __virt_to_phys(unsigned long x); > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h > index 35b60035b6b0..94ef3b49dfb6 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -11,23 +11,29 @@ > > #include > > -#ifndef __ASSEMBLY__ > - > -/* Page Upper Directory not used in RISC-V */ > -#include > -#include > -#include > -#include > - > -#ifdef CONFIG_MMU > +#ifndef CONFIG_MMU > +#define KERNEL_VIRT_ADDR PAGE_OFFSET > +#define KERNEL_LINK_ADDR PAGE_OFFSET > +#else > +/* > + * Leave 2GB for modules and BPF that must lie within a 2GB range around > + * the kernel. > + */ > +#define KERNEL_VIRT_ADDR (VMALLOC_END - SZ_2G + 1) > +#define KERNEL_LINK_ADDR KERNEL_VIRT_ADDR > > #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) > #define VMALLOC_END (PAGE_OFFSET - 1) > #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) > > #define BPF_JIT_REGION_SIZE (SZ_128M) > -#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) > -#define BPF_JIT_REGION_END (VMALLOC_END) > +#define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) > +#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) > + > +#ifdef CONFIG_64BIT > +#define VMALLOC_MODULE_START BPF_JIT_REGION_END > +#define VMALLOC_MODULE_END (((unsigned long)&_start & PAGE_MASK) + SZ_2G) > +#endif > > /* > * Roughly size the vmemmap space to be large enough to fit enough > @@ -57,9 +63,16 @@ > #define FIXADDR_SIZE PGDIR_SIZE > #endif > #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) > - > #endif > > +#ifndef __ASSEMBLY__ > + > +/* Page Upper Directory not used in RISC-V */ > +#include > +#include > +#include > +#include > + > #ifdef CONFIG_64BIT > #include > #else > @@ -483,6 +496,7 @@ static inline void __kernel_map_pages(struct page *page, int numpages, int enabl > > #define kern_addr_valid(addr) (1) /* FIXME */ > > +extern char _start[]; > extern void *dtb_early_va; > void setup_bootmem(void); > void paging_init(void); > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S > index 98a406474e7d..8f5bb7731327 100644 > --- a/arch/riscv/kernel/head.S > +++ b/arch/riscv/kernel/head.S > @@ -49,7 +49,8 @@ ENTRY(_start) > #ifdef CONFIG_MMU > relocate: > /* Relocate return address */ > - li a1, PAGE_OFFSET > + la a1, kernel_virt_addr > + REG_L a1, 0(a1) > la a2, _start > sub a1, a1, a2 > add ra, ra, a1 > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c > index 8bbe5dbe1341..1a8fbe05accf 100644 > --- a/arch/riscv/kernel/module.c > +++ b/arch/riscv/kernel/module.c > @@ -392,12 +392,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab, > } > > #if defined(CONFIG_MMU) && defined(CONFIG_64BIT) > -#define VMALLOC_MODULE_START \ > - max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START) > void *module_alloc(unsigned long size) > { > return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START, > - VMALLOC_END, GFP_KERNEL, > + VMALLOC_MODULE_END, GFP_KERNEL, > PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, > __builtin_return_address(0)); > } > diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S > index 0339b6bbe11a..a9abde62909f 100644 > --- a/arch/riscv/kernel/vmlinux.lds.S > +++ b/arch/riscv/kernel/vmlinux.lds.S > @@ -4,7 +4,8 @@ > * Copyright (C) 2017 SiFive > */ > > -#define LOAD_OFFSET PAGE_OFFSET > +#include > +#define LOAD_OFFSET KERNEL_LINK_ADDR > #include > #include > #include > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 736de6c8739f..37be2eb45e58 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -22,6 +22,9 @@ > > #include "../kernel/head.h" > > +unsigned long kernel_virt_addr = KERNEL_VIRT_ADDR; > +EXPORT_SYMBOL(kernel_virt_addr); > + > unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] > __page_aligned_bss; > EXPORT_SYMBOL(empty_zero_page); > @@ -178,8 +181,12 @@ void __init setup_bootmem(void) > } > > #ifdef CONFIG_MMU > +/* Offset between linear mapping virtual address and kernel load address */ > unsigned long va_pa_offset; > EXPORT_SYMBOL(va_pa_offset); > +/* Offset between kernel mapping virtual address and kernel load address */ > +unsigned long va_kernel_pa_offset; > +EXPORT_SYMBOL(va_kernel_pa_offset); > unsigned long pfn_base; > EXPORT_SYMBOL(pfn_base); > > @@ -271,7 +278,7 @@ static phys_addr_t __init alloc_pmd(uintptr_t va) > if (mmu_enabled) > return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE); > > - pmd_num = (va - PAGE_OFFSET) >> PGDIR_SHIFT; > + pmd_num = (va - kernel_virt_addr) >> PGDIR_SHIFT; > BUG_ON(pmd_num >= NUM_EARLY_PMDS); > return (uintptr_t)&early_pmd[pmd_num * PTRS_PER_PMD]; > } > @@ -372,14 +379,30 @@ static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size) > #error "setup_vm() is called from head.S before relocate so it should not use absolute addressing." > #endif > > +static uintptr_t load_pa, load_sz; > + > +void create_kernel_page_table(pgd_t *pgdir, uintptr_t map_size) It could be static if this function is only used in this file, as kbuild test reported. Apart from this, it looks good to me. Reviewed-by: Zong Li > +{ > + uintptr_t va, end_va; > + > + end_va = kernel_virt_addr + load_sz; > + for (va = kernel_virt_addr; va < end_va; va += map_size) > + create_pgd_mapping(pgdir, va, > + load_pa + (va - kernel_virt_addr), > + map_size, PAGE_KERNEL_EXEC); > +} > + > asmlinkage void __init setup_vm(uintptr_t dtb_pa) > { > uintptr_t va, end_va; > - uintptr_t load_pa = (uintptr_t)(&_start); > - uintptr_t load_sz = (uintptr_t)(&_end) - load_pa; > uintptr_t map_size = best_map_size(load_pa, MAX_EARLY_MAPPING_SIZE); > > + load_pa = (uintptr_t)(&_start); > + load_sz = (uintptr_t)(&_end) - load_pa; > + > va_pa_offset = PAGE_OFFSET - load_pa; > + va_kernel_pa_offset = kernel_virt_addr - load_pa; > + > pfn_base = PFN_DOWN(load_pa); > > /* > @@ -402,26 +425,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) > create_pmd_mapping(fixmap_pmd, FIXADDR_START, > (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE); > /* Setup trampoline PGD and PMD */ > - create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSET, > + create_pgd_mapping(trampoline_pg_dir, kernel_virt_addr, > (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE); > - create_pmd_mapping(trampoline_pmd, PAGE_OFFSET, > + create_pmd_mapping(trampoline_pmd, kernel_virt_addr, > load_pa, PMD_SIZE, PAGE_KERNEL_EXEC); > #else > /* Setup trampoline PGD */ > - create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSET, > + create_pgd_mapping(trampoline_pg_dir, kernel_virt_addr, > load_pa, PGDIR_SIZE, PAGE_KERNEL_EXEC); > #endif > > /* > - * Setup early PGD covering entire kernel which will allows > + * Setup early PGD covering entire kernel which will allow > * us to reach paging_init(). We map all memory banks later > * in setup_vm_final() below. > */ > - end_va = PAGE_OFFSET + load_sz; > - for (va = PAGE_OFFSET; va < end_va; va += map_size) > - create_pgd_mapping(early_pg_dir, va, > - load_pa + (va - PAGE_OFFSET), > - map_size, PAGE_KERNEL_EXEC); > + create_kernel_page_table(early_pg_dir, map_size); > > /* Create fixed mapping for early FDT parsing */ > end_va = __fix_to_virt(FIX_FDT) + FIX_FDT_SIZE; > @@ -441,6 +460,7 @@ static void __init setup_vm_final(void) > uintptr_t va, map_size; > phys_addr_t pa, start, end; > struct memblock_region *reg; > + static struct vm_struct vm_kernel = { 0 }; > > /* Set mmu_enabled flag */ > mmu_enabled = true; > @@ -467,10 +487,22 @@ static void __init setup_vm_final(void) > for (pa = start; pa < end; pa += map_size) { > va = (uintptr_t)__va(pa); > create_pgd_mapping(swapper_pg_dir, va, pa, > - map_size, PAGE_KERNEL_EXEC); > + map_size, PAGE_KERNEL); > } > } > > + /* Map the kernel */ > + create_kernel_page_table(swapper_pg_dir, PMD_SIZE); > + > + /* Reserve the vmalloc area occupied by the kernel */ > + vm_kernel.addr = (void *)kernel_virt_addr; > + vm_kernel.phys_addr = load_pa; > + vm_kernel.size = (load_sz + PMD_SIZE - 1) & ~(PMD_SIZE - 1); > + vm_kernel.flags = VM_MAP | VM_NO_GUARD; > + vm_kernel.caller = __builtin_return_address(0); > + > + vm_area_add_early(&vm_kernel); > + > /* Clear fixmap PTE and PMD mappings */ > clear_fixmap(FIX_PTE); > clear_fixmap(FIX_PMD); > diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c > index e8e4dcd39fed..35703d5ef5fd 100644 > --- a/arch/riscv/mm/physaddr.c > +++ b/arch/riscv/mm/physaddr.c > @@ -23,7 +23,7 @@ EXPORT_SYMBOL(__virt_to_phys); > > phys_addr_t __phys_addr_symbol(unsigned long x) > { > - unsigned long kernel_start = (unsigned long)PAGE_OFFSET; > + unsigned long kernel_start = (unsigned long)kernel_virt_addr; > unsigned long kernel_end = (unsigned long)_end; > > /* > -- > 2.20.1 >