Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp202746pxj; Thu, 17 Jun 2021 00:25:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzHO8ex7ofJi0LaqMiyiTnFzJ+MXQncOzU/jVyR1KUiJIychaYFSztJC27E5H4Jy78DrY0H X-Received: by 2002:a6b:103:: with SMTP id 3mr2725091iob.156.1623914751625; Thu, 17 Jun 2021 00:25:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623914751; cv=none; d=google.com; s=arc-20160816; b=aMEAVI7Z6+vn74dWBJgeFDLpHHx4rceiJJrv3PQhULZVJLpNIWIY+fP+F5wARu95sQ Bsmc9k/7y6uXGs2UiPmBTYkYGs7H4xxPDYC9xDAt1aUiF48idIEP9IK3WKUuF/SuBlGs hvOnDhEkivsk9sxiNz8nEI1VuW3Gq6CwuYuQSLm2rzcSP0wlDQR+rG2d0oQXOrv5Qpbn mQu1Vlrys/IqeB8ZjG5K/NKX0cdRF+9TRofyn8Pk7UHV3c8sFuJQ/5j0c0a3KKXl7qfN ZlmQNGdpjzBLc6c93RUOlIcT8LAoe11HggcxBXQipusRcRNfaKDxs+GPM3jMjhjYWBVC UGyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=WIlxqyimA2aPzYPvZDvzsXn05AcmKkMfn5Mwe4pp7jE=; b=BNSP8DbAbC/4N+24hGviqvNkOGXPi1l9/p9Te4ThLpEO9XhLR+R0WFelLZVIcJ5qnA THkInrjax1dq/DRqIJd2PFO/tnG3CvMScALhKpHoyaj4eFOMWmM/HTJ21a+7LFmd1KFA BedtWO6xh+4bjTLKKvWv7By96YKrMdwFN+0TSEMjBN3urRhllY1zVfCPGtBKfaQTZJWF 0Uw2DAuvtBeY6uO0emfPTGGMzwXOW93jXNM+1octVkhvOegEEk10J+2l6FcUCOGa87AC sr2WNWGGiN/8mW/QNp2boxJTB9McFGK46CLdSm14FPIVpOizcupnD8Htzrl8Tyz3Bofi vegA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g1si6271656ild.63.2021.06.17.00.25.39; Thu, 17 Jun 2021 00:25:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230152AbhFQHZZ (ORCPT + 99 others); Thu, 17 Jun 2021 03:25:25 -0400 Received: from relay1-d.mail.gandi.net ([217.70.183.193]:36209 "EHLO relay1-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229673AbhFQHZX (ORCPT ); Thu, 17 Jun 2021 03:25:23 -0400 Received: (Authenticated sender: alex@ghiti.fr) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id E10C7240008; Thu, 17 Jun 2021 07:23:04 +0000 (UTC) Subject: Re: [PATCH] riscv: Ensure BPF_JIT_REGION_START aligned with PMD size To: Jisheng Zhang Cc: Andreas Schwab , Paul Walmsley , Palmer Dabbelt , Albert Ou , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Luke Nelson , Xi Wang , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, netdev@vger.kernel.org, bpf@vger.kernel.org References: <20210330022144.150edc6e@xhacker> <20210330022521.2a904a8c@xhacker> <87o8ccqypw.fsf@igel.home> <20210612002334.6af72545@xhacker> <87bl8cqrpv.fsf@igel.home> <20210614010546.7a0d5584@xhacker> <87im2hsfvm.fsf@igel.home> <20210615004928.2d27d2ac@xhacker> <20210616080328.6548e762@xhacker> From: Alex Ghiti Message-ID: <4cdb1261-6474-8ae6-7a92-a3be81ce8cb5@ghiti.fr> Date: Thu, 17 Jun 2021 09:23:04 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210616080328.6548e762@xhacker> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 16/06/2021 à 02:03, Jisheng Zhang a écrit : > On Tue, 15 Jun 2021 20:54:19 +0200 > Alex Ghiti wrote: > >> Hi Jisheng, > > Hi Alex, > >> >> Le 14/06/2021 à 18:49, Jisheng Zhang a écrit : >>> From: Jisheng Zhang >>> >>> Andreas reported commit fc8504765ec5 ("riscv: bpf: Avoid breaking W^X") >>> breaks booting with one kind of config file, I reproduced a kernel panic >>> with the config: >>> >>> [ 0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220 >>> [ 0.139159] Oops [#1] >>> [ 0.139303] Modules linked in: >>> [ 0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1 >>> [ 0.139934] Hardware name: riscv-virtio,qemu (DT) >>> [ 0.140193] epc : __memset+0xc4/0xfc >>> [ 0.140416] ra : skb_flow_dissector_init+0x1e/0x82 >>> [ 0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0 >>> [ 0.140878] gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158 >>> [ 0.141156] t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0 >>> [ 0.141424] s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000 >>> [ 0.141654] a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064 >>> [ 0.141893] a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff >>> [ 0.142126] s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088 >>> [ 0.142353] s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438 >>> [ 0.142584] s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac >>> [ 0.142810] s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000 >>> [ 0.143042] t5 : 0000000000000155 t6 : 00000000000003ff >>> [ 0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f >>> [ 0.143560] [] __memset+0xc4/0xfc >>> [ 0.143859] [] init_default_flow_dissectors+0x22/0x60 >>> [ 0.144092] [] do_one_initcall+0x3e/0x168 >>> [ 0.144278] [] kernel_init_freeable+0x1c8/0x224 >>> [ 0.144479] [] kernel_init+0x12/0x110 >>> [ 0.144658] [] ret_from_exception+0x0/0xc >>> [ 0.145124] ---[ end trace f1e9643daa46d591 ]--- >>> >>> After some investigation, I think I found the root cause: commit >>> 2bfc6cd81bd ("move kernel mapping outside of linear mapping") moves >>> BPF JIT region after the kernel: >>> >>> The &_end is unlikely aligned with PMD size, so the front bpf jit >>> region sits with part of kernel .data section in one PMD size mapping. >>> But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is >>> called to make the first bpf jit prog ROX, we will make part of kernel >>> .data section RO too, so when we write to, for example memset the >>> .data section, MMU will trigger a store page fault. >> >> Good catch, we make sure no physical allocation happens between _end and >> the next PMD aligned address, but I missed this one. >> >>> >>> To fix the issue, we need to ensure the BPF JIT region is PMD size >>> aligned. This patch acchieve this goal by restoring the BPF JIT region >>> to original position, I.E the 128MB before kernel .text section. >> >> But I disagree with your solution: I made sure modules and BPF programs >> get their own virtual regions to avoid worst case scenario where one >> could allocate all the space and leave nothing to the other (we are >> limited to +- 2GB offset). Why don't just align BPF_JIT_REGION_START to >> the next PMD aligned address? > > Originally, I planed to fix the issue by aligning BPF_JIT_REGION_START, but > IIRC, BPF experts are adding (or have added) "Calling kernel functions from BPF" > feature, there's a risk that BPF JIT region is beyond the 2GB of module region: > > ------ > module > ------ > kernel > ------ > BPF_JIT > > So I made this patch finally. In this patch, we let BPF JIT region sit > between module and kernel. > From what I read in the lwn article, I'm not sure BPF programs can call module functions, can someone tell us if it is possible? Or planned? > To address "make sure modules and BPF programs get their own virtual regions", > what about something as below (applied against this patch)? > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h > index 380cd3a7e548..da1158f10b09 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -31,7 +31,7 @@ > #define BPF_JIT_REGION_SIZE (SZ_128M) > #ifdef CONFIG_64BIT > #define BPF_JIT_REGION_START (BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE) > -#define BPF_JIT_REGION_END (MODULES_END) > +#define BPF_JIT_REGION_END (PFN_ALIGN((unsigned long)&_start)) > #else > #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) > #define BPF_JIT_REGION_END (VMALLOC_END) > @@ -40,7 +40,7 @@ > /* Modules always live before the kernel */ > #ifdef CONFIG_64BIT > #define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G) > -#define MODULES_END (PFN_ALIGN((unsigned long)&_start)) > +#define MODULES_END (BPF_JIT_REGION_END) > #endif > > In case it is possible, I would let the vmalloc allocator handle the case where modules steal room from BPF: I would then not implement the above but rather your first patch. And do not forget to modify Documentation/riscv/vm-layout.rst accordingly and remove the comment "/* KASLR should leave at least 128MB for BPF after the kernel */" Thanks, Alex > >> >> Again, good catch, thanks, >> >> Alex >> >>> >>> Reported-by: Andreas Schwab >>> Signed-off-by: Jisheng Zhang >>> --- >>> arch/riscv/include/asm/pgtable.h | 5 ++--- >>> 1 file changed, 2 insertions(+), 3 deletions(-) >>> >>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h >>> index 9469f464e71a..380cd3a7e548 100644 >>> --- a/arch/riscv/include/asm/pgtable.h >>> +++ b/arch/riscv/include/asm/pgtable.h >>> @@ -30,9 +30,8 @@ >>> >>> #define BPF_JIT_REGION_SIZE (SZ_128M) >>> #ifdef CONFIG_64BIT >>> -/* KASLR should leave at least 128MB for BPF after the kernel */ >>> -#define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) >>> -#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) >>> +#define BPF_JIT_REGION_START (BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE) >>> +#define BPF_JIT_REGION_END (MODULES_END) >>> #else >>> #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) >>> #define BPF_JIT_REGION_END (VMALLOC_END) >>> > > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >