Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4191994pxf; Tue, 30 Mar 2021 01:21:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeWq2DUOKk4QHsrYR0A5vu+PewEycal+dc78tcgbXFkOcGTseDBy9weW2CcMU9Soba2oay X-Received: by 2002:a05:6402:278d:: with SMTP id b13mr19576704ede.34.1617092492686; Tue, 30 Mar 2021 01:21:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617092492; cv=none; d=google.com; s=arc-20160816; b=nQXvuo0k+M2clLkuDQH/YAaIabkebPRIQ38IxTljaeLfqn6mBjKVbMrIg4eJU6i0W3 HYd+Q1EhTdiCXlEXIihdR2YKRHA/Dc7YHg3EJpugu5HvtzQfzN/2/G9gS4Jf2YEIwnf2 Iaq4oBQ5pfWXPey+0T2aoI2PurQy/o87Pq+ASufq2VL2Ulp65jyIl91izlAJwDT4M6Dr 4cD6xAZGtWO9Nm4tqT+vPAB2RwdDnIzN1RAbh3dJjFdm374Bo460120MjN/dwEHnYV7Z djrOC8DRcP5Fez8LFNgjZP5ERtNCwcPIU8SX4hg3liqBX175QfvWwPUj/vga/+3Zg82s 2r4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from; bh=opA7Ch0YoLuedAsKxCQq+U5BvPT4selMRdCEbuqM0lg=; b=sA2TIaA9DDHjBEjfODrKWVgvd8nfdPQQNQ19/3jY19WE4iza+BQJYlqKQmGVBTPHf7 k2i6Yle/MJ/YVJqUToVENGiko2FMgqKAEoulQiog+XJpuRuFZPr9lwEgD17/LaXKtu53 kEIvcc/SzU8R5hWOLjQWu9ngII/gq0sH5ZusQTHBFD9Jpyq8x22vgAFqO/1MZbN8nUHW WraBLUt/hhvOErM8wqpAR5iBe214sXRSjFQ3ZyS9HgzbsLRIG5KhngX9iozOflpAoLBR HOvyMz18mp20DsCo9M1hW9WGc0tFlF8uBgUlYsXu1+KC+WOEnC1ziVoOxvHGCcX7/vHe 9gXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bi21si11228759edb.439.2021.03.30.01.21.10; Tue, 30 Mar 2021 01:21:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231529AbhC3IT1 (ORCPT + 99 others); Tue, 30 Mar 2021 04:19:27 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:15821 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231430AbhC3ITC (ORCPT ); Tue, 30 Mar 2021 04:19:02 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4F8j3D04HCz9sKj; Tue, 30 Mar 2021 16:16:56 +0800 (CST) Received: from huawei.com (10.67.174.53) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Tue, 30 Mar 2021 16:18:49 +0800 From: Liao Chang To: , , , , , , , CC: , , Subject: [PATCH] riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe Date: Tue, 30 Mar 2021 16:18:48 +0800 Message-ID: <20210330081848.14043-1-liaochang1@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.67.174.53] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The execution of sys_read end up hitting a BUG_ON() in __find_get_block after installing kprobe at sys_read, the BUG message like the following: [ 65.708663] ------------[ cut here ]------------ [ 65.709987] kernel BUG at fs/buffer.c:1251! [ 65.711283] Kernel BUG [#1] [ 65.712032] Modules linked in: [ 65.712925] CPU: 0 PID: 51 Comm: sh Not tainted 5.12.0-rc4 #1 [ 65.714407] Hardware name: riscv-virtio,qemu (DT) [ 65.715696] epc : __find_get_block+0x218/0x2c8 [ 65.716835] ra : __getblk_gfp+0x1c/0x4a [ 65.717831] epc : ffffffe00019f11e ra : ffffffe00019f56a sp : ffffffe002437930 [ 65.719553] gp : ffffffe000f06030 tp : ffffffe0015abc00 t0 : ffffffe00191e038 [ 65.721290] t1 : ffffffe00191e038 t2 : 000000000000000a s0 : ffffffe002437960 [ 65.723051] s1 : ffffffe00160ad00 a0 : ffffffe00160ad00 a1 : 000000000000012a [ 65.724772] a2 : 0000000000000400 a3 : 0000000000000008 a4 : 0000000000000040 [ 65.726545] a5 : 0000000000000000 a6 : ffffffe00191e000 a7 : 0000000000000000 [ 65.728308] s2 : 000000000000012a s3 : 0000000000000400 s4 : 0000000000000008 [ 65.730049] s5 : 000000000000006c s6 : ffffffe00240f800 s7 : ffffffe000f080a8 [ 65.731802] s8 : 0000000000000001 s9 : 000000000000012a s10: 0000000000000008 [ 65.733516] s11: 0000000000000008 t3 : 00000000000003ff t4 : 000000000000000f [ 65.734434] t5 : 00000000000003ff t6 : 0000000000040000 [ 65.734613] status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 65.734901] Call Trace: [ 65.735076] [] __find_get_block+0x218/0x2c8 [ 65.735417] [] __ext4_get_inode_loc+0xb2/0x2f6 [ 65.735618] [] ext4_get_inode_loc+0x3a/0x8a [ 65.735802] [] ext4_reserve_inode_write+0x2e/0x8c [ 65.735999] [] __ext4_mark_inode_dirty+0x4c/0x18e [ 65.736208] [] ext4_dirty_inode+0x46/0x66 [ 65.736387] [] __mark_inode_dirty+0x12c/0x3da [ 65.736576] [] touch_atime+0x146/0x150 [ 65.736748] [] filemap_read+0x234/0x246 [ 65.736920] [] generic_file_read_iter+0xc0/0x114 [ 65.737114] [] ext4_file_read_iter+0x42/0xea [ 65.737310] [] new_sync_read+0xe2/0x15a [ 65.737483] [] vfs_read+0xca/0xf2 [ 65.737641] [] ksys_read+0x5e/0xc8 [ 65.737816] [] sys_read+0xe/0x16 [ 65.737973] [] ret_from_syscall+0x0/0x2 [ 65.738858] ---[ end trace fe93f985456c935d ]--- A simple reproducer looks like: echo 'p:myprobe sys_read fd=%a0 buf=%a1 count=%a2' > /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable cat /sys/kernel/debug/tracing/trace Here's what happens to hit that BUG_ON(): 1) After installing kprobe at entry of sys_read, the first instruction is replaced by 'ebreak' instruction on riscv64 platform. 2) Once kernel reach the 'ebreak' instruction at the entry of sys_read, it trap into the riscv breakpoint handler, where it do something to setup for coming single-step of origin instruction, including backup the 'sstatus' in pt_regs, followed by disable interrupt during single stepping via clear 'SIE' bit of 'sstatus' in pt_regs. 3) Then kernel restore to the instruction slot contains two instructions, one is original instruction at entry of sys_read, the other is 'ebreak'. Here it trigger a 'Instruction page fault' exception (value at 'scause' is '0xc'), if PF is not filled into PageTabe for that slot yet. 4) Again kernel trap into page fault exception handler, where it choose different policy according to the state of running kprobe. Because afte 2) the state is KPROBE_HIT_SS, so kernel reset the current kprobe and 'pc' points back to the probe address. 5) Because 'epc' point back to 'ebreak' instrution at sys_read probe, kernel trap into breakpoint handler again, and repeat the operations at 2), however 'sstatus' without 'SIE' is keep at 4), it cause the real 'sstatus' saved at 2) is overwritten by the one withou 'SIE'. 6) When kernel cross the probe the 'sstatus' CSR restore with value without 'SIE', and reach __find_get_block where it requires the interrupt must be enabled. Fix this is very trivial, just restore the value of 'sstatus' in pt_regs with backup one at 2) when the instruction being single stepped cause a page fault. Fixes: c22b0bcb1dd02 ("riscv: Add kprobes supported") Signed-off-by: Liao Chang --- arch/riscv/kernel/probes/kprobes.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c index 7e2c78e2ca6b..d71f7c49a721 100644 --- a/arch/riscv/kernel/probes/kprobes.c +++ b/arch/riscv/kernel/probes/kprobes.c @@ -260,8 +260,10 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int trapnr) if (kcb->kprobe_status == KPROBE_REENTER) restore_previous_kprobe(kcb); - else + else { + kprobes_restore_local_irqflag(kcb, regs); reset_current_kprobe(); + } break; case KPROBE_HIT_ACTIVE: -- 2.17.1