Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1861658pxf; Fri, 12 Mar 2021 23:26:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJxN0fQFL7tYWu36w+vi0dtjf/IaITfMjHCI9vMTX5r2w99oPGFi5RDAA2vVMr0grhqCCeg6 X-Received: by 2002:a17:906:1d44:: with SMTP id o4mr11881787ejh.130.1615620406130; Fri, 12 Mar 2021 23:26:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615620406; cv=none; d=google.com; s=arc-20160816; b=OL9cQhUAyXay+Sx7l1p5TIyyn+/ZsOtphEnh/QbDJMBauHoaHJ8h+q/jmOCpsD45VZ VZCaI405RM/fvtLhac8XHn+oRt4SAZ/QzxeYnBqJ1HJq4nVU468jsuEHCCPckExDP31g Y0ePJF9Rfn05yWkzzwa7ICJqXaHY4mYmUHS1WMpGM6hB4MWN3WbUn71bLyxlU3w8rmFL QtLJCrbMxVoWMH/OfUXGtoobWUfzUO0hRwEAmDGa52OZ90rYoBHTaEyhZLDYILn8HZ5j JVP2SSkWYzeASM7dPv6+KHV/od3L9HslF4GOGGcllXfdt895xDO7aC8O3igF8lMHORUO 5Ckw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7eYVjh/3MlDURi3LsFeSKFPv9Je6M1V19C4BQQN9epg=; b=AT/wV/fweoxJBukOjeBbKfNw3eHNQC4ze79yrNCGV50PTod8oEWkqhULvIvt42ReIx 7z2ZrWGLeWQADV6Y/i9cAyuta3PoxA4KQEypyvFDDHmLTMpc4bHYw1dqrqXQi5yZa6mB t21nan4LoxwrFpZMVRzogXt06MMbqBqEtnXUiV4FAHfbEEGn8QQTAnK4svMjrr0nnOs9 0Gzp/QdigYJREClnHt7uaVNJkZRjnkzUpUjx4rYfzHEshoMab/w6Bc8whaWjjfyjzgDL BzMr+9ySBJAqSPdnak0ORgySJJQdJou+q7f4TwKJ+jnxgoX7VarQnByOxoOGfaMDNxFL vHZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=d6lL1ejz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t19si5668174ejx.42.2021.03.12.23.25.53; Fri, 12 Mar 2021 23:26:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=d6lL1ejz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231723AbhCMHVj (ORCPT + 99 others); Sat, 13 Mar 2021 02:21:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230309AbhCMHVK (ORCPT ); Sat, 13 Mar 2021 02:21:10 -0500 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9D54C061574 for ; Fri, 12 Mar 2021 23:21:09 -0800 (PST) Received: by mail-qt1-x82f.google.com with SMTP id f12so5668352qtq.4 for ; Fri, 12 Mar 2021 23:21:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=7eYVjh/3MlDURi3LsFeSKFPv9Je6M1V19C4BQQN9epg=; b=d6lL1ejz2bTagyNptYYPmk3n2fghwWx4oGFEGzjpYEYcRIVDuGb4FDJMMRwgEGZOdx wU9n7lFKqiNDiAa3f7DGbJjMXIKDrnMFQbs0/nYPg/aHQ3jN5q8gHfv5K6THjlOW7tdX bwjriZSaZNWlxWqwKSKubzxbZqsAEUczHxg5oj86tc4ixTeoKfR1G1Cj3dRmT4NoAkmt lYZYe9mD2O5sbE1btTDQG5RxGNJQYE6iPIZ0lmTjjgEjI6Va15bGH1NNirZ9Shanl824 BCgvi14e1LI0oF0HyDZj43JLbQJyyasOurnFoSjmIpS3piEMGg6+1Hwopn/NHvkcQ0XX WMUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=7eYVjh/3MlDURi3LsFeSKFPv9Je6M1V19C4BQQN9epg=; b=uTuutoB0Wgo+8pw8UMws9pygXonruweFgsrs/eFhnYXWnpvJG64pWl5glO94/CTeoC Hv/qRe6UNsIrVZFXlQYpb6P+WnOw4+XEfcTR32nk8K+jNnzg/svW6hXEC/VYkRGqo2KU X1eS8gzrH2paRa5bT73M1MzW3qziUwl3m7cgcLPYQi6/JTk36GwFyQO8fOT+hSpEV3Ox QIx3qaXM8sKEK2nuMXBTafS53w9PbXZoTXF9kzQ/kxaBtH+BstPWrEWIJi/YUSlKb3+o mDQAQNkWID/getc8tMHZ00eTc0iPmKd3QvejOWvIXaoTGkko7kd9yaw7/wdHCVkAiuEe 6pdg== X-Gm-Message-State: AOAM531VCGQG7R+BbeeAADw7uaXS32lPc88mFLTj2lG4X4d/z6J2j9e2 l1czP2VD+ezEOYdnG7a4SSv3OyouJY1j1HNVcCM19A== X-Received: by 2002:ac8:5212:: with SMTP id r18mr14848480qtn.290.1615620068524; Fri, 12 Mar 2021 23:21:08 -0800 (PST) MIME-Version: 1.0 References: <000000000000b74f1b05bd316729@google.com> <84b0471d-42c1-175f-ae1d-a18c310c7f77@codethink.co.uk> <795597a1-ec87-e09e-d073-3daf10422abb@ghiti.fr> In-Reply-To: From: Dmitry Vyukov Date: Sat, 13 Mar 2021 08:20:57 +0100 Message-ID: Subject: Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail To: Ben Dooks Cc: Alex Ghiti , syzbot , Paul Walmsley , Palmer Dabbelt , Albert Ou , linux-riscv , Daniel Bristot de Oliveira , Benjamin Segall , dietmar.eggemann@arm.com, Juri Lelli , LKML , Mel Gorman , Ingo Molnar , Peter Zijlstra , Steven Rostedt , syzkaller-bugs , Vincent Guittot Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 12, 2021 at 9:12 PM Ben Dooks wrote= : > > On 12/03/2021 16:25, Alex Ghiti wrote: > > > > > > Le 3/12/21 =C3=A0 10:12 AM, Dmitry Vyukov a =C3=A9crit : > >> On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks > >> wrote: > >>> > >>> On 10/03/2021 17:16, Dmitry Vyukov wrote: > >>>> On Wed, Mar 10, 2021 at 5:46 PM syzbot > >>>> wrote: > >>>>> > >>>>> Hello, > >>>>> > >>>>> syzbot found the following issue on: > >>>>> > >>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for > >>>>> arch_dup_tas.. > >>>>> git tree: > >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > >>>>> console output: > >>>>> https://syzkaller.appspot.com/x/log.txt?x=3D1212c6e6d00000 > >>>>> kernel config: > >>>>> https://syzkaller.appspot.com/x/.config?x=3De3c595255fb2d136 > >>>>> dashboard link: > >>>>> https://syzkaller.appspot.com/bug?extid=3De74b94fe601ab9552d69 > >>>>> userspace arch: riscv64 > >>>>> > >>>>> Unfortunately, I don't have any reproducer for this issue yet. > >>>>> > >>>>> IMPORTANT: if you fix the issue, please add the following tag to > >>>>> the commit: > >>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com > >>>> > >>>> +riscv maintainers > >>>> > >>>> This is riscv64-specific. > >>>> I've seen similar crashes in put_user in other places. It looks like > >>>> put_user crashes in the user address is not mapped/protected (?). > >>> > >>> I've been having a look, and this seems to be down to access of the > >>> tsk->set_child_tid variable. I assume the fuzzing here is to pass a > >>> bad address to clone? > >>> > >>> From looking at the code, the put_user() code should have set the > >>> relevant SR_SUM bit (the value for this, which is 1<<18 is in the > >>> s2 register in the crash report) and from looking at the compiler > >>> output from my gcc-10, the code looks to be dong the relevant csrs > >>> and then csrc around the put_user > >>> > >>> So currently I do not understand how the above could have happened > >>> over than something re-tried the code seqeunce and ended up retrying > >>> the faulting instruction without the SR_SUM bit set. > >> > >> I would maybe blame qemu for randomly resetting SR_SUM, but it's > >> strange that 99% of these crashes are in schedule_tail. If it would be > >> qemu, then they would be more evenly distributed... > >> > >> Another observation: looking at a dozen of crash logs, in none of > >> these cases fuzzer was actually trying to fuzz clone with some insane > >> arguments. So it looks like completely normal clone's (e..g coming > >> from pthread_create) result in this crash. > >> > >> I also wonder why there is ret_from_exception, is it normal? I see > >> handle_exception disables SR_SUM: > > > > csrrc does the right thing: it cleans SR_SUM bit in status but saves th= e > > previous value that will get correctly restored. > > > > ("The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the > > value of the CSR, zero-extends the value to XLEN bits, and writes it to > > integer registerrd. The initial value in integerregisterrs1is treated > > as a bit mask that specifies bit positions to be cleared in the CSR. An= y > > bitthat is high inrs1will cause the corresponding bit to be cleared in > > the CSR, if that CSR bit iswritable. Other bits in the CSR are > > unaffected.") > > I think there may also be an understanding issue on what the SR_SUM > bit does. I thought if it is set, M->U accesses would fault, which is > why it gets set early on. But from reading the uaccess code it looks > like the uaccess code sets it on entry and then clears on exit. > > I am very confused. Is there a master reference for rv64? > > https://people.eecs.berkeley.edu/~krste/papers/riscv-privileged-v1.9.pdf > seems to state PUM is the SR_SUM bit, and that (if set) disabled > > Quote: > The PUM (Protect User Memory) bit modifies the privilege with which > S-mode loads, stores, and instruction fetches access virtual memory. > When PUM=3D0, translation and protection behave as normal. When PUM=3D1, > S-mode memory accesses to pages that are accessible by U-mode (U=3D1 in > Figure 4.19) will fault. PUM has no effect when executing in U-mode > > > >> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/en= try.S#L73 > >> > > > > Still no luck for the moment, can't reproduce it locally, my test is > > maybe not that good (I created threads all day long in order to trigger > > the put_user of schedule_tail). > > It may of course depend on memory and other stuff. I did try to see if > it was possible to clone() with the child_tid address being a valid but > not mapped page... > > > Given that the path you mention works most of the time, and that the > > status register in the stack trace shows the SUM bit is not set whereas > > it is set in put_user, I'm leaning toward some race condition (maybe an > > interrupt that arrives at the "wrong" time) or a qemu issue as you > > mentioned. > > I suppose this is possible. From what I read it should get to the > point of being there with the SUM flag cleared, so either something > went wrong in trying to fix the instruction up or there's some other > error we're missing. > > > To eliminate qemu issues, do you have access to some HW ? Or to > > different qemu versions ? > > I do have access to a Microchip Polarfire board. I just need the > instructions on how to setup the test-code to make it work on the > hardware. For full syzkaller support, it would need to know how to reboot these boards and get access to the console. syzkaller has a stop-gap VM backend which just uses ssh to a physical machine and expects the kernel to reboot on its own after any crashes. But I actually managed to reproduce it in an even simpler setup. Assuming you have Go 1.15 and riscv64 cross-compiler gcc installed $ go get -u -d github.com/google/syzkaller/... $ cd $GOPATH/src/github.com/google/syzkaller $ make stress executor TARGETARCH=3Driscv64 $ scp bin/linux_riscv64/syz-execprog bin/linux_riscv64/syz-executor your_machine:/ Then run ./syz-stress on the machine. On the first run it crashed it with some other bug, on the second run I got the crash in schedule_tail. With qemu tcg I also added -slowdown=3D10 flag to syz-stress to scale all timeouts, if native execution is faster, then you don't need it.