Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1399577pxf; Fri, 12 Mar 2021 08:39:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJw3bf+OpjOo3Z5AH4Zk12zjhJ7Df58GwB5BOwEGpWnHaGFQcP+1wctN5WNcKIleiqhUixK2 X-Received: by 2002:a17:906:c402:: with SMTP id u2mr9430962ejz.546.1615567147069; Fri, 12 Mar 2021 08:39:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615567147; cv=none; d=google.com; s=arc-20160816; b=R5fAK7LM2XIGY3JNxOIhATqtfdFQG22BqpG9pSpLK1DUuQSBzTf9nI6d+2hnjMptnS fqBsfby2NZnb1Eq+VDegi+Y4mda12jYkpLREy8gujwL88MhcHAW7lMuw6RrrpybjXpFE KhvtQ1IiBx44jmVC5ny2odYppymnS0JtuiO/hVfNou9PdociSuqlCbldeoRrt/3X8eUV x7du3GURwFtFLM9M+JvIEcC/6pR52qujXXQOiYQuG4fNJ74W1Y2xa6KCMApW2ydb+0D6 w7eXV3dJzfeZSSaXJl6ASB2TAg1M08/4MhlU8GlPsa6WM2BX+fUbgTViXMpIuXESyxga SJIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :references:cc:to:from:subject; bh=+S6GD1IXxWvIfhmMFFS6Fz1MD9OfjGsBpBo6TcyMkLs=; b=r9aI4Pq1EHoFE3UB92v/YybXJOtcXqlR0epG60c+7sp30ZmeiEAyrQZEfzBU+61Agu 3pywFtlJtzt8dJLzBsk0vnOxw3C716gCLQObvtsGSkDhogWsfgu2hMtriR/2IbTUle+p wBCgu4y4ZfRxy2Twxeyd2psdpHsvQutEp+jS6fWph0Lcb9I6UC8QlIgnrHCCJZmzUX+D yIs9B+y4vzcMEAxOee8LZTWzkC+zYeA8WH2wIvhyteL0Y6sKnn3vhgy4kMUtyYZ/uKWZ E3M02WzOI875cJf2ieiShV41Nw2uzcOPu3ixFl8iO32BNzVp0i4FAEPX/i38LaZDLjRk 5CMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codethink.co.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gz11si4526791ejc.745.2021.03.12.08.38.44; Fri, 12 Mar 2021 08:39:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codethink.co.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232109AbhCLQhT (ORCPT + 99 others); Fri, 12 Mar 2021 11:37:19 -0500 Received: from imap2.colo.codethink.co.uk ([78.40.148.184]:44584 "EHLO imap2.colo.codethink.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231789AbhCLQhB (ORCPT ); Fri, 12 Mar 2021 11:37:01 -0500 Received: from cpc79921-stkp12-2-0-cust288.10-2.cable.virginm.net ([86.16.139.33] helo=[192.168.0.18]) by imap2.colo.codethink.co.uk with esmtpsa (Exim 4.92 #3 (Debian)) id 1lKkmR-0002cL-MC; Fri, 12 Mar 2021 16:36:51 +0000 Subject: Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail From: Ben Dooks To: Dmitry Vyukov Cc: syzbot , Paul Walmsley , Palmer Dabbelt , Albert Ou , linux-riscv , Daniel Bristot de Oliveira , Benjamin Segall , dietmar.eggemann@arm.com, Juri Lelli , LKML , Mel Gorman , Ingo Molnar , Peter Zijlstra , Steven Rostedt , syzkaller-bugs , Vincent Guittot References: <000000000000b74f1b05bd316729@google.com> <84b0471d-42c1-175f-ae1d-a18c310c7f77@codethink.co.uk> <816870e9-9354-ffbd-936b-40e38e4276a4@codethink.co.uk> Organization: Codethink Limited. Message-ID: <4ce57c7e-6e5d-d136-0a81-395a4207ba44@codethink.co.uk> Date: Fri, 12 Mar 2021 16:36:50 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <816870e9-9354-ffbd-936b-40e38e4276a4@codethink.co.uk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/03/2021 16:34, Ben Dooks wrote: > On 12/03/2021 16:30, Ben Dooks wrote: >> On 12/03/2021 15:12, Dmitry Vyukov wrote: >>> On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks >>> wrote: >>>> >>>> On 10/03/2021 17:16, Dmitry Vyukov wrote: >>>>> On Wed, Mar 10, 2021 at 5:46 PM syzbot >>>>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> syzbot found the following issue on: >>>>>> >>>>>> HEAD commit:    0d7588ab riscv: process: Fix no prototype for >>>>>> arch_dup_tas.. >>>>>> git tree: >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >>>>>> console output: >>>>>> https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000 >>>>>> kernel config: >>>>>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136 >>>>>> dashboard link: >>>>>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 >>>>>> userspace arch: riscv64 >>>>>> >>>>>> Unfortunately, I don't have any reproducer for this issue yet. >>>>>> >>>>>> IMPORTANT: if you fix the issue, please add the following tag to >>>>>> the commit: >>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com >>>>> >>>>> +riscv maintainers >>>>> >>>>> This is riscv64-specific. >>>>> I've seen similar crashes in put_user in other places. It looks like >>>>> put_user crashes in the user address is not mapped/protected (?). >>>> >>>> I've been having a look, and this seems to be down to access of the >>>> tsk->set_child_tid variable. I assume the fuzzing here is to pass a >>>> bad address to clone? >>>> >>>>   From looking at the code, the put_user() code should have set the >>>> relevant SR_SUM bit (the value for this, which is 1<<18 is in the >>>> s2 register in the crash report) and from looking at the compiler >>>> output from my gcc-10, the code looks to be dong the relevant csrs >>>> and then csrc around the put_user >>>> >>>> So currently I do not understand how the above could have happened >>>> over than something re-tried the code seqeunce and ended up retrying >>>> the faulting instruction without the SR_SUM bit set. >>> >>> I would maybe blame qemu for randomly resetting SR_SUM, but it's >>> strange that 99% of these crashes are in schedule_tail. If it would be >>> qemu, then they would be more evenly distributed... >>> >>> Another observation: looking at a dozen of crash logs, in none of >>> these cases fuzzer was actually trying to fuzz clone with some insane >>> arguments. So it looks like completely normal clone's (e..g coming >>> from pthread_create) result in this crash. >>> >>> I also wonder why there is ret_from_exception, is it normal? I see >>> handle_exception disables SR_SUM: >>> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73 >>> >> >> So I think if SR_SUM is set, then it faults the access to user memory >> which the _user() routines clear to allow them access. >> >> I'm thinking there is at least one issue here: >> >> - the test in fault is the wrong way around for die kernel >> - the handler only catches this if the page has yet to be mapped. >> >> So I think the test should be: >> >>          if (!user_mode(regs) && addr < TASK_SIZE && >>                          unlikely(regs->status & SR_SUM) >> >> This then should continue on and allow the rest of the handler to >> complete mapping the page if it is not there. >> >> I have been trying to create a very simple clone test, but so far it >> has yet to actually trigger anything. > > I should have added there doesn't seem to be a good way to use mmap() > to allocate memory but not insert a vm-mapping post the mmap(). > > How difficult is it to try building a branch with the above test modified? -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius https://www.codethink.co.uk/privacy.html