Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp3932813imb; Wed, 6 Mar 2019 00:50:47 -0800 (PST) X-Google-Smtp-Source: APXvYqypnq+U9Wp2LFeyJur8JNDGBAEc6+QI5D7s+gup2bU/JAGMb2MhbAYKXntxXo45SBe0YaYY X-Received: by 2002:a17:902:b117:: with SMTP id q23mr5955184plr.160.1551862247842; Wed, 06 Mar 2019 00:50:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551862247; cv=none; d=google.com; s=arc-20160816; b=dbd3UnxEGilNEXpjU7P0n2k35EUefJNZq9LYD+Uj6sStg/bxMqGi3ENrXAhrxnQVVO StyVhtTqVgUy55sjHUwe6LhplcC+RWOnMwy2Bq023KDcHHn/lN66HEkAtE/3g0ftolq4 CM80jnswTaAu9atwDGF4lFmkv18QvpAgQEzLFiVvfqMOf7hrZzv/X2QoD3hAzuOauHkV U/JTcVrOlep2Ez62oxM3kPnDePIFharVGLc/Ags58N6TKbF1MLg9hpfK1qI7wLAS2TGD NShNTB+eE+fkENMzk1i1dEGpXkKm3i005IjdcNtCjLeOsNQYqVv0fbSm85TijIlEO7HN gwsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VcbhR6YLJFD1Kdo2uvUoo5+elIYoX2aWecNcw6/O/JU=; b=WATqhM2GjgyAtU3bL7+YJFG0mMNdVchj+4InFfs7y1gd8POAvAnsTSltePnOkkJqTU K7iwULHJnIcSP12+X3LngHfJihO1QjZT/ccgjS2BKIp1BH58G2pjdcfYJfMQ0RKQVq3e i1rtZdrQCH2GiKAotiZNztbQM0df2/o9GArLx/6/8iBYw/P1sILwXWyPf+TsuoTUkh+A 4wJW/CZTFz3e8etrENUd0o9inuoYj3CVc1Vq/K48loJJnejK3A1qIAjOsjEL/sMGEPgR u++PfHLF+0LH1V/cuQ4/nZWpTcDdiHwu5z6f1jccnb88LXDug530dsJSWykCG5+P2FQn fMMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j5si911466plk.387.2019.03.06.00.50.32; Wed, 06 Mar 2019 00:50:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729330AbfCFIMK (ORCPT + 99 others); Wed, 6 Mar 2019 03:12:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58582 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729267AbfCFIMK (ORCPT ); Wed, 6 Mar 2019 03:12:10 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4722C83F51; Wed, 6 Mar 2019 08:12:09 +0000 (UTC) Received: from xz-x1 (dhcp-14-116.nay.redhat.com [10.66.14.116]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 627E11A267; Wed, 6 Mar 2019 08:12:03 +0000 (UTC) Date: Wed, 6 Mar 2019 16:12:01 +0800 From: Peter Xu To: zhong jiang Cc: Mike Rapoport , Andrea Arcangeli , Dmitry Vyukov , syzbot , Michal Hocko , cgroups@vger.kernel.org, Johannes Weiner , LKML , Linux-MM , syzkaller-bugs , Vladimir Davydov , David Rientjes , Hugh Dickins , Matthew Wilcox , Mel Gorman , Vlastimil Babka , Mike Rapoport Subject: Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm Message-ID: <20190306081201.GC11093@xz-x1> References: <5C7D2F82.40907@huawei.com> <5C7D4500.3070607@huawei.com> <5C7E1A38.2060906@huawei.com> <20190306020540.GA23850@redhat.com> <5C7F6048.2050802@huawei.com> <20190306062625.GA3549@rapoport-lnx> <5C7F7992.7050806@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <5C7F7992.7050806@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 06 Mar 2019 08:12:09 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 06, 2019 at 03:41:06PM +0800, zhong jiang wrote: > On 2019/3/6 14:26, Mike Rapoport wrote: > > Hi, > > > > On Wed, Mar 06, 2019 at 01:53:12PM +0800, zhong jiang wrote: > >> On 2019/3/6 10:05, Andrea Arcangeli wrote: > >>> Hello everyone, > >>> > >>> [ CC'ed Mike and Peter ] > >>> > >>> On Tue, Mar 05, 2019 at 02:42:00PM +0800, zhong jiang wrote: > >>>> On 2019/3/5 14:26, Dmitry Vyukov wrote: > >>>>> On Mon, Mar 4, 2019 at 4:32 PM zhong jiang wrote: > >>>>>> On 2019/3/4 22:11, Dmitry Vyukov wrote: > >>>>>>> On Mon, Mar 4, 2019 at 3:00 PM zhong jiang wrote: > >>>>>>>> On 2019/3/4 15:40, Dmitry Vyukov wrote: > >>>>>>>>> On Sun, Mar 3, 2019 at 5:19 PM zhong jiang wrote: > >>>>>>>>>> Hi, guys > >>>>>>>>>> > >>>>>>>>>> I also hit the following issue. but it fails to reproduce the issue by the log. > >>>>>>>>>> > >>>>>>>>>> it seems to the case that we access the mm->owner and deference it will result in the UAF. > >>>>>>>>>> But it should not be possible that we specify the incomplete process to be the mm->owner. > >>>>>>>>>> > >>>>>>>>>> Any thoughts? > >>>>>>>>> FWIW syzbot was able to reproduce this with this reproducer. > >>>>>>>>> This looks like a very subtle race (threaded reproducer that runs > >>>>>>>>> repeatedly in multiple processes), so most likely we are looking for > >>>>>>>>> something like few instructions inconsistency window. > >>>>>>>>> > >>>>>>>> I has a little doubtful about the instrustions inconsistency window. > >>>>>>>> > >>>>>>>> I guess that you mean some smb barriers should be taken into account.:-) > >>>>>>>> > >>>>>>>> Because IMO, It should not be the lock case to result in the issue. > >>>>>>> Since the crash was triggered on x86 _most likley_ this is not a > >>>>>>> missed barrier. What I meant is that one thread needs to executed some > >>>>>>> code, while another thread is stopped within few instructions. > >>>>>>> > >>>>>>> > >>>>>> It is weird and I can not find any relationship you had said with the issue.:-( > >>>>>> > >>>>>> Because It is the cause that mm->owner has been freed, whereas we still deference it. > >>>>>> > >>>>>> From the lastest freed task call trace, It fails to create process. > >>>>>> > >>>>>> Am I miss something or I misunderstand your meaning. Please correct me. > >>>>> Your analysis looks correct. I am just saying that the root cause of > >>>>> this use-after-free seems to be a race condition. > >>>>> > >>>>> > >>>>> > >>>> Yep, Indeed, I can not figure out how the race works. I will dig up further. > >>> Yes it's a race condition. > >>> > >>> We were aware about the non-cooperative fork userfaultfd feature > >>> creating userfaultfd file descriptor that gets reported to the parent > >>> uffd, despite they belong to mm created by failed forks. > >>> > >>> https://www.spinics.net/lists/linux-mm/msg136357.html > >>> > >> Hi, Andrea > >> > >> I still not clear why uffd ioctl can use the incomplete process as the mm->owner. > >> and how to produce the race. > > There is a C reproducer in the syzcaller report: > > > > https://syzkaller.appspot.com/x/repro.c?x=172fa5a3400000 > > > >> From your above explainations, My underdtanding is that the process handling do_exexve > >> will have a temporary mm, which will be used by the UUFD ioctl. > > The race is between userfaultfd operation and fork() failure: > > > > forking thread | userfaultfd monitor thread > > --------------------------------+------------------------------- > > fork() | > > dup_mmap() | > > dup_userfaultfd() | > > dup_userfaultfd_complete() | > > | read(UFFD_EVENT_FORK) > > | uffdio_copy() > > | mmget_not_zero() > > goto bad_fork_something | > > ... | > > bad_fork_free: | > > free_task() | > > | mem_cgroup_from_task() > > | /* access stale mm->owner */ > > > Hi, Mike Hi, Zhong, > > forking thread fails to create the process ,and then free the allocated task struct. > Other userfaultfd monitor thread should not access the stale mm->owner. > > The parent process and child process do not share the mm struct. Userfaultfd monitor thread's > mm->owner should not point to the freed child task_struct. IIUC the problem is that above mm (of the mm->owner) is the child process's mm rather than the uffd monitor's. When dup_userfaultfd_complete() is called there will be a new userfaultfd context sent to the uffd monitor thread which linked to the chlid process's mm, and if the monitor thread do UFFDIO_COPY upon the newly received userfaultfd it'll operate on that new mm too. > > and due to the existence of tasklist_lock, we can not specify the mm->owner to freed task_struct. > > I miss something,=-O > > Thanks, > zhong jiang > >> Thanks, > >> zhong jiang > > Regards, -- Peter Xu