Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp3702773pxf; Mon, 5 Apr 2021 20:42:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwn7659wmTUmEYo1fFwBwlrRhW2JWACv1QrZiB7Sso9epEaWWcsBFHuLxb0405uv6aalC0U X-Received: by 2002:a17:906:813:: with SMTP id e19mr5268810ejd.359.1617680523476; Mon, 05 Apr 2021 20:42:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617680523; cv=none; d=google.com; s=arc-20160816; b=JhfwwahfPFZcauA7UNUdlhZw/hwpX/6vQui4w1UQmyedeqSGFBqfhH9zOlPKGBXyv9 fswt+dyJnUsfh1u1ppDJUKAMq15RvI2P+6wVEpOVufnLcYRqtnHp29D1Uy3pqHMgahwO Hdufl9a9C+AkqvryMSZSMCMk3jkF3HcRtCGt7C6IXxPof4qskE+UPi2+Jm9zAq/DvCEH IzGkATOoX0t5vNu1pU9pHD7e7GyNgGRgkOd518w12MYSmAbDiS+6+d+mxViSzeSElVxt tgngeyrOVY+3gdgSu//ycvScy9ehMMjrMLzIIUBM3ZwmSG4Kr4cH1a594qk1Vn/v64bD DJlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=CTCTpVA+56M4AG/0P1LLzruyxDj3Fgy7n/re4xvrK3s=; b=LEicPDlBp97fyShXgceKsOW+crTcqz6IHSg8HvU/dovAoS+/dINADEeCn5baXhiI7C rqs2NCaRY2vWNterQ1C3CTTLEU9lGWK97bmgSXPQYE/NNDFaUw3rWVVqPOVJugmq70rO D0VWGj9ge9RnwYl1rM2EuiCfLpXbOM1suPfqvd5CeyIIEjHnelawI0ZYKT8iU5jfMhCk vAqTduVmxJgXk+sS4FLNnyDeNU2qiqIULiStVGmhcLAgXbX6e3FvqgI08qm5hw9ow0/+ FKGfScE5M7kj+X0RhJTKiMb+A7L8l667paTBz8tRKDNrGUp0IoDcqmkRfbYzM6pnwiaV S+GQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h15si2838300ejq.376.2021.04.05.20.41.39; Mon, 05 Apr 2021 20:42:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242351AbhDEQTS (ORCPT + 99 others); Mon, 5 Apr 2021 12:19:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242354AbhDEQTN (ORCPT ); Mon, 5 Apr 2021 12:19:13 -0400 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 499B0C061756; Mon, 5 Apr 2021 09:19:07 -0700 (PDT) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94 #2 (Red Hat Linux)) id 1lTRwI-002mLy-UI; Mon, 05 Apr 2021 16:18:59 +0000 Date: Mon, 5 Apr 2021 16:18:58 +0000 From: Al Viro To: Christian Brauner Cc: Jens Axboe , syzbot , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com, io-uring@vger.kernel.org Subject: Re: [syzbot] WARNING in mntput_no_expire (2) Message-ID: References: <20210404113445.xo6ntgfpxigcb3x6@wittgenstein> <20210404164040.vtxdcfzgliuzghwk@wittgenstein> <20210404170513.mfl5liccdaxjnpls@wittgenstein> <20210405114437.hjcojekyp5zt6huu@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210405114437.hjcojekyp5zt6huu@wittgenstein> Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 05, 2021 at 01:44:37PM +0200, Christian Brauner wrote: > On Sun, Apr 04, 2021 at 08:17:21PM +0000, Al Viro wrote: > > On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote: > > > > > > Yeah, I have at least namei.o > > > > > > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing > > > > > > *grumble* > > > > > > Is it reproducible without KASAN? Would be much easier to follow the produced > > > asm... > > > > Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e. > > nd->inode == NULL. > > Yeah, I already saw that. > > > > > Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in > > link_path_walk() and trying to reproduce that oops? > > Yep, no problem. If you run the reproducer in a loop for a little while > you eventually trigger the BUG_ON() and then you get the following splat > (and then an endless loop) in [1] with nd->inode NULL. > > _But_ I managed to debug this further and was able to trigger the BUG_ON() > directly in path_init() in the AT_FDCWD branch (after all its AT_FDCWD(./file0) > with the patch in [3] (it's in LOOKUP_RCU) the corresponding splat is in [2]. > So the crash happens for a PF_IO_WORKER thread with a NULL nd->inode for the > PF_IO_WORKER's pwd (The PF_IO_WORKER seems to be in async context.). So we find current->fs->pwd.dentry negative, with current->fs->seq sampled equal before and after that? Lovely... The only places where we assign anything to ->pwd.dentry are void set_fs_pwd(struct fs_struct *fs, const struct path *path) { struct path old_pwd; path_get(path); spin_lock(&fs->lock); write_seqcount_begin(&fs->seq); old_pwd = fs->pwd; fs->pwd = *path; write_seqcount_end(&fs->seq); spin_unlock(&fs->lock); if (old_pwd.dentry) path_put(&old_pwd); } where we have ->seq bumped between dget new/assignment/ dput old, copy_fs_struct() where we have spin_lock(&old->lock); fs->root = old->root; path_get(&fs->root); fs->pwd = old->pwd; path_get(&fs->pwd); spin_unlock(&old->lock); fs being freshly allocated instance that couldn't have been observed by anyone and chroot_fs_refs(), where we have spin_lock(&fs->lock); write_seqcount_begin(&fs->seq); hits += replace_path(&fs->root, old_root, new_root); hits += replace_path(&fs->pwd, old_root, new_root); write_seqcount_end(&fs->seq); while (hits--) { count++; path_get(new_root); } spin_unlock(&fs->lock); ... static inline int replace_path(struct path *p, const struct path *old, const struct path *new) { if (likely(p->dentry != old->dentry || p->mnt != old->mnt)) return 0; *p = *new; return 1; } Here we have new_root->dentry pinned from the very beginning, and assignments are wrapped into bumps of ->seq. Moreover, we are holding ->lock through that sequence (as all writers do), so these references can't be dropped before path_get() bumps new_root->dentry refcount. chroot_fs_refs() is called only by pivot_root(2): chroot_fs_refs(&root, &new); and there new is set by error = user_path_at(AT_FDCWD, new_root, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new); if (error) goto out0; which pins new.dentry *and* verifies that it's positive and a directory, at that. Since pinned positive dentry can't be made negative by anybody else, we know it will remain in that state until path_put(&new); well downstream of chroot_fs_refs(). In copy_fs_struct() we are copying someone's ->pwd, so it's also pinned positive. And it won't be dropped outside of old->lock, so by the time somebody manages to drop the reference in old, path_get() effects will be visible (old->lock serving as a barrier). That leaves set_fs_pwd() calls: fs/init.c:54: set_fs_pwd(current->fs, &path); init_chdir(), path set by LOOKUP_DIRECTORY patwalk. Pinned positive. fs/namespace.c:4207: set_fs_pwd(current->fs, &root); init_mount_tree(), root.dentry being ->mnt_root of rootfs. Pinned positive (and it would've oopsed much earlier had that been it) fs/namespace.c:4485: set_fs_pwd(fs, &root); mntns_install(), root filled by successful LOOKUP_DOWN for "/" from mnt_ns->root. Should be pinned positive. fs/open.c:501: set_fs_pwd(current->fs, &path); chdir(2), path set by LOOKUP_DIRECTORY pathwalk. Pinned positive. fs/open.c:528: set_fs_pwd(current->fs, &f.file->f_path); fchdir(2), file->f_path of any opened file. Pinned positive. kernel/usermode_driver.c:130: set_fs_pwd(current->fs, &umd_info->wd); umd_setup(), ->wd.dentry equal to ->wd.mnt->mnt_root, should be pinned positive. kernel/nsproxy.c:509: set_fs_pwd(me->fs, &nsset->fs->pwd); commit_nsset(). Let's see what's going on there... if ((flags & CLONE_NEWNS) && (flags & ~CLONE_NEWNS)) { set_fs_root(me->fs, &nsset->fs->root); set_fs_pwd(me->fs, &nsset->fs->pwd); } In those conditions nsset.fs has come from copy_fs_struct() done in prepare_nsset(). And the only thing that might've been done to it would be those set_fs_pwd() in mntns_install() (I'm not fond of the entire nsset->fs thing - looks like papering over bad calling conventions, but anyway) Now, I might've missed some insanity (direct assignments to ->pwd.dentry, etc. - wouldn't be the first time io_uring folks went "layering? wassat? we'll just poke in whatever we can reach"), but I don't see anything obvious of that sort in the area... OK, how about this: in path_init(), right after do { seq = read_seqcount_begin(&fs->seq); nd->path = fs->pwd; nd->inode = nd->path.dentry->d_inode; nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); } while (read_seqcount_retry(&fs->seq, seq)); slap if (!nd->inode) { // should never, ever happen struct dentry *fucked = nd->path.dentry; printk(KERN_ERR "%pd4 %d %x %p %d %d", fucked, d_count(fucked), fucked->d_flags, fs, fs->users, seq); BUG_ON(1); return ERR_PTR(-EINVAL); } and see what it catches?