Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp3085268ybx; Sun, 3 Nov 2019 10:22:18 -0800 (PST) X-Google-Smtp-Source: APXvYqx2jXVE3lnlDo8f6WopCkdl1XBwz/lfja6p5JRLngvbij73bz8YMGra9lGCCr2cDX2WqXNV X-Received: by 2002:a50:b83d:: with SMTP id j58mr24874543ede.84.1572805338183; Sun, 03 Nov 2019 10:22:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1572805338; cv=none; d=google.com; s=arc-20160816; b=UKosmCTGOfavBtdMZjefe5UcZIbP2mpfT4HRlmQXIYepGshrUbMPwnzgCfqxXVlIuR IcG6LAG7RtSt32dznp768UVZDK51bFlojg5lawO96hXeBFSucrFhsXGdrGpBFXUnB7Pl /93Wv/vEHbvzc9oQ4NdI1v4b6atFdo+Grk7aiNThATtkMpBAnib+MesDJwf98Frenk5Z 0nyZUWgNESjAZJVJvuteYom2xzIbR/OyFSdpmBxaM7dTI+PnPI6PSU/bacW2XF0pmQJV mSswOf3hTP/Q1MBA7tvuvz/5OXnaYpP0NTeq3l2yrfgQGuretSce7HWPSqS2JblOTyCk jYyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=rQcShPWuZuZT/NghcW2oBgowY4NPrKLWn6N88Zw6I6s=; b=lwXxSzSApVxztMHlIEiBhZeSMq3+NP63lSvIE0PtyDZ7u2gBo9StxTzKp/EupeA54O 3hc28h+RPUkaU/3yVp09ODLDTlJ/v2Mqu6Sy8oEjMt9rVHYWVBzmF96aZOUXcxYdfrVr ZLoVJThsDjwp+gF9FvHo9kZlx32U15F1gQQb9hGMosARhbcCi4K2Ku5ejZrnxl1M7G5u eSs/ZwQu2W9TOMsdhGNKokwo4GtWWMxIqwmkgAfBhFl+9DhAmUZI39BZ3/pWUzWyIq6e XyGNN46KrSZvLW68a1JsQjrHUe0y2i6AcGEcegpvO9oironZEBdP0Mb0poLDEpXe7AS0 Cdng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k51si5051944edb.411.2019.11.03.10.21.54; Sun, 03 Nov 2019 10:22:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728014AbfKCSVB (ORCPT + 99 others); Sun, 3 Nov 2019 13:21:01 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:40442 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727322AbfKCSVB (ORCPT ); Sun, 3 Nov 2019 13:21:01 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1iRKUk-0002oj-47; Sun, 03 Nov 2019 18:20:58 +0000 Date: Sun, 3 Nov 2019 18:20:58 +0000 From: Al Viro To: linux-fsdevel@vger.kernel.org Cc: Ritesh Harjani , linux-kernel@vger.kernel.org, wugyuan@cn.ibm.com, jlayton@kernel.org, hsiangkao@aol.com, Jan Kara , Linus Torvalds , ecryptfs@vger.kernel.org Subject: Re: [RFC] lookup_one_len_unlocked() lousy calling conventions Message-ID: <20191103182058.GQ26530@ZenIV.linux.org.uk> References: <20190927044243.18856-1-riteshh@linux.ibm.com> <20191015040730.6A84742047@d06av24.portsmouth.uk.ibm.com> <20191022133855.B1B4752050@d06av21.portsmouth.uk.ibm.com> <20191022143736.GX26530@ZenIV.linux.org.uk> <20191022201131.GZ26530@ZenIV.linux.org.uk> <20191023110551.D04AE4C044@d06av22.portsmouth.uk.ibm.com> <20191101234622.GM26530@ZenIV.linux.org.uk> <20191102172229.GT20975@paulmck-ThinkPad-P72> <20191102180842.GN26530@ZenIV.linux.org.uk> <20191103163524.GO26530@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191103163524.GO26530@ZenIV.linux.org.uk> User-Agent: Mutt/1.12.1 (2019-06-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 03, 2019 at 04:35:24PM +0000, Al Viro wrote: > lookup_one_len_unlocked() calling conventions are wrong for its callers. > Namely, 11 out of 12 callers really want ERR_PTR(-ENOENT) on negatives. > Most of them take care to check, some rely upon that being impossible in > their case. Interactions with dentry turning positive right after > lookup_one_len_unlocked() has returned it are of varying bugginess... > > The only exception is ecryptfs, where we do lookup in the underlying fs > on ecryptfs_lookup() and want to retain a negative dentry if we get one. Looking at that code... the thing that deals with the result of lookup in underlying fs is ecryptfs_lookup_interpose(), and there we have this: struct inode *inode, *lower_inode = d_inode(lower_dentry); ... dentry_info = kmem_cache_alloc(ecryptfs_dentry_info_cache, GFP_KERNEL); ... if (d_really_is_negative(lower_dentry)) { /* We want to add because we couldn't find in lower */ d_add(dentry, NULL); return NULL; } inode = __ecryptfs_get_inode(lower_inode, dentry->d_sb); If lower dentry used to be negative, but went positive while we'd been doing allocation, we'll get d_really_is_negative() (i.e. !lower_dentry->d_inode) false, but lower_inode (fetched earlier) still NULL. __ecryptfs_get_inode() starts with if (lower_inode->i_sb != ecryptfs_superblock_to_lower(sb)) return ERR_PTR(-EXDEV); which won't be happy in that situation... That has nothing to do with barriers, ->d_flags, etc. - the window is rather wide here. GFP_KERNEL allocation can block just fine. IOW, the only caller of lookup_one_len_unlocked() that does not reject negative dentries doesn't manage to handle them correctly ;-/ And then in the same ecryptfs_lookup_interpose() we have e.g. fsstack_copy_attr_atime(d_inode(dentry->d_parent), d_inode(lower_dentry->d_parent)); Now, dentry->d_parent is stable; dentry is guaranteed to be new and not yet visible to anybody else, besides it's negative and the parent is held shared, so it couldn't have been moved around even if it had been seen by somebody else. However, lower_dentry->d_parent is a different story. We are not holding the lock on its parent anymore; it could've been moved around by somebody mounting the underlying layer elsewhere and accessing it directly. Moreover, there's nothing to guarantee that the pointer we fetch from lower_dentry->d_parent won't be pointing to freed memory by the time we get around to looking at its inode - lose the timeslice to preemption just after fetching ->d_parent, have another process move the damn thing around and there's nothing to keep the ex-parent around by the time you regain CPU. The problem goes all way back to addd65ad8d19 "eCryptfs: Filename Encryption: filldir, lookup, and readlink" from 2009. That turned lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent); into lower_dir_dentry = lower_dentry->d_parent; and it had hit the fan... Sure, "somebody mounted the underlying fs elsewhere and is actively trying to screw us over" is not how ecryptfs is supposed to be used and it can demonstrate unexpected behavior - odd errors, etc. But that behaviour should not include oopsen and access to freed kernel memory...