Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp104730imm; Fri, 10 Aug 2018 08:17:52 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzfjWmbHW2w2JHQ73EeGr1kGkxF2ZxI+dNJdiZysYTDVJK+FhQIp4G7E6lkmAieQjFtKLH9 X-Received: by 2002:a17:902:b594:: with SMTP id a20-v6mr6617414pls.140.1533914272836; Fri, 10 Aug 2018 08:17:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533914272; cv=none; d=google.com; s=arc-20160816; b=mOjZjbYTEyGDx2Wm5SKb8gpnW86yMo4Limhr+8hzQDTcQWWkuW82Ia1mAYKe32mnNx Z5B6ZCzIEMxLT3Tw7TWs+X2RzaS7PsZtPNjedNq3TiMb0uBoy8NoqFoSQ1/CrPHFGVFV /sq/VQ6TDVwoFLj1duK67ERi4yC5aJ8SA3Er8ZKt27DzmW1JXToHI3e3QxPkUKhRlRzI yfQdfsY0Z1cCQfdm6tcunBR7DXez6r8UZXc9bDEnLOHq0t0PnzxzD9mwM5SiTR7wsEWa x3YlckTMGrOwwWOTWmf5bkp4XnpcdeVVY1wiyezr2byHTStTOMIrrLS668vBs/cRh9M6 3VZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=LUm1Q3tkuBVoiBQgmcC0szrnpNzN6E6hNgOF7lUU6FI=; b=ksNKueHocr1MygRmq/KHyhaGdfDuOp1FFwRy745aP9kSilIwQdaUnsfMGJkXLJACdZ nDr8TLQq9nCD1OECRZTwo30k4LC0EaMsXU5+VuIDMWXgco8dUsKq02Y8wlyinswX463B ur/14rz/yCuTVWrF0i92jWFKSetB494L0oc4PrLKK6qX1DhknYYO4jJanYdbWaQMp2EP zzksn4GeA71tXMHGFxiPJmgnnDEKfTShV4Z07k3va0oMgYR/UTE03TxWqtW6Vv07DNCW U9XuopXhB3jrFF615VuoZ1x56UVIay7XIHmopxLGeC9w9wXTf3xnevIT+Kq+FuSituO5 GSSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u8-v6si8008305plh.492.2018.08.10.08.17.33; Fri, 10 Aug 2018 08:17:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728076AbeHJRqz (ORCPT + 99 others); Fri, 10 Aug 2018 13:46:55 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:48048 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727381AbeHJRqz (ORCPT ); Fri, 10 Aug 2018 13:46:55 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1fo994-0005ai-Q0; Fri, 10 Aug 2018 15:16:06 +0000 Date: Fri, 10 Aug 2018 16:16:06 +0100 From: Al Viro To: "Eric W. Biederman" Cc: David Howells , John Johansen , Tejun Heo , selinux@tycho.nsa.gov, Paul Moore , Li Zefan , linux-api@vger.kernel.org, apparmor@lists.ubuntu.com, Casey Schaufler , fenghua.yu@intel.com, Greg Kroah-Hartman , Eric Biggers , linux-security-module@vger.kernel.org, Tetsuo Handa , Johannes Weiner , Stephen Smalley , tomoyo-dev-en@lists.sourceforge.jp, cgroups@vger.kernel.org, torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, "Theodore Y. Ts'o" , Miklos Szeredi Subject: Re: BUG: Mount ignores mount options Message-ID: <20180810151606.GA6515@ZenIV.linux.org.uk> References: <153313703562.13253.5766498657900728120.stgit@warthog.procyon.org.uk> <87d0uqpba5.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87d0uqpba5.fsf@xmission.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 10, 2018 at 09:05:22AM -0500, Eric W. Biederman wrote: > > There is a serious problem with mount options today that fsopen does not > address. The problem is that mount options are ignored for block based > filesystems, and any other type of filesystem that follows the same > pattern. > > The script below demonstrates this bug. Showing this bug can cause the > ext4 "acl" "quota" and "user_xattr" options to be silently ignored. > > fsopen has my nack until it addresses this issue. > > I don't know if we can fix this in the context of sys_mount. But we if > we are redoing the option parsing of how we mount filesystems this needs > to be fixed before we start worrying about bug compatibility. > > Hopefully this report is simple and clear enough that we can at least > agree on the problem. Sure, it is simple. So's the solution: MNT_USERNS_SPECIAL_SEMANTICS that would get passed to filesystems, so that Eric would be able to implement his mount(2)-incompatible behaviour at leisure, without worrying about compatibility issues. Does that address your complaint? Because one thing we are not going to do is changing mount(2) behaviour. Reason: userland-visible behaviour of hell knows how many local scripts. Another thing that is flat-out not feasible is some kind of blanket "compare options" stuff; it *can* be done as helpers to be used by filesystem when it sees that new flag, but it's simply not going to work at the fs-independent level. Trivial example with the same ext4: mount /dev/sda1 /mnt/a -o bsddf vs. mount /dev/sda1 /mnt/b ext4 can tell that these are the same. syscall itself has no clue. What's more, it's not just explicitly spelled default options - it's the stuff that has more than one form. And while we are at it, the things like two NFS mounts of different trees from the same server; they might or might not get the same superblock. Depending upon the options. Convenience helper that would allow ext4 to compare options and reject the incompatible mount? Not sure how much ext4-specific knowledge would have to go in it, but if you can come up with one - more power to you. But the decision to use it *must* be ext4-specific. Because for e.g. NFS such thing as -o fsid=..., while certainly a part of options, has a very different meaning - it's "use a separate fs instance" (and let the server deal with coherency issues on its end). Decision to use sget() (and the way it's used) is up to filesystem. We *can't* lift that into syscall. Not without breaking the fuck out of existing behaviour. Having something like a second callback for mount_bdev() that would be called when we'd found an existing instance for the same block device? Sure, no problem. Having a helper for doing such comparison that would work in enough cases to bother, so that different fs could avoid boilerplate in that callback? Again, more power to you. But I don't see what the hell does that have to the syscall interface.