2018-05-29 13:24:16

by Miklos Szeredi

[permalink] [raw]
Subject: [GIT PULL] overlayfs update for 4.18

Hi Al,

I'm sending this pull request to you instead of Linus, because a bigger than
usual chunk involves the VFS.

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git for-viro

This update contains the following:

- Deal with vfs_mkdir() not instantiating dentry.

- Stack file operations. This solves the ro/rw file descriptor inconsistency,
weirdness with ioctl, as well as removing a bunch of overlay specific hacks
from the VFS.

- Allow metadata-only copy-up when data is unchanged.

- Various cleanups in VFS and overlayfs.

Thanks,
Miklos

---
Amir Goldstein (8):
ovl: update documentation for unionmount-testsuite
ovl: remove WARN_ON() real inode attributes mismatch
ovl: strip debug argument from ovl_do_ helpers
ovl: struct cattr cleanups
ovl: return dentry from ovl_create_real()
ovl: create helper ovl_create_temp()
ovl: make ovl_create_real() cope with vfs_mkdir() safely
ovl: use inode_insert5() to hash a newly created inode

Miklos Szeredi (41):
ovl: clean up copy-up error paths
vfs: factor out inode_insert5()
vfs: dedpue: return loff_t
vfs: dedupe: rationalize args
vfs: dedupe: extract helper for a single dedup
vfs: add path_open()
vfs: optionally don't account file in nr_files
vfs: add f_op->pre_mmap()
vfs: export vfs_ioctl() to modules
vfs: export vfs_dedupe_file_range_one() to modules
ovl: copy up times
ovl: copy up inode flags
Revert "Revert "ovl: get_write_access() in truncate""
ovl: copy up file size as well
ovl: deal with overlay files in ovl_d_real()
ovl: stack file ops
ovl: add helper to return real file
ovl: add ovl_read_iter()
ovl: add ovl_write_iter()
ovl: add ovl_fsync()
ovl: add ovl_mmap()
ovl: add ovl_fallocate()
ovl: add lsattr/chattr support
ovl: add ovl_fiemap()
ovl: add O_DIRECT support
ovl: add reflink/copyfile/dedup support
vfs: don't open real
ovl: copy-up on MAP_SHARED
ovl: obsolete "check_copy_up" module option
ovl: fix documentation of non-standard behavior
vfs: simplify dentry_open()
Revert "ovl: fix may_write_real() for overlayfs directories"
Revert "ovl: don't allow writing ioctl on lower layer"
vfs: fix freeze protection in mnt_want_write_file() for overlayfs
Revert "ovl: fix relatime for directories"
Revert "vfs: update ovl inode before relatime check"
Revert "vfs: add flags to d_real()"
Revert "vfs: do get_write_access() on upper layer of overlayfs"
Partially revert "locks: fix file locking on overlayfs"
Revert "fsnotify: support overlayfs"
vfs: remove open_flags from d_real()

Vivek Goyal (29):
ovl: Pass argument to ovl_get_inode() in a structure
ovl: Initialize ovl_inode->redirect in ovl_get_inode()
ovl: Move the copy up helpers to copy_up.c
ovl: Provide a mount option metacopy=on/off for metadata copyup
ovl: During copy up, first copy up metadata and then data
ovl: Copy up only metadata during copy up where it makes sense
ovl: Add helper ovl_already_copied_up()
ovl: A new xattr OVL_XATTR_METACOPY for file on upper
ovl: Use out_err instead of out_nomem
ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
ovl: Copy up meta inode data from lowest data inode
ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
ovl: Fix ovl_getattr() to get number of blocks from lower
ovl: Store lower data inode in ovl_inode
ovl: Add helper ovl_inode_realdata()
ovl: Open file with data except for the case of fsync
ovl: Do not expose metacopy only dentry from d_real()
ovl: Move some dir related ovl_lookup_single() code in else block
ovl: Check redirects for metacopy files
ovl: Treat metacopy dentries as type OVL_PATH_MERGE
ovl: Add an inode flag OVL_CONST_INO
ovl: Do not set dentry type ORIGIN for broken hardlinks
ovl: Set redirect on metacopy files upon rename
ovl: Set redirect on upper inode when it is linked
ovl: Check redirect on index as well
ovl: Disbale metacopy for MAP_SHARED mmap()
ovl: Do not do metadata only copy-up for truncate operation
ovl: Do not do metacopy only for ioctl modifying file attr
ovl: Enable metadata only feature

---
Documentation/filesystems/Locking | 4 +-
Documentation/filesystems/overlayfs.txt | 97 ++++--
Documentation/filesystems/vfs.txt | 19 +-
fs/btrfs/ctree.h | 5 +-
fs/btrfs/ioctl.c | 7 +-
fs/file_table.c | 13 +-
fs/inode.c | 210 +++++--------
fs/internal.h | 17 +-
fs/ioctl.c | 1 +
fs/locks.c | 20 +-
fs/namei.c | 2 +-
fs/namespace.c | 69 +----
fs/ocfs2/file.c | 10 +-
fs/open.c | 74 ++---
fs/overlayfs/Kconfig | 40 +++
fs/overlayfs/Makefile | 4 +-
fs/overlayfs/copy_up.c | 273 +++++++++-------
fs/overlayfs/dir.c | 312 +++++++++++++------
fs/overlayfs/export.c | 11 +-
fs/overlayfs/file.c | 530 ++++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 203 ++++++++----
fs/overlayfs/namei.c | 205 +++++++-----
fs/overlayfs/overlayfs.h | 119 ++++---
fs/overlayfs/ovl_entry.h | 7 +-
fs/overlayfs/super.c | 134 +++++---
fs/overlayfs/util.c | 252 ++++++++++++++-
fs/read_write.c | 91 +++---
fs/xattr.c | 9 +-
fs/xfs/xfs_file.c | 8 +-
include/linux/dcache.h | 15 +-
include/linux/fs.h | 36 ++-
include/linux/fsnotify.h | 14 +-
include/uapi/linux/fs.h | 1 -
mm/util.c | 5 +
34 files changed, 1981 insertions(+), 836 deletions(-)
create mode 100644 fs/overlayfs/file.c


2018-05-29 14:02:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] overlayfs update for 4.18

> vfs: add f_op->pre_mmap()

We've been through these pre-mmap games a few times, and always rejected
them, why is this any different?

> vfs: export vfs_dedupe_file_range_one() to modules

Please use EXPORT_SYMBOL_GPL for all these crazy low-level exports.

To be homest I'd really like to see the whole thing as a patch series
instead of a pull request. Very little seems to have gotten any
reviewed-by tags which makes me very suspicious.

2018-05-29 14:14:56

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [GIT PULL] overlayfs update for 4.18

On Tue, May 29, 2018 at 3:59 PM, Christoph Hellwig <[email protected]> wrote:
>> vfs: add f_op->pre_mmap()
>
> We've been through these pre-mmap games a few times, and always rejected
> them, why is this any different?

Don't know what the other cases were.

Overlayfs case is completely state free. It just does a copy-up in
the case of a shared mapping so that subsequent modifications of that
file get reflected in the shared mapping. Can't do the copy-up with
mmap_sem held due to locking depencencies.

>
>> vfs: export vfs_dedupe_file_range_one() to modules
>
> Please use EXPORT_SYMBOL_GPL for all these crazy low-level exports.
>
> To be homest I'd really like to see the whole thing as a patch series
> instead of a pull request. Very little seems to have gotten any
> reviewed-by tags which makes me very suspicious.

Did a couple of iterations and got some good feedback. But can post
the current version, if you think that's useful.

Thanks,
Miklos

2018-05-30 08:38:07

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [GIT PULL] overlayfs update for 4.18

On Tue, May 29, 2018 at 4:12 PM, Miklos Szeredi <[email protected]> wrote:
> On Tue, May 29, 2018 at 3:59 PM, Christoph Hellwig <[email protected]> wrote:

>>> vfs: export vfs_dedupe_file_range_one() to modules
>>
>> Please use EXPORT_SYMBOL_GPL for all these crazy low-level exports.

I'd argue with the "crazy" part. This should have been the primary
interface from the start. The batched dedupe interface is the crazy
one:

- deduping is page size granularity at worst; performance would not
be horrible even if we had to do one syscall per page
- vast majority of the time it will be file size granularity

Why was that batching invented in the first place?

Thanks,
Miklos

2018-05-30 22:29:58

by Dave Chinner

[permalink] [raw]
Subject: Re: [GIT PULL] overlayfs update for 4.18

On Wed, May 30, 2018 at 10:36:46AM +0200, Miklos Szeredi wrote:
> On Tue, May 29, 2018 at 4:12 PM, Miklos Szeredi <[email protected]> wrote:
> > On Tue, May 29, 2018 at 3:59 PM, Christoph Hellwig <[email protected]> wrote:
>
> >>> vfs: export vfs_dedupe_file_range_one() to modules
> >>
> >> Please use EXPORT_SYMBOL_GPL for all these crazy low-level exports.
>
> I'd argue with the "crazy" part. This should have been the primary
> interface from the start. The batched dedupe interface is the crazy
> one:
>
> - deduping is page size granularity at worst; performance would not
> be horrible even if we had to do one syscall per page
> - vast majority of the time it will be file size granularity
>
> Why was that batching invented in the first place?

No idea - the batching ioctl interface is what we inherited from the
btrfs ioctl years ago and there are several dedupe applications out
there that use it. That's why we pulled it up to the vfs rather than
invent a new one and have to wait years for apps to start using
it...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2018-06-01 15:27:27

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [GIT PULL] overlayfs update for 4.18

On Tue, May 29, 2018 at 03:21:48PM +0200, Miklos Szeredi wrote:
> Hi Al,
>
> I'm sending this pull request to you instead of Linus, because a bigger than
> usual chunk involves the VFS.
>
> Please pull from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git for-viro
>
> This update contains the following:
>
> - Deal with vfs_mkdir() not instantiating dentry.
>
> - Stack file operations. This solves the ro/rw file descriptor inconsistency,
> weirdness with ioctl, as well as removing a bunch of overlay specific hacks
> from the VFS.
>
> - Allow metadata-only copy-up when data is unchanged.
>
> - Various cleanups in VFS and overlayfs.

Updated tree pushed to same place.

Incremental patch against previous pull and posted patchset.

Thanks,
Miklos
---

diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index 0a8e3c4543d1..79be4a77ca08 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -280,7 +280,7 @@ parameter metacopy=on/off. Lastly, there is also a per mount option
metacopy=on/off to enable/disable this feature per mount.

Do not use metacopy=on with untrusted upper/lower directories. Otherwise
-it is possible that an attacker can create a handcrafted file with
+it is possible that an attacker can create an handcrafted file with
appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
pointed by REDIRECT. This should not be possible on local system as setting
"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
@@ -318,7 +318,7 @@ does not support NFS export, lower filesystem does not have a valid UUID or
if the upper filesystem does not support extended attributes.

For "metadata only copy up" feature there is no verification mechanism at
-mount time. So if same upper is mounted with different set of lower, mount
+mount time. So if same upper is mouted with different set of lower, mount
probably will succeed but expect the unexpected later on. So don't do it.

It is quite a common practice to copy overlay layers to a different
diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index 08b04d9fd6e6..e0a090eca65e 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -11,7 +11,7 @@ config OVERLAY_FS
For more information see Documentation/filesystems/overlayfs.txt

config OVERLAY_FS_REDIRECT_DIR
- bool "Overlayfs: turn on redirect directory feature by default"
+ bool "Overlayfs: turn on redirect dir feature by default"
depends on OVERLAY_FS
help
If this config option is enabled then overlay filesystems will use
@@ -46,7 +46,7 @@ config OVERLAY_FS_INDEX
depends on OVERLAY_FS
help
If this config option is enabled then overlay filesystems will use
- the index directory to map lower inodes to upper inodes by default.
+ the inodes index dir to map lower inodes to upper inodes by default.
In this case it is still possible to turn off index globally with the
"index=off" module option or on a filesystem instance basis with the
"index=off" mount option.
@@ -67,7 +67,7 @@ config OVERLAY_FS_NFS_EXPORT
depends on !OVERLAY_FS_METACOPY
help
If this config option is enabled then overlay filesystems will use
- the index directory to decode overlay NFS file handles by default.
+ the inodes index dir to decode overlay NFS file handles by default.
In this case, it is still possible to turn off NFS export support
globally with the "nfs_export=off" module option or on a filesystem
instance basis with the "nfs_export=off" mount option.
@@ -133,7 +133,7 @@ config OVERLAY_FS_METACOPY
help
If this config option is enabled then overlay filesystems will
copy up only metadata where appropriate and data copy up will
- happen when a file is opened for WRITE operation. It is still
+ happen when a file is opended for WRITE operation. It is still
possible to turn off this feature globally with the "metacopy=off"
module option or on a filesystem instance basis with the
"metacopy=off" mount option.
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 296037afecdb..bdadedf73e51 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -27,7 +27,7 @@

static int ovl_ccup_set(const char *buf, const struct kernel_param *param)
{
- pr_warn("overlayfs: \"check_copy_up\" module option is obsolete\n");
+ WARN(1, "overlayfs: \"check_copy_up\" module option is obsolete\n");
return 0;
}

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index ec350d4d921c..7063e0f588cc 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -116,35 +116,35 @@ int ovl_cleanup_and_whiteout(struct dentry *workdir, struct inode *dir,
goto out;
}

-static int ovl_mkdir_real(struct inode *dir, struct dentry **newdentry,
- umode_t mode)
+static struct dentry *ovl_mkdir_real(struct inode *dir, struct dentry *dentry,
+ umode_t mode)
{
int err;
- struct dentry *d, *dentry = *newdentry;

err = ovl_do_mkdir(dir, dentry, mode);
- if (err)
- return err;
-
- if (likely(!d_unhashed(dentry)))
- return 0;
+ if (err) {
+ dput(dentry);
+ return ERR_PTR(err);
+ }

/*
* vfs_mkdir() may succeed and leave the dentry passed
* to it unhashed and negative. If that happens, try to
* lookup a new hashed and positive dentry.
*/
- d = lookup_one_len(dentry->d_name.name, dentry->d_parent,
- dentry->d_name.len);
- if (IS_ERR(d)) {
- pr_warn("overlayfs: failed lookup after mkdir (%pd2, err=%i).\n",
- dentry, err);
- return PTR_ERR(d);
+ if (unlikely(d_unhashed(dentry))) {
+ struct dentry *d;
+
+ d = lookup_one_len(dentry->d_name.name, dentry->d_parent,
+ dentry->d_name.len);
+ if (IS_ERR(d)) {
+ pr_warn("overlayfs: failed lookup after mkdir (%pd2, err=%i).\n",
+ dentry, err);
+ }
+ dput(dentry);
+ dentry = d;
}
- dput(dentry);
- *newdentry = d;
-
- return 0;
+ return dentry;
}

struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
@@ -169,8 +169,7 @@ struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,

case S_IFDIR:
/* mkdir is special... */
- err = ovl_mkdir_real(dir, &newdentry, attr->mode);
- break;
+ return ovl_mkdir_real(dir, newdentry, attr->mode);

case S_IFCHR:
case S_IFBLK:
@@ -193,7 +192,7 @@ struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
* Not quite sure if non-instantiated dentry is legal or not.
* VFS doesn't seem to care so check and warn here.
*/
- err = -EIO;
+ err = -ENOENT;
}
out:
if (err) {
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index ca7c3461e424..31f32fc1004b 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -128,7 +128,7 @@ static int ovl_open(struct inode *inode, struct file *file)
/* No longer need these flags, so don't pass them on to underlying fs */
file->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);

- realfile = ovl_open_realfile(file, ovl_inode_realdata(inode));
+ realfile = ovl_open_realfile(file, ovl_inode_real(file_inode(file)));
if (IS_ERR(realfile))
return PTR_ERR(realfile);


2018-06-01 16:18:48

by Randy Dunlap

[permalink] [raw]
Subject: Re: [GIT PULL] overlayfs update for 4.18

On 06/01/2018 08:26 AM, Miklos Szeredi wrote:
> On Tue, May 29, 2018 at 03:21:48PM +0200, Miklos Szeredi wrote:
>> Hi Al,
>>
>> I'm sending this pull request to you instead of Linus, because a bigger than
>> usual chunk involves the VFS.
>>
>> Please pull from:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git for-viro
>>
>> This update contains the following:


> ---
>
> diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
> index 0a8e3c4543d1..79be4a77ca08 100644
> --- a/Documentation/filesystems/overlayfs.txt
> +++ b/Documentation/filesystems/overlayfs.txt
> @@ -280,7 +280,7 @@ parameter metacopy=on/off. Lastly, there is also a per mount option
> metacopy=on/off to enable/disable this feature per mount.
>
> Do not use metacopy=on with untrusted upper/lower directories. Otherwise
> -it is possible that an attacker can create a handcrafted file with
> +it is possible that an attacker can create an handcrafted file with

bad change:
create a handcrafted

Wait. Is this patch -R (reversed)?

> appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
> pointed by REDIRECT. This should not be possible on local system as setting
> "trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
> @@ -318,7 +318,7 @@ does not support NFS export, lower filesystem does not have a valid UUID or
> if the upper filesystem does not support extended attributes.
>
> For "metadata only copy up" feature there is no verification mechanism at
> -mount time. So if same upper is mounted with different set of lower, mount
> +mount time. So if same upper is mouted with different set of lower, mount

mounted

> probably will succeed but expect the unexpected later on. So don't do it.
>
> It is quite a common practice to copy overlay layers to a different
> diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
> index 08b04d9fd6e6..e0a090eca65e 100644
> --- a/fs/overlayfs/Kconfig
> +++ b/fs/overlayfs/Kconfig
> @@ -11,7 +11,7 @@ config OVERLAY_FS
> For more information see Documentation/filesystems/overlayfs.txt
>
> config OVERLAY_FS_REDIRECT_DIR
> - bool "Overlayfs: turn on redirect directory feature by default"
> + bool "Overlayfs: turn on redirect dir feature by default"

nope.

> depends on OVERLAY_FS
> help
> If this config option is enabled then overlay filesystems will use
> @@ -46,7 +46,7 @@ config OVERLAY_FS_INDEX
> depends on OVERLAY_FS
> help
> If this config option is enabled then overlay filesystems will use
> - the index directory to map lower inodes to upper inodes by default.
> + the inodes index dir to map lower inodes to upper inodes by default.
> In this case it is still possible to turn off index globally with the
> "index=off" module option or on a filesystem instance basis with the
> "index=off" mount option.
> @@ -67,7 +67,7 @@ config OVERLAY_FS_NFS_EXPORT
> depends on !OVERLAY_FS_METACOPY
> help
> If this config option is enabled then overlay filesystems will use
> - the index directory to decode overlay NFS file handles by default.
> + the inodes index dir to decode overlay NFS file handles by default.
> In this case, it is still possible to turn off NFS export support
> globally with the "nfs_export=off" module option or on a filesystem
> instance basis with the "nfs_export=off" mount option.
> @@ -133,7 +133,7 @@ config OVERLAY_FS_METACOPY
> help
> If this config option is enabled then overlay filesystems will
> copy up only metadata where appropriate and data copy up will
> - happen when a file is opened for WRITE operation. It is still
> + happen when a file is opended for WRITE operation. It is still

nope.

> possible to turn off this feature globally with the "metacopy=off"
> module option or on a filesystem instance basis with the
> "metacopy=off" mount option.


--
~Randy

2018-06-01 17:04:21

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [GIT PULL] overlayfs update for 4.18

On Fri, Jun 1, 2018 at 6:18 PM, Randy Dunlap <[email protected]> wrote:
> On 06/01/2018 08:26 AM, Miklos Szeredi wrote:
>> On Tue, May 29, 2018 at 03:21:48PM +0200, Miklos Szeredi wrote:
>>> Hi Al,
>>>
>>> I'm sending this pull request to you instead of Linus, because a bigger than
>>> usual chunk involves the VFS.
>>>
>>> Please pull from:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git for-viro
>>>
>>> This update contains the following:
>
>
>> ---
>>
>> diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
>> index 0a8e3c4543d1..79be4a77ca08 100644
>> --- a/Documentation/filesystems/overlayfs.txt
>> +++ b/Documentation/filesystems/overlayfs.txt
>> @@ -280,7 +280,7 @@ parameter metacopy=on/off. Lastly, there is also a per mount option
>> metacopy=on/off to enable/disable this feature per mount.
>>
>> Do not use metacopy=on with untrusted upper/lower directories. Otherwise
>> -it is possible that an attacker can create a handcrafted file with
>> +it is possible that an attacker can create an handcrafted file with
>
> bad change:
> create a handcrafted
>
> Wait. Is this patch -R (reversed)?

Oops, yes, reversed diff.

Thanks,
Miklos