2014-02-07 16:48:37

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 00/13] cross rename v4

Changes since the last version (based on Al's review):

- cross-rename: fix locking of nondirectories for NFSv4
- ext4: split cross-rename and plain rename into separate functions
- introduce i_op->rename2 with flags, don't touch ->rename
- last (optional) patch to merge ->rename2 back into ->rename

The splitting of the ext4 implemetation was indeed a good idea as it uncovered a
memory leak and small inconsistencies with the merged implementation.

Splitting out rename2 will lessen the code churn, but I think is ugly. However
this is a question of taste, last patch can be ommitted without loss of
functionality.

Bruce, could you please review the locking and delegation thing in patch #8
"vfs: add cross-rename"?

Git tree is here:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git cross-rename

Thanks,
Miklos

---
Miklos Szeredi (13):
vfs: add d_is_dir()
vfs: rename: move d_move() up
vfs: rename: use common code for dir and non-dir
vfs: add renameat2 syscall
vfs: add RENAME_NOREPLACE flag
security: add flags to rename hooks
vfs: lock_two_nondirectories: allow directory args
vfs: add cross-rename
ext4: rename: create ext4_renament structure for local vars
ext4: rename: move EMLINK check up
ext4: rename: split out helper functions
ext4: add cross rename support
vfs: merge rename2 into rename

---
Documentation/filesystems/Locking | 2 +-
Documentation/filesystems/vfs.txt | 14 +-
arch/x86/syscalls/syscall_64.tbl | 1 +
.../lustre/lustre/include/linux/lustre_compat25.h | 4 +-
drivers/staging/lustre/lustre/llite/namei.c | 3 +-
drivers/staging/lustre/lustre/lvfs/lvfs_linux.c | 2 +-
fs/9p/v9fs.h | 3 +-
fs/9p/vfs_inode.c | 4 +-
fs/affs/affs.h | 3 +-
fs/affs/namei.c | 3 +-
fs/afs/dir.c | 6 +-
fs/bad_inode.c | 3 +-
fs/bfs/dir.c | 3 +-
fs/btrfs/inode.c | 3 +-
fs/cachefiles/namei.c | 4 +-
fs/ceph/dir.c | 3 +-
fs/cifs/cifsfs.h | 2 +-
fs/cifs/inode.c | 3 +-
fs/coda/dir.c | 8 +-
fs/dcache.c | 45 +-
fs/debugfs/inode.c | 2 +-
fs/ecryptfs/inode.c | 5 +-
fs/exofs/namei.c | 3 +-
fs/ext2/namei.c | 5 +-
fs/ext3/namei.c | 5 +-
fs/ext4/namei.c | 483 +++++++++++++++------
fs/ext4/super.c | 6 +-
fs/f2fs/namei.c | 3 +-
fs/fat/namei_msdos.c | 3 +-
fs/fat/namei_vfat.c | 3 +-
fs/fuse/dir.c | 3 +-
fs/gfs2/inode.c | 3 +-
fs/hfs/dir.c | 3 +-
fs/hfsplus/dir.c | 3 +-
fs/hostfs/hostfs_kern.c | 3 +-
fs/hpfs/namei.c | 3 +-
fs/inode.c | 20 +-
fs/jffs2/dir.c | 5 +-
fs/jfs/namei.c | 3 +-
fs/kernfs/dir.c | 3 +-
fs/libfs.c | 3 +-
fs/logfs/dir.c | 3 +-
fs/minix/namei.c | 5 +-
fs/namei.c | 310 +++++++------
fs/ncpfs/dir.c | 5 +-
fs/nfs/dir.c | 3 +-
fs/nfs/internal.h | 3 +-
fs/nfsd/vfs.c | 2 +-
fs/nilfs2/namei.c | 3 +-
fs/ocfs2/namei.c | 3 +-
fs/omfs/dir.c | 3 +-
fs/reiserfs/namei.c | 3 +-
fs/sysv/namei.c | 5 +-
fs/ubifs/dir.c | 3 +-
fs/udf/namei.c | 3 +-
fs/ufs/namei.c | 3 +-
fs/xfs/xfs_iops.c | 3 +-
include/linux/dcache.h | 8 +-
include/linux/fs.h | 7 +-
include/linux/security.h | 12 +-
include/uapi/linux/fs.h | 3 +
kernel/cgroup.c | 5 +-
mm/shmem.c | 2 +-
security/security.c | 22 +-
64 files changed, 736 insertions(+), 372 deletions(-)


2014-02-07 16:48:39

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 02/13] vfs: rename: move d_move() up

From: Miklos Szeredi <[email protected]>

Move the d_move() in vfs_rename_dir() up, similarly to how it's done in
vfs_rename_other(). The next patch will consolidate these two functions
and this is the only structural difference between them.

I'm not sure if doing the d_move() after the dput is even valid. But there
may be a logical explanation for that. But moving the d_move() before the
dput() (and the mutex_unlock()) should definitely not hurt.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/namei.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 258c06ae26a7..1409090d0913 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4018,13 +4018,12 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
target->i_flags |= S_DEAD;
dont_mount(new_dentry);
}
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+ d_move(old_dentry, new_dentry);
out:
if (target)
mutex_unlock(&target->i_mutex);
dput(new_dentry);
- if (!error)
- if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
- d_move(old_dentry,new_dentry);
return error;
}

--
1.8.1.4

2014-02-07 16:48:45

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 07/13] vfs: lock_two_nondirectories: allow directory args

From: Miklos Szeredi <[email protected]>

lock_two_nondirectories warned if either of its args was a directory.
Instead just ignore the directory args. This is needed for locking in
cross rename.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/inode.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 4bcdad3c9361..763010771cf4 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -944,18 +944,21 @@ EXPORT_SYMBOL(unlock_new_inode);

/**
* lock_two_nondirectories - take two i_mutexes on non-directory objects
+ *
+ * If either or both arguments are directories, then ignore those.
+ * Therefore zero, one or two objects may be locked by this function.
+ *
* @inode1: first inode to lock
* @inode2: second inode to lock
*/
void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
- WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
- if (inode1 == inode2 || !inode2) {
+ if (S_ISDIR(inode1->i_mode)) {
+ if (inode2 && !S_ISDIR(inode2->i_mode))
+ mutex_lock(&inode2->i_mutex);
+ } else if (inode1 == inode2 || !inode2 || S_ISDIR(inode2->i_mode)) {
mutex_lock(&inode1->i_mutex);
- return;
- }
- WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
- if (inode1 < inode2) {
+ } else if (inode1 < inode2) {
mutex_lock(&inode1->i_mutex);
mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
} else {
@@ -972,8 +975,9 @@ EXPORT_SYMBOL(lock_two_nondirectories);
*/
void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
- mutex_unlock(&inode1->i_mutex);
- if (inode2 && inode2 != inode1)
+ if (!S_ISDIR(inode1->i_mode))
+ mutex_unlock(&inode1->i_mutex);
+ if (inode2 && inode2 != inode1 && !S_ISDIR(inode2->i_mode))
mutex_unlock(&inode2->i_mutex);
}
EXPORT_SYMBOL(unlock_two_nondirectories);
--
1.8.1.4

2014-02-07 16:48:51

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 09/13] ext4: rename: create ext4_renament structure for local vars

From: Miklos Szeredi <[email protected]>

Need to split up ext4_rename() into helpers but there are too many local
variables involved, so create a new structure. This also, apparently,
makes the generated code size slightly smaller.

Signed-off-by: Miklos Szeredi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/namei.c | 211 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 114 insertions(+), 97 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5f19171b3e1f..7193cea805ff 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3000,6 +3000,22 @@ static struct buffer_head *ext4_get_first_dir_block(handle_t *handle,
return ext4_get_first_inline_block(inode, parent_de, retval);
}

+struct ext4_renament {
+ struct inode *dir;
+ struct dentry *dentry;
+ struct inode *inode;
+
+ /* entry for "dentry" */
+ struct buffer_head *bh;
+ struct ext4_dir_entry_2 *de;
+ int inlined;
+
+ /* entry for ".." in inode if it's a directory */
+ struct buffer_head *dir_bh;
+ struct ext4_dir_entry_2 *parent_de;
+ int dir_inlined;
+};
+
/*
* Anybody can rename anything with this: the permission checks are left to the
* higher-level routines.
@@ -3012,193 +3028,194 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
handle_t *handle = NULL;
- struct inode *old_inode, *new_inode;
- struct buffer_head *old_bh, *new_bh, *dir_bh;
- struct ext4_dir_entry_2 *old_de, *new_de;
+ struct ext4_renament old = {
+ .dir = old_dir,
+ .dentry = old_dentry,
+ .inode = old_dentry->d_inode,
+ };
+ struct ext4_renament new = {
+ .dir = new_dir,
+ .dentry = new_dentry,
+ .inode = new_dentry->d_inode,
+ };
int retval;
- int inlined = 0, new_inlined = 0;
- struct ext4_dir_entry_2 *parent_de;
-
- dquot_initialize(old_dir);
- dquot_initialize(new_dir);

- old_bh = new_bh = dir_bh = NULL;
+ dquot_initialize(old.dir);
+ dquot_initialize(new.dir);

/* Initialize quotas before so that eventual writes go
* in separate transaction */
- if (new_dentry->d_inode)
- dquot_initialize(new_dentry->d_inode);
+ if (new.inode)
+ dquot_initialize(new.inode);

- old_bh = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de, NULL);
+ old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
/*
* Check for inode number is _not_ due to possible IO errors.
* We might rmdir the source, keep it as pwd of some process
* and merrily kill the link to whatever was created under the
* same name. Goodbye sticky bit ;-<
*/
- old_inode = old_dentry->d_inode;
retval = -ENOENT;
- if (!old_bh || le32_to_cpu(old_de->inode) != old_inode->i_ino)
+ if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
goto end_rename;

- new_inode = new_dentry->d_inode;
- new_bh = ext4_find_entry(new_dir, &new_dentry->d_name,
- &new_de, &new_inlined);
- if (new_bh) {
- if (!new_inode) {
- brelse(new_bh);
- new_bh = NULL;
+ new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
+ &new.de, &new.inlined);
+ if (new.bh) {
+ if (!new.inode) {
+ brelse(new.bh);
+ new.bh = NULL;
}
}
- if (new_inode && !test_opt(new_dir->i_sb, NO_AUTO_DA_ALLOC))
- ext4_alloc_da_blocks(old_inode);
+ if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
+ ext4_alloc_da_blocks(old.inode);

- handle = ext4_journal_start(old_dir, EXT4_HT_DIR,
- (2 * EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
+ handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
+ (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
if (IS_ERR(handle))
return PTR_ERR(handle);

- if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
+ if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
ext4_handle_sync(handle);

- if (S_ISDIR(old_inode->i_mode)) {
- if (new_inode) {
+ if (S_ISDIR(old.inode->i_mode)) {
+ if (new.inode) {
retval = -ENOTEMPTY;
- if (!empty_dir(new_inode))
+ if (!empty_dir(new.inode))
goto end_rename;
}
retval = -EIO;
- dir_bh = ext4_get_first_dir_block(handle, old_inode,
- &retval, &parent_de,
- &inlined);
- if (!dir_bh)
+ old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
+ &retval, &old.parent_de,
+ &old.dir_inlined);
+ if (!old.dir_bh)
goto end_rename;
- if (le32_to_cpu(parent_de->inode) != old_dir->i_ino)
+ if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
goto end_rename;
retval = -EMLINK;
- if (!new_inode && new_dir != old_dir &&
- EXT4_DIR_LINK_MAX(new_dir))
+ if (!new.inode && new.dir != old.dir &&
+ EXT4_DIR_LINK_MAX(new.dir))
goto end_rename;
- BUFFER_TRACE(dir_bh, "get_write_access");
- retval = ext4_journal_get_write_access(handle, dir_bh);
+ BUFFER_TRACE(old.dir_bh, "get_write_access");
+ retval = ext4_journal_get_write_access(handle, old.dir_bh);
if (retval)
goto end_rename;
}
- if (!new_bh) {
- retval = ext4_add_entry(handle, new_dentry, old_inode);
+ if (!new.bh) {
+ retval = ext4_add_entry(handle, new.dentry, old.inode);
if (retval)
goto end_rename;
} else {
- BUFFER_TRACE(new_bh, "get write access");
- retval = ext4_journal_get_write_access(handle, new_bh);
+ BUFFER_TRACE(new.bh, "get write access");
+ retval = ext4_journal_get_write_access(handle, new.bh);
if (retval)
goto end_rename;
- new_de->inode = cpu_to_le32(old_inode->i_ino);
- if (EXT4_HAS_INCOMPAT_FEATURE(new_dir->i_sb,
+ new.de->inode = cpu_to_le32(old.inode->i_ino);
+ if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
EXT4_FEATURE_INCOMPAT_FILETYPE))
- new_de->file_type = old_de->file_type;
- new_dir->i_version++;
- new_dir->i_ctime = new_dir->i_mtime =
- ext4_current_time(new_dir);
- ext4_mark_inode_dirty(handle, new_dir);
- BUFFER_TRACE(new_bh, "call ext4_handle_dirty_metadata");
- if (!new_inlined) {
+ new.de->file_type = old.de->file_type;
+ new.dir->i_version++;
+ new.dir->i_ctime = new.dir->i_mtime =
+ ext4_current_time(new.dir);
+ ext4_mark_inode_dirty(handle, new.dir);
+ BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
+ if (!new.inlined) {
retval = ext4_handle_dirty_dirent_node(handle,
- new_dir, new_bh);
+ new.dir, new.bh);
if (unlikely(retval)) {
- ext4_std_error(new_dir->i_sb, retval);
+ ext4_std_error(new.dir->i_sb, retval);
goto end_rename;
}
}
- brelse(new_bh);
- new_bh = NULL;
+ brelse(new.bh);
+ new.bh = NULL;
}

/*
* Like most other Unix systems, set the ctime for inodes on a
* rename.
*/
- old_inode->i_ctime = ext4_current_time(old_inode);
- ext4_mark_inode_dirty(handle, old_inode);
+ old.inode->i_ctime = ext4_current_time(old.inode);
+ ext4_mark_inode_dirty(handle, old.inode);

/*
* ok, that's it
*/
- if (le32_to_cpu(old_de->inode) != old_inode->i_ino ||
- old_de->name_len != old_dentry->d_name.len ||
- strncmp(old_de->name, old_dentry->d_name.name, old_de->name_len) ||
- (retval = ext4_delete_entry(handle, old_dir,
- old_de, old_bh)) == -ENOENT) {
- /* old_de could have moved from under us during htree split, so
+ if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
+ old.de->name_len != old.dentry->d_name.len ||
+ strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
+ (retval = ext4_delete_entry(handle, old.dir,
+ old.de, old.bh)) == -ENOENT) {
+ /* old.de could have moved from under us during htree split, so
* make sure that we are deleting the right entry. We might
* also be pointing to a stale entry in the unused part of
- * old_bh so just checking inum and the name isn't enough. */
+ * old.bh so just checking inum and the name isn't enough. */
struct buffer_head *old_bh2;
struct ext4_dir_entry_2 *old_de2;

- old_bh2 = ext4_find_entry(old_dir, &old_dentry->d_name,
+ old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
&old_de2, NULL);
if (old_bh2) {
- retval = ext4_delete_entry(handle, old_dir,
+ retval = ext4_delete_entry(handle, old.dir,
old_de2, old_bh2);
brelse(old_bh2);
}
}
if (retval) {
- ext4_warning(old_dir->i_sb,
+ ext4_warning(old.dir->i_sb,
"Deleting old file (%lu), %d, error=%d",
- old_dir->i_ino, old_dir->i_nlink, retval);
+ old.dir->i_ino, old.dir->i_nlink, retval);
}

- if (new_inode) {
- ext4_dec_count(handle, new_inode);
- new_inode->i_ctime = ext4_current_time(new_inode);
+ if (new.inode) {
+ ext4_dec_count(handle, new.inode);
+ new.inode->i_ctime = ext4_current_time(new.inode);
}
- old_dir->i_ctime = old_dir->i_mtime = ext4_current_time(old_dir);
- ext4_update_dx_flag(old_dir);
- if (dir_bh) {
- parent_de->inode = cpu_to_le32(new_dir->i_ino);
- BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
- if (!inlined) {
- if (is_dx(old_inode)) {
+ old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
+ ext4_update_dx_flag(old.dir);
+ if (old.dir_bh) {
+ old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
+ BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
+ if (!old.dir_inlined) {
+ if (is_dx(old.inode)) {
retval = ext4_handle_dirty_dx_node(handle,
- old_inode,
- dir_bh);
+ old.inode,
+ old.dir_bh);
} else {
retval = ext4_handle_dirty_dirent_node(handle,
- old_inode, dir_bh);
+ old.inode, old.dir_bh);
}
} else {
- retval = ext4_mark_inode_dirty(handle, old_inode);
+ retval = ext4_mark_inode_dirty(handle, old.inode);
}
if (retval) {
- ext4_std_error(old_dir->i_sb, retval);
+ ext4_std_error(old.dir->i_sb, retval);
goto end_rename;
}
- ext4_dec_count(handle, old_dir);
- if (new_inode) {
+ ext4_dec_count(handle, old.dir);
+ if (new.inode) {
/* checked empty_dir above, can't have another parent,
* ext4_dec_count() won't work for many-linked dirs */
- clear_nlink(new_inode);
+ clear_nlink(new.inode);
} else {
- ext4_inc_count(handle, new_dir);
- ext4_update_dx_flag(new_dir);
- ext4_mark_inode_dirty(handle, new_dir);
+ ext4_inc_count(handle, new.dir);
+ ext4_update_dx_flag(new.dir);
+ ext4_mark_inode_dirty(handle, new.dir);
}
}
- ext4_mark_inode_dirty(handle, old_dir);
- if (new_inode) {
- ext4_mark_inode_dirty(handle, new_inode);
- if (!new_inode->i_nlink)
- ext4_orphan_add(handle, new_inode);
+ ext4_mark_inode_dirty(handle, old.dir);
+ if (new.inode) {
+ ext4_mark_inode_dirty(handle, new.inode);
+ if (!new.inode->i_nlink)
+ ext4_orphan_add(handle, new.inode);
}
retval = 0;

end_rename:
- brelse(dir_bh);
- brelse(old_bh);
- brelse(new_bh);
+ brelse(old.dir_bh);
+ brelse(old.bh);
+ brelse(new.bh);
if (handle)
ext4_journal_stop(handle);
return retval;
--
1.8.1.4

2014-02-07 16:48:56

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 11/13] ext4: rename: split out helper functions

From: Miklos Szeredi <[email protected]>

Cross rename (exchange source and dest) will need to call some of these
helpers for both source and dest, while overwriting rename currently only
calls them for one or the other. This also makes the code easier to
follow.

Signed-off-by: Miklos Szeredi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/namei.c | 199 +++++++++++++++++++++++++++++++++++---------------------
1 file changed, 126 insertions(+), 73 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 87a8a6e613ba..75f1bde43dcc 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3016,6 +3016,125 @@ struct ext4_renament {
int dir_inlined;
};

+static int ext4_rename_dir_prepare(handle_t *handle, struct ext4_renament *ent)
+{
+ int retval;
+
+ ent->dir_bh = ext4_get_first_dir_block(handle, ent->inode,
+ &retval, &ent->parent_de,
+ &ent->dir_inlined);
+ if (!ent->dir_bh)
+ return retval;
+ if (le32_to_cpu(ent->parent_de->inode) != ent->dir->i_ino)
+ return -EIO;
+ BUFFER_TRACE(ent->dir_bh, "get_write_access");
+ return ext4_journal_get_write_access(handle, ent->dir_bh);
+}
+
+static int ext4_rename_dir_finish(handle_t *handle, struct ext4_renament *ent,
+ unsigned dir_ino)
+{
+ int retval;
+
+ ent->parent_de->inode = cpu_to_le32(dir_ino);
+ BUFFER_TRACE(ent->dir_bh, "call ext4_handle_dirty_metadata");
+ if (!ent->dir_inlined) {
+ if (is_dx(ent->inode)) {
+ retval = ext4_handle_dirty_dx_node(handle,
+ ent->inode,
+ ent->dir_bh);
+ } else {
+ retval = ext4_handle_dirty_dirent_node(handle,
+ ent->inode,
+ ent->dir_bh);
+ }
+ } else {
+ retval = ext4_mark_inode_dirty(handle, ent->inode);
+ }
+ if (retval) {
+ ext4_std_error(ent->dir->i_sb, retval);
+ return retval;
+ }
+ return 0;
+}
+
+static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
+ unsigned ino, unsigned file_type)
+{
+ int retval;
+
+ BUFFER_TRACE(ent->bh, "get write access");
+ retval = ext4_journal_get_write_access(handle, ent->bh);
+ if (retval)
+ return retval;
+ ent->de->inode = cpu_to_le32(ino);
+ if (EXT4_HAS_INCOMPAT_FEATURE(ent->dir->i_sb,
+ EXT4_FEATURE_INCOMPAT_FILETYPE))
+ ent->de->file_type = file_type;
+ ent->dir->i_version++;
+ ent->dir->i_ctime = ent->dir->i_mtime =
+ ext4_current_time(ent->dir);
+ ext4_mark_inode_dirty(handle, ent->dir);
+ BUFFER_TRACE(ent->bh, "call ext4_handle_dirty_metadata");
+ if (!ent->inlined) {
+ retval = ext4_handle_dirty_dirent_node(handle,
+ ent->dir, ent->bh);
+ if (unlikely(retval)) {
+ ext4_std_error(ent->dir->i_sb, retval);
+ return retval;
+ }
+ }
+ brelse(ent->bh);
+ ent->bh = NULL;
+
+ return 0;
+}
+
+static int ext4_find_delete_entry(handle_t *handle, struct inode *dir,
+ const struct qstr *d_name)
+{
+ int retval = -ENOENT;
+ struct buffer_head *bh;
+ struct ext4_dir_entry_2 *de;
+
+ bh = ext4_find_entry(dir, d_name, &de, NULL);
+ if (bh) {
+ retval = ext4_delete_entry(handle, dir, de, bh);
+ brelse(bh);
+ }
+ return retval;
+}
+
+static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
+{
+ int retval;
+ /*
+ * ent->de could have moved from under us during htree split, so make
+ * sure that we are deleting the right entry. We might also be pointing
+ * to a stale entry in the unused part of ent->bh so just checking inum
+ * and the name isn't enough.
+ */
+ if (le32_to_cpu(ent->de->inode) != ent->inode->i_ino ||
+ ent->de->name_len != ent->dentry->d_name.len ||
+ strncmp(ent->de->name, ent->dentry->d_name.name,
+ ent->de->name_len)) {
+ retval = ext4_find_delete_entry(handle, ent->dir,
+ &ent->dentry->d_name);
+ } else {
+ retval = ext4_delete_entry(handle, ent->dir, ent->de, ent->bh);
+ if (retval == -ENOENT) {
+ retval = ext4_find_delete_entry(handle, ent->dir,
+ &ent->dentry->d_name);
+ }
+ }
+
+ if (retval) {
+ ext4_warning(ent->dir->i_sb,
+ "Deleting old file (%lu), %d, error=%d",
+ ent->dir->i_ino, ent->dir->i_nlink, retval);
+ }
+}
+
/*
* Anybody can rename anything with this: the permission checks are left to the
* higher-level routines.
@@ -3089,16 +3208,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
goto end_rename;
}
- retval = -EIO;
- old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
- &retval, &old.parent_de,
- &old.dir_inlined);
- if (!old.dir_bh)
- goto end_rename;
- if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
- goto end_rename;
- BUFFER_TRACE(old.dir_bh, "get_write_access");
- retval = ext4_journal_get_write_access(handle, old.dir_bh);
+ retval = ext4_rename_dir_prepare(handle, &old);
if (retval)
goto end_rename;
}
@@ -3107,29 +3217,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (retval)
goto end_rename;
} else {
- BUFFER_TRACE(new.bh, "get write access");
- retval = ext4_journal_get_write_access(handle, new.bh);
+ retval = ext4_setent(handle, &new,
+ old.inode->i_ino, old.de->file_type);
if (retval)
goto end_rename;
- new.de->inode = cpu_to_le32(old.inode->i_ino);
- if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
- EXT4_FEATURE_INCOMPAT_FILETYPE))
- new.de->file_type = old.de->file_type;
- new.dir->i_version++;
- new.dir->i_ctime = new.dir->i_mtime =
- ext4_current_time(new.dir);
- ext4_mark_inode_dirty(handle, new.dir);
- BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
- if (!new.inlined) {
- retval = ext4_handle_dirty_dirent_node(handle,
- new.dir, new.bh);
- if (unlikely(retval)) {
- ext4_std_error(new.dir->i_sb, retval);
- goto end_rename;
- }
- }
- brelse(new.bh);
- new.bh = NULL;
}

/*
@@ -3142,31 +3233,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
/*
* ok, that's it
*/
- if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
- old.de->name_len != old.dentry->d_name.len ||
- strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
- (retval = ext4_delete_entry(handle, old.dir,
- old.de, old.bh)) == -ENOENT) {
- /* old.de could have moved from under us during htree split, so
- * make sure that we are deleting the right entry. We might
- * also be pointing to a stale entry in the unused part of
- * old.bh so just checking inum and the name isn't enough. */
- struct buffer_head *old_bh2;
- struct ext4_dir_entry_2 *old_de2;
-
- old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
- &old_de2, NULL);
- if (old_bh2) {
- retval = ext4_delete_entry(handle, old.dir,
- old_de2, old_bh2);
- brelse(old_bh2);
- }
- }
- if (retval) {
- ext4_warning(old.dir->i_sb,
- "Deleting old file (%lu), %d, error=%d",
- old.dir->i_ino, old.dir->i_nlink, retval);
- }
+ ext4_rename_delete(handle, &old);

if (new.inode) {
ext4_dec_count(handle, new.inode);
@@ -3175,24 +3242,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
ext4_update_dx_flag(old.dir);
if (old.dir_bh) {
- old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
- BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
- if (!old.dir_inlined) {
- if (is_dx(old.inode)) {
- retval = ext4_handle_dirty_dx_node(handle,
- old.inode,
- old.dir_bh);
- } else {
- retval = ext4_handle_dirty_dirent_node(handle,
- old.inode, old.dir_bh);
- }
- } else {
- retval = ext4_mark_inode_dirty(handle, old.inode);
- }
- if (retval) {
- ext4_std_error(old.dir->i_sb, retval);
+ retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
+ if (retval)
goto end_rename;
- }
+
ext4_dec_count(handle, old.dir);
if (new.inode) {
/* checked empty_dir above, can't have another parent,
--
1.8.1.4

2014-02-07 16:49:05

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 13/13] vfs: merge rename2 into rename

From: Miklos Szeredi <[email protected]>

This merges i_op->rename2 back into ->rename.

Signed-off-by: Miklos Szeredi <[email protected]>
---
Documentation/filesystems/Locking | 6 +-----
Documentation/filesystems/vfs.txt | 8 ++------
drivers/staging/lustre/lustre/llite/namei.c | 3 ++-
fs/9p/v9fs.h | 3 ++-
fs/9p/vfs_inode.c | 4 +++-
fs/affs/affs.h | 3 ++-
fs/affs/namei.c | 3 ++-
fs/afs/dir.c | 6 ++++--
fs/bad_inode.c | 3 ++-
fs/bfs/dir.c | 3 ++-
fs/btrfs/inode.c | 3 ++-
fs/ceph/dir.c | 3 ++-
fs/cifs/cifsfs.h | 2 +-
fs/cifs/inode.c | 3 ++-
fs/coda/dir.c | 8 +++++---
fs/debugfs/inode.c | 2 +-
fs/ecryptfs/inode.c | 3 ++-
fs/exofs/namei.c | 3 ++-
fs/ext2/namei.c | 5 +++--
fs/ext3/namei.c | 5 +++--
fs/ext4/namei.c | 9 ++++-----
fs/ext4/super.c | 6 +++---
fs/f2fs/namei.c | 3 ++-
fs/fat/namei_msdos.c | 3 ++-
fs/fat/namei_vfat.c | 3 ++-
fs/fuse/dir.c | 3 ++-
fs/gfs2/inode.c | 3 ++-
fs/hfs/dir.c | 3 ++-
fs/hfsplus/dir.c | 3 ++-
fs/hostfs/hostfs_kern.c | 3 ++-
fs/hpfs/namei.c | 3 ++-
fs/jffs2/dir.c | 5 +++--
fs/jfs/namei.c | 3 ++-
fs/kernfs/dir.c | 3 ++-
fs/libfs.c | 3 ++-
fs/logfs/dir.c | 3 ++-
fs/minix/namei.c | 5 +++--
fs/namei.c | 11 +++--------
fs/ncpfs/dir.c | 5 +++--
fs/nfs/dir.c | 3 ++-
fs/nfs/internal.h | 3 ++-
fs/nilfs2/namei.c | 3 ++-
fs/ocfs2/namei.c | 3 ++-
fs/omfs/dir.c | 3 ++-
fs/reiserfs/namei.c | 3 ++-
fs/sysv/namei.c | 5 +++--
fs/ubifs/dir.c | 3 ++-
fs/udf/namei.c | 3 ++-
fs/ufs/namei.c | 3 ++-
fs/xfs/xfs_iops.c | 3 ++-
include/linux/fs.h | 5 ++---
kernel/cgroup.c | 5 +++--
mm/shmem.c | 2 +-
53 files changed, 119 insertions(+), 87 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index f424e0e5b46b..3bbd4140a150 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -46,8 +46,6 @@ prototypes:
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct inode *, struct dentry *,
- struct inode *, struct dentry *);
- int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
void * (*follow_link) (struct dentry *, struct nameidata *);
@@ -80,7 +78,6 @@ mkdir: yes
unlink: yes (both)
rmdir: yes (both) (see below)
rename: yes (all) (see below)
-rename2: yes (all) (see below)
readlink: no
follow_link: no
put_link: no
@@ -99,8 +96,7 @@ tmpfile: no

Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_mutex on
victim.
- cross-directory ->rename() and rename2() has (per-superblock)
-->s_vfs_rename_sem.
+ cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.

See Documentation/filesystems/directory-locking for more detailed discussion
of the locking scheme for directory operations.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 94eb86287bcb..88a32804b9e9 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -346,8 +346,6 @@ struct inode_operations {
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct inode *, struct dentry *,
- struct inode *, struct dentry *);
- int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
void * (*follow_link) (struct dentry *, struct nameidata *);
@@ -415,10 +413,8 @@ otherwise noted.

rename: called by the rename(2) system call to rename the object to
have the parent and name given by the second inode and dentry.
-
- rename2: this has an additional flags argument compared to rename.
- If no flags are supported by the filesystem then this method
- need not be implemented. If some flags are supported then the
+ If the filesystem supports some flags (fifth argument), then
+ it needs to set FS_RENAME_FLAGS in the filesystem type. The
filesystem must return -EINVAL for any unsupported or unknown
flags. Currently the following flags are implemented:
(1) RENAME_NOREPLACE: this flag indicates that if the target
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index fc8d264f6c9a..0191e7d9482e 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -1215,7 +1215,8 @@ static int ll_link(struct dentry *old_dentry, struct inode *dir,
}

static int ll_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int err;
err = ll_rename_generic(old_dir, NULL,
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 099c7712631c..6433da41fdc9 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -149,7 +149,8 @@ extern struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
extern int v9fs_vfs_unlink(struct inode *i, struct dentry *d);
extern int v9fs_vfs_rmdir(struct inode *i, struct dentry *d);
extern int v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry);
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags);
extern void v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd,
void *p);
extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index bb7991c7e5c7..b67d47bf6f8f 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -951,12 +951,14 @@ int v9fs_vfs_rmdir(struct inode *i, struct dentry *d)
* @old_dentry: old dentry
* @new_dir: new dir inode
* @new_dentry: new dentry
+ * @flags: rename flags
*
*/

int
v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int retval;
struct inode *old_inode;
diff --git a/fs/affs/affs.h b/fs/affs/affs.h
index 3952121f2f28..badf97e81250 100644
--- a/fs/affs/affs.h
+++ b/fs/affs/affs.h
@@ -163,7 +163,8 @@ extern int affs_link(struct dentry *olddentry, struct inode *dir,
extern int affs_symlink(struct inode *dir, struct dentry *dentry,
const char *symname);
extern int affs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry);
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags);

/* inode.c */

diff --git a/fs/affs/namei.c b/fs/affs/namei.c
index c36cbb4537a2..56ba5ee23514 100644
--- a/fs/affs/namei.c
+++ b/fs/affs/namei.c
@@ -401,7 +401,8 @@ affs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)

int
affs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct super_block *sb = old_dir->i_sb;
struct buffer_head *bh = NULL;
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 529300327f45..09a27fa2f995 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -38,7 +38,8 @@ static int afs_link(struct dentry *from, struct inode *dir,
static int afs_symlink(struct inode *dir, struct dentry *dentry,
const char *content);
static int afs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry);
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags);

const struct file_operations afs_dir_file_operations = {
.open = afs_dir_open,
@@ -1088,7 +1089,8 @@ error:
* rename a file in an AFS filesystem and/or move it between directories
*/
static int afs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct afs_vnode *orig_dvnode, *new_dvnode, *vnode;
struct key *key;
diff --git a/fs/bad_inode.c b/fs/bad_inode.c
index 7c93953030fb..02673d57ceda 100644
--- a/fs/bad_inode.c
+++ b/fs/bad_inode.c
@@ -219,7 +219,8 @@ static int bad_inode_mknod (struct inode *dir, struct dentry *dentry,
}

static int bad_inode_rename (struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
return -EIO;
}
diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index a399e6d9dc74..3ae986541dcd 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -209,7 +209,8 @@ out_brelse:
}

static int bfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *old_inode, *new_inode;
struct buffer_head *old_bh, *new_bh;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5c4ab9c18940..d68f292e15dc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8179,7 +8179,8 @@ static int btrfs_getattr(struct vfsmount *mnt,
}

static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct btrfs_trans_handle *trans;
struct btrfs_root *root = BTRFS_I(old_dir)->root;
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 6da4df84ba30..80b827da02c1 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -883,7 +883,8 @@ out:
}

static int ceph_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct ceph_fs_client *fsc = ceph_sb_to_client(old_dir->i_sb);
struct ceph_mds_client *mdsc = fsc->mdsc;
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 26a754f49ba1..b9e81af57e46 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -61,7 +61,7 @@ extern int cifs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
extern int cifs_mkdir(struct inode *, struct dentry *, umode_t);
extern int cifs_rmdir(struct inode *, struct dentry *);
extern int cifs_rename(struct inode *, struct dentry *, struct inode *,
- struct dentry *);
+ struct dentry *, unsigned int);
extern int cifs_revalidate_file_attr(struct file *filp);
extern int cifs_revalidate_dentry_attr(struct dentry *);
extern int cifs_revalidate_file(struct file *filp);
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 9cb9679d7357..7aed67deefb1 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1619,7 +1619,8 @@ do_rename_exit:

int
cifs_rename(struct inode *source_dir, struct dentry *source_dentry,
- struct inode *target_dir, struct dentry *target_dentry)
+ struct inode *target_dir, struct dentry *target_dentry,
+ unsigned int flags)
{
char *from_name = NULL;
char *to_name = NULL;
diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index 5efbb5ee0adc..73f4d2b97b4d 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -39,8 +39,9 @@ static int coda_symlink(struct inode *dir_inode, struct dentry *entry,
const char *symname);
static int coda_mkdir(struct inode *dir_inode, struct dentry *entry, umode_t mode);
static int coda_rmdir(struct inode *dir_inode, struct dentry *entry);
-static int coda_rename(struct inode *old_inode, struct dentry *old_dentry,
- struct inode *new_inode, struct dentry *new_dentry);
+static int coda_rename(struct inode *old_inode, struct dentry *old_dentry,
+ struct inode *new_inode, struct dentry *new_dentry,
+ unsigned int flags);

/* dir file-ops */
static int coda_readdir(struct file *file, struct dir_context *ctx);
@@ -347,7 +348,8 @@ static int coda_rmdir(struct inode *dir, struct dentry *de)

/* rename */
static int coda_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
const char *old_name = old_dentry->d_name.name;
const char *new_name = new_dentry->d_name.name;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 9c0444cccbe1..70fc09be0ffd 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -618,7 +618,7 @@ struct dentry *debugfs_rename(struct dentry *old_dir, struct dentry *old_dentry,
old_name = fsnotify_oldname_init(old_dentry->d_name.name);

error = simple_rename(old_dir->d_inode, old_dentry, new_dir->d_inode,
- dentry);
+ dentry, 0);
if (error) {
fsnotify_oldname_free(old_name);
goto exit;
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index d4a9431ec73c..d35675fda228 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -611,7 +611,8 @@ out:

static int
ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int rc;
struct dentry *lower_old_dentry;
diff --git a/fs/exofs/namei.c b/fs/exofs/namei.c
index 4731fd991efe..bc120ed5f566 100644
--- a/fs/exofs/namei.c
+++ b/fs/exofs/namei.c
@@ -228,7 +228,8 @@ static int exofs_rmdir(struct inode *dir, struct dentry *dentry)
}

static int exofs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *old_inode = old_dentry->d_inode;
struct inode *new_inode = new_dentry->d_inode;
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index c268d0af1db9..24b24c1f6560 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -320,8 +320,9 @@ static int ext2_rmdir (struct inode * dir, struct dentry *dentry)
return err;
}

-static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
- struct inode * new_dir, struct dentry * new_dentry )
+static int ext2_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode * old_inode = old_dentry->d_inode;
struct inode * new_inode = new_dentry->d_inode;
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index f197736dccfa..b1633143bd74 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -2375,8 +2375,9 @@ retry:
* Anybody can rename anything with this: the permission checks are left to the
* higher-level routines.
*/
-static int ext3_rename (struct inode * old_dir, struct dentry *old_dentry,
- struct inode * new_dir,struct dentry *new_dentry)
+static int ext3_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
handle_t *handle;
struct inode * old_inode, * new_inode;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1cb84f78909e..6f1ee2140ee5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3156,8 +3156,8 @@ static void ext4_update_dir_count(handle_t *handle, struct ext4_renament *ent)
* while new_{dentry,inode) refers to the destination dentry/inode
* This comes from rename(const char *oldpath, const char *newpath)
*/
-static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+static int ext4_plain_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
{
handle_t *handle = NULL;
struct ext4_renament old = {
@@ -3403,7 +3403,7 @@ end_rename:
return retval;
}

-static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
+static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags)
{
@@ -3418,7 +3418,7 @@ static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
* Existence checking was done by the VFS, otherwise "RENAME_NOREPLACE"
* is equivalent to regular rename.
*/
- return ext4_rename(old_dir, old_dentry, new_dir, new_dentry);
+ return ext4_plain_rename(old_dir, old_dentry, new_dir, new_dentry);
}

/*
@@ -3435,7 +3435,6 @@ const struct inode_operations ext4_dir_inode_operations = {
.mknod = ext4_mknod,
.tmpfile = ext4_tmpfile,
.rename = ext4_rename,
- .rename2 = ext4_rename2,
.setattr = ext4_setattr,
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1f7784de05b6..9a380ed56495 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -90,7 +90,7 @@ static struct file_system_type ext2_fs_type = {
.name = "ext2",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_RENAME_FLAGS,
};
MODULE_ALIAS_FS("ext2");
MODULE_ALIAS("ext2");
@@ -106,7 +106,7 @@ static struct file_system_type ext3_fs_type = {
.name = "ext3",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_RENAME_FLAGS,
};
MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
@@ -5432,7 +5432,7 @@ static struct file_system_type ext4_fs_type = {
.name = "ext4",
.mount = ext4_mount,
.kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_RENAME_FLAGS,
};
MODULE_ALIAS_FS("ext4");

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 397d459e97bf..a8627157d49d 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -374,7 +374,8 @@ out:
}

static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct super_block *sb = old_dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index a783b0e1272a..1bd1ece2f752 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -598,7 +598,8 @@ error_inode:

/***** Rename, a wrapper for rename_same_dir & rename_diff_dir */
static int msdos_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct super_block *sb = old_dir->i_sb;
unsigned char old_msdos_name[MSDOS_NAME], new_msdos_name[MSDOS_NAME];
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 6df8d3d885e5..87ab4e09821f 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -903,7 +903,8 @@ out:
}

static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct buffer_head *dotdot_bh;
struct msdos_dir_entry *dotdot_de;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1d1292c581c3..954f08fd9e72 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -744,7 +744,8 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
}

static int fuse_rename(struct inode *olddir, struct dentry *oldent,
- struct inode *newdir, struct dentry *newent)
+ struct inode *newdir, struct dentry *newent,
+ unsigned int flags)
{
int err;
struct fuse_rename_in inarg;
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 5c524180c98e..fb9dbae2a985 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1288,7 +1288,8 @@ static int gfs2_ok_to_move(struct gfs2_inode *this, struct gfs2_inode *to)
*/

static int gfs2_rename(struct inode *odir, struct dentry *odentry,
- struct inode *ndir, struct dentry *ndentry)
+ struct inode *ndir, struct dentry *ndentry,
+ unsigned int flags)
{
struct gfs2_inode *odip = GFS2_I(odir);
struct gfs2_inode *ndip = GFS2_I(ndir);
diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c
index 145566851e7a..3205d4496471 100644
--- a/fs/hfs/dir.c
+++ b/fs/hfs/dir.c
@@ -280,7 +280,8 @@ static int hfs_remove(struct inode *dir, struct dentry *dentry)
* XXX: how do you handle must_be dir?
*/
static int hfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int res;

diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index bdec66522de3..80cb1fcc83aa 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -494,7 +494,8 @@ static int hfsplus_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
}

static int hfsplus_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int res;

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index fe649d325b1f..03ffa2505248 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -742,7 +742,8 @@ static int hostfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
}

static int hostfs_rename(struct inode *from_ino, struct dentry *from,
- struct inode *to_ino, struct dentry *to)
+ struct inode *to_ino, struct dentry *to,
+ unsigned int flags)
{
char *from_name, *to_name;
int err;
diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
index 1b39afdd86fd..4e2fd3ea59a6 100644
--- a/fs/hpfs/namei.c
+++ b/fs/hpfs/namei.c
@@ -516,7 +516,8 @@ const struct address_space_operations hpfs_symlink_aops = {
};

static int hpfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
const unsigned char *old_name = old_dentry->d_name.name;
unsigned old_len = old_dentry->d_name.len;
diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 938556025d64..eb6855ddbb72 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -35,7 +35,7 @@ static int jffs2_mkdir (struct inode *,struct dentry *,umode_t);
static int jffs2_rmdir (struct inode *,struct dentry *);
static int jffs2_mknod (struct inode *,struct dentry *,umode_t,dev_t);
static int jffs2_rename (struct inode *, struct dentry *,
- struct inode *, struct dentry *);
+ struct inode *, struct dentry *, unsigned int);

const struct file_operations jffs2_dir_operations =
{
@@ -757,7 +757,8 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, umode_t mode
}

static int jffs2_rename (struct inode *old_dir_i, struct dentry *old_dentry,
- struct inode *new_dir_i, struct dentry *new_dentry)
+ struct inode *new_dir_i, struct dentry *new_dentry,
+ unsigned int flags)
{
int ret;
struct jffs2_sb_info *c = JFFS2_SB_INFO(old_dir_i->i_sb);
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index d59c7defb1ef..005ebea78a2e 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -1062,7 +1062,8 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
* FUNCTION: rename a file or directory
*/
static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct btstack btstack;
ino_t ino;
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 5104cf5d25c5..451e2cd9bc16 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -744,7 +744,8 @@ static int kernfs_iop_rmdir(struct inode *dir, struct dentry *dentry)
}

static int kernfs_iop_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct kernfs_node *kn = old_dentry->d_fsdata;
struct kernfs_node *new_parent = new_dir->i_private;
diff --git a/fs/libfs.c b/fs/libfs.c
index a1844244246f..b32dad65f350 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -325,7 +325,8 @@ int simple_rmdir(struct inode *dir, struct dentry *dentry)
EXPORT_SYMBOL(simple_rmdir);

int simple_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *inode = old_dentry->d_inode;
int they_are_dirs = S_ISDIR(old_dentry->d_inode->i_mode);
diff --git a/fs/logfs/dir.c b/fs/logfs/dir.c
index 6bdc347008f5..ce1ed61305f6 100644
--- a/fs/logfs/dir.c
+++ b/fs/logfs/dir.c
@@ -717,7 +717,8 @@ out:
}

static int logfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
if (new_dentry->d_inode)
return logfs_rename_target(old_dir, old_dentry,
diff --git a/fs/minix/namei.c b/fs/minix/namei.c
index cd950e2331b6..e4bdbb3edf00 100644
--- a/fs/minix/namei.c
+++ b/fs/minix/namei.c
@@ -184,8 +184,9 @@ static int minix_rmdir(struct inode * dir, struct dentry *dentry)
return err;
}

-static int minix_rename(struct inode * old_dir, struct dentry *old_dentry,
- struct inode * new_dir, struct dentry *new_dentry)
+static int minix_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode * old_inode = old_dentry->d_inode;
struct inode * new_inode = new_dentry->d_inode;
diff --git a/fs/namei.c b/fs/namei.c
index 50b0ca3dddc3..75e6ff6be70b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4030,7 +4030,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (!old_dir->i_op->rename)
return -EPERM;

- if (flags && !old_dir->i_op->rename2)
+ if (flags && !(old_dir->i_sb->s_type->fs_flags & FS_RENAME_FLAGS))
return -EINVAL;

/*
@@ -4086,13 +4086,8 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (error)
goto out;
}
- if (!flags) {
- error = old_dir->i_op->rename(old_dir, old_dentry,
- new_dir, new_dentry);
- } else {
- error = old_dir->i_op->rename2(old_dir, old_dentry,
- new_dir, new_dentry, flags);
- }
+ error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry,
+ flags);
if (error)
goto out;

diff --git a/fs/ncpfs/dir.c b/fs/ncpfs/dir.c
index c320ac52353e..bcbd159c86ba 100644
--- a/fs/ncpfs/dir.c
+++ b/fs/ncpfs/dir.c
@@ -36,7 +36,7 @@ static int ncp_unlink(struct inode *, struct dentry *);
static int ncp_mkdir(struct inode *, struct dentry *, umode_t);
static int ncp_rmdir(struct inode *, struct dentry *);
static int ncp_rename(struct inode *, struct dentry *,
- struct inode *, struct dentry *);
+ struct inode *, struct dentry *, unsigned int);
static int ncp_mknod(struct inode * dir, struct dentry *dentry,
umode_t mode, dev_t rdev);
#if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
@@ -1113,7 +1113,8 @@ static int ncp_unlink(struct inode *dir, struct dentry *dentry)
}

static int ncp_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct ncp_server *server = NCP_SERVER(old_dir);
int error;
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index be38b573495a..ed335994569a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1901,7 +1901,8 @@ EXPORT_SYMBOL_GPL(nfs_link);
* the rename.
*/
int nfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *old_inode = old_dentry->d_inode;
struct inode *new_inode = new_dentry->d_inode;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 8b5cc04a8611..a2e287eb6b7d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -303,7 +303,8 @@ int nfs_unlink(struct inode *, struct dentry *);
int nfs_symlink(struct inode *, struct dentry *, const char *);
int nfs_link(struct dentry *, struct inode *, struct dentry *);
int nfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
-int nfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
+int nfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *,
+ unsigned int);

/* file.c */
int nfs_file_fsync_commit(struct file *, loff_t, loff_t, int);
diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
index 9de78f08989e..33c474cf61cc 100644
--- a/fs/nilfs2/namei.c
+++ b/fs/nilfs2/namei.c
@@ -347,7 +347,8 @@ static int nilfs_rmdir(struct inode *dir, struct dentry *dentry)
}

static int nilfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *old_inode = old_dentry->d_inode;
struct inode *new_inode = new_dentry->d_inode;
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index f4d609be9400..d8837e37b9dd 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -1058,7 +1058,8 @@ static void ocfs2_double_unlock(struct inode *inode1, struct inode *inode2)
static int ocfs2_rename(struct inode *old_dir,
struct dentry *old_dentry,
struct inode *new_dir,
- struct dentry *new_dentry)
+ struct dentry *new_dentry,
+ unsigned int flags)
{
int status = 0, rename_lock = 0, parents_locked = 0, target_exists = 0;
int old_child_locked = 0, new_child_locked = 0, update_dot_dot = 0;
diff --git a/fs/omfs/dir.c b/fs/omfs/dir.c
index 1b8e9e8405b2..b0df602aeeea 100644
--- a/fs/omfs/dir.c
+++ b/fs/omfs/dir.c
@@ -371,7 +371,8 @@ static bool omfs_fill_chain(struct inode *dir, struct dir_context *ctx,
}

static int omfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *new_inode = new_dentry->d_inode;
struct inode *old_inode = old_dentry->d_inode;
diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
index e825f8b63e6b..361e3f36c32d 100644
--- a/fs/reiserfs/namei.c
+++ b/fs/reiserfs/namei.c
@@ -1202,7 +1202,8 @@ static void set_ino_in_dir_entry(struct reiserfs_dir_entry *de,
* get_empty_nodes or its clones
*/
static int reiserfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int retval;
INITIALIZE_PATH(old_entry_path);
diff --git a/fs/sysv/namei.c b/fs/sysv/namei.c
index 731b2bbcaab3..34000fa0c9bd 100644
--- a/fs/sysv/namei.c
+++ b/fs/sysv/namei.c
@@ -205,8 +205,9 @@ static int sysv_rmdir(struct inode * dir, struct dentry * dentry)
* Anybody can rename anything with this: the permission checks are left to the
* higher-level routines.
*/
-static int sysv_rename(struct inode * old_dir, struct dentry * old_dentry,
- struct inode * new_dir, struct dentry * new_dentry)
+static int sysv_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode * old_inode = old_dentry->d_inode;
struct inode * new_inode = new_dentry->d_inode;
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index ea41649e4ca5..d47373e5438a 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -950,7 +950,8 @@ static void unlock_3_inodes(struct inode *inode1, struct inode *inode2,
}

static int ubifs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct ubifs_info *c = old_dir->i_sb->s_fs_info;
struct inode *old_inode = old_dentry->d_inode;
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 9737cba1357d..6a16d2e4c823 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1079,7 +1079,8 @@ static int udf_link(struct dentry *old_dentry, struct inode *dir,
* higher-level routines.
*/
static int udf_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *old_inode = old_dentry->d_inode;
struct inode *new_inode = new_dentry->d_inode;
diff --git a/fs/ufs/namei.c b/fs/ufs/namei.c
index 90d74b8f8eba..85f5be154e8a 100644
--- a/fs/ufs/namei.c
+++ b/fs/ufs/namei.c
@@ -259,7 +259,8 @@ static int ufs_rmdir (struct inode * dir, struct dentry *dentry)
}

static int ufs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
struct inode *old_inode = old_dentry->d_inode;
struct inode *new_inode = new_dentry->d_inode;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f35d5c953ff9..d79bcda51fc9 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -349,7 +349,8 @@ xfs_vn_rename(
struct inode *odir,
struct dentry *odentry,
struct inode *ndir,
- struct dentry *ndentry)
+ struct dentry *ndentry,
+ unsigned int flags)
{
struct inode *new_inode = ndentry->d_inode;
struct xfs_name oname;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4fbdfae87410..23b47e4d9fd9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1566,8 +1566,6 @@ struct inode_operations {
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct inode *, struct dentry *,
- struct inode *, struct dentry *);
- int (*rename2) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
@@ -1819,6 +1817,7 @@ struct file_system_type {
#define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */
#define FS_USERNS_DEV_MOUNT 16 /* A userns mount does not imply MNT_NODEV */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
+#define FS_RENAME_FLAGS (1 << 16) /* ->rename supports flag(s) */
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
void (*kill_sb) (struct super_block *);
@@ -2615,7 +2614,7 @@ extern int simple_open(struct inode *inode, struct file *file);
extern int simple_link(struct dentry *, struct inode *, struct dentry *);
extern int simple_unlink(struct inode *, struct dentry *);
extern int simple_rmdir(struct inode *, struct dentry *);
-extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
+extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, unsigned int);
extern int noop_fsync(struct file *, loff_t, loff_t, int);
extern int simple_empty(struct dentry *);
extern int simple_readpage(struct file *file, struct page *page);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e2f46ba37f72..66426304d2f6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2439,7 +2439,8 @@ static int cgroup_file_release(struct inode *inode, struct file *file)
* cgroup_rename - Only allow simple rename of directories in place.
*/
static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int ret;
struct cgroup_name *name, *old_name;
@@ -2471,7 +2472,7 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
if (!name)
return -ENOMEM;

- ret = simple_rename(old_dir, old_dentry, new_dir, new_dentry);
+ ret = simple_rename(old_dir, old_dentry, new_dir, new_dentry, 0);
if (ret) {
kfree(name);
return ret;
diff --git a/mm/shmem.c b/mm/shmem.c
index 1f18c9d0d93e..d92d2d61e8b2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2053,7 +2053,7 @@ static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
* it exists so that the VFS layer correctly free's it when it
* gets overwritten.
*/
-static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry)
+static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags)
{
struct inode *inode = old_dentry->d_inode;
int they_are_dirs = S_ISDIR(inode->i_mode);
--
1.8.1.4

2014-02-07 16:53:09

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 12/13] ext4: add cross rename support

From: Miklos Szeredi <[email protected]>

Implement RENAME_EXCHANGE flag in renameat2 syscall.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/ext4/namei.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 138 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 75f1bde43dcc..1cb84f78909e 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3004,6 +3004,8 @@ struct ext4_renament {
struct inode *dir;
struct dentry *dentry;
struct inode *inode;
+ bool is_dir;
+ int dir_nlink_delta;

/* entry for "dentry" */
struct buffer_head *bh;
@@ -3135,6 +3137,17 @@ static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
}
}

+static void ext4_update_dir_count(handle_t *handle, struct ext4_renament *ent)
+{
+ if (ent->dir_nlink_delta) {
+ if (ent->dir_nlink_delta == -1)
+ ext4_dec_count(handle, ent->dir);
+ else
+ ext4_inc_count(handle, ent->dir);
+ ext4_mark_inode_dirty(handle, ent->dir);
+ }
+}
+
/*
* Anybody can rename anything with this: the permission checks are left to the
* higher-level routines.
@@ -3274,13 +3287,137 @@ end_rename:
return retval;
}

+static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ handle_t *handle = NULL;
+ struct ext4_renament old = {
+ .dir = old_dir,
+ .dentry = old_dentry,
+ .inode = old_dentry->d_inode,
+ };
+ struct ext4_renament new = {
+ .dir = new_dir,
+ .dentry = new_dentry,
+ .inode = new_dentry->d_inode,
+ };
+ u8 new_file_type;
+ int retval;
+
+ dquot_initialize(old.dir);
+ dquot_initialize(new.dir);
+
+ old.bh = ext4_find_entry(old.dir, &old.dentry->d_name,
+ &old.de, &old.inlined);
+ /*
+ * Check for inode number is _not_ due to possible IO errors.
+ * We might rmdir the source, keep it as pwd of some process
+ * and merrily kill the link to whatever was created under the
+ * same name. Goodbye sticky bit ;-<
+ */
+ retval = -ENOENT;
+ if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
+ goto end_rename;
+
+ new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
+ &new.de, &new.inlined);
+
+ /* RENAME_EXCHANGE case: old *and* new must both exist */
+ if (!new.bh || le32_to_cpu(new.de->inode) != new.inode->i_ino)
+ goto end_rename;
+
+ handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
+ (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
+ 2 * EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+
+ if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
+ ext4_handle_sync(handle);
+
+ if (S_ISDIR(old.inode->i_mode)) {
+ old.is_dir = true;
+ retval = ext4_rename_dir_prepare(handle, &old);
+ if (retval)
+ goto end_rename;
+ }
+ if (S_ISDIR(new.inode->i_mode)) {
+ new.is_dir = true;
+ retval = ext4_rename_dir_prepare(handle, &new);
+ if (retval)
+ goto end_rename;
+ }
+
+ /*
+ * Other than the special case of overwriting a directory, parents'
+ * nlink only needs to be modified if this is a cross directory rename.
+ */
+ if (old.dir != new.dir && old.is_dir != new.is_dir) {
+ old.dir_nlink_delta = old.is_dir ? -1 : 1;
+ new.dir_nlink_delta = -old.dir_nlink_delta;
+ retval = -EMLINK;
+ if ((old.dir_nlink_delta > 0 && EXT4_DIR_LINK_MAX(old.dir)) ||
+ (new.dir_nlink_delta > 0 && EXT4_DIR_LINK_MAX(new.dir)))
+ goto end_rename;
+ }
+
+ new_file_type = new.de->file_type;
+ retval = ext4_setent(handle, &new, old.inode->i_ino, old.de->file_type);
+ if (retval)
+ goto end_rename;
+
+ retval = ext4_setent(handle, &old, new.inode->i_ino, new_file_type);
+ if (retval)
+ goto end_rename;
+
+ /*
+ * Like most other Unix systems, set the ctime for inodes on a
+ * rename.
+ */
+ old.inode->i_ctime = ext4_current_time(old.inode);
+ new.inode->i_ctime = ext4_current_time(new.inode);
+ ext4_mark_inode_dirty(handle, old.inode);
+ ext4_mark_inode_dirty(handle, new.inode);
+
+ if (old.dir_bh) {
+ retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
+ if (retval)
+ goto end_rename;
+ }
+ if (new.dir_bh) {
+ retval = ext4_rename_dir_finish(handle, &new, old.dir->i_ino);
+ if (retval)
+ goto end_rename;
+ }
+ ext4_update_dir_count(handle, &old);
+ ext4_update_dir_count(handle, &new);
+ retval = 0;
+
+end_rename:
+ brelse(old.dir_bh);
+ brelse(new.dir_bh);
+ brelse(old.bh);
+ brelse(new.bh);
+ if (handle)
+ ext4_journal_stop(handle);
+ return retval;
+}
+
static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags)
{
- if (flags & ~RENAME_NOREPLACE)
+ if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
return -EINVAL;

+ if (flags & RENAME_EXCHANGE) {
+ return ext4_cross_rename(old_dir, old_dentry,
+ new_dir, new_dentry);
+ }
+ /*
+ * Existence checking was done by the VFS, otherwise "RENAME_NOREPLACE"
+ * is equivalent to regular rename.
+ */
return ext4_rename(old_dir, old_dentry, new_dir, new_dentry);
}

--
1.8.1.4

2014-02-07 16:53:44

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 10/13] ext4: rename: move EMLINK check up

From: Miklos Szeredi <[email protected]>

Move checking i_nlink from after ext4_get_first_dir_block() to before. The
check doesn't rely on the result of that function and the function only
fails on fs corruption, so the order shouldn't matter.

Signed-off-by: Miklos Szeredi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/namei.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 7193cea805ff..87a8a6e613ba 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3084,6 +3084,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
retval = -ENOTEMPTY;
if (!empty_dir(new.inode))
goto end_rename;
+ } else {
+ retval = -EMLINK;
+ if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
+ goto end_rename;
}
retval = -EIO;
old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
@@ -3093,10 +3097,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
goto end_rename;
if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
goto end_rename;
- retval = -EMLINK;
- if (!new.inode && new.dir != old.dir &&
- EXT4_DIR_LINK_MAX(new.dir))
- goto end_rename;
BUFFER_TRACE(old.dir_bh, "get_write_access");
retval = ext4_journal_get_write_access(handle, old.dir_bh);
if (retval)
--
1.8.1.4

2014-02-07 16:54:08

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 08/13] vfs: add cross-rename

From: Miklos Szeredi <[email protected]>

If flags contain RENAME_EXCHANGE then exchange source and destination files.
There's no restriction on the type of the files; e.g. a directory can be
exchanged with a symlink.

Signed-off-by: Miklos Szeredi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/dcache.c | 45 +++++++++++++++++----
fs/namei.c | 104 +++++++++++++++++++++++++++++++++---------------
include/linux/dcache.h | 1 +
include/uapi/linux/fs.h | 1 +
security/security.c | 16 ++++++++
5 files changed, 127 insertions(+), 40 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 265e0ce9769c..bd6f96497402 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2483,12 +2483,14 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
dentry->d_name.name = dentry->d_iname;
} else {
/*
- * Both are internal. Just copy target to dentry
+ * Both are internal.
*/
- memcpy(dentry->d_iname, target->d_name.name,
- target->d_name.len + 1);
- dentry->d_name.len = target->d_name.len;
- return;
+ unsigned int i;
+ BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
+ for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+ swap(((long *) &dentry->d_iname)[i],
+ ((long *) &target->d_iname)[i]);
+ }
}
}
swap(dentry->d_name.len, target->d_name.len);
@@ -2545,13 +2547,15 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
* __d_move - move a dentry
* @dentry: entry to move
* @target: new dentry
+ * @exchange: exchange the two dentries
*
* Update the dcache to reflect the move of a file name. Negative
* dcache entries should not be moved in this way. Caller must hold
* rename_lock, the i_mutex of the source and target directories,
* and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
*/
-static void __d_move(struct dentry * dentry, struct dentry * target)
+static void __d_move(struct dentry *dentry, struct dentry *target,
+ bool exchange)
{
if (!dentry->d_inode)
printk(KERN_WARNING "VFS: moving negative dcache entry\n");
@@ -2575,6 +2579,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)

/* Unhash the target: dput() will then get rid of it */
__d_drop(target);
+ if (exchange) {
+ __d_rehash(target,
+ d_hash(dentry->d_parent, dentry->d_name.hash));
+ }

list_del(&dentry->d_u.d_child);
list_del(&target->d_u.d_child);
@@ -2601,6 +2609,8 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
write_seqcount_end(&dentry->d_seq);

dentry_unlock_parents_for_move(dentry, target);
+ if (exchange)
+ fsnotify_d_move(target);
spin_unlock(&target->d_lock);
fsnotify_d_move(dentry);
spin_unlock(&dentry->d_lock);
@@ -2618,11 +2628,30 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
void d_move(struct dentry *dentry, struct dentry *target)
{
write_seqlock(&rename_lock);
- __d_move(dentry, target);
+ __d_move(dentry, target, false);
write_sequnlock(&rename_lock);
}
EXPORT_SYMBOL(d_move);

+/*
+ * d_exchange - exchange two dentries
+ * @dentry1: first dentry
+ * @dentry2: second dentry
+ */
+void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
+{
+ write_seqlock(&rename_lock);
+
+ WARN_ON(!dentry1->d_inode);
+ WARN_ON(!dentry2->d_inode);
+ WARN_ON(IS_ROOT(dentry1));
+ WARN_ON(IS_ROOT(dentry2));
+
+ __d_move(dentry1, dentry2, true);
+
+ write_sequnlock(&rename_lock);
+}
+
/**
* d_ancestor - search for an ancestor
* @p1: ancestor dentry
@@ -2670,7 +2699,7 @@ static struct dentry *__d_unalias(struct inode *inode,
m2 = &alias->d_parent->d_inode->i_mutex;
out_unalias:
if (likely(!d_mountpoint(alias))) {
- __d_move(alias, dentry);
+ __d_move(alias, dentry, false);
ret = alias;
}
out_err:
diff --git a/fs/namei.c b/fs/namei.c
index 1974a07835f7..50b0ca3dddc3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4004,6 +4004,8 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
const unsigned char *old_name;
struct inode *source = old_dentry->d_inode;
struct inode *target = new_dentry->d_inode;
+ bool new_is_dir = false;
+ unsigned max_links = new_dir->i_sb->s_max_links;

if (source == target)
return 0;
@@ -4012,10 +4014,16 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (error)
return error;

- if (!target)
+ if (!target) {
error = may_create(new_dir, new_dentry);
- else
- error = may_delete(new_dir, new_dentry, is_dir);
+ } else {
+ new_is_dir = d_is_dir(new_dentry);
+
+ if (!(flags & RENAME_EXCHANGE))
+ error = may_delete(new_dir, new_dentry, is_dir);
+ else
+ error = may_delete(new_dir, new_dentry, new_is_dir);
+ }
if (error)
return error;

@@ -4029,10 +4037,17 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
* If we are going to change the parent - check write permissions,
* we'll need to flip '..'.
*/
- if (is_dir && new_dir != old_dir) {
- error = inode_permission(source, MAY_WRITE);
- if (error)
- return error;
+ if (new_dir != old_dir) {
+ if (is_dir) {
+ error = inode_permission(source, MAY_WRITE);
+ if (error)
+ return error;
+ }
+ if ((flags & RENAME_EXCHANGE) && new_is_dir) {
+ error = inode_permission(target, MAY_WRITE);
+ if (error)
+ return error;
+ }
}

error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry,
@@ -4042,7 +4057,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,

old_name = fsnotify_oldname_init(old_dentry->d_name.name);
dget(new_dentry);
- if (!is_dir)
+ if (!is_dir || (flags & RENAME_EXCHANGE))
lock_two_nondirectories(source, target);
else if (target)
mutex_lock(&target->i_mutex);
@@ -4051,25 +4066,25 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
goto out;

- if (is_dir) {
- unsigned max_links = new_dir->i_sb->s_max_links;
-
+ if (max_links && new_dir != old_dir) {
error = -EMLINK;
- if (max_links && !target && new_dir != old_dir &&
- new_dir->i_nlink >= max_links)
+ if (is_dir && !new_is_dir && new_dir->i_nlink >= max_links)
goto out;
-
- if (target)
- shrink_dcache_parent(new_dentry);
- } else {
+ if ((flags & RENAME_EXCHANGE) && !is_dir && new_is_dir &&
+ old_dir->i_nlink >= max_links)
+ goto out;
+ }
+ if (is_dir && !(flags & RENAME_EXCHANGE) && target)
+ shrink_dcache_parent(new_dentry);
+ if (!is_dir) {
error = try_break_deleg(source, delegated_inode);
if (error)
goto out;
- if (target) {
- error = try_break_deleg(target, delegated_inode);
- if (error)
- goto out;
- }
+ }
+ if (target && !new_is_dir) {
+ error = try_break_deleg(target, delegated_inode);
+ if (error)
+ goto out;
}
if (!flags) {
error = old_dir->i_op->rename(old_dir, old_dentry,
@@ -4081,22 +4096,31 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (error)
goto out;

- if (target) {
+ if (!(flags & RENAME_EXCHANGE) && target) {
if (is_dir)
target->i_flags |= S_DEAD;
dont_mount(new_dentry);
}
- if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
- d_move(old_dentry, new_dentry);
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
+ if (!(flags & RENAME_EXCHANGE))
+ d_move(old_dentry, new_dentry);
+ else
+ d_exchange(old_dentry, new_dentry);
+ }
out:
- if (!is_dir)
+ if (!is_dir || (flags & RENAME_EXCHANGE))
unlock_two_nondirectories(source, target);
else if (target)
mutex_unlock(&target->i_mutex);
dput(new_dentry);
- if (!error)
+ if (!error) {
fsnotify_move(old_dir, new_dir, old_name, is_dir,
- target, old_dentry);
+ !(flags & RENAME_EXCHANGE) ? target : NULL, old_dentry);
+ if (flags & RENAME_EXCHANGE) {
+ fsnotify_move(new_dir, old_dir, old_dentry->d_name.name,
+ new_is_dir, NULL, new_dentry);
+ }
+ }
fsnotify_oldname_free(old_name);

return error;
@@ -4116,7 +4140,10 @@ SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
bool should_retry = false;
int error;

- if (flags & ~RENAME_NOREPLACE)
+ if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
+ return -EINVAL;
+
+ if ((flags & RENAME_NOREPLACE) && (flags & RENAME_EXCHANGE))
return -EINVAL;

retry:
@@ -4153,7 +4180,8 @@ retry:

oldnd.flags &= ~LOOKUP_PARENT;
newnd.flags &= ~LOOKUP_PARENT;
- newnd.flags |= LOOKUP_RENAME_TARGET;
+ if (!(flags & RENAME_EXCHANGE))
+ newnd.flags |= LOOKUP_RENAME_TARGET;

retry_deleg:
trap = lock_rename(new_dir, old_dir);
@@ -4173,12 +4201,23 @@ retry_deleg:
error = -EEXIST;
if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry))
goto exit5;
+ if (flags & RENAME_EXCHANGE) {
+ error = -ENOENT;
+ if (d_is_negative(new_dentry))
+ goto exit5;
+
+ if (!d_is_dir(new_dentry)) {
+ error = -ENOTDIR;
+ if (newnd.last.name[newnd.last.len])
+ goto exit5;
+ }
+ }
/* unless the source is a directory trailing slashes give -ENOTDIR */
if (!d_is_dir(old_dentry)) {
error = -ENOTDIR;
if (oldnd.last.name[oldnd.last.len])
goto exit5;
- if (newnd.last.name[newnd.last.len])
+ if (!(flags & RENAME_EXCHANGE) && newnd.last.name[newnd.last.len])
goto exit5;
}
/* source should not be ancestor of target */
@@ -4186,7 +4225,8 @@ retry_deleg:
if (old_dentry == trap)
goto exit5;
/* target should not be an ancestor of source */
- error = -ENOTEMPTY;
+ if (!(flags & RENAME_EXCHANGE))
+ error = -ENOTEMPTY;
if (new_dentry == trap)
goto exit5;

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3b50cac7ccb3..3b9bfdb83ba6 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -308,6 +308,7 @@ extern void dentry_update_name_case(struct dentry *, struct qstr *);

/* used for rename() and baskets */
extern void d_move(struct dentry *, struct dentry *);
+extern void d_exchange(struct dentry *, struct dentry *);
extern struct dentry *d_ancestor(struct dentry *, struct dentry *);

/* appendix may either be NULL or be used for transname suffixes */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 9250f4dd7d96..ca1a11bb4443 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -36,6 +36,7 @@
#define SEEK_MAX SEEK_HOLE

#define RENAME_NOREPLACE (1 << 0) /* Don't overwrite target */
+#define RENAME_EXCHANGE (1 << 1) /* Exchange source and dest */

struct fstrim_range {
__u64 start;
diff --git a/security/security.c b/security/security.c
index edc179f1ade0..3dd2258b7a38 100644
--- a/security/security.c
+++ b/security/security.c
@@ -439,6 +439,14 @@ int security_path_rename(struct path *old_dir, struct dentry *old_dentry,
if (unlikely(IS_PRIVATE(old_dentry->d_inode) ||
(new_dentry->d_inode && IS_PRIVATE(new_dentry->d_inode))))
return 0;
+
+ if (flags & RENAME_EXCHANGE) {
+ int err = security_ops->path_rename(new_dir, new_dentry,
+ old_dir, old_dentry);
+ if (err)
+ return err;
+ }
+
return security_ops->path_rename(old_dir, old_dentry, new_dir,
new_dentry);
}
@@ -531,6 +539,14 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
if (unlikely(IS_PRIVATE(old_dentry->d_inode) ||
(new_dentry->d_inode && IS_PRIVATE(new_dentry->d_inode))))
return 0;
+
+ if (flags & RENAME_EXCHANGE) {
+ int err = security_ops->inode_rename(new_dir, new_dentry,
+ old_dir, old_dentry);
+ if (err)
+ return err;
+ }
+
return security_ops->inode_rename(old_dir, old_dentry,
new_dir, new_dentry);
}
--
1.8.1.4

2014-02-07 16:48:41

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 06/13] security: add flags to rename hooks

From: Miklos Szeredi <[email protected]>

Add flags to security_path_rename() and security_inode_rename() hooks.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/cachefiles/namei.c | 2 +-
fs/namei.c | 5 +++--
include/linux/security.h | 12 ++++++++----
security/security.c | 6 ++++--
4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 31088a969351..6494d9f673aa 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -391,7 +391,7 @@ try_again:
path.dentry = dir;
path_to_graveyard.mnt = cache->mnt;
path_to_graveyard.dentry = cache->graveyard;
- ret = security_path_rename(&path, rep, &path_to_graveyard, grave);
+ ret = security_path_rename(&path, rep, &path_to_graveyard, grave, 0);
if (ret < 0) {
cachefiles_io_error(cache, "Rename security error %d", ret);
} else {
diff --git a/fs/namei.c b/fs/namei.c
index 9031abac50b1..1974a07835f7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4035,7 +4035,8 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
return error;
}

- error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
+ error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry,
+ flags);
if (error)
return error;

@@ -4190,7 +4191,7 @@ retry_deleg:
goto exit5;

error = security_path_rename(&oldnd.path, old_dentry,
- &newnd.path, new_dentry);
+ &newnd.path, new_dentry, flags);
if (error)
goto exit5;
error = vfs_rename(old_dir->d_inode, old_dentry,
diff --git a/include/linux/security.h b/include/linux/security.h
index 5623a7f965b7..95cfccc213fb 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1792,7 +1792,8 @@ int security_inode_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
int security_inode_rmdir(struct inode *dir, struct dentry *dentry);
int security_inode_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev);
int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry);
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags);
int security_inode_readlink(struct dentry *dentry);
int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
int security_inode_permission(struct inode *inode, int mask);
@@ -2160,7 +2161,8 @@ static inline int security_inode_mknod(struct inode *dir,
static inline int security_inode_rename(struct inode *old_dir,
struct dentry *old_dentry,
struct inode *new_dir,
- struct dentry *new_dentry)
+ struct dentry *new_dentry,
+ unsigned int flags)
{
return 0;
}
@@ -2951,7 +2953,8 @@ int security_path_symlink(struct path *dir, struct dentry *dentry,
int security_path_link(struct dentry *old_dentry, struct path *new_dir,
struct dentry *new_dentry);
int security_path_rename(struct path *old_dir, struct dentry *old_dentry,
- struct path *new_dir, struct dentry *new_dentry);
+ struct path *new_dir, struct dentry *new_dentry,
+ unsigned int flags);
int security_path_chmod(struct path *path, umode_t mode);
int security_path_chown(struct path *path, kuid_t uid, kgid_t gid);
int security_path_chroot(struct path *path);
@@ -2999,7 +3002,8 @@ static inline int security_path_link(struct dentry *old_dentry,
static inline int security_path_rename(struct path *old_dir,
struct dentry *old_dentry,
struct path *new_dir,
- struct dentry *new_dentry)
+ struct dentry *new_dentry,
+ unsigned int flags)
{
return 0;
}
diff --git a/security/security.c b/security/security.c
index 15b6928592ef..edc179f1ade0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -433,7 +433,8 @@ int security_path_link(struct dentry *old_dentry, struct path *new_dir,
}

int security_path_rename(struct path *old_dir, struct dentry *old_dentry,
- struct path *new_dir, struct dentry *new_dentry)
+ struct path *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
if (unlikely(IS_PRIVATE(old_dentry->d_inode) ||
(new_dentry->d_inode && IS_PRIVATE(new_dentry->d_inode))))
@@ -524,7 +525,8 @@ int security_inode_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
}

int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
if (unlikely(IS_PRIVATE(old_dentry->d_inode) ||
(new_dentry->d_inode && IS_PRIVATE(new_dentry->d_inode))))
--
1.8.1.4

2014-02-07 16:55:24

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 05/13] vfs: add RENAME_NOREPLACE flag

From: Miklos Szeredi <[email protected]>

If this flag is specified and the target of the rename exists then the
rename syscall fails with EEXIST.

The VFS does the existence checking, so it is trivial to enable for most
local filesystems. This patch only enables it in ext4.

For network filesystems the VFS check is not enough as there may be a race
between a remote create and the rename, so these filesystems need to handle
this flag in their ->rename() implementations to ensure atomicity.

Andy writes about why this is useful:

"The trivial answer: to eliminate the race condition from 'mv -i'.

Another answer: there's a common pattern to atomically create a file
with contents: open a temporary file, write to it, optionally fsync
it, close it, then link(2) it to the final name, then unlink the
temporary file.

The reason to use link(2) is because it won't silently clobber the destination.

This is annoying:
- It requires an extra system call that shouldn't be necessary.
- It doesn't work on (IMO sensible) filesystems that don't support
hard links (e.g. vfat).
- It's not atomic -- there's an intermediate state where both files exist.
- It's ugly.

The new rename flag will make this totally sensible.

To be fair, on new enough kernels, you can also use O_TMPFILE and
linkat to achieve the same thing even more cleanly."

Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/ext4/namei.c | 11 +++++++++++
fs/namei.c | 21 +++++++++++++--------
include/uapi/linux/fs.h | 2 ++
3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index d050e043e884..5f19171b3e1f 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3204,6 +3204,16 @@ end_rename:
return retval;
}

+static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
+{
+ if (flags & ~RENAME_NOREPLACE)
+ return -EINVAL;
+
+ return ext4_rename(old_dir, old_dentry, new_dir, new_dentry);
+}
+
/*
* directories can handle most operations...
*/
@@ -3218,6 +3228,7 @@ const struct inode_operations ext4_dir_inode_operations = {
.mknod = ext4_mknod,
.tmpfile = ext4_tmpfile,
.rename = ext4_rename,
+ .rename2 = ext4_rename2,
.setattr = ext4_setattr,
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
diff --git a/fs/namei.c b/fs/namei.c
index 93a98a303db5..9031abac50b1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4115,7 +4115,7 @@ SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
bool should_retry = false;
int error;

- if (flags)
+ if (flags & ~RENAME_NOREPLACE)
return -EINVAL;

retry:
@@ -4141,6 +4141,8 @@ retry:
goto exit2;

new_dir = newnd.path.dentry;
+ if (flags & RENAME_NOREPLACE)
+ error = -EEXIST;
if (newnd.last_type != LAST_NORM)
goto exit2;

@@ -4163,22 +4165,25 @@ retry_deleg:
error = -ENOENT;
if (d_is_negative(old_dentry))
goto exit4;
+ new_dentry = lookup_hash(&newnd);
+ error = PTR_ERR(new_dentry);
+ if (IS_ERR(new_dentry))
+ goto exit4;
+ error = -EEXIST;
+ if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry))
+ goto exit5;
/* unless the source is a directory trailing slashes give -ENOTDIR */
if (!d_is_dir(old_dentry)) {
error = -ENOTDIR;
if (oldnd.last.name[oldnd.last.len])
- goto exit4;
+ goto exit5;
if (newnd.last.name[newnd.last.len])
- goto exit4;
+ goto exit5;
}
/* source should not be ancestor of target */
error = -EINVAL;
if (old_dentry == trap)
- goto exit4;
- new_dentry = lookup_hash(&newnd);
- error = PTR_ERR(new_dentry);
- if (IS_ERR(new_dentry))
- goto exit4;
+ goto exit5;
/* target should not be an ancestor of source */
error = -ENOTEMPTY;
if (new_dentry == trap)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 6c28b61bb690..9250f4dd7d96 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -35,6 +35,8 @@
#define SEEK_HOLE 4 /* seek to the next hole */
#define SEEK_MAX SEEK_HOLE

+#define RENAME_NOREPLACE (1 << 0) /* Don't overwrite target */
+
struct fstrim_range {
__u64 start;
__u64 len;
--
1.8.1.4

2014-02-07 16:56:01

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 04/13] vfs: add renameat2 syscall

From: Miklos Szeredi <[email protected]>

Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

Pass flags to vfs_rename() and to i_op->rename() as well.

Signed-off-by: Miklos Szeredi <[email protected]>
---
Documentation/filesystems/Locking | 6 +++-
Documentation/filesystems/vfs.txt | 16 ++++++++++
arch/x86/syscalls/syscall_64.tbl | 1 +
.../lustre/lustre/include/linux/lustre_compat25.h | 4 +--
drivers/staging/lustre/lustre/lvfs/lvfs_linux.c | 2 +-
fs/cachefiles/namei.c | 2 +-
fs/ecryptfs/inode.c | 2 +-
fs/namei.c | 34 +++++++++++++++++-----
fs/nfsd/vfs.c | 2 +-
include/linux/fs.h | 4 ++-
10 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 5b0c083d7c0e..f424e0e5b46b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -47,6 +47,8 @@ prototypes:
int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
+ int (*rename2) (struct inode *, struct dentry *,
+ struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
void * (*follow_link) (struct dentry *, struct nameidata *);
void (*put_link) (struct dentry *, struct nameidata *, void *);
@@ -78,6 +80,7 @@ mkdir: yes
unlink: yes (both)
rmdir: yes (both) (see below)
rename: yes (all) (see below)
+rename2: yes (all) (see below)
readlink: no
follow_link: no
put_link: no
@@ -96,7 +99,8 @@ tmpfile: no

Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_mutex on
victim.
- cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
+ cross-directory ->rename() and rename2() has (per-superblock)
+->s_vfs_rename_sem.

See Documentation/filesystems/directory-locking for more detailed discussion
of the locking scheme for directory operations.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index c53784c119c8..94eb86287bcb 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -347,6 +347,8 @@ struct inode_operations {
int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
+ int (*rename2) (struct inode *, struct dentry *,
+ struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
void * (*follow_link) (struct dentry *, struct nameidata *);
void (*put_link) (struct dentry *, struct nameidata *, void *);
@@ -414,6 +416,20 @@ otherwise noted.
rename: called by the rename(2) system call to rename the object to
have the parent and name given by the second inode and dentry.

+ rename2: this has an additional flags argument compared to rename.
+ If no flags are supported by the filesystem then this method
+ need not be implemented. If some flags are supported then the
+ filesystem must return -EINVAL for any unsupported or unknown
+ flags. Currently the following flags are implemented:
+ (1) RENAME_NOREPLACE: this flag indicates that if the target
+ of the rename exists the rename should fail with -EEXIST
+ instead of replacing the target. The VFS already checks for
+ existence, so for local filesystems the RENAME_NOREPLACE
+ implementation is equivalent to plain rename.
+ (2) RENAME_EXCHANGE: exchange source and target. Both must
+ exist; this is checked by the VFS. Unlike plain rename,
+ source and target may be of different type.
+
readlink: called by the readlink(2) system call. Only required if
you want to support reading symbolic links

diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index a12bddc7ccea..04376ac3d9ef 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -322,6 +322,7 @@
313 common finit_module sys_finit_module
314 common sched_setattr sys_sched_setattr
315 common sched_getattr sys_sched_getattr
+316 common renameat2 sys_renameat2

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h b/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h
index eefdb8d061b1..81cc7a0134bb 100644
--- a/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h
+++ b/drivers/staging/lustre/lustre/include/linux/lustre_compat25.h
@@ -105,8 +105,8 @@ static inline void ll_set_fs_pwd(struct fs_struct *fs, struct vfsmount *mnt,
#define ll_vfs_unlink(inode,entry,mnt) vfs_unlink(inode,entry)
#define ll_vfs_mknod(dir,entry,mnt,mode,dev) vfs_mknod(dir,entry,mode,dev)
#define ll_security_inode_unlink(dir,entry,mnt) security_inode_unlink(dir,entry)
-#define ll_vfs_rename(old,old_dir,mnt,new,new_dir,mnt1,delegated_inode) \
- vfs_rename(old,old_dir,new,new_dir,delegated_inode)
+#define ll_vfs_rename(old, old_dir, mnt, new, new_dir, mnt1) \
+ vfs_rename(old, old_dir, new, new_dir, NULL, 0)

#define cfs_bio_io_error(a,b) bio_io_error((a))
#define cfs_bio_endio(a,b,c) bio_endio((a),(c))
diff --git a/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c b/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
index 428ffd8c37b7..d50822be3230 100644
--- a/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
+++ b/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
@@ -223,7 +223,7 @@ int lustre_rename(struct dentry *dir, struct vfsmount *mnt,
GOTO(put_old, err = PTR_ERR(dchild_new));

err = ll_vfs_rename(dir->d_inode, dchild_old, mnt,
- dir->d_inode, dchild_new, mnt, NULL);
+ dir->d_inode, dchild_new, mnt);

dput(dchild_new);
put_old:
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index ca65f39dc8dc..31088a969351 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -396,7 +396,7 @@ try_again:
cachefiles_io_error(cache, "Rename security error %d", ret);
} else {
ret = vfs_rename(dir->d_inode, rep,
- cache->graveyard->d_inode, grave, NULL);
+ cache->graveyard->d_inode, grave, NULL, 0);
if (ret != 0 && ret != -ENOMEM)
cachefiles_io_error(cache,
"Rename failed with error %d", ret);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index b167ca48b8ee..d4a9431ec73c 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -641,7 +641,7 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry,
}
rc = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
lower_new_dir_dentry->d_inode, lower_new_dentry,
- NULL);
+ NULL, 0);
if (rc)
goto out_lock;
if (target_inode)
diff --git a/fs/namei.c b/fs/namei.c
index b6976bd59cbb..93a98a303db5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3953,6 +3953,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
* @new_dir: parent of destination
* @new_dentry: destination
* @delegated_inode: returns an inode needing a delegation break
+ * @flags: rename flags
*
* The caller must hold multiple mutexes--see lock_rename()).
*
@@ -3996,7 +3997,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
*/
int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
- struct inode **delegated_inode)
+ struct inode **delegated_inode, unsigned int flags)
{
int error;
bool is_dir = d_is_dir(old_dentry);
@@ -4021,6 +4022,9 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (!old_dir->i_op->rename)
return -EPERM;

+ if (flags && !old_dir->i_op->rename2)
+ return -EINVAL;
+
/*
* If we are going to change the parent - check write permissions,
* we'll need to flip '..'.
@@ -4066,7 +4070,13 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
goto out;
}
}
- error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
+ if (!flags) {
+ error = old_dir->i_op->rename(old_dir, old_dentry,
+ new_dir, new_dentry);
+ } else {
+ error = old_dir->i_op->rename2(old_dir, old_dentry,
+ new_dir, new_dentry, flags);
+ }
if (error)
goto out;

@@ -4091,8 +4101,8 @@ out:
return error;
}

-SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
- int, newdfd, const char __user *, newname)
+SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
+ int, newdfd, const char __user *, newname, unsigned int, flags)
{
struct dentry *old_dir, *new_dir;
struct dentry *old_dentry, *new_dentry;
@@ -4104,6 +4114,10 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
unsigned int lookup_flags = 0;
bool should_retry = false;
int error;
+
+ if (flags)
+ return -EINVAL;
+
retry:
from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
if (IS_ERR(from)) {
@@ -4175,8 +4189,8 @@ retry_deleg:
if (error)
goto exit5;
error = vfs_rename(old_dir->d_inode, old_dentry,
- new_dir->d_inode, new_dentry,
- &delegated_inode);
+ new_dir->d_inode, new_dentry,
+ &delegated_inode, flags);
exit5:
dput(new_dentry);
exit4:
@@ -4206,9 +4220,15 @@ exit:
return error;
}

+SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
+ int, newdfd, const char __user *, newname)
+{
+ return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
+}
+
SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
{
- return sys_renameat(AT_FDCWD, oldname, AT_FDCWD, newname);
+ return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
}

int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 017d3cb5e99b..f88df9c1336d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1693,7 +1693,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
goto out_dput_new;

- host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL);
+ host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
if (!host_err) {
host_err = commit_metadata(tfhp);
if (!host_err)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 09f553c59813..4fbdfae87410 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1456,7 +1456,7 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *);
extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
extern int vfs_rmdir(struct inode *, struct dentry *);
extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
-extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **);
+extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int);

/*
* VFS dentry helper functions.
@@ -1567,6 +1567,8 @@ struct inode_operations {
int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
+ int (*rename2) (struct inode *, struct dentry *,
+ struct inode *, struct dentry *, unsigned int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
--
1.8.1.4

2014-02-07 16:48:35

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 01/13] vfs: add d_is_dir()

From: Miklos Szeredi <[email protected]>

Add d_is_dir(dentry) helper which is analogous to S_ISDIR().

To avoid confusion, rename d_is_directory() to d_can_lookup().

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/namei.c | 23 +++++++++++------------
include/linux/dcache.h | 7 ++++++-
2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d580df2e6804..258c06ae26a7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1769,7 +1769,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (err)
return err;
}
- if (!d_is_directory(nd->path.dentry)) {
+ if (!d_can_lookup(nd->path.dentry)) {
err = -ENOTDIR;
break;
}
@@ -1790,7 +1790,7 @@ static int path_init(int dfd, const char *name, unsigned int flags,
struct dentry *root = nd->root.dentry;
struct inode *inode = root->d_inode;
if (*name) {
- if (!d_is_directory(root))
+ if (!d_can_lookup(root))
return -ENOTDIR;
retval = inode_permission(inode, MAY_EXEC);
if (retval)
@@ -1846,7 +1846,7 @@ static int path_init(int dfd, const char *name, unsigned int flags,
dentry = f.file->f_path.dentry;

if (*name) {
- if (!d_is_directory(dentry)) {
+ if (!d_can_lookup(dentry)) {
fdput(f);
return -ENOTDIR;
}
@@ -1928,7 +1928,7 @@ static int path_lookupat(int dfd, const char *name,
err = complete_walk(nd);

if (!err && nd->flags & LOOKUP_DIRECTORY) {
- if (!d_is_directory(nd->path.dentry)) {
+ if (!d_can_lookup(nd->path.dentry)) {
path_put(&nd->path);
err = -ENOTDIR;
}
@@ -2387,11 +2387,11 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
IS_IMMUTABLE(inode) || IS_SWAPFILE(inode))
return -EPERM;
if (isdir) {
- if (!d_is_directory(victim) && !d_is_autodir(victim))
+ if (!d_is_dir(victim))
return -ENOTDIR;
if (IS_ROOT(victim))
return -EBUSY;
- } else if (d_is_directory(victim) || d_is_autodir(victim))
+ } else if (d_is_dir(victim))
return -EISDIR;
if (IS_DEADDIR(dir))
return -ENOENT;
@@ -2989,11 +2989,10 @@ finish_open:
}
audit_inode(name, nd->path.dentry, 0);
error = -EISDIR;
- if ((open_flag & O_CREAT) &&
- (d_is_directory(nd->path.dentry) || d_is_autodir(nd->path.dentry)))
+ if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
goto out;
error = -ENOTDIR;
- if ((nd->flags & LOOKUP_DIRECTORY) && !d_is_directory(nd->path.dentry))
+ if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
goto out;
if (!S_ISREG(nd->inode->i_mode))
will_truncate = false;
@@ -3717,7 +3716,7 @@ exit1:
slashes:
if (d_is_negative(dentry))
error = -ENOENT;
- else if (d_is_directory(dentry) || d_is_autodir(dentry))
+ else if (d_is_dir(dentry))
error = -EISDIR;
else
error = -ENOTDIR;
@@ -4096,7 +4095,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode **delegated_inode)
{
int error;
- int is_dir = d_is_directory(old_dentry) || d_is_autodir(old_dentry);
+ int is_dir = d_is_dir(old_dentry);
const unsigned char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
@@ -4189,7 +4188,7 @@ retry_deleg:
if (d_is_negative(old_dentry))
goto exit4;
/* unless the source is a directory trailing slashes give -ENOTDIR */
- if (!d_is_directory(old_dentry) && !d_is_autodir(old_dentry)) {
+ if (!d_is_dir(old_dentry)) {
error = -ENOTDIR;
if (oldnd.last.name[oldnd.last.len])
goto exit4;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index bf72e9ac6de0..3b50cac7ccb3 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -429,7 +429,7 @@ static inline unsigned __d_entry_type(const struct dentry *dentry)
return dentry->d_flags & DCACHE_ENTRY_TYPE;
}

-static inline bool d_is_directory(const struct dentry *dentry)
+static inline bool d_can_lookup(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_DIRECTORY_TYPE;
}
@@ -439,6 +439,11 @@ static inline bool d_is_autodir(const struct dentry *dentry)
return __d_entry_type(dentry) == DCACHE_AUTODIR_TYPE;
}

+static inline bool d_is_dir(const struct dentry *dentry)
+{
+ return d_can_lookup(dentry) || d_is_autodir(dentry);
+}
+
static inline bool d_is_symlink(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_SYMLINK_TYPE;
--
1.8.1.4

2014-02-07 16:57:00

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 03/13] vfs: rename: use common code for dir and non-dir

From: Miklos Szeredi <[email protected]>

There's actually very little difference between vfs_rename_dir() and
vfs_rename_other() so move both inline into vfs_rename() which still stays
reasonably readable.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/namei.c | 187 +++++++++++++++++++++++++------------------------------------
1 file changed, 75 insertions(+), 112 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1409090d0913..b6976bd59cbb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3946,7 +3946,27 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
return sys_linkat(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
}

-/*
+/**
+ * vfs_rename - rename a filesystem object
+ * @old_dir: parent of source
+ * @old_dentry: source
+ * @new_dir: parent of destination
+ * @new_dentry: destination
+ * @delegated_inode: returns an inode needing a delegation break
+ *
+ * The caller must hold multiple mutexes--see lock_rename()).
+ *
+ * If vfs_rename discovers a delegation in need of breaking at either
+ * the source or destination, it will return -EWOULDBLOCK and return a
+ * reference to the inode in delegated_inode. The caller should then
+ * break the delegation and retry. Because breaking a delegation may
+ * take a long time, the caller should drop all locks before doing
+ * so.
+ *
+ * Alternatively, a caller may pass NULL for delegated_inode. This may
+ * be appropriate for callers that expect the underlying filesystem not
+ * to be NFS exported.
+ *
* The worst of all namespace operations - renaming directory. "Perverted"
* doesn't even start to describe it. Somebody in UCB had a heck of a trip...
* Problems:
@@ -3974,19 +3994,39 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
* ->i_mutex on parents, which works but leads to some truly excessive
* locking].
*/
-static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ struct inode **delegated_inode)
{
- int error = 0;
+ int error;
+ bool is_dir = d_is_dir(old_dentry);
+ const unsigned char *old_name;
+ struct inode *source = old_dentry->d_inode;
struct inode *target = new_dentry->d_inode;
- unsigned max_links = new_dir->i_sb->s_max_links;
+
+ if (source == target)
+ return 0;
+
+ error = may_delete(old_dir, old_dentry, is_dir);
+ if (error)
+ return error;
+
+ if (!target)
+ error = may_create(new_dir, new_dentry);
+ else
+ error = may_delete(new_dir, new_dentry, is_dir);
+ if (error)
+ return error;
+
+ if (!old_dir->i_op->rename)
+ return -EPERM;

/*
* If we are going to change the parent - check write permissions,
* we'll need to flip '..'.
*/
- if (new_dir != old_dir) {
- error = inode_permission(old_dentry->d_inode, MAY_WRITE);
+ if (is_dir && new_dir != old_dir) {
+ error = inode_permission(source, MAY_WRITE);
if (error)
return error;
}
@@ -3995,134 +4035,57 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
if (error)
return error;

+ old_name = fsnotify_oldname_init(old_dentry->d_name.name);
dget(new_dentry);
- if (target)
+ if (!is_dir)
+ lock_two_nondirectories(source, target);
+ else if (target)
mutex_lock(&target->i_mutex);

error = -EBUSY;
if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
goto out;

- error = -EMLINK;
- if (max_links && !target && new_dir != old_dir &&
- new_dir->i_nlink >= max_links)
- goto out;
-
- if (target)
- shrink_dcache_parent(new_dentry);
- error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
- if (error)
- goto out;
-
- if (target) {
- target->i_flags |= S_DEAD;
- dont_mount(new_dentry);
- }
- if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
- d_move(old_dentry, new_dentry);
-out:
- if (target)
- mutex_unlock(&target->i_mutex);
- dput(new_dentry);
- return error;
-}
-
-static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry,
- struct inode **delegated_inode)
-{
- struct inode *target = new_dentry->d_inode;
- struct inode *source = old_dentry->d_inode;
- int error;
-
- error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
- if (error)
- return error;
-
- dget(new_dentry);
- lock_two_nondirectories(source, target);
+ if (is_dir) {
+ unsigned max_links = new_dir->i_sb->s_max_links;

- error = -EBUSY;
- if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
- goto out;
+ error = -EMLINK;
+ if (max_links && !target && new_dir != old_dir &&
+ new_dir->i_nlink >= max_links)
+ goto out;

- error = try_break_deleg(source, delegated_inode);
- if (error)
- goto out;
- if (target) {
- error = try_break_deleg(target, delegated_inode);
+ if (target)
+ shrink_dcache_parent(new_dentry);
+ } else {
+ error = try_break_deleg(source, delegated_inode);
if (error)
goto out;
+ if (target) {
+ error = try_break_deleg(target, delegated_inode);
+ if (error)
+ goto out;
+ }
}
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (error)
goto out;

- if (target)
+ if (target) {
+ if (is_dir)
+ target->i_flags |= S_DEAD;
dont_mount(new_dentry);
+ }
if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
d_move(old_dentry, new_dentry);
out:
- unlock_two_nondirectories(source, target);
+ if (!is_dir)
+ unlock_two_nondirectories(source, target);
+ else if (target)
+ mutex_unlock(&target->i_mutex);
dput(new_dentry);
- return error;
-}
-
-/**
- * vfs_rename - rename a filesystem object
- * @old_dir: parent of source
- * @old_dentry: source
- * @new_dir: parent of destination
- * @new_dentry: destination
- * @delegated_inode: returns an inode needing a delegation break
- *
- * The caller must hold multiple mutexes--see lock_rename()).
- *
- * If vfs_rename discovers a delegation in need of breaking at either
- * the source or destination, it will return -EWOULDBLOCK and return a
- * reference to the inode in delegated_inode. The caller should then
- * break the delegation and retry. Because breaking a delegation may
- * take a long time, the caller should drop all locks before doing
- * so.
- *
- * Alternatively, a caller may pass NULL for delegated_inode. This may
- * be appropriate for callers that expect the underlying filesystem not
- * to be NFS exported.
- */
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry,
- struct inode **delegated_inode)
-{
- int error;
- int is_dir = d_is_dir(old_dentry);
- const unsigned char *old_name;
-
- if (old_dentry->d_inode == new_dentry->d_inode)
- return 0;
-
- error = may_delete(old_dir, old_dentry, is_dir);
- if (error)
- return error;
-
- if (!new_dentry->d_inode)
- error = may_create(new_dir, new_dentry);
- else
- error = may_delete(new_dir, new_dentry, is_dir);
- if (error)
- return error;
-
- if (!old_dir->i_op->rename)
- return -EPERM;
-
- old_name = fsnotify_oldname_init(old_dentry->d_name.name);
-
- if (is_dir)
- error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
- else
- error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry,delegated_inode);
if (!error)
fsnotify_move(old_dir, new_dir, old_name, is_dir,
- new_dentry->d_inode, old_dentry);
+ target, old_dentry);
fsnotify_oldname_free(old_name);

return error;
--
1.8.1.4

2014-02-07 17:36:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 01/13] vfs: add d_is_dir()

On Fri, Feb 07, 2014 at 05:48:59PM +0100, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> Add d_is_dir(dentry) helper which is analogous to S_ISDIR().

While trying to get up to speed I notice that these flags were
introduced by b18825a7c8e37a7cf6abb97a12a6ad71af160de7 "VFS: Put a small
type field into struct dentry::d_flags" whose changelog tells me "what"
but not "why". So out of curiosity: was that some kind of optimization,
or was there some other reason for it?

--b.

2014-02-07 19:58:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 01/13] vfs: add d_is_dir()

J. Bruce Fields <[email protected]> wrote:

> On Fri, Feb 07, 2014 at 05:48:59PM +0100, Miklos Szeredi wrote:
> > From: Miklos Szeredi <[email protected]>
> >
> > Add d_is_dir(dentry) helper which is analogous to S_ISDIR().
>
> While trying to get up to speed I notice that these flags were
> introduced by b18825a7c8e37a7cf6abb97a12a6ad71af160de7 "VFS: Put a small
> type field into struct dentry::d_flags" whose changelog tells me "what"
> but not "why". So out of curiosity: was that some kind of optimization,
> or was there some other reason for it?

The idea is to put into the dentry information about objects that pathwalk
specifically needs to know about, ie. negative dentries, dirs, automount
points and symlinks so that you don't have to look this up in the inode.

We can use this to distinguish lookupable dirs from nonlookupable dirs
(ie. automounts), thereby subsuming the can_lookup query.

However, the real power comes in support for fs unioning. There are two
additional uses for this field:

(1) Whiteout dentry type. An additional type is added for this field which
causes d_is_negative() to return true, but can be distinguished when
trying to decide whether to look in a lower layer or not.

(2) Fallthroughs. Whilst the dentry may appear negative by virtue of having
a NULL d_inode pointer, it does in fact correspond to a real object on a
lower layer. The type of the lower object is set in this field so that
pathwalk can immediately know how to deal with it.

David

2014-02-07 21:16:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 07/13] vfs: lock_two_nondirectories: allow directory args

On Fri, Feb 07, 2014 at 05:49:05PM +0100, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> lock_two_nondirectories warned if either of its args was a directory.
> Instead just ignore the directory args. This is needed for locking in
> cross rename.
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/inode.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 4bcdad3c9361..763010771cf4 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -944,18 +944,21 @@ EXPORT_SYMBOL(unlock_new_inode);
>
> /**
> * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + *
> + * If either or both arguments are directories, then ignore those.
> + * Therefore zero, one or two objects may be locked by this function.
> + *
> * @inode1: first inode to lock
> * @inode2: second inode to lock
> */
> void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> {
> - WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
> - if (inode1 == inode2 || !inode2) {
> + if (S_ISDIR(inode1->i_mode)) {
> + if (inode2 && !S_ISDIR(inode2->i_mode))
> + mutex_lock(&inode2->i_mutex);
> + } else if (inode1 == inode2 || !inode2 || S_ISDIR(inode2->i_mode)) {
> mutex_lock(&inode1->i_mutex);
> - return;
> - }
> - WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
> - if (inode1 < inode2) {
> + } else if (inode1 < inode2) {
> mutex_lock(&inode1->i_mutex);
> mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
> } else {

Nit: I find the conditionals here a little complicated.

Would something like this be clearer? (Untested):

if (inode1 > inode2)
swap(inode1, inode2);

if (inode1 && !S_ISDIR(inode1->i_mode))
mutex_lock(&inode1->i_mutex);

if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);

--b.

diff --git a/fs/inode.c b/fs/inode.c
index 4bcdad3..94e41c8 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -944,24 +944,23 @@ EXPORT_SYMBOL(unlock_new_inode);

/**
* lock_two_nondirectories - take two i_mutexes on non-directory objects
+ *
+ * Lock any non-NULL argument that is not a directory.
+ * Zero, one or two objects may be locked by this function.
+ *
* @inode1: first inode to lock
* @inode2: second inode to lock
*/
void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
- WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
- if (inode1 == inode2 || !inode2) {
- mutex_lock(&inode1->i_mutex);
- return;
- }
- WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
- if (inode1 < inode2) {
+ if (inode1 > inode2)
+ swap(inode1, inode2);
+
+ if (inode1 && !S_ISDIR(inode1->i_mode))
mutex_lock(&inode1->i_mutex);
+
+ if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
- } else {
- mutex_lock(&inode2->i_mutex);
- mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
- }
}
EXPORT_SYMBOL(lock_two_nondirectories);

@@ -972,8 +971,9 @@ EXPORT_SYMBOL(lock_two_nondirectories);
*/
void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
- mutex_unlock(&inode1->i_mutex);
- if (inode2 && inode2 != inode1)
+ if (inode1 && !S_ISDIR(inode1->i_mode))
+ mutex_unlock(&inode1->i_mutex);
+ if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
mutex_unlock(&inode2->i_mutex);
}
EXPORT_SYMBOL(unlock_two_nondirectories);

2014-02-07 22:40:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/13] vfs: add cross-rename

On Fri, Feb 07, 2014 at 05:49:06PM +0100, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> If flags contain RENAME_EXCHANGE then exchange source and destination files.
> There's no restriction on the type of the files; e.g. a directory can be
> exchanged with a symlink.
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>

I don't see any problem with the delegation stuff. Some random
bikeshedding:

> @@ -2575,6 +2579,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>
> /* Unhash the target: dput() will then get rid of it */

I never understood the point of this comment. It's not even right, is
it? And if anything this makes it less so. Delete?

> __d_drop(target);
> + if (exchange) {
> + __d_rehash(target,
> + d_hash(dentry->d_parent, dentry->d_name.hash));
> + }
>
> list_del(&dentry->d_u.d_child);
> list_del(&target->d_u.d_child);
...
> @@ -4042,7 +4057,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>
> old_name = fsnotify_oldname_init(old_dentry->d_name.name);
> dget(new_dentry);
> - if (!is_dir)
> + if (!is_dir || (flags & RENAME_EXCHANGE))
> lock_two_nondirectories(source, target);
> else if (target)
> mutex_lock(&target->i_mutex);

I had to stop to think about that for a minute: OK, so in the normal
rename case we still need to lock the to-be-deleted target, and
lock_two_nondirectories won't do that for us because it ignores
directories. Got it.

This feels a bit ugly but I don't have a better idea.

> @@ -4051,25 +4066,25 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,

Most of this function is under (flags & RENAME_EXCHANGE) conditionals at
this point. Have you looked at how much is duplicated if you split this
into something like vfs_rename and vfs_exchange?

--b.

2014-02-07 22:46:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Fri, Feb 07, 2014 at 05:48:58PM +0100, Miklos Szeredi wrote:
> Changes since the last version (based on Al's review):
>
> - cross-rename: fix locking of nondirectories for NFSv4
> - ext4: split cross-rename and plain rename into separate functions
> - introduce i_op->rename2 with flags, don't touch ->rename
> - last (optional) patch to merge ->rename2 back into ->rename
>
> The splitting of the ext4 implemetation was indeed a good idea as it uncovered a
> memory leak and small inconsistencies with the merged implementation.
>
> Splitting out rename2 will lessen the code churn, but I think is ugly. However
> this is a question of taste, last patch can be ommitted without loss of
> functionality.
>
> Bruce, could you please review the locking and delegation thing in patch #8
> "vfs: add cross-rename"?

Yep, done. I'll also try running this through my nfs tests, for what
it's worth. (Not today as there's some unrelated regression to sort
out first.)

Feel free to add a

Reviewed-by: J. Bruce Fields <[email protected]>

for any but the ext4 patches, which I skipped.

--b.

>
> Git tree is here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git cross-rename
>
> Thanks,
> Miklos
>
> ---
> Miklos Szeredi (13):
> vfs: add d_is_dir()
> vfs: rename: move d_move() up
> vfs: rename: use common code for dir and non-dir
> vfs: add renameat2 syscall
> vfs: add RENAME_NOREPLACE flag
> security: add flags to rename hooks
> vfs: lock_two_nondirectories: allow directory args
> vfs: add cross-rename
> ext4: rename: create ext4_renament structure for local vars
> ext4: rename: move EMLINK check up
> ext4: rename: split out helper functions
> ext4: add cross rename support
> vfs: merge rename2 into rename
>
> ---
> Documentation/filesystems/Locking | 2 +-
> Documentation/filesystems/vfs.txt | 14 +-
> arch/x86/syscalls/syscall_64.tbl | 1 +
> .../lustre/lustre/include/linux/lustre_compat25.h | 4 +-
> drivers/staging/lustre/lustre/llite/namei.c | 3 +-
> drivers/staging/lustre/lustre/lvfs/lvfs_linux.c | 2 +-
> fs/9p/v9fs.h | 3 +-
> fs/9p/vfs_inode.c | 4 +-
> fs/affs/affs.h | 3 +-
> fs/affs/namei.c | 3 +-
> fs/afs/dir.c | 6 +-
> fs/bad_inode.c | 3 +-
> fs/bfs/dir.c | 3 +-
> fs/btrfs/inode.c | 3 +-
> fs/cachefiles/namei.c | 4 +-
> fs/ceph/dir.c | 3 +-
> fs/cifs/cifsfs.h | 2 +-
> fs/cifs/inode.c | 3 +-
> fs/coda/dir.c | 8 +-
> fs/dcache.c | 45 +-
> fs/debugfs/inode.c | 2 +-
> fs/ecryptfs/inode.c | 5 +-
> fs/exofs/namei.c | 3 +-
> fs/ext2/namei.c | 5 +-
> fs/ext3/namei.c | 5 +-
> fs/ext4/namei.c | 483 +++++++++++++++------
> fs/ext4/super.c | 6 +-
> fs/f2fs/namei.c | 3 +-
> fs/fat/namei_msdos.c | 3 +-
> fs/fat/namei_vfat.c | 3 +-
> fs/fuse/dir.c | 3 +-
> fs/gfs2/inode.c | 3 +-
> fs/hfs/dir.c | 3 +-
> fs/hfsplus/dir.c | 3 +-
> fs/hostfs/hostfs_kern.c | 3 +-
> fs/hpfs/namei.c | 3 +-
> fs/inode.c | 20 +-
> fs/jffs2/dir.c | 5 +-
> fs/jfs/namei.c | 3 +-
> fs/kernfs/dir.c | 3 +-
> fs/libfs.c | 3 +-
> fs/logfs/dir.c | 3 +-
> fs/minix/namei.c | 5 +-
> fs/namei.c | 310 +++++++------
> fs/ncpfs/dir.c | 5 +-
> fs/nfs/dir.c | 3 +-
> fs/nfs/internal.h | 3 +-
> fs/nfsd/vfs.c | 2 +-
> fs/nilfs2/namei.c | 3 +-
> fs/ocfs2/namei.c | 3 +-
> fs/omfs/dir.c | 3 +-
> fs/reiserfs/namei.c | 3 +-
> fs/sysv/namei.c | 5 +-
> fs/ubifs/dir.c | 3 +-
> fs/udf/namei.c | 3 +-
> fs/ufs/namei.c | 3 +-
> fs/xfs/xfs_iops.c | 3 +-
> include/linux/dcache.h | 8 +-
> include/linux/fs.h | 7 +-
> include/linux/security.h | 12 +-
> include/uapi/linux/fs.h | 3 +
> kernel/cgroup.c | 5 +-
> mm/shmem.c | 2 +-
> security/security.c | 22 +-
> 64 files changed, 736 insertions(+), 372 deletions(-)
>

2014-02-10 10:51:52

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Fri, Feb 07, 2014 at 05:48:58PM +0100, Miklos Szeredi wrote:
> Changes since the last version (based on Al's review):
>
> - cross-rename: fix locking of nondirectories for NFSv4
> - ext4: split cross-rename and plain rename into separate functions
> - introduce i_op->rename2 with flags, don't touch ->rename
> - last (optional) patch to merge ->rename2 back into ->rename
>
> The splitting of the ext4 implemetation was indeed a good idea as it uncovered a
> memory leak and small inconsistencies with the merged implementation.
>
> Splitting out rename2 will lessen the code churn, but I think is ugly. However
> this is a question of taste, last patch can be ommitted without loss of
> functionality.
>
> Bruce, could you please review the locking and delegation thing in patch #8
> "vfs: add cross-rename"?
>
> Git tree is here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git cross-rename

Miklos, can you please write an xfstest for this new API? That way
we can verify that the behaviour is as documented, and we can ensure
that when we implement it on other filesystems it works exactly the
same on all filesystems?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-02-11 15:31:54

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 07/13] vfs: lock_two_nondirectories: allow directory args

On Fri, Feb 07, 2014 at 04:16:44PM -0500, J. Bruce Fields wrote:
>
> Nit: I find the conditionals here a little complicated.
>
> Would something like this be clearer? (Untested):
>
> if (inode1 > inode2)
> swap(inode1, inode2);
>
> if (inode1 && !S_ISDIR(inode1->i_mode))
> mutex_lock(&inode1->i_mutex);
>
> if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
> mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);

Yes, much better. And it becomes nicely symmetric with the unlock function.

Thanks,
Miklos


>
> --b.
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 4bcdad3..94e41c8 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -944,24 +944,23 @@ EXPORT_SYMBOL(unlock_new_inode);
>
> /**
> * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + *
> + * Lock any non-NULL argument that is not a directory.
> + * Zero, one or two objects may be locked by this function.
> + *
> * @inode1: first inode to lock
> * @inode2: second inode to lock
> */
> void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> {
> - WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
> - if (inode1 == inode2 || !inode2) {
> - mutex_lock(&inode1->i_mutex);
> - return;
> - }
> - WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
> - if (inode1 < inode2) {
> + if (inode1 > inode2)
> + swap(inode1, inode2);
> +
> + if (inode1 && !S_ISDIR(inode1->i_mode))
> mutex_lock(&inode1->i_mutex);
> +
> + if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
> mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
> - } else {
> - mutex_lock(&inode2->i_mutex);
> - mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
> - }
> }
> EXPORT_SYMBOL(lock_two_nondirectories);
>
> @@ -972,8 +971,9 @@ EXPORT_SYMBOL(lock_two_nondirectories);
> */
> void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> {
> - mutex_unlock(&inode1->i_mutex);
> - if (inode2 && inode2 != inode1)
> + if (inode1 && !S_ISDIR(inode1->i_mode))
> + mutex_unlock(&inode1->i_mutex);
> + if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
> mutex_unlock(&inode2->i_mutex);
> }
> EXPORT_SYMBOL(unlock_two_nondirectories);

2014-02-11 15:54:04

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 08/13] vfs: add cross-rename

On Fri, Feb 07, 2014 at 05:40:44PM -0500, J. Bruce Fields wrote:
> On Fri, Feb 07, 2014 at 05:49:06PM +0100, Miklos Szeredi wrote:
> > From: Miklos Szeredi <[email protected]>
> >
> > If flags contain RENAME_EXCHANGE then exchange source and destination files.
> > There's no restriction on the type of the files; e.g. a directory can be
> > exchanged with a symlink.
> >
> > Signed-off-by: Miklos Szeredi <[email protected]>
> > Reviewed-by: Jan Kara <[email protected]>
>
> I don't see any problem with the delegation stuff. Some random
> bikeshedding:
>
> > @@ -2575,6 +2579,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
> >
> > /* Unhash the target: dput() will then get rid of it */
>
> I never understood the point of this comment. It's not even right, is
> it? And if anything this makes it less so. Delete?

Not sure, but I think the comment refers to the fact that we can't use
d_delete() for the target, so instead we just unhash it here (which is exactly
what happens for d_delete() if the dentry is still used).

You're right, it makes no sense for the cross-rename case. So adjusted comment
is:

/*
* Unhash the target (d_delete() is not usable here). If exchanging
* the two dentries, then rehash onto the other's hash queue.
*/
>
> > __d_drop(target);
> > + if (exchange) {
> > + __d_rehash(target,
> > + d_hash(dentry->d_parent, dentry->d_name.hash));
> > + }
> >
> > list_del(&dentry->d_u.d_child);
> > list_del(&target->d_u.d_child);
> ...
> > @@ -4042,7 +4057,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> >
> > old_name = fsnotify_oldname_init(old_dentry->d_name.name);
> > dget(new_dentry);
> > - if (!is_dir)
> > + if (!is_dir || (flags & RENAME_EXCHANGE))
> > lock_two_nondirectories(source, target);
> > else if (target)
> > mutex_lock(&target->i_mutex);
>
> I had to stop to think about that for a minute: OK, so in the normal
> rename case we still need to lock the to-be-deleted target, and
> lock_two_nondirectories won't do that for us because it ignores
> directories. Got it.
>
> This feels a bit ugly but I don't have a better idea.
>
> > @@ -4051,25 +4066,25 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>
> Most of this function is under (flags & RENAME_EXCHANGE) conditionals at
> this point. Have you looked at how much is duplicated if you split this
> into something like vfs_rename and vfs_exchange?

Split it up and it becomes 106 + 90 lines. Combine it and it's 130 lines. That
comes to 66 common, 64 conditional, doesn't it? So it's half and half.

And I really can't tell which is better in this case.

Thanks,
Miklos

2014-02-11 15:56:34

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Fri, Feb 07, 2014 at 05:46:30PM -0500, J. Bruce Fields wrote:

> > Bruce, could you please review the locking and delegation thing in patch #8
> > "vfs: add cross-rename"?
>
> Yep, done. I'll also try running this through my nfs tests, for what
> it's worth. (Not today as there's some unrelated regression to sort
> out first.)
>
> Feel free to add a
>
> Reviewed-by: J. Bruce Fields <[email protected]>
>
> for any but the ext4 patches, which I skipped.

Thanks for the review!

I've changed the authorship of the lock_two_nondirectories() patch to you. Can
I also add your sign-off?

Thanks,
Miklos

2014-02-11 16:00:35

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Mon, Feb 10, 2014 at 09:51:45PM +1100, Dave Chinner wrote:
> On Fri, Feb 07, 2014 at 05:48:58PM +0100, Miklos Szeredi wrote:
> > Changes since the last version (based on Al's review):
> >
> > - cross-rename: fix locking of nondirectories for NFSv4
> > - ext4: split cross-rename and plain rename into separate functions
> > - introduce i_op->rename2 with flags, don't touch ->rename
> > - last (optional) patch to merge ->rename2 back into ->rename
> >
> > The splitting of the ext4 implemetation was indeed a good idea as it uncovered a
> > memory leak and small inconsistencies with the merged implementation.
> >
> > Splitting out rename2 will lessen the code churn, but I think is ugly. However
> > this is a question of taste, last patch can be ommitted without loss of
> > functionality.
> >
> > Bruce, could you please review the locking and delegation thing in patch #8
> > "vfs: add cross-rename"?
> >
> > Git tree is here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git cross-rename
>
> Miklos, can you please write an xfstest for this new API? That way
> we can verify that the behaviour is as documented, and we can ensure
> that when we implement it on other filesystems it works exactly the
> same on all filesystems?

Splendid idea. Will do.

Thanks,
Miklos

2014-02-11 21:23:50

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 12/13] ext4: add cross rename support

On Fri 07-02-14 17:49:10, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> Implement RENAME_EXCHANGE flag in renameat2 syscall.
Hum, I guess the nice symmetry of ext4_cross_rename() outweights the code
duplication. So you can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/ext4/namei.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 138 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 75f1bde43dcc..1cb84f78909e 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3004,6 +3004,8 @@ struct ext4_renament {
> struct inode *dir;
> struct dentry *dentry;
> struct inode *inode;
> + bool is_dir;
> + int dir_nlink_delta;
>
> /* entry for "dentry" */
> struct buffer_head *bh;
> @@ -3135,6 +3137,17 @@ static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
> }
> }
>
> +static void ext4_update_dir_count(handle_t *handle, struct ext4_renament *ent)
> +{
> + if (ent->dir_nlink_delta) {
> + if (ent->dir_nlink_delta == -1)
> + ext4_dec_count(handle, ent->dir);
> + else
> + ext4_inc_count(handle, ent->dir);
> + ext4_mark_inode_dirty(handle, ent->dir);
> + }
> +}
> +
> /*
> * Anybody can rename anything with this: the permission checks are left to the
> * higher-level routines.
> @@ -3274,13 +3287,137 @@ end_rename:
> return retval;
> }
>
> +static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry)
> +{
> + handle_t *handle = NULL;
> + struct ext4_renament old = {
> + .dir = old_dir,
> + .dentry = old_dentry,
> + .inode = old_dentry->d_inode,
> + };
> + struct ext4_renament new = {
> + .dir = new_dir,
> + .dentry = new_dentry,
> + .inode = new_dentry->d_inode,
> + };
> + u8 new_file_type;
> + int retval;
> +
> + dquot_initialize(old.dir);
> + dquot_initialize(new.dir);
> +
> + old.bh = ext4_find_entry(old.dir, &old.dentry->d_name,
> + &old.de, &old.inlined);
> + /*
> + * Check for inode number is _not_ due to possible IO errors.
> + * We might rmdir the source, keep it as pwd of some process
> + * and merrily kill the link to whatever was created under the
> + * same name. Goodbye sticky bit ;-<
> + */
> + retval = -ENOENT;
> + if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
> + goto end_rename;
> +
> + new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
> + &new.de, &new.inlined);
> +
> + /* RENAME_EXCHANGE case: old *and* new must both exist */
> + if (!new.bh || le32_to_cpu(new.de->inode) != new.inode->i_ino)
> + goto end_rename;
> +
> + handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
> + (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
> + 2 * EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> +
> + if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
> + ext4_handle_sync(handle);
> +
> + if (S_ISDIR(old.inode->i_mode)) {
> + old.is_dir = true;
> + retval = ext4_rename_dir_prepare(handle, &old);
> + if (retval)
> + goto end_rename;
> + }
> + if (S_ISDIR(new.inode->i_mode)) {
> + new.is_dir = true;
> + retval = ext4_rename_dir_prepare(handle, &new);
> + if (retval)
> + goto end_rename;
> + }
> +
> + /*
> + * Other than the special case of overwriting a directory, parents'
> + * nlink only needs to be modified if this is a cross directory rename.
> + */
> + if (old.dir != new.dir && old.is_dir != new.is_dir) {
> + old.dir_nlink_delta = old.is_dir ? -1 : 1;
> + new.dir_nlink_delta = -old.dir_nlink_delta;
> + retval = -EMLINK;
> + if ((old.dir_nlink_delta > 0 && EXT4_DIR_LINK_MAX(old.dir)) ||
> + (new.dir_nlink_delta > 0 && EXT4_DIR_LINK_MAX(new.dir)))
> + goto end_rename;
> + }
> +
> + new_file_type = new.de->file_type;
> + retval = ext4_setent(handle, &new, old.inode->i_ino, old.de->file_type);
> + if (retval)
> + goto end_rename;
> +
> + retval = ext4_setent(handle, &old, new.inode->i_ino, new_file_type);
> + if (retval)
> + goto end_rename;
> +
> + /*
> + * Like most other Unix systems, set the ctime for inodes on a
> + * rename.
> + */
> + old.inode->i_ctime = ext4_current_time(old.inode);
> + new.inode->i_ctime = ext4_current_time(new.inode);
> + ext4_mark_inode_dirty(handle, old.inode);
> + ext4_mark_inode_dirty(handle, new.inode);
> +
> + if (old.dir_bh) {
> + retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
> + if (retval)
> + goto end_rename;
> + }
> + if (new.dir_bh) {
> + retval = ext4_rename_dir_finish(handle, &new, old.dir->i_ino);
> + if (retval)
> + goto end_rename;
> + }
> + ext4_update_dir_count(handle, &old);
> + ext4_update_dir_count(handle, &new);
> + retval = 0;
> +
> +end_rename:
> + brelse(old.dir_bh);
> + brelse(new.dir_bh);
> + brelse(old.bh);
> + brelse(new.bh);
> + if (handle)
> + ext4_journal_stop(handle);
> + return retval;
> +}
> +
> static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
> struct inode *new_dir, struct dentry *new_dentry,
> unsigned int flags)
> {
> - if (flags & ~RENAME_NOREPLACE)
> + if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
> return -EINVAL;
>
> + if (flags & RENAME_EXCHANGE) {
> + return ext4_cross_rename(old_dir, old_dentry,
> + new_dir, new_dentry);
> + }
> + /*
> + * Existence checking was done by the VFS, otherwise "RENAME_NOREPLACE"
> + * is equivalent to regular rename.
> + */
> return ext4_rename(old_dir, old_dentry, new_dir, new_dentry);
> }
>
> --
> 1.8.1.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-02-12 17:17:47

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Tue, Feb 11, 2014 at 05:01:41PM +0100, Miklos Szeredi wrote:
> On Mon, Feb 10, 2014 at 09:51:45PM +1100, Dave Chinner wrote:

> > Miklos, can you please write an xfstest for this new API? That way
> > we can verify that the behaviour is as documented, and we can ensure
> > that when we implement it on other filesystems it works exactly the
> > same on all filesystems?

This is a standalone testprog, but I guess it's trivial to integrate into
xfstests.

Please let me know what you think.

Thanks,
Miklos
----

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <dirent.h>
#include <utime.h>
#include <errno.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>


static char testfile[1024];
static char testfile2[1024];
static char testdir[1024];
static char testdir2[1024];

static char testname[256];
static char testdata[] = "abcdefghijklmnopqrstuvwxyz";
static char testdata2[] = "1234567890-=qwertyuiop[]\asdfghjkl;'zxcvbnm,./";
static const char *testdir_files[] = { "f1", "f2", NULL};
static const char *testdir_files2[] = { "f3", "f4", "f5", NULL};
static const char *testdir_empty[] = { NULL};
static int testdatalen = sizeof(testdata) - 1;
static int testdata2len = sizeof(testdata2) - 1;
static unsigned int testnum = 1;
static unsigned int select_test = 0;
static unsigned int skip_test = 0;

#define swap(a, b) \
do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)

#define MAX_ENTRIES 1024

static void test_perror(const char *func, const char *msg)
{
printf("%s %s() - %s: %s\n", testname, func, msg,
strerror(errno));
}

static void test_error(const char *func, const char *msg, ...)
__attribute__ ((format (printf, 2, 3)));

static void __start_test(const char *fmt, ...)
__attribute__ ((format (printf, 1, 2)));

static void test_error(const char *func, const char *msg, ...)
{
va_list ap;
printf("%s %s() - ", testname, func);
va_start(ap, msg);
vfprintf(stdout, msg, ap);
va_end(ap);
fprintf(stdout, "\n");
}

static void success(void)
{
printf("%s OK\n", testname);
}

static void __start_test(const char *fmt, ...)
{
unsigned int n;
va_list ap;
n = sprintf(testname, "%3i [", testnum++);
va_start(ap, fmt);
n += vsprintf(testname + n, fmt, ap);
va_end(ap);
sprintf(testname + n, "]");
}

#define start_test(msg, args...) { \
if ((select_test && testnum != select_test) || \
(testnum == skip_test)) { \
testnum++; \
return 0; \
} \
__start_test(msg, ##args); \
}

#define PERROR(msg) test_perror(__FUNCTION__, msg)
#define ERROR(msg, args...) test_error(__FUNCTION__, msg, ##args)

static int check_size(const char *path, int len)
{
struct stat stbuf;
int res = stat(path, &stbuf);
if (res == -1) {
PERROR("stat");
return -1;
}
if (stbuf.st_size != len) {
ERROR("length %u instead of %u", (int) stbuf.st_size,
(int) len);
return -1;
}
return 0;
}

static int check_type(const char *path, mode_t type)
{
struct stat stbuf;
int res = lstat(path, &stbuf);
if (res == -1) {
PERROR("lstat");
return -1;
}
if ((stbuf.st_mode & S_IFMT) != type) {
ERROR("type 0%o instead of 0%o", stbuf.st_mode & S_IFMT, type);
return -1;
}
return 0;
}
static int check_mode(const char *path, mode_t mode)
{
struct stat stbuf;
int res = lstat(path, &stbuf);
if (res == -1) {
PERROR("lstat");
return -1;
}
if ((stbuf.st_mode & 07777) != mode) {
ERROR("mode 0%o instead of 0%o", stbuf.st_mode & 07777, mode);
return -1;
}
return 0;
}

static int check_nlink(const char *path, nlink_t nlink)
{
struct stat stbuf;
int res = lstat(path, &stbuf);
if (res == -1) {
PERROR("lstat");
return -1;
}
if (stbuf.st_nlink != nlink) {
ERROR("nlink %li instead of %li", (long) stbuf.st_nlink,
(long) nlink);
return -1;
}
return 0;
}

static int check_nonexist(const char *path)
{
struct stat stbuf;
int res = lstat(path, &stbuf);
if (res == 0) {
ERROR("file should not exist");
return -1;
}
if (errno != ENOENT) {
ERROR("file should not exist: %s", strerror(errno));
return -1;
}
return 0;
}

static int check_buffer(const char *buf, const char *data, unsigned len)
{
if (memcmp(buf, data, len) != 0) {
ERROR("data mismatch");
return -1;
}
return 0;
}

static int check_data(const char *path, const char *data, int offset,
unsigned len)
{
char buf[4096];
int res;
int fd = open(path, O_RDONLY);
if (fd == -1) {
PERROR("open");
return -1;
}
if (lseek(fd, offset, SEEK_SET) == (off_t) -1) {
PERROR("lseek");
close(fd);
return -1;
}
while (len) {
int rdlen = len < sizeof(buf) ? len : sizeof(buf);
res = read(fd, buf, rdlen);
if (res == -1) {
PERROR("read");
close(fd);
return -1;
}
if (res != rdlen) {
ERROR("short read: %u instead of %u", res, rdlen);
close(fd);
return -1;
}
if (check_buffer(buf, data, rdlen) != 0) {
close(fd);
return -1;
}
data += rdlen;
len -= rdlen;
}
res = close(fd);
if (res == -1) {
PERROR("close");
return -1;
}
return 0;
}

static int check_dir_contents(const char *path, const char **contents)
{
int i;
int res;
int err = 0;
int found[MAX_ENTRIES];
const char *cont[MAX_ENTRIES];
DIR *dp;

for (i = 0; contents[i]; i++) {
assert(i < MAX_ENTRIES - 3);
found[i] = 0;
cont[i] = contents[i];
}
found[i] = 0;
cont[i++] = ".";
found[i] = 0;
cont[i++] = "..";
cont[i] = NULL;

dp = opendir(path);
if (dp == NULL) {
PERROR("opendir");
return -1;
}
memset(found, 0, sizeof(found));
while(1) {
struct dirent *de;
errno = 0;
de = readdir(dp);
if (de == NULL) {
if (errno) {
PERROR("readdir");
closedir(dp);
return -1;
}
break;
}
for (i = 0; cont[i] != NULL; i++) {
assert(i < MAX_ENTRIES);
if (strcmp(cont[i], de->d_name) == 0) {
if (found[i]) {
ERROR("duplicate entry <%s>",
de->d_name);
err--;
} else
found[i] = 1;
break;
}
}
if (!cont[i]) {
ERROR("unexpected entry <%s>", de->d_name);
err --;
}
}
for (i = 0; cont[i] != NULL; i++) {
if (!found[i]) {
ERROR("missing entry <%s>", cont[i]);
err--;
}
}
res = closedir(dp);
if (res == -1) {
PERROR("closedir");
return -1;
}
if (err)
return -1;

return 0;
}

static int create_file(const char *path, const char *data, int len)
{
int res;
int fd;

unlink(path);
fd = open(path, O_CREAT | O_WRONLY | O_TRUNC | O_EXCL, 0644);
if (fd == -1) {
PERROR("creat");
return -1;
}
if (len) {
res = write(fd, data, len);
if (res == -1) {
PERROR("write");
close(fd);
return -1;
}
if (res != len) {
ERROR("write is short: %u instead of %u", res, len);
close(fd);
return -1;
}
}
res = close(fd);
if (res == -1) {
PERROR("close");
return -1;
}
res = check_type(path, S_IFREG);
if (res == -1)
return -1;
res = check_mode(path, 0644);
if (res == -1)
return -1;
res = check_nlink(path, 1);
if (res == -1)
return -1;
res = check_size(path, len);
if (res == -1)
return -1;

if (len) {
res = check_data(path, data, 0, len);
if (res == -1)
return -1;
}

return 0;
}

static int cleanup_dir(const char *path, const char **dir_files, int quiet)
{
int i;
int err = 0;

for (i = 0; dir_files[i]; i++) {
int res;
char fpath[1024];
sprintf(fpath, "%s/%s", path, dir_files[i]);
res = unlink(fpath);
if (res == -1 && !quiet) {
PERROR("unlink");
err --;
}
}
if (err)
return -1;

return 0;
}

static int create_dir(const char *path, const char **dir_files)
{
int res;
int i;

rmdir(path);
res = mkdir(path, 0755);
if (res == -1) {
PERROR("mkdir");
return -1;
}
res = check_type(path, S_IFDIR);
if (res == -1)
return -1;
res = check_mode(path, 0755);
if (res == -1)
return -1;

for (i = 0; dir_files[i]; i++) {
char fpath[1024];
sprintf(fpath, "%s/%s", path, dir_files[i]);
res = create_file(fpath, "", 0);
if (res == -1) {
cleanup_dir(path, dir_files, 1);
return -1;
}
}
res = check_dir_contents(path, dir_files);
if (res == -1) {
cleanup_dir(path, dir_files, 1);
return -1;
}

return 0;
}

static void cleanup_one(const char *path)
{
int res;

res = unlink(path);
if (res == -1 && errno != ENOENT) {
res = rmdir(path);
if (res == -1) {
DIR *dp = opendir(path);
if (dp != NULL) {
int fd = dirfd(dp);
while (1) {
struct dirent *de = readdir(dp);
if (de == NULL)
break;
res = unlinkat(fd, de->d_name, 0);
if (res == -1) {
unlinkat(fd, de->d_name,
AT_REMOVEDIR);
}
}
closedir(dp);
rmdir(path);
}
}
}
}

static void cleanup(void)
{
cleanup_one(testfile);
cleanup_one(testfile2);
cleanup_one(testdir);
cleanup_one(testdir2);
}

#define SYS_renameat2 316
#define RENAME_NOREPLACE (1 << 0) /* Don't overwrite target */
#define RENAME_EXCHANGE (1 << 1) /* Exchange source and dest */

static int sys_renameat2(int dfd1, const char *path1,
int dfd2, const char *path2,
unsigned int flags)
{
return syscall(SYS_renameat2, dfd1, path1, dfd2, path2, flags);
}

static const char *type_name(int type, int empty)
{
switch (type) {
case S_IFREG:
return "REG ";
case S_IFLNK:
return "LNK ";
case S_IFDIR:
if (empty)
return "DIR-";
else
return "DIR+";
case 0:
return "- ";
default:
return "????";
}
}

static int check_any(const char *path, int type, void *data, int data_len)
{
int res;
char buf[1024];

if (type) {
res = check_type(path, type);
if (res == -1)
return -1;

}

switch (type) {
case S_IFDIR:
res = check_mode(path, 0755);
if (res == -1)
return -1;
res = check_dir_contents(path, data);
if (res == -1)
return -1;
res = cleanup_dir(path, data, 0);
if (res == -1)
return -1;
res = rmdir(path);
if (res == -1)
return -1;
break;

case S_IFREG:
res = check_mode(path, 0644);
if (res == -1)
return -1;
res = check_nlink(path, 1);
if (res == -1)
return -1;
res = check_size(path, data_len);
if (res == -1)
return -1;
res = check_data(path, data, 0, data_len);
if (res == -1)
return -1;
res = unlink(path);
if (res == -1) {
PERROR("unlink");
return -1;
}
break;

case S_IFLNK:
res = check_mode(path, 0777);
if (res == -1)
return -1;
res = readlink(path, buf, sizeof(buf));
if (res == -1) {
PERROR("readlink");
return -1;
}
if (res != data_len) {
ERROR("short readlink: %u instead of %u", res,
data_len);
return -1;
}
if (memcmp(buf, data, data_len) != 0) {
ERROR("link mismatch");
return -1;
}
res = unlink(path);
if (res == -1) {
PERROR("unlink");
return -1;
}
break;
}

res = check_nonexist(path);
if (res == -1)
return -1;

return 0;
}

static int create_any(const char *path, int type, void *data, int data_len)
{
int res;

switch (type) {
case S_IFREG:
res = create_file(path, data, data_len);
break;
case S_IFLNK:
res = symlink(data, path);
if (res == -1)
PERROR("symlink");
break;
case S_IFDIR:
res = create_dir(path, data);
break;
case 0:
res = check_nonexist(path);
break;
}
return res;
}

static const char *rename_flag_name(unsigned int flags)
{
switch (flags) {
case 0:
return "(none)";
case RENAME_NOREPLACE:
return "(NOREPLACE)";
case RENAME_EXCHANGE:
return "(EXCHANGE)";
case RENAME_NOREPLACE | RENAME_EXCHANGE:
return "(NOREPLACE | EXCHANGE)";
default:
return "????";
}
}

static int test_rename(unsigned int flags, int src_type, int src_empty,
int dst_type, int dst_empty, int err)
{
int res;
const char *src = NULL;
const char *dst = NULL;
void *src_data = NULL;
void *dst_data = NULL;
int src_datalen = 0;
int dst_datalen = 0;

start_test("rename %-11s %s -> %s error: '%s'",
rename_flag_name(flags),
type_name(src_type, src_empty),
type_name(dst_type, dst_empty),
strerror(err));

res = 0;
if (src_type == S_IFDIR) {
src_data = src_empty ? testdir_empty : testdir_files;
src = testdir;
} else {
src = testfile;
src_data = testdata;
src_datalen = testdatalen;
}
if (dst_type == S_IFDIR) {
dst = testdir2;
dst_data = dst_empty ? testdir_empty : testdir_files2;
} else {
dst = testfile2;
dst_data = testdata2;
dst_datalen = testdata2len;
}

res = create_any(src, src_type, src_data, src_datalen);
if (res == -1)
goto cleanup;
res = create_any(dst, dst_type, dst_data, dst_datalen);
if (res == -1)
goto cleanup;

res = sys_renameat2(AT_FDCWD, src, AT_FDCWD, dst, flags);
if (res == 0) {
if (err) {
ERROR("renameat2 should have failed");
res = -1;
goto cleanup;
}
if (!(flags & RENAME_EXCHANGE)) {
dst_type = src_type;
dst_data = src_data;
dst_datalen = src_datalen;
src_type = 0;
src_data = NULL;
src_datalen = 0;
} else {
swap(src_type, dst_type);
swap(src_data, dst_data);
swap(dst_datalen, src_datalen);
}
} else {
if (errno == ENOSYS || errno == EINVAL) {
success(); /* not supported, most likely */
res = 0;
goto cleanup;
}
if (err != errno) {
PERROR("wrong errno");
res = -1;
goto cleanup;
}
}

res = check_any(src, src_type, src_data, src_datalen);
if (res == -1)
goto cleanup;
res = check_any(dst, dst_type, dst_data, dst_datalen);
if (res == -1)
goto cleanup;

success();
return 0;

cleanup:
cleanup();
return res;
}

static int test_renames(void)
{
int err = 0;

err += test_rename(0, 0, 0, S_IFREG, 0, ENOENT);
err += test_rename(0, 0, 0, S_IFLNK, 0, ENOENT);
err += test_rename(0, 0, 0, S_IFDIR, 0, ENOENT);
err += test_rename(0, 0, 0, S_IFDIR, 1, ENOENT);
err += test_rename(0, 0, 0, 0, 0, ENOENT);

err += test_rename(0, S_IFREG, 0, S_IFREG, 0, 0);
err += test_rename(0, S_IFREG, 0, S_IFLNK, 0, 0);
err += test_rename(0, S_IFREG, 0, S_IFDIR, 0, EISDIR);
err += test_rename(0, S_IFREG, 0, S_IFDIR, 1, EISDIR);
err += test_rename(0, S_IFREG, 0, 0, 0, 0);

err += test_rename(0, S_IFLNK, 0, S_IFREG, 0, 0);
err += test_rename(0, S_IFLNK, 0, S_IFLNK, 0, 0);
err += test_rename(0, S_IFLNK, 0, S_IFDIR, 0, EISDIR);
err += test_rename(0, S_IFLNK, 0, S_IFDIR, 1, EISDIR);
err += test_rename(0, S_IFLNK, 0, 0, 0, 0);

err += test_rename(0, S_IFDIR, 0, S_IFREG, 0, ENOTDIR);
err += test_rename(0, S_IFDIR, 0, S_IFLNK, 0, ENOTDIR);
err += test_rename(0, S_IFDIR, 0, S_IFDIR, 0, ENOTEMPTY);
err += test_rename(0, S_IFDIR, 0, S_IFDIR, 1, 0);
err += test_rename(0, S_IFDIR, 0, 0, 0, 0);

err += test_rename(0, S_IFDIR, 1, S_IFREG, 0, ENOTDIR);
err += test_rename(0, S_IFDIR, 1, S_IFLNK, 0, ENOTDIR);
err += test_rename(0, S_IFDIR, 1, S_IFDIR, 0, ENOTEMPTY);
err += test_rename(0, S_IFDIR, 1, S_IFDIR, 1, 0);
err += test_rename(0, S_IFDIR, 1, 0, 0, 0);

err += test_rename(RENAME_NOREPLACE, 0, 0, S_IFREG, 0, ENOENT);
err += test_rename(RENAME_NOREPLACE, 0, 0, S_IFLNK, 0, ENOENT);
err += test_rename(RENAME_NOREPLACE, 0, 0, S_IFDIR, 0, ENOENT);
err += test_rename(RENAME_NOREPLACE, 0, 0, S_IFDIR, 1, ENOENT);
err += test_rename(RENAME_NOREPLACE, 0, 0, 0, 0, ENOENT);

err += test_rename(RENAME_NOREPLACE, S_IFREG, 0, S_IFREG, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFREG, 0, S_IFLNK, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFREG, 0, S_IFDIR, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFREG, 0, S_IFDIR, 1, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFREG, 0, 0, 0, 0);

err += test_rename(RENAME_NOREPLACE, S_IFLNK, 0, S_IFREG, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFLNK, 0, S_IFLNK, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFLNK, 0, S_IFDIR, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFLNK, 0, S_IFDIR, 1, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFLNK, 0, 0, 0, 0);

err += test_rename(RENAME_NOREPLACE, S_IFDIR, 0, S_IFREG, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 0, S_IFLNK, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 0, S_IFDIR, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 0, S_IFDIR, 1, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 0, 0, 0, 0);

err += test_rename(RENAME_NOREPLACE, S_IFDIR, 1, S_IFREG, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 1, S_IFLNK, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 1, S_IFDIR, 0, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 1, S_IFDIR, 1, EEXIST);
err += test_rename(RENAME_NOREPLACE, S_IFDIR, 1, 0, 0, 0);


err += test_rename(RENAME_EXCHANGE, 0, 0, S_IFREG, 0, ENOENT);
err += test_rename(RENAME_EXCHANGE, 0, 0, S_IFLNK, 0, ENOENT);
err += test_rename(RENAME_EXCHANGE, 0, 0, S_IFDIR, 0, ENOENT);
err += test_rename(RENAME_EXCHANGE, 0, 0, S_IFDIR, 1, ENOENT);
err += test_rename(RENAME_EXCHANGE, 0, 0, 0, 0, ENOENT);

err += test_rename(RENAME_EXCHANGE, S_IFREG, 0, S_IFREG, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFREG, 0, S_IFLNK, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFREG, 0, S_IFDIR, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFREG, 0, S_IFDIR, 1, 0);
err += test_rename(RENAME_EXCHANGE, S_IFREG, 0, 0, 0, ENOENT);

err += test_rename(RENAME_EXCHANGE, S_IFLNK, 0, S_IFREG, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFLNK, 0, S_IFLNK, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFLNK, 0, S_IFDIR, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFLNK, 0, S_IFDIR, 1, 0);
err += test_rename(RENAME_EXCHANGE, S_IFLNK, 0, 0, 0, ENOENT);

err += test_rename(RENAME_EXCHANGE, S_IFDIR, 0, S_IFREG, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 0, S_IFLNK, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 0, S_IFDIR, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 0, S_IFDIR, 1, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 0, 0, 0, ENOENT);

err += test_rename(RENAME_EXCHANGE, S_IFDIR, 1, S_IFREG, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 1, S_IFLNK, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 1, S_IFDIR, 0, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 1, S_IFDIR, 1, 0);
err += test_rename(RENAME_EXCHANGE, S_IFDIR, 1, 0, 0, ENOENT);

err += test_rename(RENAME_NOREPLACE | RENAME_EXCHANGE,
S_IFREG, 0, S_IFREG, 0, EINVAL);

return err;
}

int main(int argc, char *argv[])
{
const char *basepath;
int err = 0;
int res;

umask(0);
if (argc != 2) {
fprintf(stderr, "usage: %s testdir\n", argv[0]);
return 1;
}
basepath = argv[1];
assert(strlen(basepath) < 512);
if (basepath[0] != '/') {
fprintf(stderr, "testdir must be an absolute path\n");
return 1;
}

sprintf(testfile, "%s/testfile", basepath);
sprintf(testfile2, "%s/testfile2", basepath);
sprintf(testdir, "%s/testdir", basepath);
sprintf(testdir2, "%s/testdir2", basepath);

if (check_nonexist(testfile) == -1 ||
check_nonexist(testfile2) == -1 ||
check_nonexist(testdir) == -1 ||
check_nonexist(testdir2) == -1)
return 1;

err += test_renames();

res = mkdir(testdir, 0755);
if (res == -1) {
perror(testdir);
return 1;
}
res = mkdir(testdir2, 0755);
if (res == -1) {
perror(testdir2);
return 1;
}
printf("------- Doing cross-directory renames...\n");

sprintf(testfile, "%s/testdir/subfile", basepath);
sprintf(testdir, "%s/testdir/subdir", basepath);
sprintf(testfile2, "%s/testdir2/subfile2", basepath);
sprintf(testdir2, "%s/testdir2/subdir2", basepath);

err += test_renames();

sprintf(testdir, "%s/testdir", basepath);
sprintf(testdir2, "%s/testdir2", basepath);
cleanup();

if (err) {
fprintf(stderr, "%i tests failed\n", -err);
return 1;
}

return 0;
}

2014-02-13 15:54:53

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4


If we're going to be adding a new rename inode op, can we make it take a flag
to white out the source for union type things? This would mean that
rename-and-white-out can be done atomically.

David

2014-02-13 16:24:23

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 03:54:14PM +0000, David Howells wrote:
>
> If we're going to be adding a new rename inode op, can we make it take a flag
> to white out the source for union type things? This would mean that
> rename-and-white-out can be done atomically.

That is an option, yes.

Regarding whiteouts, I raised a couple of questions that nobody answered yet, so
let me ask again.

- If a filesystem containing whiteouts (fallthroughs, etc...) is mounted as not
part of a union, how are these special entities represented to userspace?

- Can the user remove them?

Thanks,
Miklos

2014-02-13 16:43:27

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

Miklos Szeredi <[email protected]> wrote:

> Regarding whiteouts, I raised a couple of questions that nobody answered
> yet, so let me ask again.
>
> - If a filesystem containing whiteouts (fallthroughs, etc...) is mounted as
> not part of a union, how are these special entities represented to
> userspace?

I would suggest that whiteouts appear as otherwise negative dentries and that
they don't appear in getdents().

Fallthroughs are far more 'interesting'. Maybe they should appear in
getdents() with a dentry type saying what they are, but give you EREMOTE or
something if you try to follow them.

Note that there is space in d_flags & DCACHE_ENTRY_TYPE for a whiteout type.
I would, however, mark fallthroughs by a separate flag. So that the union
dentry will mirror the source dentry's type.

> - Can the user remove them?

Overwriting whiteouts and fallthroughs and unlinking fallthroughs I don't see
as a problem where they can be treated as normal negative dentries and normal
files in this regard.

However, what do you do about non-opaque directories that may or may not have
been unioned if you try and follow a dirent that would be a subdirectory that
hasn't been copied up?

David

2014-02-13 17:28:41

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 5:42 PM, David Howells <[email protected]> wrote:
> Miklos Szeredi <[email protected]> wrote:
>
>> Regarding whiteouts, I raised a couple of questions that nobody answered
>> yet, so let me ask again.
>>
>> - If a filesystem containing whiteouts (fallthroughs, etc...) is mounted as
>> not part of a union, how are these special entities represented to
>> userspace?
>
> I would suggest that whiteouts appear as otherwise negative dentries and that
> they don't appear in getdents().

I'd argue that this is an administration nightmare. E.g. what if the
a backup needs to be made of the rw layer?

Will rmdir work normally in a directory containing whiteouts? Will
the VFS take care of that, just like if it was part of a union? Or
will it fail with ENOTEMPTY despite *appearing* empty?

And zillion other problems related to the fact that things happen to a
filesystem even when they do not appear to happen ("mv foo bar; mv bar
foo" has side effects).

Thanks,
Miklos

2014-02-13 18:22:08

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 9:28 AM, Miklos Szeredi <[email protected]> wrote:
> On Thu, Feb 13, 2014 at 5:42 PM, David Howells <[email protected]> wrote:
>> Miklos Szeredi <[email protected]> wrote:
>>
>>> Regarding whiteouts, I raised a couple of questions that nobody answered
>>> yet, so let me ask again.
>>>
>>> - If a filesystem containing whiteouts (fallthroughs, etc...) is mounted as
>>> not part of a union, how are these special entities represented to
>>> userspace?
>>
>> I would suggest that whiteouts appear as otherwise negative dentries and that
>> they don't appear in getdents().
>
> I'd argue that this is an administration nightmare. E.g. what if the
> a backup needs to be made of the rw layer?
>
> Will rmdir work normally in a directory containing whiteouts? Will
> the VFS take care of that, just like if it was part of a union? Or
> will it fail with ENOTEMPTY despite *appearing* empty?
>
> And zillion other problems related to the fact that things happen to a
> filesystem even when they do not appear to happen ("mv foo bar; mv bar
> foo" has side effects).

Are there any users of unions / overlays who will want to modify the
bottom layer after creating the top layer? I'm starting to think that
changing the bottom layer should require userspace to do a three-way
merge or something and explicitly decide what it wants to do.

--Andy

2014-02-13 18:29:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 9:28 AM, Miklos Szeredi <[email protected]> wrote:
>>
>> I would suggest that whiteouts appear as otherwise negative dentries and that
>> they don't appear in getdents().
>
> I'd argue that this is an administration nightmare. E.g. what if the
> a backup needs to be made of the rw layer?

The major issue is user space support.

So what do others that support this do? Looking at the gitweb for
ls.c in coreutils, we find:

http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/ls.c

# ifdef DT_WHT
case DT_WHT: type = whiteout; break;
# endif

so that's presumably what we should use.

Linus

2014-02-13 18:56:26

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 7:29 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Feb 13, 2014 at 9:28 AM, Miklos Szeredi <[email protected]> wrote:
>>>
>>> I would suggest that whiteouts appear as otherwise negative dentries and that
>>> they don't appear in getdents().
>>
>> I'd argue that this is an administration nightmare. E.g. what if the
>> a backup needs to be made of the rw layer?
>
> The major issue is user space support.
>
> So what do others that support this do? Looking at the gitweb for
> ls.c in coreutils, we find:
>
> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/ls.c
>
> # ifdef DT_WHT
> case DT_WHT: type = whiteout; break;
> # endif
>
> so that's presumably what we should use.

Fair enough, that allows the thing to be listed, at least.

What about creation? A new syscall?

Removal? unlink(2)?

Should stat(2) succeed with a new filetype?

Thanks,
Miklos

2014-02-13 19:03:24

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

Linus Torvalds <[email protected]> wrote:

> So what do others that support this do? Looking at the gitweb for
> ls.c in coreutils, we find:
>
> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/ls.c
>
> # ifdef DT_WHT
> case DT_WHT: type = whiteout; break;
> # endif
>
> so that's presumably what we should use.

Whilst that does seem reasonable, what about all the other software that
iterates over a directory? Some of that is surely not going to know about
DT_WHT.

Further, while that may sort whiteouts, what about fallthroughs? There isn't
a DT_ symbol for that... Fallthroughs are 'really there' in the sense that
they're positive, but they should take on the underlying object type - which
in this situation we can't retrieve:-/

I wonder if it would be possible to require filesystems that can store
fallthroughs to store the lower type in the upper dentry (ie. there is no
fallthrough type per se, but rather fallthrough-to-file, f-to-char, f-to-sym,
etc.).

David

2014-02-13 19:20:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 10:56 AM, Miklos Szeredi <[email protected]> wrote:
>
> Fair enough, that allows the thing to be listed, at least.
>
> What about creation? A new syscall?
>
> Removal? unlink(2)?
>
> Should stat(2) succeed with a new filetype?

I think it had better work exactly like a special node (eg character
device etc). I don't know about creation (yes, we might even fake it
with mknod(), or just say that the only way to create them is as part
of the union-fs), but removal and renaming should absolutely *not* be
a new system call. That would be a disaster for any system admin,
having to use special tools to edit the filesystem.

Obviously when it is part of a union mount, whiteouts work differently
- they must *not* show up in getdents, and you can't rename/remove a
whiteout anywhere else. But that is obviously part of the union-fs,
nor the low-level filesystem itself.

Linus

2014-02-13 19:32:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Tue, Feb 11, 2014 at 04:57:44PM +0100, Miklos Szeredi wrote:
> On Fri, Feb 07, 2014 at 05:46:30PM -0500, J. Bruce Fields wrote:
>
> > > Bruce, could you please review the locking and delegation thing in patch #8
> > > "vfs: add cross-rename"?
> >
> > Yep, done. I'll also try running this through my nfs tests, for what
> > it's worth. (Not today as there's some unrelated regression to sort
> > out first.)
> >
> > Feel free to add a
> >
> > Reviewed-by: J. Bruce Fields <[email protected]>
> >
> > for any but the ext4 patches, which I skipped.
>
> Thanks for the review!
>
> I've changed the authorship of the lock_two_nondirectories() patch to you. Can
> I also add your sign-off?

Sure; thanks!

--b.

2014-02-13 19:32:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 11:02 AM, David Howells <[email protected]> wrote:
>
> Whilst that does seem reasonable, what about all the other software that
> iterates over a directory? Some of that is surely not going to know about
> DT_WHT.

So?

Remeber: whiteout entries do not exist "normally". No normal apps
should care or see them, since the whole and only point of them is
when they are part of a union mount (in which case they are not
visible).

So the "how do you see whiteouts" is really only about the raw
filesystem mount when *not* in the normal place.

IOW, it's not like these guys are going to show up in users home
directories etc. It's more like a special device node than a file - we
need to care about some basic system management interfaces, not about
"random apps". So "coreutils" is the primary user, although I guess a
few IT people would prefer for things like Nautilus etc random file
managers to be able to show them nicely too. But if they show up as an
icon with a question mark on them or whatever, that's really not a big
deal either.

Sure, maybe they'll look odd in some graphical file chooser *if*
somebody makes them show up, but I think creation of a whiteout - if
we allow it at all outside of the union mount itself - should be a
root-only thing (the same way mknod is) so quite frankly, it falls
under "filesystem corruption makes my directory listings look odd -
cry me a river".

(I do think we should allow creation - but for root only - for
management and testing purposes, but I really think it's a secondary
issue, and I do think we should literally use "mknod()" - either with
a new S_IFWHT or even just making use of existing S_IFCHR just so you
could use the user-space "mknod" to create it with some magic
major/minor combination.

Linus

2014-02-13 20:17:52

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

Linus Torvalds <[email protected]> writes:

> On Thu, Feb 13, 2014 at 11:02 AM, David Howells <[email protected]> wrote:
>>
>> Whilst that does seem reasonable, what about all the other software that
>> iterates over a directory? Some of that is surely not going to know about
>> DT_WHT.
>
> So?
>
> Remeber: whiteout entries do not exist "normally". No normal apps
> should care or see them, since the whole and only point of them is
> when they are part of a union mount (in which case they are not
> visible).
>
> So the "how do you see whiteouts" is really only about the raw
> filesystem mount when *not* in the normal place.
>
> IOW, it's not like these guys are going to show up in users home
> directories etc. It's more like a special device node than a file - we
> need to care about some basic system management interfaces, not about
> "random apps". So "coreutils" is the primary user, although I guess a
> few IT people would prefer for things like Nautilus etc random file
> managers to be able to show them nicely too. But if they show up as an
> icon with a question mark on them or whatever, that's really not a big
> deal either.
>
> Sure, maybe they'll look odd in some graphical file chooser *if*
> somebody makes them show up, but I think creation of a whiteout - if
> we allow it at all outside of the union mount itself - should be a
> root-only thing (the same way mknod is) so quite frankly, it falls
> under "filesystem corruption makes my directory listings look odd -
> cry me a river".
>
> (I do think we should allow creation - but for root only - for
> management and testing purposes, but I really think it's a secondary
> issue, and I do think we should literally use "mknod()" - either with
> a new S_IFWHT or even just making use of existing S_IFCHR just so you
> could use the user-space "mknod" to create it with some magic
> major/minor combination.

I see two interesting questions.

- How do I backup and restore my top layer of my union mount/filesystem?
- How do I use union mounts without in a container?

Backup and restore argues that mknod be able to create these things, and
unlink be able to remove them. rename shrug.

I expect whiteouts on a filesystem will all belong to some inode with
i_nlink == 0, and likely is not even represented on disk.

Using union mounts in a container effectively boils down to letting
non-root users create these things, so unless applications handle these
very badly I don't know why we should restrict their creation to
root. Quotas restrict the size of directories and the number of inodes
you can have, and the number of directory blocks you can have, which
handles everything except applications that misbehave in the face of the
unknown.

Eric

2014-02-13 20:28:52

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 8:32 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Feb 13, 2014 at 11:02 AM, David Howells <[email protected]> wrote:
>>
>> Whilst that does seem reasonable, what about all the other software that
>> iterates over a directory? Some of that is surely not going to know about
>> DT_WHT.
>
> So?
>
> Remeber: whiteout entries do not exist "normally". No normal apps
> should care or see them, since the whole and only point of them is
> when they are part of a union mount (in which case they are not
> visible).
>
> So the "how do you see whiteouts" is really only about the raw
> filesystem mount when *not* in the normal place.
>
> IOW, it's not like these guys are going to show up in users home
> directories etc. It's more like a special device node than a file - we
> need to care about some basic system management interfaces, not about
> "random apps". So "coreutils" is the primary user, although I guess a
> few IT people would prefer for things like Nautilus etc random file
> managers to be able to show them nicely too. But if they show up as an
> icon with a question mark on them or whatever, that's really not a big
> deal either.
>
> Sure, maybe they'll look odd in some graphical file chooser *if*
> somebody makes them show up, but I think creation of a whiteout - if
> we allow it at all outside of the union mount itself - should be a
> root-only thing (the same way mknod is) so quite frankly, it falls
> under "filesystem corruption makes my directory listings look odd -
> cry me a river".
>
> (I do think we should allow creation - but for root only - for
> management and testing purposes, but I really think it's a secondary
> issue, and I do think we should literally use "mknod()" - either with
> a new S_IFWHT or even just making use of existing S_IFCHR just so you
> could use the user-space "mknod" to create it with some magic
> major/minor combination.

And IMO the magic S_IFCHR is a lot better in many respects than a new
filetype, since now all backup tools automatically work. And I think
that's a lot more important than looking like a nice new design.
Sure, if S_IFWHT was there from the start, it would be wonderful. But
as it stands, it's a lot more difficult to add support for such a
thing to userspace than adding a hack, using the existing intefaces,
to the kernel.

Thanks,
Miklos




>
> Linus

2014-02-17 08:19:20

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Wed, Feb 12, 2014 at 06:18:52PM +0100, Miklos Szeredi wrote:
> On Tue, Feb 11, 2014 at 05:01:41PM +0100, Miklos Szeredi wrote:
> > On Mon, Feb 10, 2014 at 09:51:45PM +1100, Dave Chinner wrote:
>
> > > Miklos, can you please write an xfstest for this new API? That way
> > > we can verify that the behaviour is as documented, and we can ensure
> > > that when we implement it on other filesystems it works exactly the
> > > same on all filesystems?
>
> This is a standalone testprog, but I guess it's trivial to integrate into
> xfstests.

Same problem with integrating any standalone test program into
xfstests - we end up with a standalone pass/fail test instead of a
bunch of components we can reuse and refactor for other tests. But
we can work around that for the moment.

[ FWIW, the normal way to write an xfstest like this is to write a
small helper program that just does the renameat2() syscall (we
often use xfs_io to provide this) and everything is just shell
scripts to drive the helper program in the necessary way. We don't
directly check that mode, size, destination of a file is correct -
just stat(1) on the expected destinations is sufficient to capture
this information. stdout is captured by the test harness and used to match
against a golden output. If the match fails, the test fails.

This would allow us to use the same test infrastructure for testing
a coreutils binary that implemented renameat2 when that comes
along... ]

> Please let me know what you think.

We need to be able to test whether the syscall exists so we
add a function like:

_requires_renameat2(
{
[ -x src/renameat2_test ] || _notrun "renameat2_test not found."
src/renameat2_test -t || _notrun "kernel does not support renameat2"
}

to gracefully avoid the test on kernels that don't support the
syscall. Indeed, the test needs to be able to be built on kernels
that don't have the right header files that define the syscall
number, either, so there's going to need to be some #ifndef
__NR_renameat2 type clauses in there as well...

And finally, it needs comments to explain what the test is actually
testing - if you don't document what the test is supposed to be
checking, how do we know that it is testing is actually correct?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-02-17 18:04:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Mon, Feb 17, 2014 at 07:19:11PM +1100, Dave Chinner wrote:
> [ FWIW, the normal way to write an xfstest like this is to write a
> small helper program that just does the renameat2() syscall (we
> often use xfs_io to provide this) and everything is just shell
> scripts to drive the helper program in the necessary way. We don't
> directly check that mode, size, destination of a file is correct -
> just stat(1) on the expected destinations is sufficient to capture
> this information. stdout is captured by the test harness and used to match
> against a golden output. If the match fails, the test fails.

The other reason why it's really nice to use a small helper program is
that it becomes much easier for file system developers to debug kernel
problems without having to create their own single-shot C programs.

It also becomes easier to debug a test failure by looking at the shell
script and manually running the commands one at a time, perhaps
changing some of the arguments after getting an xfstest failure from
inside a VM running a test kernel, since the VM very often won't even
have a C compiler.

It also becomes easier to add new test just simply by updating the
shell script, which is another win.

> And finally, it needs comments to explain what the test is actually
> testing - if you don't document what the test is supposed to be
> checking, how do we know that it is testing is actually correct?

Yes, please!

- Ted

2014-02-24 17:11:25

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Thu, Feb 13, 2014 at 09:28:50PM +0100, Miklos Szeredi wrote:
> On Thu, Feb 13, 2014 at 8:32 PM, Linus Torvalds
> <[email protected]> wrote:

> > (I do think we should allow creation - but for root only - for
> > management and testing purposes, but I really think it's a secondary
> > issue, and I do think we should literally use "mknod()" - either with
> > a new S_IFWHT or even just making use of existing S_IFCHR just so you
> > could use the user-space "mknod" to create it with some magic
> > major/minor combination.
>
> And IMO the magic S_IFCHR is a lot better in many respects than a new
> filetype, since now all backup tools automatically work. And I think
> that's a lot more important than looking like a nice new design.
> Sure, if S_IFWHT was there from the start, it would be wonderful. But
> as it stands, it's a lot more difficult to add support for such a
> thing to userspace than adding a hack, using the existing intefaces,
> to the kernel.

And here's a patch to actually implement the "rename and whiteout source"
operation.

Jan, I suspect that the "credits" calculation for the number of journal blocks
is excessive, since here we don't actually need to create the directory entry,
only create the inode and populate an already existing entry. But I don't
undestand the logic of ext4 enough to see what's the right number to use here.

And another ext4 specific question: I added ext4_setent(old, whiteout) before
ext4_add_entry(new, old.inode) because I suspect from the comment in
ext4_rename_delete() that ext4_add_entry() invalidates the looked up entry. Is
that correct? Are there any other traps in this area to look out for?

Thanks,
Miklos

----
Subject: vfs: add RENAME_WHITEOUT
From: Miklos Szeredi <[email protected]>

This adds a new RENAME_WHITEOUT flag. This flag makes rename() create a
whiteout of source. The whiteout creation is atomic relative to the
rename.

As Linus' suggestion, a whiteout is represented as a dummy char device.
This patch uses the 0/0 device number, but the actual number doesn't matter
as long as it doesn't conflict with a real device.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/ext4/namei.c | 69 ++++++++++++++++++++++++++++++++++++++----------
fs/namei.c | 8 ++++-
include/linux/fs.h | 7 ++++
include/uapi/linux/fs.h | 1
4 files changed, 70 insertions(+), 15 deletions(-)

--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3157,7 +3157,8 @@ static void ext4_update_dir_count(handle
* This comes from rename(const char *oldpath, const char *newpath)
*/
static int ext4_plain_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
handle_t *handle = NULL;
struct ext4_renament old = {
@@ -3171,6 +3172,9 @@ static int ext4_plain_rename(struct inod
.inode = new_dentry->d_inode,
};
int retval;
+ struct inode *whiteout = NULL;
+ int credits;
+ u8 old_file_type;

dquot_initialize(old.dir);
dquot_initialize(new.dir);
@@ -3202,11 +3206,33 @@ static int ext4_plain_rename(struct inod
if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
ext4_alloc_da_blocks(old.inode);

- handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
- (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
+ credits = (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2);
+ if (!(flags & RENAME_WHITEOUT)) {
+ handle = ext4_journal_start(old.dir, EXT4_HT_DIR, credits);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+ } else {
+ int retries = 0;
+
+ credits += (EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
+retry:
+ whiteout = ext4_new_inode_start_handle(old.dir, S_IFCHR | WHITEOUT_MODE, &old.dentry->d_name, 0, NULL, EXT4_HT_DIR, credits);
+ handle = ext4_journal_current_handle();
+ if (IS_ERR(whiteout)) {
+ if (handle)
+ ext4_journal_stop(handle);
+ retval = PTR_ERR(whiteout);
+ if (retval == -ENOSPC &&
+ ext4_should_retry_alloc(old.dir->i_sb, &retries))
+ goto retry;
+
+ return retval;
+ }
+ init_special_inode(whiteout, whiteout->i_mode, WHITEOUT_DEV);
+ whiteout->i_op = &ext4_special_inode_operations;
+ }

if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
ext4_handle_sync(handle);
@@ -3225,13 +3251,21 @@ static int ext4_plain_rename(struct inod
if (retval)
goto end_rename;
}
+ old_file_type = old.de->file_type;
+ if (whiteout) {
+ retval = ext4_setent(handle, &old, whiteout->i_ino,
+ EXT4_FT_CHRDEV);
+ if (retval)
+ goto end_rename;
+ ext4_mark_inode_dirty(handle, whiteout);
+ }
if (!new.bh) {
retval = ext4_add_entry(handle, new.dentry, old.inode);
if (retval)
goto end_rename;
} else {
retval = ext4_setent(handle, &new,
- old.inode->i_ino, old.de->file_type);
+ old.inode->i_ino, old_file_type);
if (retval)
goto end_rename;
}
@@ -3243,10 +3277,12 @@ static int ext4_plain_rename(struct inod
old.inode->i_ctime = ext4_current_time(old.inode);
ext4_mark_inode_dirty(handle, old.inode);

- /*
- * ok, that's it
- */
- ext4_rename_delete(handle, &old);
+ if (!whiteout) {
+ /*
+ * ok, that's it
+ */
+ ext4_rename_delete(handle, &old);
+ }

if (new.inode) {
ext4_dec_count(handle, new.inode);
@@ -3282,6 +3318,12 @@ static int ext4_plain_rename(struct inod
brelse(old.dir_bh);
brelse(old.bh);
brelse(new.bh);
+ if (whiteout) {
+ if (retval)
+ drop_nlink(whiteout);
+ unlock_new_inode(whiteout);
+ iput(whiteout);
+ }
if (handle)
ext4_journal_stop(handle);
return retval;
@@ -3407,7 +3449,7 @@ static int ext4_rename(struct inode *old
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags)
{
- if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
+ if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
return -EINVAL;

if (flags & RENAME_EXCHANGE) {
@@ -3418,7 +3460,8 @@ static int ext4_rename(struct inode *old
* Existence checking was done by the VFS, otherwise "RENAME_NOREPLACE"
* is equivalent to regular rename.
*/
- return ext4_plain_rename(old_dir, old_dentry, new_dir, new_dentry);
+ return ext4_plain_rename(old_dir, old_dentry, new_dir, new_dentry,
+ flags);
}

/*
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4165,12 +4165,16 @@ SYSCALL_DEFINE5(renameat2, int, olddfd,
bool should_retry = false;
int error;

- if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
+ if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
return -EINVAL;

- if ((flags & RENAME_NOREPLACE) && (flags & RENAME_EXCHANGE))
+ if ((flags & (RENAME_NOREPLACE | RENAME_WHITEOUT)) &&
+ (flags & RENAME_EXCHANGE))
return -EINVAL;

+ if ((flags & RENAME_WHITEOUT) && !capable(CAP_MKNOD))
+ return -EPERM;
+
retry:
from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
if (IS_ERR(from)) {
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -37,6 +37,7 @@

#define RENAME_NOREPLACE (1 << 0) /* Don't overwrite target */
#define RENAME_EXCHANGE (1 << 1) /* Exchange source and dest */
+#define RENAME_WHITEOUT (1 << 2) /* Whiteout source */

struct fstrim_range {
__u64 start;
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -215,6 +215,13 @@ typedef void (dio_iodone_t)(struct kiocb
#define ATTR_TIMES_SET (1 << 16)

/*
+ * Whiteout is represented by a char device. The following constants define the
+ * mode and device number to use.
+ */
+#define WHITEOUT_MODE 0
+#define WHITEOUT_DEV 0
+
+/*
* This is the Inode Attributes structure, used for notify_change(). It
* uses the above definitions as flags, to know which values have changed.
* Also, in this manner, a Filesystem can look at only the values it cares

2014-02-24 17:49:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Mon, Feb 24, 2014 at 9:12 AM, Miklos Szeredi <[email protected]> wrote:
>
> This patch uses the 0/0 device number, but the actual number doesn't matter
> as long as it doesn't conflict with a real device.

Side note: I think 0/0 is the right choice, for a very specific
reason: it is already documented as being special. No other
combination has that.

We've had "major number 0" documented as being for unnamed devices,
and minor 0 is "reserved as null device number", which is just bad
documentation (it's *not* /dev/null, it doesn't exist). You cannot
register a character device with mijor/minor 0 in Linux, for example.

(The block layer similarly considers MKDEV(0,0) to be an unallocated device)

Linus

2014-02-25 04:07:17

by J. R. Okajima

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4


Miklos Szeredi:
> As Linus' suggestion, a whiteout is represented as a dummy char device.
> This patch uses the 0/0 device number, but the actual number doesn't matter
> as long as it doesn't conflict with a real device.

I have no objection about the char device.
But why do we need an inode for every whiteout? I'd suggest making a
hardlink. For some filesystems which don't support hardlinks, we have to
consume an inode per whiteout. But when the fs supports hardlinks, we
can re-use the inode and consume a few inodes only.


J. R. Okajima

2014-02-26 15:15:26

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 00/13] cross rename v4

On Mon 24-02-14 18:12:38, Miklos Szeredi wrote:
> On Thu, Feb 13, 2014 at 09:28:50PM +0100, Miklos Szeredi wrote:
> > On Thu, Feb 13, 2014 at 8:32 PM, Linus Torvalds
> > <[email protected]> wrote:
>
> > > (I do think we should allow creation - but for root only - for
> > > management and testing purposes, but I really think it's a secondary
> > > issue, and I do think we should literally use "mknod()" - either with
> > > a new S_IFWHT or even just making use of existing S_IFCHR just so you
> > > could use the user-space "mknod" to create it with some magic
> > > major/minor combination.
> >
> > And IMO the magic S_IFCHR is a lot better in many respects than a new
> > filetype, since now all backup tools automatically work. And I think
> > that's a lot more important than looking like a nice new design.
> > Sure, if S_IFWHT was there from the start, it would be wonderful. But
> > as it stands, it's a lot more difficult to add support for such a
> > thing to userspace than adding a hack, using the existing intefaces,
> > to the kernel.
>
> And here's a patch to actually implement the "rename and whiteout source"
> operation.
>
> Jan, I suspect that the "credits" calculation for the number of journal
> blocks is excessive, since here we don't actually need to create the
> directory entry, only create the inode and populate an already existing
> entry. But I don't undestand the logic of ext4 enough to see what's the
> right number to use here.
Yes. It should be enough to reserve
(EXT4_MAXQUOTAS_TRANS_BLOCKS(sb) + EXT4_XATTR_TRANS_BLOCKS + 4) blocks
(for inode block, sb block, group summaries, and inode bitmap). Please add
this summary to the comment before the estimate.

> And another ext4 specific question: I added ext4_setent(old, whiteout) before
> ext4_add_entry(new, old.inode) because I suspect from the comment in
> ext4_rename_delete() that ext4_add_entry() invalidates the looked up entry. Is
> that correct?
Yes, that is correct. As soon as you add any entry into a directory, any
looked up position in that directory may be invalid (the directory is
implemented as a tree and addition can trigger balancing which moves
things around in the directory).

> Are there any other traps in this area to look out for?
:-) Hum, no I cannot remember any.

Honza

> ----
> Subject: vfs: add RENAME_WHITEOUT
> From: Miklos Szeredi <[email protected]>
>
> This adds a new RENAME_WHITEOUT flag. This flag makes rename() create a
> whiteout of source. The whiteout creation is atomic relative to the
> rename.
>
> As Linus' suggestion, a whiteout is represented as a dummy char device.
> This patch uses the 0/0 device number, but the actual number doesn't matter
> as long as it doesn't conflict with a real device.
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/ext4/namei.c | 69 ++++++++++++++++++++++++++++++++++++++----------
> fs/namei.c | 8 ++++-
> include/linux/fs.h | 7 ++++
> include/uapi/linux/fs.h | 1
> 4 files changed, 70 insertions(+), 15 deletions(-)
>
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3157,7 +3157,8 @@ static void ext4_update_dir_count(handle
> * This comes from rename(const char *oldpath, const char *newpath)
> */
> static int ext4_plain_rename(struct inode *old_dir, struct dentry *old_dentry,
> - struct inode *new_dir, struct dentry *new_dentry)
> + struct inode *new_dir, struct dentry *new_dentry,
> + unsigned int flags)
> {
> handle_t *handle = NULL;
> struct ext4_renament old = {
> @@ -3171,6 +3172,9 @@ static int ext4_plain_rename(struct inod
> .inode = new_dentry->d_inode,
> };
> int retval;
> + struct inode *whiteout = NULL;
> + int credits;
> + u8 old_file_type;
>
> dquot_initialize(old.dir);
> dquot_initialize(new.dir);
> @@ -3202,11 +3206,33 @@ static int ext4_plain_rename(struct inod
> if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
> ext4_alloc_da_blocks(old.inode);
>
> - handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
> - (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
> - if (IS_ERR(handle))
> - return PTR_ERR(handle);
> + credits = (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2);
> + if (!(flags & RENAME_WHITEOUT)) {
> + handle = ext4_journal_start(old.dir, EXT4_HT_DIR, credits);
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> + } else {
> + int retries = 0;
> +
> + credits += (EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
> +retry:
> + whiteout = ext4_new_inode_start_handle(old.dir, S_IFCHR | WHITEOUT_MODE, &old.dentry->d_name, 0, NULL, EXT4_HT_DIR, credits);
> + handle = ext4_journal_current_handle();
> + if (IS_ERR(whiteout)) {
> + if (handle)
> + ext4_journal_stop(handle);
> + retval = PTR_ERR(whiteout);
> + if (retval == -ENOSPC &&
> + ext4_should_retry_alloc(old.dir->i_sb, &retries))
> + goto retry;
> +
> + return retval;
> + }
> + init_special_inode(whiteout, whiteout->i_mode, WHITEOUT_DEV);
> + whiteout->i_op = &ext4_special_inode_operations;
> + }
>
> if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
> ext4_handle_sync(handle);
> @@ -3225,13 +3251,21 @@ static int ext4_plain_rename(struct inod
> if (retval)
> goto end_rename;
> }
> + old_file_type = old.de->file_type;
> + if (whiteout) {
> + retval = ext4_setent(handle, &old, whiteout->i_ino,
> + EXT4_FT_CHRDEV);
> + if (retval)
> + goto end_rename;
> + ext4_mark_inode_dirty(handle, whiteout);
> + }
> if (!new.bh) {
> retval = ext4_add_entry(handle, new.dentry, old.inode);
> if (retval)
> goto end_rename;
> } else {
> retval = ext4_setent(handle, &new,
> - old.inode->i_ino, old.de->file_type);
> + old.inode->i_ino, old_file_type);
> if (retval)
> goto end_rename;
> }
> @@ -3243,10 +3277,12 @@ static int ext4_plain_rename(struct inod
> old.inode->i_ctime = ext4_current_time(old.inode);
> ext4_mark_inode_dirty(handle, old.inode);
>
> - /*
> - * ok, that's it
> - */
> - ext4_rename_delete(handle, &old);
> + if (!whiteout) {
> + /*
> + * ok, that's it
> + */
> + ext4_rename_delete(handle, &old);
> + }
>
> if (new.inode) {
> ext4_dec_count(handle, new.inode);
> @@ -3282,6 +3318,12 @@ static int ext4_plain_rename(struct inod
> brelse(old.dir_bh);
> brelse(old.bh);
> brelse(new.bh);
> + if (whiteout) {
> + if (retval)
> + drop_nlink(whiteout);
> + unlock_new_inode(whiteout);
> + iput(whiteout);
> + }
> if (handle)
> ext4_journal_stop(handle);
> return retval;
> @@ -3407,7 +3449,7 @@ static int ext4_rename(struct inode *old
> struct inode *new_dir, struct dentry *new_dentry,
> unsigned int flags)
> {
> - if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
> + if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
> return -EINVAL;
>
> if (flags & RENAME_EXCHANGE) {
> @@ -3418,7 +3460,8 @@ static int ext4_rename(struct inode *old
> * Existence checking was done by the VFS, otherwise "RENAME_NOREPLACE"
> * is equivalent to regular rename.
> */
> - return ext4_plain_rename(old_dir, old_dentry, new_dir, new_dentry);
> + return ext4_plain_rename(old_dir, old_dentry, new_dir, new_dentry,
> + flags);
> }
>
> /*
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4165,12 +4165,16 @@ SYSCALL_DEFINE5(renameat2, int, olddfd,
> bool should_retry = false;
> int error;
>
> - if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
> + if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
> return -EINVAL;
>
> - if ((flags & RENAME_NOREPLACE) && (flags & RENAME_EXCHANGE))
> + if ((flags & (RENAME_NOREPLACE | RENAME_WHITEOUT)) &&
> + (flags & RENAME_EXCHANGE))
> return -EINVAL;
>
> + if ((flags & RENAME_WHITEOUT) && !capable(CAP_MKNOD))
> + return -EPERM;
> +
> retry:
> from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
> if (IS_ERR(from)) {
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -37,6 +37,7 @@
>
> #define RENAME_NOREPLACE (1 << 0) /* Don't overwrite target */
> #define RENAME_EXCHANGE (1 << 1) /* Exchange source and dest */
> +#define RENAME_WHITEOUT (1 << 2) /* Whiteout source */
>
> struct fstrim_range {
> __u64 start;
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -215,6 +215,13 @@ typedef void (dio_iodone_t)(struct kiocb
> #define ATTR_TIMES_SET (1 << 16)
>
> /*
> + * Whiteout is represented by a char device. The following constants define the
> + * mode and device number to use.
> + */
> +#define WHITEOUT_MODE 0
> +#define WHITEOUT_DEV 0
> +
> +/*
> * This is the Inode Attributes structure, used for notify_change(). It
> * uses the above definitions as flags, to know which values have changed.
> * Also, in this manner, a Filesystem can look at only the values it cares
--
Jan Kara <[email protected]>
SUSE Labs, CR