2013-10-01 16:00:36

by Miklos Szeredi

[permalink] [raw]
Subject: [RFC PATCH 0/7] cross rename

This series adds a new syscall, renameat2(), which is the same as renameat() but
with a flags argument. Internally i_op->reaname2() is also added, which can
later be merged with ->rename() but is kept separately for now, since this would
just blow up this patch without helping review.

The purpose of extending rename is to add cross-rename, a symmetric variant of
rename, which exchanges the two files. This allows interesting things, which
were not possible before, for example atomically replacing a directory tree with
a symlink, etc...

The other reason to introduce this is for whiteout handling in union/overlay
solutions in an atomic manner without having to add complex code to each
filesystem's rmdir, mkdir and rename just for handling whiteouts. With
cross-rename it becomes possible to handle whiteouts in a generic manner in most
of the cases:

rmdir(P) where P needs to be whiteout:
- create-whiteout(orphan/X)
- exchange(orphan/X, P)
- rmdir(orphan/X)

mkdir(P) where P is currently a whiteout:
- mkdir(orphan/X)
- exchange(orphan/X, P)
- unlink(orphan/X)

rename(P, Q) where P needs to be whiteout and Q is negative:
- create-whiteout(Q)
- exchange(P, Q)

rename(P, Q) where Q is a directory containing only whiteouts:
- mkdir(orphan/X) (marked as opaque)
- exchange(orphan/X, Q)
- recursive-remove(orphan/X)
- rename(P, Q)

The case that cannot be handled with cross rename is:

rename(P, Q) where P needs to be whiteout and Q exists

For this case a new rename flag will be propsed that atomically creates the
whiteout.

Thanks,
Miklos

---
Miklos Szeredi (7):
vfs: rename: move d_move() up
vfs: rename: use common code for dir and non-dir
vfs: add renameat2 syscall and cross-rename
ext4: rename: create ext4_renament structure for local vars
ext4: rename: move EMLINK check up
ext4: rename: split out helper functions
ext4: add cross rename support

---
arch/x86/syscalls/syscall_64.tbl | 1 +
fs/dcache.c | 46 ++++-
fs/ext4/namei.c | 379 +++++++++++++++++++++++++--------------
fs/namei.c | 218 ++++++++++++----------
include/linux/dcache.h | 1 +
include/linux/fs.h | 2 +
include/uapi/linux/fs.h | 2 +
7 files changed, 415 insertions(+), 234 deletions(-)


2013-10-01 16:00:19

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 5/7] ext4: rename: move EMLINK check up

From: Miklos Szeredi <[email protected]>

Move checking i_nlink from after ext4_get_first_dir_block() to before. The
check doesn't rely on the result of that function and the function only
fails on fs corruption, so the order shouldn't matter.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/ext4/namei.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 01bd80e..1348251 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3082,6 +3082,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
retval = -ENOTEMPTY;
if (!empty_dir(new.inode))
goto end_rename;
+ } else {
+ retval = -EMLINK;
+ if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
+ goto end_rename;
}
retval = -EIO;
old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
@@ -3091,10 +3095,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
goto end_rename;
if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
goto end_rename;
- retval = -EMLINK;
- if (!new.inode && new.dir != old.dir &&
- EXT4_DIR_LINK_MAX(new.dir))
- goto end_rename;
BUFFER_TRACE(old.dir_bh, "get_write_access");
retval = ext4_journal_get_write_access(handle, old.dir_bh);
if (retval)
--
1.8.1.4

2013-10-01 16:00:27

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 3/7] vfs: add renameat2 syscall and cross-rename

From: Miklos Szeredi <[email protected]>

Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

If flags is zero then this is a plain overwriting rename. If flags contain
RENAME_EXCHANGE then exchange source and destination files. There's no
restriction on the type of the files; e.g. a directory can be exchanged
with a symlink.

Signed-off-by: Miklos Szeredi <[email protected]>
---
arch/x86/syscalls/syscall_64.tbl | 1 +
fs/dcache.c | 46 ++++++++++---
fs/namei.c | 139 ++++++++++++++++++++++++++++++---------
include/linux/dcache.h | 1 +
include/linux/fs.h | 2 +
include/uapi/linux/fs.h | 2 +
6 files changed, 152 insertions(+), 39 deletions(-)

diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 38ae65d..fafd734 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -320,6 +320,7 @@
311 64 process_vm_writev sys_process_vm_writev
312 common kcmp sys_kcmp
313 common finit_module sys_finit_module
+314 common renameat2 sys_renameat2

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/dcache.c b/fs/dcache.c
index 4100030..1735bac 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2495,12 +2495,14 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
dentry->d_name.name = dentry->d_iname;
} else {
/*
- * Both are internal. Just copy target to dentry
+ * Both are internal.
*/
- memcpy(dentry->d_iname, target->d_name.name,
- target->d_name.len + 1);
- dentry->d_name.len = target->d_name.len;
- return;
+ unsigned int i;
+ BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
+ for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+ swap(((long *) &dentry->d_iname)[i],
+ ((long *) &target->d_iname)[i]);
+ }
}
}
swap(dentry->d_name.len, target->d_name.len);
@@ -2557,13 +2559,15 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
* __d_move - move a dentry
* @dentry: entry to move
* @target: new dentry
+ * @exchange: exchange the two dentries
*
* Update the dcache to reflect the move of a file name. Negative
* dcache entries should not be moved in this way. Caller must hold
* rename_lock, the i_mutex of the source and target directories,
* and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
*/
-static void __d_move(struct dentry * dentry, struct dentry * target)
+static void __d_move(struct dentry *dentry, struct dentry *target,
+ bool exchange)
{
if (!dentry->d_inode)
printk(KERN_WARNING "VFS: moving negative dcache entry\n");
@@ -2587,6 +2591,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)

/* Unhash the target: dput() will then get rid of it */
__d_drop(target);
+ if (exchange) {
+ __d_rehash(target,
+ d_hash(dentry->d_parent, dentry->d_name.hash));
+ }

list_del(&dentry->d_u.d_child);
list_del(&target->d_u.d_child);
@@ -2613,6 +2621,8 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
write_seqcount_end(&dentry->d_seq);

dentry_unlock_parents_for_move(dentry, target);
+ if (exchange)
+ fsnotify_d_move(target);
spin_unlock(&target->d_lock);
fsnotify_d_move(dentry);
spin_unlock(&dentry->d_lock);
@@ -2630,11 +2640,31 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
void d_move(struct dentry *dentry, struct dentry *target)
{
write_seqlock(&rename_lock);
- __d_move(dentry, target);
+ __d_move(dentry, target, false);
write_sequnlock(&rename_lock);
}
EXPORT_SYMBOL(d_move);

+/*
+ * d_exchange - exchange two dentries
+ * @dentry1: first dentry
+ * @dentry2: second dentry
+ */
+void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
+{
+ write_seqlock(&rename_lock);
+
+ WARN_ON(!dentry1->d_inode);
+ WARN_ON(!dentry2->d_inode);
+ WARN_ON(IS_ROOT(dentry1));
+ WARN_ON(IS_ROOT(dentry2));
+
+ __d_move(dentry1, dentry2, true);
+
+ write_sequnlock(&rename_lock);
+}
+
+
/**
* d_ancestor - search for an ancestor
* @p1: ancestor dentry
@@ -2682,7 +2712,7 @@ static struct dentry *__d_unalias(struct inode *inode,
m2 = &alias->d_parent->d_inode->i_mutex;
out_unalias:
if (likely(!d_mountpoint(alias))) {
- __d_move(alias, dentry);
+ __d_move(alias, dentry, false);
ret = alias;
}
out_err:
diff --git a/fs/namei.c b/fs/namei.c
index 7ec6a12..55700b3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3963,14 +3963,18 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
* ->i_mutex on parents, which works but leads to some truly excessive
* locking].
*/
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+static int vfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
int error;
const unsigned char *old_name;
struct inode *source = old_dentry->d_inode;
struct inode *target = new_dentry->d_inode;
bool is_dir = S_ISDIR(source->i_mode);
+ bool new_is_dir = target ? S_ISDIR(target->i_mode) : false;
+ bool overwrite = !(flags & RENAME_EXCHANGE);
+ unsigned max_links = new_dir->i_sb->s_max_links;

if (source == target)
return 0;
@@ -3981,74 +3985,116 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,

if (!target)
error = may_create(new_dir, new_dentry);
- else
+ else if (overwrite)
error = may_delete(new_dir, new_dentry, is_dir);
+ else
+ error = may_delete(new_dir, new_dentry, new_is_dir);
if (error)
return error;

- if (!old_dir->i_op->rename)
+ if (!old_dir->i_op->rename && !old_dir->i_op->rename2)
return -EPERM;

+ if (flags && !old_dir->i_op->rename2)
+ return -EOPNOTSUPP;
+
/*
* If we are going to change the parent - check write permissions,
* we'll need to flip '..'.
*/
- if (is_dir && new_dir != old_dir) {
- error = inode_permission(source, MAY_WRITE);
- if (error)
- return error;
+ if (new_dir != old_dir) {
+ if (is_dir) {
+ error = inode_permission(source, MAY_WRITE);
+ if (error)
+ return error;
+ }
+ if (!overwrite && new_is_dir) {
+ error = inode_permission(target, MAY_WRITE);
+ if (error)
+ return error;
+ }
}

error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
if (error)
return error;

+ if (!overwrite) {
+ error = security_inode_rename(new_dir, new_dentry,
+ old_dir, old_dentry);
+ if (error)
+ return error;
+ }
+
old_name = fsnotify_oldname_init(old_dentry->d_name.name);
dget(new_dentry);
- if (target)
+ if (overwrite && target)
mutex_lock(&target->i_mutex);

error = -EBUSY;
if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
goto out;

- if (is_dir) {
- unsigned max_links = new_dir->i_sb->s_max_links;
-
+ if (max_links && new_dir != old_dir) {
error = -EMLINK;
- if (max_links && !target && new_dir != old_dir &&
- new_dir->i_nlink >= max_links)
+ if (is_dir && !new_is_dir && new_dir->i_nlink >= max_links)
goto out;
+ if (!overwrite && !is_dir && new_is_dir &&
+ old_dir->i_nlink > max_links)
+ goto out;
+ }
+
+ if (overwrite && is_dir && target)
+ shrink_dcache_parent(new_dentry);

- if (target)
- shrink_dcache_parent(new_dentry);
+ if (old_dir->i_op->rename2) {
+ error = old_dir->i_op->rename2(old_dir, old_dentry,
+ new_dir, new_dentry, flags);
+ } else {
+ error = old_dir->i_op->rename(old_dir, old_dentry,
+ new_dir, new_dentry);
}

- error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (error)
goto out;

- if (target) {
+ if (overwrite && target) {
if (is_dir)
target->i_flags |= S_DEAD;
dont_mount(new_dentry);
}
- if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
- d_move(old_dentry, new_dentry);
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
+ if (overwrite)
+ d_move(old_dentry, new_dentry);
+ else
+ d_exchange(old_dentry, new_dentry);
+ }
out:
- if (target)
+ if (overwrite && target)
mutex_unlock(&target->i_mutex);
dput(new_dentry);
- if (!error)
+ if (!error) {
fsnotify_move(old_dir, new_dir, old_name, is_dir,
- target, old_dentry);
+ overwrite ? target : NULL, old_dentry);
+ if (!overwrite) {
+ fsnotify_move(new_dir, old_dir, old_dentry->d_name.name,
+ new_is_dir, NULL, new_dentry);
+ }
+ }
fsnotify_oldname_free(old_name);

return error;
}

-SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
- int, newdfd, const char __user *, newname)
+int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ return vfs_rename2(old_dir, old_dentry, new_dir, new_dentry, 0);
+}
+
+
+SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
+ int, newdfd, const char __user *, newname, unsigned int, flags)
{
struct dentry *old_dir, *new_dir;
struct dentry *old_dentry, *new_dentry;
@@ -4058,7 +4104,13 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
struct filename *to;
unsigned int lookup_flags = 0;
bool should_retry = false;
+ bool overwrite = !(flags & RENAME_EXCHANGE);
int error;
+
+ error = -EOPNOTSUPP;
+ if (flags & ~RENAME_EXCHANGE)
+ goto exit;
+
retry:
from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
if (IS_ERR(from)) {
@@ -4091,7 +4143,8 @@ retry:

oldnd.flags &= ~LOOKUP_PARENT;
newnd.flags &= ~LOOKUP_PARENT;
- newnd.flags |= LOOKUP_RENAME_TARGET;
+ if (overwrite)
+ newnd.flags |= LOOKUP_RENAME_TARGET;

trap = lock_rename(new_dir, old_dir);

@@ -4108,7 +4161,7 @@ retry:
error = -ENOTDIR;
if (oldnd.last.name[oldnd.last.len])
goto exit4;
- if (newnd.last.name[newnd.last.len])
+ if (overwrite && newnd.last.name[newnd.last.len])
goto exit4;
}
/* source should not be ancestor of target */
@@ -4119,8 +4172,19 @@ retry:
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto exit4;
+ if (!overwrite) {
+ error = -ENOENT;
+ if (!new_dentry->d_inode)
+ goto exit4;
+
+ if (!S_ISDIR(new_dentry->d_inode->i_mode)) {
+ error = -ENOTDIR;
+ if (newnd.last.name[newnd.last.len])
+ goto exit4;
+ }
+ }
/* target should not be an ancestor of source */
- error = -ENOTEMPTY;
+ error = overwrite ? -ENOTEMPTY : -EINVAL;
if (new_dentry == trap)
goto exit5;

@@ -4128,8 +4192,15 @@ retry:
&newnd.path, new_dentry);
if (error)
goto exit5;
- error = vfs_rename(old_dir->d_inode, old_dentry,
- new_dir->d_inode, new_dentry);
+ if (!overwrite) {
+ error = security_path_rename(&newnd.path, new_dentry,
+ &oldnd.path, old_dentry);
+ if (error)
+ goto exit5;
+ }
+
+ error = vfs_rename2(old_dir->d_inode, old_dentry,
+ new_dir->d_inode, new_dentry, flags);
exit5:
dput(new_dentry);
exit4:
@@ -4154,9 +4225,15 @@ exit:
return error;
}

+SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
+ int, newdfd, const char __user *, newname)
+{
+ return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
+}
+
SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
{
- return sys_renameat(AT_FDCWD, oldname, AT_FDCWD, newname);
+ return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
}

int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 59066e0..ce5ebed 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -297,6 +297,7 @@ extern void dentry_update_name_case(struct dentry *, struct qstr *);

/* used for rename() and baskets */
extern void d_move(struct dentry *, struct dentry *);
+extern void d_exchange(struct dentry *, struct dentry *);
extern struct dentry *d_ancestor(struct dentry *, struct dentry *);

/* appendix may either be NULL or be used for transname suffixes */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f40547..71c6cf9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1572,6 +1572,8 @@ struct inode_operations {
int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
+ int (*rename2) (struct inode *, struct dentry *,
+ struct inode *, struct dentry *, unsigned int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 6c28b61..ebdafb6 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -35,6 +35,8 @@
#define SEEK_HOLE 4 /* seek to the next hole */
#define SEEK_MAX SEEK_HOLE

+#define RENAME_EXCHANGE (1 << 0) /* Exchange source and dest */
+
struct fstrim_range {
__u64 start;
__u64 len;
--
1.8.1.4

2013-10-01 16:00:30

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 1/7] vfs: rename: move d_move() up

From: Miklos Szeredi <[email protected]>

Move the d_move() in vfs_rename_dir() up, similarly to how it's done in
vfs_rename_other(). The next patch will consolidate these two functions
and this is the only structural difference between them.

I'm not sure if doing the d_move() after the dput is even valid. But there
may be a logical explanation for that. But moving the d_move() before the
dput() (and the mutex_unlock()) should definitely not hurt.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/namei.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 645268f..911aadc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4007,13 +4007,12 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
target->i_flags |= S_DEAD;
dont_mount(new_dentry);
}
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+ d_move(old_dentry, new_dentry);
out:
if (target)
mutex_unlock(&target->i_mutex);
dput(new_dentry);
- if (!error)
- if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
- d_move(old_dentry,new_dentry);
return error;
}

--
1.8.1.4

2013-10-01 16:00:39

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 2/7] vfs: rename: use common code for dir and non-dir

From: Miklos Szeredi <[email protected]>

There's actually very little difference between vfs_rename_dir() and
vfs_rename_other() so move both inline into vfs_rename() which still stays
reasonably readable.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/namei.c | 118 ++++++++++++++++++++-----------------------------------------
1 file changed, 39 insertions(+), 79 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 911aadc..7ec6a12 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3963,19 +3963,38 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
* ->i_mutex on parents, which works but leads to some truly excessive
* locking].
*/
-static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
{
- int error = 0;
+ int error;
+ const unsigned char *old_name;
+ struct inode *source = old_dentry->d_inode;
struct inode *target = new_dentry->d_inode;
- unsigned max_links = new_dir->i_sb->s_max_links;
+ bool is_dir = S_ISDIR(source->i_mode);
+
+ if (source == target)
+ return 0;
+
+ error = may_delete(old_dir, old_dentry, is_dir);
+ if (error)
+ return error;
+
+ if (!target)
+ error = may_create(new_dir, new_dentry);
+ else
+ error = may_delete(new_dir, new_dentry, is_dir);
+ if (error)
+ return error;
+
+ if (!old_dir->i_op->rename)
+ return -EPERM;

/*
* If we are going to change the parent - check write permissions,
* we'll need to flip '..'.
*/
- if (new_dir != old_dir) {
- error = inode_permission(old_dentry->d_inode, MAY_WRITE);
+ if (is_dir && new_dir != old_dir) {
+ error = inode_permission(source, MAY_WRITE);
if (error)
return error;
}
@@ -3984,6 +4003,7 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
if (error)
return error;

+ old_name = fsnotify_oldname_init(old_dentry->d_name.name);
dget(new_dentry);
if (target)
mutex_lock(&target->i_mutex);
@@ -3992,96 +4012,36 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
goto out;

- error = -EMLINK;
- if (max_links && !target && new_dir != old_dir &&
- new_dir->i_nlink >= max_links)
- goto out;
+ if (is_dir) {
+ unsigned max_links = new_dir->i_sb->s_max_links;

- if (target)
- shrink_dcache_parent(new_dentry);
- error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
- if (error)
- goto out;
+ error = -EMLINK;
+ if (max_links && !target && new_dir != old_dir &&
+ new_dir->i_nlink >= max_links)
+ goto out;

- if (target) {
- target->i_flags |= S_DEAD;
- dont_mount(new_dentry);
+ if (target)
+ shrink_dcache_parent(new_dentry);
}
- if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
- d_move(old_dentry, new_dentry);
-out:
- if (target)
- mutex_unlock(&target->i_mutex);
- dput(new_dentry);
- return error;
-}
-
-static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
-{
- struct inode *target = new_dentry->d_inode;
- int error;
-
- error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
- if (error)
- return error;
-
- dget(new_dentry);
- if (target)
- mutex_lock(&target->i_mutex);
-
- error = -EBUSY;
- if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
- goto out;

error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (error)
goto out;

- if (target)
+ if (target) {
+ if (is_dir)
+ target->i_flags |= S_DEAD;
dont_mount(new_dentry);
+ }
if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
d_move(old_dentry, new_dentry);
out:
if (target)
mutex_unlock(&target->i_mutex);
dput(new_dentry);
- return error;
-}
-
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
-{
- int error;
- int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
- const unsigned char *old_name;
-
- if (old_dentry->d_inode == new_dentry->d_inode)
- return 0;
-
- error = may_delete(old_dir, old_dentry, is_dir);
- if (error)
- return error;
-
- if (!new_dentry->d_inode)
- error = may_create(new_dir, new_dentry);
- else
- error = may_delete(new_dir, new_dentry, is_dir);
- if (error)
- return error;
-
- if (!old_dir->i_op->rename)
- return -EPERM;
-
- old_name = fsnotify_oldname_init(old_dentry->d_name.name);
-
- if (is_dir)
- error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
- else
- error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error)
fsnotify_move(old_dir, new_dir, old_name, is_dir,
- new_dentry->d_inode, old_dentry);
+ target, old_dentry);
fsnotify_oldname_free(old_name);

return error;
--
1.8.1.4

2013-10-01 16:00:21

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars

From: Miklos Szeredi <[email protected]>

Need to split up ext4_rename() into helpers but there are two many local
variables involved, so create a new structure. This also, apparently,
makes the generated code size slightly smaller.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/ext4/namei.c | 207 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 110 insertions(+), 97 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1bec5a5..01bd80e 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3002,6 +3002,18 @@ static struct buffer_head *ext4_get_first_dir_block(handle_t *handle,
return ext4_get_first_inline_block(inode, parent_de, retval);
}

+struct ext4_renament {
+ struct inode *dir;
+ struct dentry *dentry;
+ struct inode *inode;
+ struct buffer_head *bh;
+ struct buffer_head *dir_bh;
+ struct ext4_dir_entry_2 *de;
+ struct ext4_dir_entry_2 *parent_de;
+ int inlined;
+ int parent_inlined;
+};
+
/*
* Anybody can rename anything with this: the permission checks are left to the
* higher-level routines.
@@ -3014,193 +3026,194 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
handle_t *handle = NULL;
- struct inode *old_inode, *new_inode;
- struct buffer_head *old_bh, *new_bh, *dir_bh;
- struct ext4_dir_entry_2 *old_de, *new_de;
+ struct ext4_renament old = {
+ .dir = old_dir,
+ .dentry = old_dentry,
+ };
+ struct ext4_renament new = {
+ .dir = new_dir,
+ .dentry = new_dentry,
+ };
int retval;
- int inlined = 0, new_inlined = 0;
- struct ext4_dir_entry_2 *parent_de;
-
- dquot_initialize(old_dir);
- dquot_initialize(new_dir);

- old_bh = new_bh = dir_bh = NULL;
+ dquot_initialize(old.dir);
+ dquot_initialize(new.dir);

/* Initialize quotas before so that eventual writes go
* in separate transaction */
- if (new_dentry->d_inode)
- dquot_initialize(new_dentry->d_inode);
+ if (new.dentry->d_inode)
+ dquot_initialize(new.dentry->d_inode);

- old_bh = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de, NULL);
+ old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
/*
* Check for inode number is _not_ due to possible IO errors.
* We might rmdir the source, keep it as pwd of some process
* and merrily kill the link to whatever was created under the
* same name. Goodbye sticky bit ;-<
*/
- old_inode = old_dentry->d_inode;
+ old.inode = old.dentry->d_inode;
retval = -ENOENT;
- if (!old_bh || le32_to_cpu(old_de->inode) != old_inode->i_ino)
+ if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
goto end_rename;

- new_inode = new_dentry->d_inode;
- new_bh = ext4_find_entry(new_dir, &new_dentry->d_name,
- &new_de, &new_inlined);
- if (new_bh) {
- if (!new_inode) {
- brelse(new_bh);
- new_bh = NULL;
+ new.inode = new.dentry->d_inode;
+ new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
+ &new.de, &new.inlined);
+ if (new.bh) {
+ if (!new.inode) {
+ brelse(new.bh);
+ new.bh = NULL;
}
}
- if (new_inode && !test_opt(new_dir->i_sb, NO_AUTO_DA_ALLOC))
- ext4_alloc_da_blocks(old_inode);
+ if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
+ ext4_alloc_da_blocks(old.inode);

- handle = ext4_journal_start(old_dir, EXT4_HT_DIR,
- (2 * EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
+ handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
+ (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
if (IS_ERR(handle))
return PTR_ERR(handle);

- if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
+ if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
ext4_handle_sync(handle);

- if (S_ISDIR(old_inode->i_mode)) {
- if (new_inode) {
+ if (S_ISDIR(old.inode->i_mode)) {
+ if (new.inode) {
retval = -ENOTEMPTY;
- if (!empty_dir(new_inode))
+ if (!empty_dir(new.inode))
goto end_rename;
}
retval = -EIO;
- dir_bh = ext4_get_first_dir_block(handle, old_inode,
- &retval, &parent_de,
- &inlined);
- if (!dir_bh)
+ old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
+ &retval, &old.parent_de,
+ &old.parent_inlined);
+ if (!old.dir_bh)
goto end_rename;
- if (le32_to_cpu(parent_de->inode) != old_dir->i_ino)
+ if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
goto end_rename;
retval = -EMLINK;
- if (!new_inode && new_dir != old_dir &&
- EXT4_DIR_LINK_MAX(new_dir))
+ if (!new.inode && new.dir != old.dir &&
+ EXT4_DIR_LINK_MAX(new.dir))
goto end_rename;
- BUFFER_TRACE(dir_bh, "get_write_access");
- retval = ext4_journal_get_write_access(handle, dir_bh);
+ BUFFER_TRACE(old.dir_bh, "get_write_access");
+ retval = ext4_journal_get_write_access(handle, old.dir_bh);
if (retval)
goto end_rename;
}
- if (!new_bh) {
- retval = ext4_add_entry(handle, new_dentry, old_inode);
+ if (!new.bh) {
+ retval = ext4_add_entry(handle, new.dentry, old.inode);
if (retval)
goto end_rename;
} else {
- BUFFER_TRACE(new_bh, "get write access");
- retval = ext4_journal_get_write_access(handle, new_bh);
+ BUFFER_TRACE(new.bh, "get write access");
+ retval = ext4_journal_get_write_access(handle, new.bh);
if (retval)
goto end_rename;
- new_de->inode = cpu_to_le32(old_inode->i_ino);
- if (EXT4_HAS_INCOMPAT_FEATURE(new_dir->i_sb,
+ new.de->inode = cpu_to_le32(old.inode->i_ino);
+ if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
EXT4_FEATURE_INCOMPAT_FILETYPE))
- new_de->file_type = old_de->file_type;
- new_dir->i_version++;
- new_dir->i_ctime = new_dir->i_mtime =
- ext4_current_time(new_dir);
- ext4_mark_inode_dirty(handle, new_dir);
- BUFFER_TRACE(new_bh, "call ext4_handle_dirty_metadata");
- if (!new_inlined) {
+ new.de->file_type = old.de->file_type;
+ new.dir->i_version++;
+ new.dir->i_ctime = new.dir->i_mtime =
+ ext4_current_time(new.dir);
+ ext4_mark_inode_dirty(handle, new.dir);
+ BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
+ if (!new.inlined) {
retval = ext4_handle_dirty_dirent_node(handle,
- new_dir, new_bh);
+ new.dir, new.bh);
if (unlikely(retval)) {
- ext4_std_error(new_dir->i_sb, retval);
+ ext4_std_error(new.dir->i_sb, retval);
goto end_rename;
}
}
- brelse(new_bh);
- new_bh = NULL;
+ brelse(new.bh);
+ new.bh = NULL;
}

/*
* Like most other Unix systems, set the ctime for inodes on a
* rename.
*/
- old_inode->i_ctime = ext4_current_time(old_inode);
- ext4_mark_inode_dirty(handle, old_inode);
+ old.inode->i_ctime = ext4_current_time(old.inode);
+ ext4_mark_inode_dirty(handle, old.inode);

/*
* ok, that's it
*/
- if (le32_to_cpu(old_de->inode) != old_inode->i_ino ||
- old_de->name_len != old_dentry->d_name.len ||
- strncmp(old_de->name, old_dentry->d_name.name, old_de->name_len) ||
- (retval = ext4_delete_entry(handle, old_dir,
- old_de, old_bh)) == -ENOENT) {
- /* old_de could have moved from under us during htree split, so
+ if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
+ old.de->name_len != old.dentry->d_name.len ||
+ strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
+ (retval = ext4_delete_entry(handle, old.dir,
+ old.de, old.bh)) == -ENOENT) {
+ /* old.de could have moved from under us during htree split, so
* make sure that we are deleting the right entry. We might
* also be pointing to a stale entry in the unused part of
- * old_bh so just checking inum and the name isn't enough. */
+ * old.bh so just checking inum and the name isn't enough. */
struct buffer_head *old_bh2;
struct ext4_dir_entry_2 *old_de2;

- old_bh2 = ext4_find_entry(old_dir, &old_dentry->d_name,
+ old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
&old_de2, NULL);
if (old_bh2) {
- retval = ext4_delete_entry(handle, old_dir,
+ retval = ext4_delete_entry(handle, old.dir,
old_de2, old_bh2);
brelse(old_bh2);
}
}
if (retval) {
- ext4_warning(old_dir->i_sb,
+ ext4_warning(old.dir->i_sb,
"Deleting old file (%lu), %d, error=%d",
- old_dir->i_ino, old_dir->i_nlink, retval);
+ old.dir->i_ino, old.dir->i_nlink, retval);
}

- if (new_inode) {
- ext4_dec_count(handle, new_inode);
- new_inode->i_ctime = ext4_current_time(new_inode);
+ if (new.inode) {
+ ext4_dec_count(handle, new.inode);
+ new.inode->i_ctime = ext4_current_time(new.inode);
}
- old_dir->i_ctime = old_dir->i_mtime = ext4_current_time(old_dir);
- ext4_update_dx_flag(old_dir);
- if (dir_bh) {
- parent_de->inode = cpu_to_le32(new_dir->i_ino);
- BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
- if (!inlined) {
- if (is_dx(old_inode)) {
+ old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
+ ext4_update_dx_flag(old.dir);
+ if (old.dir_bh) {
+ old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
+ BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
+ if (!old.parent_inlined) {
+ if (is_dx(old.inode)) {
retval = ext4_handle_dirty_dx_node(handle,
- old_inode,
- dir_bh);
+ old.inode,
+ old.dir_bh);
} else {
retval = ext4_handle_dirty_dirent_node(handle,
- old_inode, dir_bh);
+ old.inode, old.dir_bh);
}
} else {
- retval = ext4_mark_inode_dirty(handle, old_inode);
+ retval = ext4_mark_inode_dirty(handle, old.inode);
}
if (retval) {
- ext4_std_error(old_dir->i_sb, retval);
+ ext4_std_error(old.dir->i_sb, retval);
goto end_rename;
}
- ext4_dec_count(handle, old_dir);
- if (new_inode) {
+ ext4_dec_count(handle, old.dir);
+ if (new.inode) {
/* checked empty_dir above, can't have another parent,
* ext4_dec_count() won't work for many-linked dirs */
- clear_nlink(new_inode);
+ clear_nlink(new.inode);
} else {
- ext4_inc_count(handle, new_dir);
- ext4_update_dx_flag(new_dir);
- ext4_mark_inode_dirty(handle, new_dir);
+ ext4_inc_count(handle, new.dir);
+ ext4_update_dx_flag(new.dir);
+ ext4_mark_inode_dirty(handle, new.dir);
}
}
- ext4_mark_inode_dirty(handle, old_dir);
- if (new_inode) {
- ext4_mark_inode_dirty(handle, new_inode);
- if (!new_inode->i_nlink)
- ext4_orphan_add(handle, new_inode);
+ ext4_mark_inode_dirty(handle, old.dir);
+ if (new.inode) {
+ ext4_mark_inode_dirty(handle, new.inode);
+ if (!new.inode->i_nlink)
+ ext4_orphan_add(handle, new.inode);
}
retval = 0;

end_rename:
- brelse(dir_bh);
- brelse(old_bh);
- brelse(new_bh);
+ brelse(old.dir_bh);
+ brelse(old.bh);
+ brelse(new.bh);
if (handle)
ext4_journal_stop(handle);
return retval;
--
1.8.1.4

2013-10-01 16:02:04

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 7/7] ext4: add cross rename support

From: Miklos Szeredi <[email protected]>

Implement RENAME_EXCHANGE flag in renameat2 syscall.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/ext4/namei.c | 103 +++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 73 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index fb0f1db..6d87a09 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3141,8 +3141,9 @@ static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
* while new_{dentry,inode) refers to the destination dentry/inode
* This comes from rename(const char *oldpath, const char *newpath)
*/
-static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
{
handle_t *handle = NULL;
struct ext4_renament old = {
@@ -3154,16 +3155,18 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
.dentry = new_dentry,
};
int retval;
+ bool overwrite = !(flags & RENAME_EXCHANGE);

dquot_initialize(old.dir);
dquot_initialize(new.dir);

/* Initialize quotas before so that eventual writes go
* in separate transaction */
- if (new.dentry->d_inode)
+ if (overwrite && new.dentry->d_inode)
dquot_initialize(new.dentry->d_inode);

- old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
+ old.bh = ext4_find_entry(old.dir, &old.dentry->d_name,
+ &old.de, &old.inlined);
/*
* Check for inode number is _not_ due to possible IO errors.
* We might rmdir the source, keep it as pwd of some process
@@ -3178,18 +3181,22 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
new.inode = new.dentry->d_inode;
new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
&new.de, &new.inlined);
- if (new.bh) {
- if (!new.inode) {
- brelse(new.bh);
- new.bh = NULL;
+ if (overwrite) {
+ if (new.bh) {
+ if (!new.inode) {
+ brelse(new.bh);
+ new.bh = NULL;
+ }
}
+ if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
+ ext4_alloc_da_blocks(old.inode);
+ } else if (!new.bh || le32_to_cpu(new.de->inode) != new.inode->i_ino) {
+ goto end_rename;
}
- if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
- ext4_alloc_da_blocks(old.inode);

handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
(2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
+ 2 * EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -3197,11 +3204,12 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
ext4_handle_sync(handle);

if (S_ISDIR(old.inode->i_mode)) {
- if (new.inode) {
+ if (overwrite && new.inode) {
retval = -ENOTEMPTY;
if (!empty_dir(new.inode))
goto end_rename;
- } else {
+ }
+ if (!new.inode || !S_ISDIR(new.inode->i_mode)) {
retval = -EMLINK;
if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
goto end_rename;
@@ -3210,15 +3218,34 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (retval)
goto end_rename;
}
+ if (!overwrite && S_ISDIR(new.inode->i_mode)) {
+ if (!S_ISDIR(old.inode->i_mode)) {
+ retval = -EMLINK;
+ if (new.dir != old.dir && EXT4_DIR_LINK_MAX(old.dir))
+ goto end_rename;
+ }
+ retval = ext4_rename_dir_prepare(handle, &new);
+ if (retval)
+ goto end_rename;
+ }
+
if (!new.bh) {
retval = ext4_add_entry(handle, new.dentry, old.inode);
if (retval)
goto end_rename;
} else {
+ u8 new_file_type = new.de->file_type;
retval = ext4_setent(handle, &new,
old.inode->i_ino, old.de->file_type);
if (retval)
goto end_rename;
+
+ if (!overwrite) {
+ retval = ext4_setent(handle, &old,
+ new.inode->i_ino, new_file_type);
+ if (retval)
+ goto end_rename;
+ }
}

/*
@@ -3228,35 +3255,51 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
old.inode->i_ctime = ext4_current_time(old.inode);
ext4_mark_inode_dirty(handle, old.inode);

- /*
- * ok, that's it
- */
- ext4_rename_delete(handle, &old);
+ if (overwrite) {
+ /*
+ * ok, that's it
+ */
+ ext4_rename_delete(handle, &old);

- if (new.inode) {
- ext4_dec_count(handle, new.inode);
- new.inode->i_ctime = ext4_current_time(new.inode);
+ old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
+ ext4_update_dx_flag(old.dir);
}
- old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
- ext4_update_dx_flag(old.dir);
+ /* S_ISDIR(old.inode->i_mode */
if (old.dir_bh) {
retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
if (retval)
goto end_rename;

- ext4_dec_count(handle, old.dir);
- if (new.inode) {
- /* checked empty_dir above, can't have another parent,
- * ext4_dec_count() won't work for many-linked dirs */
- clear_nlink(new.inode);
- } else {
+ if (overwrite || !S_ISDIR(new.inode->i_mode))
+ ext4_dec_count(handle, old.dir);
+
+ if (!new.inode || !S_ISDIR(new.inode->i_mode)) {
ext4_inc_count(handle, new.dir);
ext4_update_dx_flag(new.dir);
ext4_mark_inode_dirty(handle, new.dir);
}
}
+ /* !overwrite && S_ISDIR(new.inode->i_mode */
+ if (new.dir_bh) {
+ retval = ext4_rename_dir_finish(handle, &new, old.dir->i_ino);
+ if (retval)
+ goto end_rename;
+
+ if (!S_ISDIR(old.inode->i_mode)) {
+ ext4_dec_count(handle, new.dir);
+ ext4_inc_count(handle, old.dir);
+ ext4_mark_inode_dirty(handle, new.dir);
+ }
+ }
ext4_mark_inode_dirty(handle, old.dir);
- if (new.inode) {
+ if (overwrite && new.inode) {
+ ext4_dec_count(handle, new.inode);
+ new.inode->i_ctime = ext4_current_time(new.inode);
+ if (S_ISDIR(old.inode->i_mode)) {
+ /* checked empty_dir above, can't have another parent,
+ * ext4_dec_count() won't work for many-linked dirs */
+ clear_nlink(new.inode);
+ }
ext4_mark_inode_dirty(handle, new.inode);
if (!new.inode->i_nlink)
ext4_orphan_add(handle, new.inode);
@@ -3285,7 +3328,7 @@ const struct inode_operations ext4_dir_inode_operations = {
.rmdir = ext4_rmdir,
.mknod = ext4_mknod,
.tmpfile = ext4_tmpfile,
- .rename = ext4_rename,
+ .rename2 = ext4_rename2,
.setattr = ext4_setattr,
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
--
1.8.1.4

2013-10-01 16:02:26

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 6/7] ext4: rename: split out helper functions

From: Miklos Szeredi <[email protected]>

Cross rename (exchange source and dest) will need to call some of these
helpers for both source and dest, while overwriting rename currently only
calls them for one or the other. This also makes the code easier to
follow.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/ext4/namei.c | 199 +++++++++++++++++++++++++++++++++++---------------------
1 file changed, 126 insertions(+), 73 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1348251..fb0f1db 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3014,6 +3014,125 @@ struct ext4_renament {
int parent_inlined;
};

+static int ext4_rename_dir_prepare(handle_t *handle, struct ext4_renament *ent)
+{
+ int retval;
+
+ ent->dir_bh = ext4_get_first_dir_block(handle, ent->inode,
+ &retval, &ent->parent_de,
+ &ent->parent_inlined);
+ if (!ent->dir_bh)
+ return retval;
+ if (le32_to_cpu(ent->parent_de->inode) != ent->dir->i_ino)
+ return -EIO;
+ BUFFER_TRACE(ent->dir_bh, "get_write_access");
+ return ext4_journal_get_write_access(handle, ent->dir_bh);
+}
+
+static int ext4_rename_dir_finish(handle_t *handle, struct ext4_renament *ent,
+ unsigned dir_ino)
+{
+ int retval;
+
+ ent->parent_de->inode = cpu_to_le32(dir_ino);
+ BUFFER_TRACE(ent->dir_bh, "call ext4_handle_dirty_metadata");
+ if (!ent->parent_inlined) {
+ if (is_dx(ent->inode)) {
+ retval = ext4_handle_dirty_dx_node(handle,
+ ent->inode,
+ ent->dir_bh);
+ } else {
+ retval = ext4_handle_dirty_dirent_node(handle,
+ ent->inode,
+ ent->dir_bh);
+ }
+ } else {
+ retval = ext4_mark_inode_dirty(handle, ent->inode);
+ }
+ if (retval) {
+ ext4_std_error(ent->dir->i_sb, retval);
+ return retval;
+ }
+ return 0;
+}
+
+static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
+ unsigned ino, unsigned file_type)
+{
+ int retval;
+
+ BUFFER_TRACE(ent->bh, "get write access");
+ retval = ext4_journal_get_write_access(handle, ent->bh);
+ if (retval)
+ return retval;
+ ent->de->inode = cpu_to_le32(ino);
+ if (EXT4_HAS_INCOMPAT_FEATURE(ent->dir->i_sb,
+ EXT4_FEATURE_INCOMPAT_FILETYPE))
+ ent->de->file_type = file_type;
+ ent->dir->i_version++;
+ ent->dir->i_ctime = ent->dir->i_mtime =
+ ext4_current_time(ent->dir);
+ ext4_mark_inode_dirty(handle, ent->dir);
+ BUFFER_TRACE(ent->bh, "call ext4_handle_dirty_metadata");
+ if (!ent->inlined) {
+ retval = ext4_handle_dirty_dirent_node(handle,
+ ent->dir, ent->bh);
+ if (unlikely(retval)) {
+ ext4_std_error(ent->dir->i_sb, retval);
+ return retval;
+ }
+ }
+ brelse(ent->bh);
+ ent->bh = NULL;
+
+ return 0;
+}
+
+static int ext4_find_delete_entry(handle_t *handle, struct inode *dir,
+ const struct qstr *d_name)
+{
+ int retval = -ENOENT;
+ struct buffer_head *bh;
+ struct ext4_dir_entry_2 *de;
+
+ bh = ext4_find_entry(dir, d_name, &de, NULL);
+ if (bh) {
+ retval = ext4_delete_entry(handle, dir, de, bh);
+ brelse(bh);
+ }
+ return retval;
+}
+
+static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
+{
+ int retval;
+ /*
+ * ent->de could have moved from under us during htree split, so make
+ * sure that we are deleting the right entry. We might also be pointing
+ * to a stale entry in the unused part of ent->bh so just checking inum
+ * and the name isn't enough.
+ */
+ if (le32_to_cpu(ent->de->inode) != ent->inode->i_ino ||
+ ent->de->name_len != ent->dentry->d_name.len ||
+ strncmp(ent->de->name, ent->dentry->d_name.name,
+ ent->de->name_len)) {
+ retval = ext4_find_delete_entry(handle, ent->dir,
+ &ent->dentry->d_name);
+ } else {
+ retval = ext4_delete_entry(handle, ent->dir, ent->de, ent->bh);
+ if (retval == -ENOENT) {
+ retval = ext4_find_delete_entry(handle, ent->dir,
+ &ent->dentry->d_name);
+ }
+ }
+
+ if (retval) {
+ ext4_warning(ent->dir->i_sb,
+ "Deleting old file (%lu), %d, error=%d",
+ ent->dir->i_ino, ent->dir->i_nlink, retval);
+ }
+}
+
/*
* Anybody can rename anything with this: the permission checks are left to the
* higher-level routines.
@@ -3087,16 +3206,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
goto end_rename;
}
- retval = -EIO;
- old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
- &retval, &old.parent_de,
- &old.parent_inlined);
- if (!old.dir_bh)
- goto end_rename;
- if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
- goto end_rename;
- BUFFER_TRACE(old.dir_bh, "get_write_access");
- retval = ext4_journal_get_write_access(handle, old.dir_bh);
+ retval = ext4_rename_dir_prepare(handle, &old);
if (retval)
goto end_rename;
}
@@ -3105,29 +3215,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (retval)
goto end_rename;
} else {
- BUFFER_TRACE(new.bh, "get write access");
- retval = ext4_journal_get_write_access(handle, new.bh);
+ retval = ext4_setent(handle, &new,
+ old.inode->i_ino, old.de->file_type);
if (retval)
goto end_rename;
- new.de->inode = cpu_to_le32(old.inode->i_ino);
- if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
- EXT4_FEATURE_INCOMPAT_FILETYPE))
- new.de->file_type = old.de->file_type;
- new.dir->i_version++;
- new.dir->i_ctime = new.dir->i_mtime =
- ext4_current_time(new.dir);
- ext4_mark_inode_dirty(handle, new.dir);
- BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
- if (!new.inlined) {
- retval = ext4_handle_dirty_dirent_node(handle,
- new.dir, new.bh);
- if (unlikely(retval)) {
- ext4_std_error(new.dir->i_sb, retval);
- goto end_rename;
- }
- }
- brelse(new.bh);
- new.bh = NULL;
}

/*
@@ -3140,31 +3231,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
/*
* ok, that's it
*/
- if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
- old.de->name_len != old.dentry->d_name.len ||
- strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
- (retval = ext4_delete_entry(handle, old.dir,
- old.de, old.bh)) == -ENOENT) {
- /* old.de could have moved from under us during htree split, so
- * make sure that we are deleting the right entry. We might
- * also be pointing to a stale entry in the unused part of
- * old.bh so just checking inum and the name isn't enough. */
- struct buffer_head *old_bh2;
- struct ext4_dir_entry_2 *old_de2;
-
- old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
- &old_de2, NULL);
- if (old_bh2) {
- retval = ext4_delete_entry(handle, old.dir,
- old_de2, old_bh2);
- brelse(old_bh2);
- }
- }
- if (retval) {
- ext4_warning(old.dir->i_sb,
- "Deleting old file (%lu), %d, error=%d",
- old.dir->i_ino, old.dir->i_nlink, retval);
- }
+ ext4_rename_delete(handle, &old);

if (new.inode) {
ext4_dec_count(handle, new.inode);
@@ -3173,24 +3240,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
ext4_update_dx_flag(old.dir);
if (old.dir_bh) {
- old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
- BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
- if (!old.parent_inlined) {
- if (is_dx(old.inode)) {
- retval = ext4_handle_dirty_dx_node(handle,
- old.inode,
- old.dir_bh);
- } else {
- retval = ext4_handle_dirty_dirent_node(handle,
- old.inode, old.dir_bh);
- }
- } else {
- retval = ext4_mark_inode_dirty(handle, old.inode);
- }
- if (retval) {
- ext4_std_error(old.dir->i_sb, retval);
+ retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
+ if (retval)
goto end_rename;
- }
+
ext4_dec_count(handle, old.dir);
if (new.inode) {
/* checked empty_dir above, can't have another parent,
--
1.8.1.4

2013-10-02 12:01:48

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars

On Tue 01-10-13 18:00:36, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> Need to split up ext4_rename() into helpers but there are two many local
> variables involved, so create a new structure. This also, apparently,
> makes the generated code size slightly smaller.
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/ext4/namei.c | 207 ++++++++++++++++++++++++++++++--------------------------
> 1 file changed, 110 insertions(+), 97 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 1bec5a5..01bd80e 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3002,6 +3002,18 @@ static struct buffer_head *ext4_get_first_dir_block(handle_t *handle,
> return ext4_get_first_inline_block(inode, parent_de, retval);
> }
>
> +struct ext4_renament {
> + struct inode *dir;
> + struct dentry *dentry;
> + struct inode *inode;
> + struct buffer_head *bh;
> + struct buffer_head *dir_bh;
> + struct ext4_dir_entry_2 *de;
> + struct ext4_dir_entry_2 *parent_de;
> + int inlined;
> + int parent_inlined;
> +};
The patch looks good to me. Only the inline handling looks a bit strange.
The structure have inlined and parent_inlined however you only use
new.inlined and old.parent_inlined. Now new.inlined actually means whether
new.dir is inlined (i.e. the destination directory of rename) and
old.parent_inlined is set when the source for rename is an inlined
directory. So at least the naming looks the other way around to me.

Also as the code currently is including 'inlined' and 'parent_inlined' in
ext4_renament looks like overdoing it a bit. But I haven't checked the
other two patches yet.

Honza
> /*
> * Anybody can rename anything with this: the permission checks are left to the
> * higher-level routines.
> @@ -3014,193 +3026,194 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> struct inode *new_dir, struct dentry *new_dentry)
> {
> handle_t *handle = NULL;
> - struct inode *old_inode, *new_inode;
> - struct buffer_head *old_bh, *new_bh, *dir_bh;
> - struct ext4_dir_entry_2 *old_de, *new_de;
> + struct ext4_renament old = {
> + .dir = old_dir,
> + .dentry = old_dentry,
> + };
> + struct ext4_renament new = {
> + .dir = new_dir,
> + .dentry = new_dentry,
> + };
> int retval;
> - int inlined = 0, new_inlined = 0;
> - struct ext4_dir_entry_2 *parent_de;
> -
> - dquot_initialize(old_dir);
> - dquot_initialize(new_dir);
>
> - old_bh = new_bh = dir_bh = NULL;
> + dquot_initialize(old.dir);
> + dquot_initialize(new.dir);
>
> /* Initialize quotas before so that eventual writes go
> * in separate transaction */
> - if (new_dentry->d_inode)
> - dquot_initialize(new_dentry->d_inode);
> + if (new.dentry->d_inode)
> + dquot_initialize(new.dentry->d_inode);
>
> - old_bh = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de, NULL);
> + old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
> /*
> * Check for inode number is _not_ due to possible IO errors.
> * We might rmdir the source, keep it as pwd of some process
> * and merrily kill the link to whatever was created under the
> * same name. Goodbye sticky bit ;-<
> */
> - old_inode = old_dentry->d_inode;
> + old.inode = old.dentry->d_inode;
> retval = -ENOENT;
> - if (!old_bh || le32_to_cpu(old_de->inode) != old_inode->i_ino)
> + if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
> goto end_rename;
>
> - new_inode = new_dentry->d_inode;
> - new_bh = ext4_find_entry(new_dir, &new_dentry->d_name,
> - &new_de, &new_inlined);
> - if (new_bh) {
> - if (!new_inode) {
> - brelse(new_bh);
> - new_bh = NULL;
> + new.inode = new.dentry->d_inode;
> + new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
> + &new.de, &new.inlined);
> + if (new.bh) {
> + if (!new.inode) {
> + brelse(new.bh);
> + new.bh = NULL;
> }
> }
> - if (new_inode && !test_opt(new_dir->i_sb, NO_AUTO_DA_ALLOC))
> - ext4_alloc_da_blocks(old_inode);
> + if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
> + ext4_alloc_da_blocks(old.inode);
>
> - handle = ext4_journal_start(old_dir, EXT4_HT_DIR,
> - (2 * EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
> + handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
> + (2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
> EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
> - if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
> + if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
> ext4_handle_sync(handle);
>
> - if (S_ISDIR(old_inode->i_mode)) {
> - if (new_inode) {
> + if (S_ISDIR(old.inode->i_mode)) {
> + if (new.inode) {
> retval = -ENOTEMPTY;
> - if (!empty_dir(new_inode))
> + if (!empty_dir(new.inode))
> goto end_rename;
> }
> retval = -EIO;
> - dir_bh = ext4_get_first_dir_block(handle, old_inode,
> - &retval, &parent_de,
> - &inlined);
> - if (!dir_bh)
> + old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
> + &retval, &old.parent_de,
> + &old.parent_inlined);
> + if (!old.dir_bh)
> goto end_rename;
> - if (le32_to_cpu(parent_de->inode) != old_dir->i_ino)
> + if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
> goto end_rename;
> retval = -EMLINK;
> - if (!new_inode && new_dir != old_dir &&
> - EXT4_DIR_LINK_MAX(new_dir))
> + if (!new.inode && new.dir != old.dir &&
> + EXT4_DIR_LINK_MAX(new.dir))
> goto end_rename;
> - BUFFER_TRACE(dir_bh, "get_write_access");
> - retval = ext4_journal_get_write_access(handle, dir_bh);
> + BUFFER_TRACE(old.dir_bh, "get_write_access");
> + retval = ext4_journal_get_write_access(handle, old.dir_bh);
> if (retval)
> goto end_rename;
> }
> - if (!new_bh) {
> - retval = ext4_add_entry(handle, new_dentry, old_inode);
> + if (!new.bh) {
> + retval = ext4_add_entry(handle, new.dentry, old.inode);
> if (retval)
> goto end_rename;
> } else {
> - BUFFER_TRACE(new_bh, "get write access");
> - retval = ext4_journal_get_write_access(handle, new_bh);
> + BUFFER_TRACE(new.bh, "get write access");
> + retval = ext4_journal_get_write_access(handle, new.bh);
> if (retval)
> goto end_rename;
> - new_de->inode = cpu_to_le32(old_inode->i_ino);
> - if (EXT4_HAS_INCOMPAT_FEATURE(new_dir->i_sb,
> + new.de->inode = cpu_to_le32(old.inode->i_ino);
> + if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
> EXT4_FEATURE_INCOMPAT_FILETYPE))
> - new_de->file_type = old_de->file_type;
> - new_dir->i_version++;
> - new_dir->i_ctime = new_dir->i_mtime =
> - ext4_current_time(new_dir);
> - ext4_mark_inode_dirty(handle, new_dir);
> - BUFFER_TRACE(new_bh, "call ext4_handle_dirty_metadata");
> - if (!new_inlined) {
> + new.de->file_type = old.de->file_type;
> + new.dir->i_version++;
> + new.dir->i_ctime = new.dir->i_mtime =
> + ext4_current_time(new.dir);
> + ext4_mark_inode_dirty(handle, new.dir);
> + BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
> + if (!new.inlined) {
> retval = ext4_handle_dirty_dirent_node(handle,
> - new_dir, new_bh);
> + new.dir, new.bh);
> if (unlikely(retval)) {
> - ext4_std_error(new_dir->i_sb, retval);
> + ext4_std_error(new.dir->i_sb, retval);
> goto end_rename;
> }
> }
> - brelse(new_bh);
> - new_bh = NULL;
> + brelse(new.bh);
> + new.bh = NULL;
> }
>
> /*
> * Like most other Unix systems, set the ctime for inodes on a
> * rename.
> */
> - old_inode->i_ctime = ext4_current_time(old_inode);
> - ext4_mark_inode_dirty(handle, old_inode);
> + old.inode->i_ctime = ext4_current_time(old.inode);
> + ext4_mark_inode_dirty(handle, old.inode);
>
> /*
> * ok, that's it
> */
> - if (le32_to_cpu(old_de->inode) != old_inode->i_ino ||
> - old_de->name_len != old_dentry->d_name.len ||
> - strncmp(old_de->name, old_dentry->d_name.name, old_de->name_len) ||
> - (retval = ext4_delete_entry(handle, old_dir,
> - old_de, old_bh)) == -ENOENT) {
> - /* old_de could have moved from under us during htree split, so
> + if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
> + old.de->name_len != old.dentry->d_name.len ||
> + strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
> + (retval = ext4_delete_entry(handle, old.dir,
> + old.de, old.bh)) == -ENOENT) {
> + /* old.de could have moved from under us during htree split, so
> * make sure that we are deleting the right entry. We might
> * also be pointing to a stale entry in the unused part of
> - * old_bh so just checking inum and the name isn't enough. */
> + * old.bh so just checking inum and the name isn't enough. */
> struct buffer_head *old_bh2;
> struct ext4_dir_entry_2 *old_de2;
>
> - old_bh2 = ext4_find_entry(old_dir, &old_dentry->d_name,
> + old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
> &old_de2, NULL);
> if (old_bh2) {
> - retval = ext4_delete_entry(handle, old_dir,
> + retval = ext4_delete_entry(handle, old.dir,
> old_de2, old_bh2);
> brelse(old_bh2);
> }
> }
> if (retval) {
> - ext4_warning(old_dir->i_sb,
> + ext4_warning(old.dir->i_sb,
> "Deleting old file (%lu), %d, error=%d",
> - old_dir->i_ino, old_dir->i_nlink, retval);
> + old.dir->i_ino, old.dir->i_nlink, retval);
> }
>
> - if (new_inode) {
> - ext4_dec_count(handle, new_inode);
> - new_inode->i_ctime = ext4_current_time(new_inode);
> + if (new.inode) {
> + ext4_dec_count(handle, new.inode);
> + new.inode->i_ctime = ext4_current_time(new.inode);
> }
> - old_dir->i_ctime = old_dir->i_mtime = ext4_current_time(old_dir);
> - ext4_update_dx_flag(old_dir);
> - if (dir_bh) {
> - parent_de->inode = cpu_to_le32(new_dir->i_ino);
> - BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
> - if (!inlined) {
> - if (is_dx(old_inode)) {
> + old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
> + ext4_update_dx_flag(old.dir);
> + if (old.dir_bh) {
> + old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
> + BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
> + if (!old.parent_inlined) {
> + if (is_dx(old.inode)) {
> retval = ext4_handle_dirty_dx_node(handle,
> - old_inode,
> - dir_bh);
> + old.inode,
> + old.dir_bh);
> } else {
> retval = ext4_handle_dirty_dirent_node(handle,
> - old_inode, dir_bh);
> + old.inode, old.dir_bh);
> }
> } else {
> - retval = ext4_mark_inode_dirty(handle, old_inode);
> + retval = ext4_mark_inode_dirty(handle, old.inode);
> }
> if (retval) {
> - ext4_std_error(old_dir->i_sb, retval);
> + ext4_std_error(old.dir->i_sb, retval);
> goto end_rename;
> }
> - ext4_dec_count(handle, old_dir);
> - if (new_inode) {
> + ext4_dec_count(handle, old.dir);
> + if (new.inode) {
> /* checked empty_dir above, can't have another parent,
> * ext4_dec_count() won't work for many-linked dirs */
> - clear_nlink(new_inode);
> + clear_nlink(new.inode);
> } else {
> - ext4_inc_count(handle, new_dir);
> - ext4_update_dx_flag(new_dir);
> - ext4_mark_inode_dirty(handle, new_dir);
> + ext4_inc_count(handle, new.dir);
> + ext4_update_dx_flag(new.dir);
> + ext4_mark_inode_dirty(handle, new.dir);
> }
> }
> - ext4_mark_inode_dirty(handle, old_dir);
> - if (new_inode) {
> - ext4_mark_inode_dirty(handle, new_inode);
> - if (!new_inode->i_nlink)
> - ext4_orphan_add(handle, new_inode);
> + ext4_mark_inode_dirty(handle, old.dir);
> + if (new.inode) {
> + ext4_mark_inode_dirty(handle, new.inode);
> + if (!new.inode->i_nlink)
> + ext4_orphan_add(handle, new.inode);
> }
> retval = 0;
>
> end_rename:
> - brelse(dir_bh);
> - brelse(old_bh);
> - brelse(new_bh);
> + brelse(old.dir_bh);
> + brelse(old.bh);
> + brelse(new.bh);
> if (handle)
> ext4_journal_stop(handle);
> return retval;
> --
> 1.8.1.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-10-02 12:06:02

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 5/7] ext4: rename: move EMLINK check up

On Tue 01-10-13 18:00:37, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> Move checking i_nlink from after ext4_get_first_dir_block() to before. The
> check doesn't rely on the result of that function and the function only
> fails on fs corruption, so the order shouldn't matter.
>
> Signed-off-by: Miklos Szeredi <[email protected]>
This looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/namei.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 01bd80e..1348251 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3082,6 +3082,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> retval = -ENOTEMPTY;
> if (!empty_dir(new.inode))
> goto end_rename;
> + } else {
> + retval = -EMLINK;
> + if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
> + goto end_rename;
> }
> retval = -EIO;
> old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
> @@ -3091,10 +3095,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> goto end_rename;
> if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
> goto end_rename;
> - retval = -EMLINK;
> - if (!new.inode && new.dir != old.dir &&
> - EXT4_DIR_LINK_MAX(new.dir))
> - goto end_rename;
> BUFFER_TRACE(old.dir_bh, "get_write_access");
> retval = ext4_journal_get_write_access(handle, old.dir_bh);
> if (retval)
> --
> 1.8.1.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-10-02 12:19:32

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 6/7] ext4: rename: split out helper functions

On Tue 01-10-13 18:00:38, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> Cross rename (exchange source and dest) will need to call some of these
> helpers for both source and dest, while overwriting rename currently only
> calls them for one or the other. This also makes the code easier to
> follow.
The patch looks good to me. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/ext4/namei.c | 199 +++++++++++++++++++++++++++++++++++---------------------
> 1 file changed, 126 insertions(+), 73 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 1348251..fb0f1db 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3014,6 +3014,125 @@ struct ext4_renament {
> int parent_inlined;
> };
>
> +static int ext4_rename_dir_prepare(handle_t *handle, struct ext4_renament *ent)
> +{
> + int retval;
> +
> + ent->dir_bh = ext4_get_first_dir_block(handle, ent->inode,
> + &retval, &ent->parent_de,
> + &ent->parent_inlined);
> + if (!ent->dir_bh)
> + return retval;
> + if (le32_to_cpu(ent->parent_de->inode) != ent->dir->i_ino)
> + return -EIO;
> + BUFFER_TRACE(ent->dir_bh, "get_write_access");
> + return ext4_journal_get_write_access(handle, ent->dir_bh);
> +}
> +
> +static int ext4_rename_dir_finish(handle_t *handle, struct ext4_renament *ent,
> + unsigned dir_ino)
> +{
> + int retval;
> +
> + ent->parent_de->inode = cpu_to_le32(dir_ino);
> + BUFFER_TRACE(ent->dir_bh, "call ext4_handle_dirty_metadata");
> + if (!ent->parent_inlined) {
> + if (is_dx(ent->inode)) {
> + retval = ext4_handle_dirty_dx_node(handle,
> + ent->inode,
> + ent->dir_bh);
> + } else {
> + retval = ext4_handle_dirty_dirent_node(handle,
> + ent->inode,
> + ent->dir_bh);
> + }
> + } else {
> + retval = ext4_mark_inode_dirty(handle, ent->inode);
> + }
> + if (retval) {
> + ext4_std_error(ent->dir->i_sb, retval);
> + return retval;
> + }
> + return 0;
> +}
> +
> +static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
> + unsigned ino, unsigned file_type)
> +{
> + int retval;
> +
> + BUFFER_TRACE(ent->bh, "get write access");
> + retval = ext4_journal_get_write_access(handle, ent->bh);
> + if (retval)
> + return retval;
> + ent->de->inode = cpu_to_le32(ino);
> + if (EXT4_HAS_INCOMPAT_FEATURE(ent->dir->i_sb,
> + EXT4_FEATURE_INCOMPAT_FILETYPE))
> + ent->de->file_type = file_type;
> + ent->dir->i_version++;
> + ent->dir->i_ctime = ent->dir->i_mtime =
> + ext4_current_time(ent->dir);
> + ext4_mark_inode_dirty(handle, ent->dir);
> + BUFFER_TRACE(ent->bh, "call ext4_handle_dirty_metadata");
> + if (!ent->inlined) {
> + retval = ext4_handle_dirty_dirent_node(handle,
> + ent->dir, ent->bh);
> + if (unlikely(retval)) {
> + ext4_std_error(ent->dir->i_sb, retval);
> + return retval;
> + }
> + }
> + brelse(ent->bh);
> + ent->bh = NULL;
> +
> + return 0;
> +}
> +
> +static int ext4_find_delete_entry(handle_t *handle, struct inode *dir,
> + const struct qstr *d_name)
> +{
> + int retval = -ENOENT;
> + struct buffer_head *bh;
> + struct ext4_dir_entry_2 *de;
> +
> + bh = ext4_find_entry(dir, d_name, &de, NULL);
> + if (bh) {
> + retval = ext4_delete_entry(handle, dir, de, bh);
> + brelse(bh);
> + }
> + return retval;
> +}
> +
> +static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
> +{
> + int retval;
> + /*
> + * ent->de could have moved from under us during htree split, so make
> + * sure that we are deleting the right entry. We might also be pointing
> + * to a stale entry in the unused part of ent->bh so just checking inum
> + * and the name isn't enough.
> + */
> + if (le32_to_cpu(ent->de->inode) != ent->inode->i_ino ||
> + ent->de->name_len != ent->dentry->d_name.len ||
> + strncmp(ent->de->name, ent->dentry->d_name.name,
> + ent->de->name_len)) {
> + retval = ext4_find_delete_entry(handle, ent->dir,
> + &ent->dentry->d_name);
> + } else {
> + retval = ext4_delete_entry(handle, ent->dir, ent->de, ent->bh);
> + if (retval == -ENOENT) {
> + retval = ext4_find_delete_entry(handle, ent->dir,
> + &ent->dentry->d_name);
> + }
> + }
> +
> + if (retval) {
> + ext4_warning(ent->dir->i_sb,
> + "Deleting old file (%lu), %d, error=%d",
> + ent->dir->i_ino, ent->dir->i_nlink, retval);
> + }
> +}
> +
> /*
> * Anybody can rename anything with this: the permission checks are left to the
> * higher-level routines.
> @@ -3087,16 +3206,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
> goto end_rename;
> }
> - retval = -EIO;
> - old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
> - &retval, &old.parent_de,
> - &old.parent_inlined);
> - if (!old.dir_bh)
> - goto end_rename;
> - if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
> - goto end_rename;
> - BUFFER_TRACE(old.dir_bh, "get_write_access");
> - retval = ext4_journal_get_write_access(handle, old.dir_bh);
> + retval = ext4_rename_dir_prepare(handle, &old);
> if (retval)
> goto end_rename;
> }
> @@ -3105,29 +3215,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> if (retval)
> goto end_rename;
> } else {
> - BUFFER_TRACE(new.bh, "get write access");
> - retval = ext4_journal_get_write_access(handle, new.bh);
> + retval = ext4_setent(handle, &new,
> + old.inode->i_ino, old.de->file_type);
> if (retval)
> goto end_rename;
> - new.de->inode = cpu_to_le32(old.inode->i_ino);
> - if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
> - EXT4_FEATURE_INCOMPAT_FILETYPE))
> - new.de->file_type = old.de->file_type;
> - new.dir->i_version++;
> - new.dir->i_ctime = new.dir->i_mtime =
> - ext4_current_time(new.dir);
> - ext4_mark_inode_dirty(handle, new.dir);
> - BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
> - if (!new.inlined) {
> - retval = ext4_handle_dirty_dirent_node(handle,
> - new.dir, new.bh);
> - if (unlikely(retval)) {
> - ext4_std_error(new.dir->i_sb, retval);
> - goto end_rename;
> - }
> - }
> - brelse(new.bh);
> - new.bh = NULL;
> }
>
> /*
> @@ -3140,31 +3231,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> /*
> * ok, that's it
> */
> - if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
> - old.de->name_len != old.dentry->d_name.len ||
> - strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
> - (retval = ext4_delete_entry(handle, old.dir,
> - old.de, old.bh)) == -ENOENT) {
> - /* old.de could have moved from under us during htree split, so
> - * make sure that we are deleting the right entry. We might
> - * also be pointing to a stale entry in the unused part of
> - * old.bh so just checking inum and the name isn't enough. */
> - struct buffer_head *old_bh2;
> - struct ext4_dir_entry_2 *old_de2;
> -
> - old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
> - &old_de2, NULL);
> - if (old_bh2) {
> - retval = ext4_delete_entry(handle, old.dir,
> - old_de2, old_bh2);
> - brelse(old_bh2);
> - }
> - }
> - if (retval) {
> - ext4_warning(old.dir->i_sb,
> - "Deleting old file (%lu), %d, error=%d",
> - old.dir->i_ino, old.dir->i_nlink, retval);
> - }
> + ext4_rename_delete(handle, &old);
>
> if (new.inode) {
> ext4_dec_count(handle, new.inode);
> @@ -3173,24 +3240,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
> ext4_update_dx_flag(old.dir);
> if (old.dir_bh) {
> - old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
> - BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
> - if (!old.parent_inlined) {
> - if (is_dx(old.inode)) {
> - retval = ext4_handle_dirty_dx_node(handle,
> - old.inode,
> - old.dir_bh);
> - } else {
> - retval = ext4_handle_dirty_dirent_node(handle,
> - old.inode, old.dir_bh);
> - }
> - } else {
> - retval = ext4_mark_inode_dirty(handle, old.inode);
> - }
> - if (retval) {
> - ext4_std_error(old.dir->i_sb, retval);
> + retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
> + if (retval)
> goto end_rename;
> - }
> +
> ext4_dec_count(handle, old.dir);
> if (new.inode) {
> /* checked empty_dir above, can't have another parent,
> --
> 1.8.1.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-10-02 12:27:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/7] vfs: add renameat2 syscall and cross-rename

On Tue 01-10-13 18:00:35, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> Add new renameat2 syscall, which is the same as renameat with an added
> flags argument.
>
> If flags is zero then this is a plain overwriting rename. If flags contain
> RENAME_EXCHANGE then exchange source and destination files. There's no
> restriction on the type of the files; e.g. a directory can be exchanged
> with a symlink.
It's not completely clear to me what should happen if RENAME_EXCHANGE is
set but destination doesn't exist. Return -ENOENT or just do standard
rename?

Honza
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> arch/x86/syscalls/syscall_64.tbl | 1 +
> fs/dcache.c | 46 ++++++++++---
> fs/namei.c | 139 ++++++++++++++++++++++++++++++---------
> include/linux/dcache.h | 1 +
> include/linux/fs.h | 2 +
> include/uapi/linux/fs.h | 2 +
> 6 files changed, 152 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
> index 38ae65d..fafd734 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -320,6 +320,7 @@
> 311 64 process_vm_writev sys_process_vm_writev
> 312 common kcmp sys_kcmp
> 313 common finit_module sys_finit_module
> +314 common renameat2 sys_renameat2
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 4100030..1735bac 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2495,12 +2495,14 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
> dentry->d_name.name = dentry->d_iname;
> } else {
> /*
> - * Both are internal. Just copy target to dentry
> + * Both are internal.
> */
> - memcpy(dentry->d_iname, target->d_name.name,
> - target->d_name.len + 1);
> - dentry->d_name.len = target->d_name.len;
> - return;
> + unsigned int i;
> + BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> + for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
> + swap(((long *) &dentry->d_iname)[i],
> + ((long *) &target->d_iname)[i]);
> + }
> }
> }
> swap(dentry->d_name.len, target->d_name.len);
> @@ -2557,13 +2559,15 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
> * __d_move - move a dentry
> * @dentry: entry to move
> * @target: new dentry
> + * @exchange: exchange the two dentries
> *
> * Update the dcache to reflect the move of a file name. Negative
> * dcache entries should not be moved in this way. Caller must hold
> * rename_lock, the i_mutex of the source and target directories,
> * and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
> */
> -static void __d_move(struct dentry * dentry, struct dentry * target)
> +static void __d_move(struct dentry *dentry, struct dentry *target,
> + bool exchange)
> {
> if (!dentry->d_inode)
> printk(KERN_WARNING "VFS: moving negative dcache entry\n");
> @@ -2587,6 +2591,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>
> /* Unhash the target: dput() will then get rid of it */
> __d_drop(target);
> + if (exchange) {
> + __d_rehash(target,
> + d_hash(dentry->d_parent, dentry->d_name.hash));
> + }
>
> list_del(&dentry->d_u.d_child);
> list_del(&target->d_u.d_child);
> @@ -2613,6 +2621,8 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
> write_seqcount_end(&dentry->d_seq);
>
> dentry_unlock_parents_for_move(dentry, target);
> + if (exchange)
> + fsnotify_d_move(target);
> spin_unlock(&target->d_lock);
> fsnotify_d_move(dentry);
> spin_unlock(&dentry->d_lock);
> @@ -2630,11 +2640,31 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
> void d_move(struct dentry *dentry, struct dentry *target)
> {
> write_seqlock(&rename_lock);
> - __d_move(dentry, target);
> + __d_move(dentry, target, false);
> write_sequnlock(&rename_lock);
> }
> EXPORT_SYMBOL(d_move);
>
> +/*
> + * d_exchange - exchange two dentries
> + * @dentry1: first dentry
> + * @dentry2: second dentry
> + */
> +void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
> +{
> + write_seqlock(&rename_lock);
> +
> + WARN_ON(!dentry1->d_inode);
> + WARN_ON(!dentry2->d_inode);
> + WARN_ON(IS_ROOT(dentry1));
> + WARN_ON(IS_ROOT(dentry2));
> +
> + __d_move(dentry1, dentry2, true);
> +
> + write_sequnlock(&rename_lock);
> +}
> +
> +
> /**
> * d_ancestor - search for an ancestor
> * @p1: ancestor dentry
> @@ -2682,7 +2712,7 @@ static struct dentry *__d_unalias(struct inode *inode,
> m2 = &alias->d_parent->d_inode->i_mutex;
> out_unalias:
> if (likely(!d_mountpoint(alias))) {
> - __d_move(alias, dentry);
> + __d_move(alias, dentry, false);
> ret = alias;
> }
> out_err:
> diff --git a/fs/namei.c b/fs/namei.c
> index 7ec6a12..55700b3 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3963,14 +3963,18 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
> * ->i_mutex on parents, which works but leads to some truly excessive
> * locking].
> */
> -int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> - struct inode *new_dir, struct dentry *new_dentry)
> +static int vfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry,
> + unsigned int flags)
> {
> int error;
> const unsigned char *old_name;
> struct inode *source = old_dentry->d_inode;
> struct inode *target = new_dentry->d_inode;
> bool is_dir = S_ISDIR(source->i_mode);
> + bool new_is_dir = target ? S_ISDIR(target->i_mode) : false;
> + bool overwrite = !(flags & RENAME_EXCHANGE);
> + unsigned max_links = new_dir->i_sb->s_max_links;
>
> if (source == target)
> return 0;
> @@ -3981,74 +3985,116 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>
> if (!target)
> error = may_create(new_dir, new_dentry);
> - else
> + else if (overwrite)
> error = may_delete(new_dir, new_dentry, is_dir);
> + else
> + error = may_delete(new_dir, new_dentry, new_is_dir);
> if (error)
> return error;
>
> - if (!old_dir->i_op->rename)
> + if (!old_dir->i_op->rename && !old_dir->i_op->rename2)
> return -EPERM;
>
> + if (flags && !old_dir->i_op->rename2)
> + return -EOPNOTSUPP;
> +
> /*
> * If we are going to change the parent - check write permissions,
> * we'll need to flip '..'.
> */
> - if (is_dir && new_dir != old_dir) {
> - error = inode_permission(source, MAY_WRITE);
> - if (error)
> - return error;
> + if (new_dir != old_dir) {
> + if (is_dir) {
> + error = inode_permission(source, MAY_WRITE);
> + if (error)
> + return error;
> + }
> + if (!overwrite && new_is_dir) {
> + error = inode_permission(target, MAY_WRITE);
> + if (error)
> + return error;
> + }
> }
>
> error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
> if (error)
> return error;
>
> + if (!overwrite) {
> + error = security_inode_rename(new_dir, new_dentry,
> + old_dir, old_dentry);
> + if (error)
> + return error;
> + }
> +
> old_name = fsnotify_oldname_init(old_dentry->d_name.name);
> dget(new_dentry);
> - if (target)
> + if (overwrite && target)
> mutex_lock(&target->i_mutex);
>
> error = -EBUSY;
> if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
> goto out;
>
> - if (is_dir) {
> - unsigned max_links = new_dir->i_sb->s_max_links;
> -
> + if (max_links && new_dir != old_dir) {
> error = -EMLINK;
> - if (max_links && !target && new_dir != old_dir &&
> - new_dir->i_nlink >= max_links)
> + if (is_dir && !new_is_dir && new_dir->i_nlink >= max_links)
> goto out;
> + if (!overwrite && !is_dir && new_is_dir &&
> + old_dir->i_nlink > max_links)
> + goto out;
> + }
> +
> + if (overwrite && is_dir && target)
> + shrink_dcache_parent(new_dentry);
>
> - if (target)
> - shrink_dcache_parent(new_dentry);
> + if (old_dir->i_op->rename2) {
> + error = old_dir->i_op->rename2(old_dir, old_dentry,
> + new_dir, new_dentry, flags);
> + } else {
> + error = old_dir->i_op->rename(old_dir, old_dentry,
> + new_dir, new_dentry);
> }
>
> - error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
> if (error)
> goto out;
>
> - if (target) {
> + if (overwrite && target) {
> if (is_dir)
> target->i_flags |= S_DEAD;
> dont_mount(new_dentry);
> }
> - if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
> - d_move(old_dentry, new_dentry);
> + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
> + if (overwrite)
> + d_move(old_dentry, new_dentry);
> + else
> + d_exchange(old_dentry, new_dentry);
> + }
> out:
> - if (target)
> + if (overwrite && target)
> mutex_unlock(&target->i_mutex);
> dput(new_dentry);
> - if (!error)
> + if (!error) {
> fsnotify_move(old_dir, new_dir, old_name, is_dir,
> - target, old_dentry);
> + overwrite ? target : NULL, old_dentry);
> + if (!overwrite) {
> + fsnotify_move(new_dir, old_dir, old_dentry->d_name.name,
> + new_is_dir, NULL, new_dentry);
> + }
> + }
> fsnotify_oldname_free(old_name);
>
> return error;
> }
>
> -SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
> - int, newdfd, const char __user *, newname)
> +int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry)
> +{
> + return vfs_rename2(old_dir, old_dentry, new_dir, new_dentry, 0);
> +}
> +
> +
> +SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
> + int, newdfd, const char __user *, newname, unsigned int, flags)
> {
> struct dentry *old_dir, *new_dir;
> struct dentry *old_dentry, *new_dentry;
> @@ -4058,7 +4104,13 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
> struct filename *to;
> unsigned int lookup_flags = 0;
> bool should_retry = false;
> + bool overwrite = !(flags & RENAME_EXCHANGE);
> int error;
> +
> + error = -EOPNOTSUPP;
> + if (flags & ~RENAME_EXCHANGE)
> + goto exit;
> +
> retry:
> from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
> if (IS_ERR(from)) {
> @@ -4091,7 +4143,8 @@ retry:
>
> oldnd.flags &= ~LOOKUP_PARENT;
> newnd.flags &= ~LOOKUP_PARENT;
> - newnd.flags |= LOOKUP_RENAME_TARGET;
> + if (overwrite)
> + newnd.flags |= LOOKUP_RENAME_TARGET;
>
> trap = lock_rename(new_dir, old_dir);
>
> @@ -4108,7 +4161,7 @@ retry:
> error = -ENOTDIR;
> if (oldnd.last.name[oldnd.last.len])
> goto exit4;
> - if (newnd.last.name[newnd.last.len])
> + if (overwrite && newnd.last.name[newnd.last.len])
> goto exit4;
> }
> /* source should not be ancestor of target */
> @@ -4119,8 +4172,19 @@ retry:
> error = PTR_ERR(new_dentry);
> if (IS_ERR(new_dentry))
> goto exit4;
> + if (!overwrite) {
> + error = -ENOENT;
> + if (!new_dentry->d_inode)
> + goto exit4;
> +
> + if (!S_ISDIR(new_dentry->d_inode->i_mode)) {
> + error = -ENOTDIR;
> + if (newnd.last.name[newnd.last.len])
> + goto exit4;
> + }
> + }
> /* target should not be an ancestor of source */
> - error = -ENOTEMPTY;
> + error = overwrite ? -ENOTEMPTY : -EINVAL;
> if (new_dentry == trap)
> goto exit5;
>
> @@ -4128,8 +4192,15 @@ retry:
> &newnd.path, new_dentry);
> if (error)
> goto exit5;
> - error = vfs_rename(old_dir->d_inode, old_dentry,
> - new_dir->d_inode, new_dentry);
> + if (!overwrite) {
> + error = security_path_rename(&newnd.path, new_dentry,
> + &oldnd.path, old_dentry);
> + if (error)
> + goto exit5;
> + }
> +
> + error = vfs_rename2(old_dir->d_inode, old_dentry,
> + new_dir->d_inode, new_dentry, flags);
> exit5:
> dput(new_dentry);
> exit4:
> @@ -4154,9 +4225,15 @@ exit:
> return error;
> }
>
> +SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
> + int, newdfd, const char __user *, newname)
> +{
> + return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
> +}
> +
> SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
> {
> - return sys_renameat(AT_FDCWD, oldname, AT_FDCWD, newname);
> + return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
> }
>
> int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 59066e0..ce5ebed 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -297,6 +297,7 @@ extern void dentry_update_name_case(struct dentry *, struct qstr *);
>
> /* used for rename() and baskets */
> extern void d_move(struct dentry *, struct dentry *);
> +extern void d_exchange(struct dentry *, struct dentry *);
> extern struct dentry *d_ancestor(struct dentry *, struct dentry *);
>
> /* appendix may either be NULL or be used for transname suffixes */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3f40547..71c6cf9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1572,6 +1572,8 @@ struct inode_operations {
> int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
> int (*rename) (struct inode *, struct dentry *,
> struct inode *, struct dentry *);
> + int (*rename2) (struct inode *, struct dentry *,
> + struct inode *, struct dentry *, unsigned int);
> int (*setattr) (struct dentry *, struct iattr *);
> int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
> int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 6c28b61..ebdafb6 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -35,6 +35,8 @@
> #define SEEK_HOLE 4 /* seek to the next hole */
> #define SEEK_MAX SEEK_HOLE
>
> +#define RENAME_EXCHANGE (1 << 0) /* Exchange source and dest */
> +
> struct fstrim_range {
> __u64 start;
> __u64 len;
> --
> 1.8.1.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-10-02 14:59:40

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 3/7] vfs: add renameat2 syscall and cross-rename

On Wed, Oct 2, 2013 at 2:26 PM, Jan Kara <[email protected]> wrote:
> On Tue 01-10-13 18:00:35, Miklos Szeredi wrote:
>> From: Miklos Szeredi <[email protected]>
>>
>> Add new renameat2 syscall, which is the same as renameat with an added
>> flags argument.
>>
>> If flags is zero then this is a plain overwriting rename. If flags contain
>> RENAME_EXCHANGE then exchange source and destination files. There's no
>> restriction on the type of the files; e.g. a directory can be exchanged
>> with a symlink.
> It's not completely clear to me what should happen if RENAME_EXCHANGE is
> set but destination doesn't exist. Return -ENOENT or just do standard
> rename?

For RENAME_EXCHANGE the vfs part should ensure this doesn't happen
(it's -ENOENT if either of the dentries are negative).

Another interesting case, which is disallowed for now, is if one of
the dentries is unlinked, e.g. generated by ->tmpfile(). That one
might be useful and the semantics are well defined: the unlinked one
gets linked and the linked one gets unlinked. But it would need
additional code in vfs and filesystems, and for now I'd rather keep
this as simple as possible.

Thanks,
Miklos

>
> Honza
>>
>> Signed-off-by: Miklos Szeredi <[email protected]>
>> ---
>> arch/x86/syscalls/syscall_64.tbl | 1 +
>> fs/dcache.c | 46 ++++++++++---
>> fs/namei.c | 139 ++++++++++++++++++++++++++++++---------
>> include/linux/dcache.h | 1 +
>> include/linux/fs.h | 2 +
>> include/uapi/linux/fs.h | 2 +
>> 6 files changed, 152 insertions(+), 39 deletions(-)
>>
>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
>> index 38ae65d..fafd734 100644
>> --- a/arch/x86/syscalls/syscall_64.tbl
>> +++ b/arch/x86/syscalls/syscall_64.tbl
>> @@ -320,6 +320,7 @@
>> 311 64 process_vm_writev sys_process_vm_writev
>> 312 common kcmp sys_kcmp
>> 313 common finit_module sys_finit_module
>> +314 common renameat2 sys_renameat2
>>
>> #
>> # x32-specific system call numbers start at 512 to avoid cache impact
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index 4100030..1735bac 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -2495,12 +2495,14 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
>> dentry->d_name.name = dentry->d_iname;
>> } else {
>> /*
>> - * Both are internal. Just copy target to dentry
>> + * Both are internal.
>> */
>> - memcpy(dentry->d_iname, target->d_name.name,
>> - target->d_name.len + 1);
>> - dentry->d_name.len = target->d_name.len;
>> - return;
>> + unsigned int i;
>> + BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
>> + for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
>> + swap(((long *) &dentry->d_iname)[i],
>> + ((long *) &target->d_iname)[i]);
>> + }
>> }
>> }
>> swap(dentry->d_name.len, target->d_name.len);
>> @@ -2557,13 +2559,15 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
>> * __d_move - move a dentry
>> * @dentry: entry to move
>> * @target: new dentry
>> + * @exchange: exchange the two dentries
>> *
>> * Update the dcache to reflect the move of a file name. Negative
>> * dcache entries should not be moved in this way. Caller must hold
>> * rename_lock, the i_mutex of the source and target directories,
>> * and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
>> */
>> -static void __d_move(struct dentry * dentry, struct dentry * target)
>> +static void __d_move(struct dentry *dentry, struct dentry *target,
>> + bool exchange)
>> {
>> if (!dentry->d_inode)
>> printk(KERN_WARNING "VFS: moving negative dcache entry\n");
>> @@ -2587,6 +2591,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>>
>> /* Unhash the target: dput() will then get rid of it */
>> __d_drop(target);
>> + if (exchange) {
>> + __d_rehash(target,
>> + d_hash(dentry->d_parent, dentry->d_name.hash));
>> + }
>>
>> list_del(&dentry->d_u.d_child);
>> list_del(&target->d_u.d_child);
>> @@ -2613,6 +2621,8 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>> write_seqcount_end(&dentry->d_seq);
>>
>> dentry_unlock_parents_for_move(dentry, target);
>> + if (exchange)
>> + fsnotify_d_move(target);
>> spin_unlock(&target->d_lock);
>> fsnotify_d_move(dentry);
>> spin_unlock(&dentry->d_lock);
>> @@ -2630,11 +2640,31 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>> void d_move(struct dentry *dentry, struct dentry *target)
>> {
>> write_seqlock(&rename_lock);
>> - __d_move(dentry, target);
>> + __d_move(dentry, target, false);
>> write_sequnlock(&rename_lock);
>> }
>> EXPORT_SYMBOL(d_move);
>>
>> +/*
>> + * d_exchange - exchange two dentries
>> + * @dentry1: first dentry
>> + * @dentry2: second dentry
>> + */
>> +void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
>> +{
>> + write_seqlock(&rename_lock);
>> +
>> + WARN_ON(!dentry1->d_inode);
>> + WARN_ON(!dentry2->d_inode);
>> + WARN_ON(IS_ROOT(dentry1));
>> + WARN_ON(IS_ROOT(dentry2));
>> +
>> + __d_move(dentry1, dentry2, true);
>> +
>> + write_sequnlock(&rename_lock);
>> +}
>> +
>> +
>> /**
>> * d_ancestor - search for an ancestor
>> * @p1: ancestor dentry
>> @@ -2682,7 +2712,7 @@ static struct dentry *__d_unalias(struct inode *inode,
>> m2 = &alias->d_parent->d_inode->i_mutex;
>> out_unalias:
>> if (likely(!d_mountpoint(alias))) {
>> - __d_move(alias, dentry);
>> + __d_move(alias, dentry, false);
>> ret = alias;
>> }
>> out_err:
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 7ec6a12..55700b3 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -3963,14 +3963,18 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
>> * ->i_mutex on parents, which works but leads to some truly excessive
>> * locking].
>> */
>> -int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>> - struct inode *new_dir, struct dentry *new_dentry)
>> +static int vfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
>> + struct inode *new_dir, struct dentry *new_dentry,
>> + unsigned int flags)
>> {
>> int error;
>> const unsigned char *old_name;
>> struct inode *source = old_dentry->d_inode;
>> struct inode *target = new_dentry->d_inode;
>> bool is_dir = S_ISDIR(source->i_mode);
>> + bool new_is_dir = target ? S_ISDIR(target->i_mode) : false;
>> + bool overwrite = !(flags & RENAME_EXCHANGE);
>> + unsigned max_links = new_dir->i_sb->s_max_links;
>>
>> if (source == target)
>> return 0;
>> @@ -3981,74 +3985,116 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>>
>> if (!target)
>> error = may_create(new_dir, new_dentry);
>> - else
>> + else if (overwrite)
>> error = may_delete(new_dir, new_dentry, is_dir);
>> + else
>> + error = may_delete(new_dir, new_dentry, new_is_dir);
>> if (error)
>> return error;
>>
>> - if (!old_dir->i_op->rename)
>> + if (!old_dir->i_op->rename && !old_dir->i_op->rename2)
>> return -EPERM;
>>
>> + if (flags && !old_dir->i_op->rename2)
>> + return -EOPNOTSUPP;
>> +
>> /*
>> * If we are going to change the parent - check write permissions,
>> * we'll need to flip '..'.
>> */
>> - if (is_dir && new_dir != old_dir) {
>> - error = inode_permission(source, MAY_WRITE);
>> - if (error)
>> - return error;
>> + if (new_dir != old_dir) {
>> + if (is_dir) {
>> + error = inode_permission(source, MAY_WRITE);
>> + if (error)
>> + return error;
>> + }
>> + if (!overwrite && new_is_dir) {
>> + error = inode_permission(target, MAY_WRITE);
>> + if (error)
>> + return error;
>> + }
>> }
>>
>> error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
>> if (error)
>> return error;
>>
>> + if (!overwrite) {
>> + error = security_inode_rename(new_dir, new_dentry,
>> + old_dir, old_dentry);
>> + if (error)
>> + return error;
>> + }
>> +
>> old_name = fsnotify_oldname_init(old_dentry->d_name.name);
>> dget(new_dentry);
>> - if (target)
>> + if (overwrite && target)
>> mutex_lock(&target->i_mutex);
>>
>> error = -EBUSY;
>> if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
>> goto out;
>>
>> - if (is_dir) {
>> - unsigned max_links = new_dir->i_sb->s_max_links;
>> -
>> + if (max_links && new_dir != old_dir) {
>> error = -EMLINK;
>> - if (max_links && !target && new_dir != old_dir &&
>> - new_dir->i_nlink >= max_links)
>> + if (is_dir && !new_is_dir && new_dir->i_nlink >= max_links)
>> goto out;
>> + if (!overwrite && !is_dir && new_is_dir &&
>> + old_dir->i_nlink > max_links)
>> + goto out;
>> + }
>> +
>> + if (overwrite && is_dir && target)
>> + shrink_dcache_parent(new_dentry);
>>
>> - if (target)
>> - shrink_dcache_parent(new_dentry);
>> + if (old_dir->i_op->rename2) {
>> + error = old_dir->i_op->rename2(old_dir, old_dentry,
>> + new_dir, new_dentry, flags);
>> + } else {
>> + error = old_dir->i_op->rename(old_dir, old_dentry,
>> + new_dir, new_dentry);
>> }
>>
>> - error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
>> if (error)
>> goto out;
>>
>> - if (target) {
>> + if (overwrite && target) {
>> if (is_dir)
>> target->i_flags |= S_DEAD;
>> dont_mount(new_dentry);
>> }
>> - if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
>> - d_move(old_dentry, new_dentry);
>> + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
>> + if (overwrite)
>> + d_move(old_dentry, new_dentry);
>> + else
>> + d_exchange(old_dentry, new_dentry);
>> + }
>> out:
>> - if (target)
>> + if (overwrite && target)
>> mutex_unlock(&target->i_mutex);
>> dput(new_dentry);
>> - if (!error)
>> + if (!error) {
>> fsnotify_move(old_dir, new_dir, old_name, is_dir,
>> - target, old_dentry);
>> + overwrite ? target : NULL, old_dentry);
>> + if (!overwrite) {
>> + fsnotify_move(new_dir, old_dir, old_dentry->d_name.name,
>> + new_is_dir, NULL, new_dentry);
>> + }
>> + }
>> fsnotify_oldname_free(old_name);
>>
>> return error;
>> }
>>
>> -SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>> - int, newdfd, const char __user *, newname)
>> +int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>> + struct inode *new_dir, struct dentry *new_dentry)
>> +{
>> + return vfs_rename2(old_dir, old_dentry, new_dir, new_dentry, 0);
>> +}
>> +
>> +
>> +SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
>> + int, newdfd, const char __user *, newname, unsigned int, flags)
>> {
>> struct dentry *old_dir, *new_dir;
>> struct dentry *old_dentry, *new_dentry;
>> @@ -4058,7 +4104,13 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>> struct filename *to;
>> unsigned int lookup_flags = 0;
>> bool should_retry = false;
>> + bool overwrite = !(flags & RENAME_EXCHANGE);
>> int error;
>> +
>> + error = -EOPNOTSUPP;
>> + if (flags & ~RENAME_EXCHANGE)
>> + goto exit;
>> +
>> retry:
>> from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
>> if (IS_ERR(from)) {
>> @@ -4091,7 +4143,8 @@ retry:
>>
>> oldnd.flags &= ~LOOKUP_PARENT;
>> newnd.flags &= ~LOOKUP_PARENT;
>> - newnd.flags |= LOOKUP_RENAME_TARGET;
>> + if (overwrite)
>> + newnd.flags |= LOOKUP_RENAME_TARGET;
>>
>> trap = lock_rename(new_dir, old_dir);
>>
>> @@ -4108,7 +4161,7 @@ retry:
>> error = -ENOTDIR;
>> if (oldnd.last.name[oldnd.last.len])
>> goto exit4;
>> - if (newnd.last.name[newnd.last.len])
>> + if (overwrite && newnd.last.name[newnd.last.len])
>> goto exit4;
>> }
>> /* source should not be ancestor of target */
>> @@ -4119,8 +4172,19 @@ retry:
>> error = PTR_ERR(new_dentry);
>> if (IS_ERR(new_dentry))
>> goto exit4;
>> + if (!overwrite) {
>> + error = -ENOENT;
>> + if (!new_dentry->d_inode)
>> + goto exit4;
>> +
>> + if (!S_ISDIR(new_dentry->d_inode->i_mode)) {
>> + error = -ENOTDIR;
>> + if (newnd.last.name[newnd.last.len])
>> + goto exit4;
>> + }
>> + }
>> /* target should not be an ancestor of source */
>> - error = -ENOTEMPTY;
>> + error = overwrite ? -ENOTEMPTY : -EINVAL;
>> if (new_dentry == trap)
>> goto exit5;
>>
>> @@ -4128,8 +4192,15 @@ retry:
>> &newnd.path, new_dentry);
>> if (error)
>> goto exit5;
>> - error = vfs_rename(old_dir->d_inode, old_dentry,
>> - new_dir->d_inode, new_dentry);
>> + if (!overwrite) {
>> + error = security_path_rename(&newnd.path, new_dentry,
>> + &oldnd.path, old_dentry);
>> + if (error)
>> + goto exit5;
>> + }
>> +
>> + error = vfs_rename2(old_dir->d_inode, old_dentry,
>> + new_dir->d_inode, new_dentry, flags);
>> exit5:
>> dput(new_dentry);
>> exit4:
>> @@ -4154,9 +4225,15 @@ exit:
>> return error;
>> }
>>
>> +SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>> + int, newdfd, const char __user *, newname)
>> +{
>> + return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
>> +}
>> +
>> SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
>> {
>> - return sys_renameat(AT_FDCWD, oldname, AT_FDCWD, newname);
>> + return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
>> }
>>
>> int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
>> index 59066e0..ce5ebed 100644
>> --- a/include/linux/dcache.h
>> +++ b/include/linux/dcache.h
>> @@ -297,6 +297,7 @@ extern void dentry_update_name_case(struct dentry *, struct qstr *);
>>
>> /* used for rename() and baskets */
>> extern void d_move(struct dentry *, struct dentry *);
>> +extern void d_exchange(struct dentry *, struct dentry *);
>> extern struct dentry *d_ancestor(struct dentry *, struct dentry *);
>>
>> /* appendix may either be NULL or be used for transname suffixes */
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 3f40547..71c6cf9 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1572,6 +1572,8 @@ struct inode_operations {
>> int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
>> int (*rename) (struct inode *, struct dentry *,
>> struct inode *, struct dentry *);
>> + int (*rename2) (struct inode *, struct dentry *,
>> + struct inode *, struct dentry *, unsigned int);
>> int (*setattr) (struct dentry *, struct iattr *);
>> int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
>> int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
>> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> index 6c28b61..ebdafb6 100644
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -35,6 +35,8 @@
>> #define SEEK_HOLE 4 /* seek to the next hole */
>> #define SEEK_MAX SEEK_HOLE
>>
>> +#define RENAME_EXCHANGE (1 << 0) /* Exchange source and dest */
>> +
>> struct fstrim_range {
>> __u64 start;
>> __u64 len;
>> --
>> 1.8.1.4
>>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2013-10-03 02:00:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] cross rename

On 10/01/2013 09:00 AM, Miklos Szeredi wrote:
> This series adds a new syscall, renameat2(), which is the same as renameat() but
> with a flags argument. Internally i_op->reaname2() is also added, which can
> later be merged with ->rename() but is kept separately for now, since this would
> just blow up this patch without helping review.
>
> The purpose of extending rename is to add cross-rename, a symmetric variant of
> rename, which exchanges the two files. This allows interesting things, which
> were not possible before, for example atomically replacing a directory tree with
> a symlink, etc...
>

I would suggest it shouldn't be renameat2() but rather renameat3(), i.e.
rename file A -> B, if B exists rename B to C. It may not be desirable
to expose the stale B in the same namespace as A, but still want it to
be possible to scavenge it. Obviously, A=C is a valid subcase.

-hpa

2013-10-03 05:34:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] cross rename

On Wed, Oct 2, 2013 at 6:58 PM, H. Peter Anvin <[email protected]> wrote:
>
> I would suggest it shouldn't be renameat2() but rather renameat3(), i.e.
> rename file A -> B, if B exists rename B to C. It may not be desirable
> to expose the stale B in the same namespace as A, but still want it to
> be possible to scavenge it. Obviously, A=C is a valid subcase.

I really *really* prefer to stay with two names. Miklos had an earlier
three-name version, and it was hugely more complex, and it does not
fit nearly as well in the model.

Two directory entries is also what the current rename() effectively
always does (clearing one, changing another). So doing the
cross-rename model is actually fairly close to a normal rename. A
three-way one is not actually at all similar.

So I was actually very relieved to see this much simpler and cleaner
model, because the alternative really was nasty. This one looks fairly
simple and clean and straightforward. The previous was none of that.

Linus

2013-10-03 05:37:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] cross rename

Yes... Al and I had a brief conversation about the complexities over IRC this evening.

Linus Torvalds <[email protected]> wrote:
>On Wed, Oct 2, 2013 at 6:58 PM, H. Peter Anvin <[email protected]> wrote:
>>
>> I would suggest it shouldn't be renameat2() but rather renameat3(),
>i.e.
>> rename file A -> B, if B exists rename B to C. It may not be
>desirable
>> to expose the stale B in the same namespace as A, but still want it
>to
>> be possible to scavenge it. Obviously, A=C is a valid subcase.
>
>I really *really* prefer to stay with two names. Miklos had an earlier
>three-name version, and it was hugely more complex, and it does not
>fit nearly as well in the model.
>
>Two directory entries is also what the current rename() effectively
>always does (clearing one, changing another). So doing the
>cross-rename model is actually fairly close to a normal rename. A
>three-way one is not actually at all similar.
>
>So I was actually very relieved to see this much simpler and cleaner
>model, because the alternative really was nasty. This one looks fairly
>simple and clean and straightforward. The previous was none of that.
>
> Linus

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2013-10-04 23:58:35

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] cross rename

On 10/01/2013 09:00 AM, Miklos Szeredi wrote:
> This series adds a new syscall, renameat2(), which is the same as renameat() but
> with a flags argument. Internally i_op->reaname2() is also added, which can
> later be merged with ->rename() but is kept separately for now, since this would
> just blow up this patch without helping review.

How hard would it be to also add RENAME_NOREPLACE that fails if the
destination already exists?

IMO this would get rid of the last sane use of hard links (link + unlink
to simulate non-clobbering rename), and it would be nice on filesystems
that don't have hard links.

Windows has supported this since Windows 98, IIRC (using MoveFileEx).

--Andy

2013-10-05 00:11:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] cross rename

On Fri, Oct 4, 2013 at 4:58 PM, Andy Lutomirski <[email protected]> wrote:
>
> How hard would it be to also add RENAME_NOREPLACE that fails if the
> destination already exists?

Trivial. And I agree that that should be a good flag to have.

Linus